-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multisymbol RLE implementation #1052
Conversation
708cfaa
to
7d9f08b
Compare
can you rebase? |
@proppy I've rebased it on to the newest main |
Can you detail a little a bit more in the PR description (or the |
I'll update |
Curious why there is three transaction here, and two separate packets for |
xls/modules/rle/rle_enc_adv.x
Outdated
|
||
// This encoder is implemented as a net of 4 processes. | ||
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the stage is stateless, does it need to be a proc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand DSLX if I want to perform channel operations I have to use proc's.
xls/modules/rle/rle_enc_adv.x
Outdated
// This encoder is implemented as a net of 4 processes. | ||
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. | ||
// 2. Realign stage - this process moves pairs emitted from previous stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a note describing the state for each "stage" when they have one?
xls/modules/rle/rle_enc_adv.x
Outdated
|
||
// This encoder is implemented as a net of 4 processes. | ||
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this stage equivalent to the current rle_enc
can the implementation be shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any obvious code that can be shared between reduce step
and rle_enc
.
Main difference is that rle_enc
code has to take into account state and update it f necessary,
while reduce step doesn't have one.
xls/modules/rle/rle_enc_adv.x
Outdated
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. | ||
// 2. Realign stage - this process moves pairs emitted from previous stage | ||
// so that they are align to the left, it also calculates propagation distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example to show 'align to the left` refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let the input to 1 step be [A, A, B, B]
.
Output from the first step will be [..., (A,2), ..., (B,2)]
.
2 step will transform it to [(A,2), (B,2), ..., ...]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comment below can you also explain what ...
refers to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here ...
was used for an invalid data symbol, counter pair.
xls/modules/rle/rle_enc_adv.x
Outdated
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. | ||
// 2. Realign stage - this process moves pairs emitted from previous stage | ||
// so that they are align to the left, it also calculates propagation distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example of a propagation distance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume that count is 2 bit wide and input can ingest 4 symbols at once.
Let the input be [A,A,A,A]
, so after the second stage it will look like:
[(A,3), (A,1), ..., ...]
and propagation distance will be 1.
For input [A,B,B,B]
second step will emit [(A,1), (B,3), ..., ...]
with propagation distance 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comment below, can you update the comment with a textual explanation of what the propagation distance refer to? (here is seems that it mainly indicate with a packet has been broken or not)
xls/modules/rle/rle_enc_adv.x
Outdated
// 2. Realign stage - this process moves pairs emitted from previous stage | ||
// so that they are align to the left, it also calculates propagation distance | ||
// for the first pair. | ||
// 3. Core stage - this stage is stateful. It takes align pairs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document what the state is?
xls/modules/rle/rle_enc_adv.x
Outdated
// so that they are align to the left, it also calculates propagation distance | ||
// for the first pair. | ||
// 3. Core stage - this stage is stateful. It takes align pairs, | ||
// and combines them with its state.It outputs multiple symbol/count pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example of the output to show how it different from the received input?
xls/modules/rle/rle_enc_adv.x
Outdated
// for the first pair. | ||
// 3. Core stage - this stage is stateful. It takes align pairs, | ||
// and combines them with its state.It outputs multiple symbol/count pairs. | ||
// 4 - Adjust Width stage - this stage takes output from the core stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/4 - /4./
xls/modules/rle/rle_enc_adv.x
Outdated
// This encoder is implemented as a net of 4 processes. | ||
// 1. Reduce stage - this process takes incoming symbols and symbol_valid | ||
// and reduces them into symbol count pairs. This stage is stateless. | ||
// 2. Realign stage - this process moves pairs emitted from previous stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it takes input from the preceding "stage"? if yes, maybe we should state it explicitly.
xls/modules/rle/rle_enc_adv.x
Outdated
// 2. Realign stage - this process moves pairs emitted from previous stage | ||
// so that they are align to the left, it also calculates propagation distance | ||
// for the first pair. | ||
// 3. Core stage - this stage is stateful. It takes align pairs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it takes input from the preceding "stage"? if yes, maybe we should state it explicitly.
// 3. Core stage - this stage is stateful. It takes align pairs, | ||
// and combines them with its state.It outputs multiple symbol/count pairs. | ||
// 4 - Adjust Width stage - this stage takes output from the core stage. | ||
// If output can handle more or equal number of pairs as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the "stage" know that the "output can handle more or equal the number of pairs", can you give an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output and input widths are parametrized INPUT_WIDTH
and OUTPUT_WIDTH
.
If OUTPUT_WIDTH
is greater or equal to INPUT_WIDTH
, adjust width step has nothing to adjust.
xls/modules/rle/rle_enc_adv.x
Outdated
// If output can handle more or equal number of pairs as | ||
// input number of symbols. This stage does nothing. | ||
// If the output is narrower than the input, | ||
// this stage will serialize symbol counter pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this stage will serialize symbol counter pairs.
as it will split them to match the size of the desired output? can you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let input width be 4 and output with is 2. Core step can emit the following : [(A,1), (B,1) , (A,1), (B,1)]
.
Adjust Width step will serialize it to [(A,1(, (B,1)], [(A,1), (B,1)]
This was with an assumption that RLE counter is 2 bit wide, so it can only hold at most 3 symbols, while C was repeated 4 times. |
c9daa7d
to
5a7ead3
Compare
6bb2500
to
981283c
Compare
@@ -63,8 +63,8 @@ struct RunLengthDecoderState<SYMBOL_WIDTH: u32, COUNT_WIDTH: u32> { | |||
} | |||
// RLE decoder implementation | |||
pub proc RunLengthDecoder<SYMBOL_WIDTH: u32, COUNT_WIDTH: u32> { | |||
input_r: chan<DecInData<SYMBOL_WIDTH, COUNT_WIDTH>> in; | |||
output_s: chan<DecOutData<SYMBOL_WIDTH>> out; | |||
input_r: chan<DecInData<SYMBOL_WIDTH, COUNT_WIDTH, 1>> in; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more friendly to parameterize with max_count
and use https://google.github.io/xls/dslx_std/#stdclog2 + 1 to represent the width?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember there are parts of code that assume max_count
to be 2^k-1.
I'm not sure when user would like to set max_count to be something other than power of 2 -1.
// The behavior of the encoder is presented on the waveform below: | ||
|
||
// This encoder is implemented as a net of 4 proc. | ||
// 1. Reduce proc - this process takes incoming symbols and symbol_valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 1.
effectively equivalent to the existing RLE enc implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly the same thing as existing RLE enc. Existing RLE has state that is combined with incoming data,
this stage doesn't have state, it only works on data visible in a single channel payload and doesn't include stream history.
// distance for the first pair. | ||
// Example behaviours: | ||
// 1) | ||
// input: [.., (A, 2), .., (B, 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does ..
means here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to mark invalid data and data, count pairs.
// and reduces them into symbol count pairs. This step is stateless. | ||
// 2. Realign proc - this process moves pairs emitted from the reduce step | ||
// so that they are aligned to the left, it also calculates propagation | ||
// distance for the first pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also explain what "propagation distance" mean here? (it's not clear, at least to me, from just reading the example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here propagation distance is the number of consecutive pairs, starting form first pair, that have the same symbol value as the first one and have their counter maxed out. This way 'Core proc' can only check "propagation distance"th pair value and counter.
- If value differs from the internal state, then all pairs from [0:"propagation distance") will differ and core should combine payload with its state.
- If value is equal internal state, then "Core proc" only checks if internal state counter + "propagation distance"th pair counter overflow.
// output: [(A, 2), (B, 2), .., ..] | ||
// propagation distance: 0 | ||
// 2) | ||
// input: [.., .., (A, 3), (A, 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A, 3), (A, 1)
can this actually happen? wouldn't the previous step simply emit (A, 4)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the examples I was assuming counter 2 bit wide, so that's why there is (A, 3), (A, 1)
and not (A, 4)
.
// 3. Core proc - this step is stateful. It takes align pairs from | ||
// the realign step, and combines them with its state to create multiple | ||
// symbol/count pairs output. State is represented by following tuple | ||
// `<symbol, count, last>`. It contains symbol and count from last pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it effectively means that we're counting again? is that right to assume that if each transaction only include one symbol this does nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first proc with a state that holds stream history. If a stream would only have one data value in all symbols.
First processing state would reduce incoming data stream into (data_value, input_count)
, this stage would combine this pair with one that preceded it, and only emit output pair when either internal counter overflows or stream ends.
||| | ||
|-----|-------| | ||
|input|output | | ||
|[(A, True), (A, True), (A, True), (A, True)]|[.., .., (A, 3), (A, 1)]| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the boolean after the symbol in the example corresponding to !last
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catch this. I've forgotten to put last in this example.
This boolean value next to the symbol is data valid value.
I'll update this example to include last value
repeating symbols to the output stream that contains the symbols and | ||
the number of its consecutive occurrences in the input stream. | ||
|
||
Overall, we can break down the data processing into four stages: reduction, alignment, compression, and output generation. The division of responsibility allowed the specialized blocks to efficiently process data and made it possible test each functionality separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we measure the overhead of splitting the process into different proc in term of additional registers and wires, compared to a single proc implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked how much more resource does multi process implementation uses over single process implementation, and how max frequency is affected by this.
xls/modules/rle/rle_enc_adv.md
Outdated
|
||
### Process | ||
1. Reduce step - this process takes incoming symbols and symbol_valid | ||
and reduces them into symbol count pairs. This step is stateless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't stateless proc be in theory implemented as fn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All state less processes are effectively a fn
s, but as far as I know you can't put recv/send
operations into fn
s.
Using fn
would potentially reduce fmax, as first proc would have to do all the processing before it would be able to update it's state, or we would reduce pipeline throughput when using II>1 to improve fmax.
## Encoder processing pipeline detailed breakdown. | ||
|
||
### Initial conditions | ||
- input width is 4 symbols wide, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this effectively assume that the input bit-width is always a multiple of the symbols size? i.e: you couldn't have block with a u9
input bus that process repeating u2
symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, implementation assumes that the input is always x symbols wide.
Handling u9 to u2 conversion should be done by additional proc.
As it will know how to handle that extra one bit,
if that bit is a part of symbol that was split on u9 boundary, or it has some other meaning.
981283c
to
e4e50e9
Compare
This commit adds valid signals to PlainData interface. CompressedData interface uses `count > 0` to define valid symbol count pair. This is in preparation for a multisymbol RLE encoder implementation. Signed-off-by: Maciej Dudek <[email protected]>
This encoder is capable of ingesting multiple symbols and produces multiple compressed pairs. It should offer faster compression in exchange for area used. Signed-off-by: Maciej Dudek <[email protected]>
Signed-off-by: Maciej Dudek <[email protected]>
Signed-off-by: Maciej Dudek <[email protected]>
Signed-off-by: Maciej Dudek <[email protected]>
Signed-off-by: Maciej Dudek <[email protected]>
Signed-off-by: Maciej Dudek <[email protected]>
e4e50e9
to
d28d81b
Compare
Should we close this in favor of the work on #1211 ? |
@tmichalak closing this now that the more relevant #1315 has been merged. |
This PR adds more advanced implementation of RunLengthEncoder that is capable of ingesting
multiple symbols at once and produce multiple pairs in output.
This PR depends on #1006
It updates implementations of simple encoder and simple decoder to work with new
channel description.
Main differentiating factor is that original RLE allows for symbol width to be adjusted.
This implementation adds ability to process multiple symbols at once.
Let's take a base RLE and multisybol RLE with input that takes up to 4 symbols and outputs up to 2 pairs
and following sequence:
AABCCCCB
A, A, B, C, C, C, C, B
.Generating output takes require 5 transactions:
(2,A), (1,B), (3,C), (1,C), (1,B)
AABC, CCCB
.Generating output takes only 3 transactions:
[(2,A), (1,B)], [(3,C), (1,C)], [(1,B)]
Multisybol RL encoder should have higher throughput but wt will use more logic to implement.