[FEA] parquet: rle_stream for dictionary pages #14950

abellina · 2024-02-01T15:45:30Z

I've been looking at the rle_stream class in order to decode dictionary streams in addition to repetition streams in the parquet decoder. This is a component of the work that @nvdbaranec has done here #13622, where we'd like to separate out at least a "fixed width" and a "fixed width dictionary encoded" pair of kernels.

With the changes in rle_stream, the core of the logic is able to use more threads for the RLE stream decoder. Specifically, a first warp is in charge of generating a set of runs, and other warps are able to take each one of the runs and decode them in parallel. As part of the micro kernel work, we feel that focusing on rle_stream decoder and its effects on gpuComputeStringPageBounds, gpuComputePageSizes and the use in the new fixed kernels, is a good first step to get the micro kernel work merged.

This issue then is to get a new rle_stream into cuDF that can handle both repetition AND dictionary streams, and show that the performance impact is same or better than what we have now. We hope that having this decoder will help centralize code, helping cleanup the parquet code base.

The text was updated successfully, but these errors were encountered:

etseidl · 2024-02-01T21:48:03Z

It would be interesting to compare the rle_stream approach to dictionary decoding to the approach in totalDictEntriesSize. The latter makes use of all warps for decoding work, and doesn't suffer from load balancing problems between warps, but it might be harder to save state and pick up again in a batch processing application.

pmattione-nvidia · 2025-01-08T21:34:57Z

This work is essentially done, except there is a corner case: If we have one rle run that is extremely long, one warp is busy decoding it while the other warps sit idle. When we detect this case we should split up the work amongst the warps to better balance the load.

abellina added feature request New feature or request Needs Triage Need team to review and classify cuIO cuIO issue Performance Performance related issue Spark Functionality that helps Spark RAPIDS labels Feb 1, 2024

nvdbaranec mentioned this issue Feb 1, 2024

[FEA] Implement a templated parquet decoding kernel suitable for reuse in micro-kernel optimization approach. #14953

Open

abellina self-assigned this Feb 1, 2024

GregoryKimball added this to libcudf Feb 1, 2024

GregoryKimball moved this to In progress in libcudf Feb 1, 2024

GregoryKimball added this to the Parquet continuous improvement milestone Feb 1, 2024

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

mattahrens assigned gerashegalov Apr 11, 2024

abellina removed their assignment Aug 22, 2024

gerashegalov assigned pmattione-nvidia and unassigned gerashegalov Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] parquet: rle_stream for dictionary pages #14950

[FEA] parquet: rle_stream for dictionary pages #14950

abellina commented Feb 1, 2024

etseidl commented Feb 1, 2024

pmattione-nvidia commented Jan 8, 2025

[FEA] parquet: rle_stream for dictionary pages #14950

[FEA] parquet: rle_stream for dictionary pages #14950

Comments

abellina commented Feb 1, 2024

etseidl commented Feb 1, 2024

pmattione-nvidia commented Jan 8, 2025