[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit` #15057

GregoryKimball · 2024-02-14T21:27:40Z

Is your feature request related to a problem? Please describe.
The BM_parquet_read_chunks benchmark in benchmarks/io/parquet/parquet_reader_input.cpp includes a byte_limit nvbench axis. This axis controls the chunk_read_limit. With the new features added in #14360, there is a new chunked_parquet_reader API that exposes both chunk_read_limit and pass_read_limit parameters to control reader behavior. We currently do not have a method for benchmarking pass_read_limit values.

Describe the solution you'd like

Add a new benchmark, such as BM_parquet_read_subrowgroup_chunks, that provides nvbench axes for both chunk_read_limit and pass_read_limit
Rename byte_limit to chunk_read_limit in BM_parquet_read_chunks for clarity, now that we have both input and output byte limits in chunked parquet reading.
Also, please consider adding an nvbench axis for data_size for at least the chunked parquet reader benchmarks. It would be useful to allow the benchmarks to operate on tables larger than 536 MB.

Describe alternatives you've considered
n/a

The text was updated successfully, but these errors were encountered:

abellina · 2024-07-23T20:46:49Z

@sdrp713 will take this on.

GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Feb 14, 2024

GregoryKimball added this to the Parquet continuous improvement milestone Feb 14, 2024

GregoryKimball added this to libcudf Feb 14, 2024

GregoryKimball moved this to Needs owner in libcudf Feb 14, 2024

abellina assigned abellina and unassigned abellina Jul 23, 2024

GregoryKimball assigned GregoryKimball and nvdbaranec and unassigned GregoryKimball Jul 23, 2024

sdrp713 mentioned this issue Aug 13, 2024

Update chunked parquet reader benchmarks #16543

Open

3 tasks

GregoryKimball removed the status in libcudf Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit` #15057

[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit` #15057

GregoryKimball commented Feb 14, 2024

abellina commented Jul 23, 2024

[FEA] Update chunked parquet reader benchmarks to include pass_read_limit #15057

[FEA] Update chunked parquet reader benchmarks to include pass_read_limit #15057

Comments

GregoryKimball commented Feb 14, 2024

abellina commented Jul 23, 2024

[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit` #15057

[FEA] Update chunked parquet reader benchmarks to include `pass_read_limit` #15057