Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Update chunked parquet reader benchmarks to include pass_read_limit #15057

Open
3 tasks
GregoryKimball opened this issue Feb 14, 2024 · 1 comment
Open
3 tasks
Assignees
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@GregoryKimball
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The BM_parquet_read_chunks benchmark in benchmarks/io/parquet/parquet_reader_input.cpp includes a byte_limit nvbench axis. This axis controls the chunk_read_limit. With the new features added in #14360, there is a new chunked_parquet_reader API that exposes both chunk_read_limit and pass_read_limit parameters to control reader behavior. We currently do not have a method for benchmarking pass_read_limit values.

Describe the solution you'd like

  • Add a new benchmark, such as BM_parquet_read_subrowgroup_chunks, that provides nvbench axes for both chunk_read_limit and pass_read_limit
  • Rename byte_limit to chunk_read_limit in BM_parquet_read_chunks for clarity, now that we have both input and output byte limits in chunked parquet reading.
  • Also, please consider adding an nvbench axis for data_size for at least the chunked parquet reader benchmarks. It would be useful to allow the benchmarks to operate on tables larger than 536 MB.

Describe alternatives you've considered
n/a

@GregoryKimball GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Feb 14, 2024
@GregoryKimball GregoryKimball moved this to Needs owner in libcudf Feb 14, 2024
@abellina abellina assigned abellina and unassigned abellina Jul 23, 2024
@abellina
Copy link
Contributor

@sdrp713 will take this on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
Status: No status
Development

No branches or pull requests

3 participants