Skip to content

Parquet continuous improvement

No due date 60% complete
[QST] Should byte_array_view in parquet reader/writer change cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. question Further information is requested
#11408 opened Jul 29, 2022 by hyperbolic2346
Investigate need for output as binary configuration option bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
#11394 opened Jul 28, 2022 by hyperbolic2346
[FEA] category dtype support in parquet reader 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
#12497 opened Jan 7, 2023 by mattf
[FEA] Profiling duplicate reading of metadata cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
#6004 opened Aug 17, 2020 by calebwin
4 tasks
[FEA] Add Parquet and ORC unit tests based on Apache sample files 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS tests Unit testing for project
#13627 opened Jun 27, 2023 by GregoryKimball
[FEA] Increase reader throughput by pipelining IO and compute 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#13828 opened Aug 7, 2023 by GregoryKimball
[FEA] Update chunked parquet reader benchmarks to include pass_read_limit 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#15057 opened Feb 14, 2024 by GregoryKimball
3 tasks
[FEA] Add GZIP compression support to parquet writer 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#14509 opened Nov 28, 2023 by GregoryKimball
[BUG] Misinterpretation of Parquet List schema with single GROUP child named "array" 0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
#13313 opened May 9, 2023 by mythrocks
[BUG] String columns written with fastparquet seem to be read incorrectly via CUDF's Parquet reader 0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
#14258 opened Oct 5, 2023 by mythrocks
[BUG] CompactProtocolFieldWriter does not write empty value string in key-value pair 2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
#14024 opened Aug 31, 2023 by ttnghia
[FEA] Unused variable in JNI/Java binding for readParquet 0 - Backlog In queue waiting for assignment feature request New feature or request Java Affects Java cuDF API.
#12031 opened Oct 31, 2022 by ttnghia
[FEA] Add variable bit-width keys and improved key order for Parquet dict pages 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#13995 opened Aug 29, 2023 by abellina
[FEA] Pinned memory pools for parquet decode 0 - Blocked Cannot progress due to external reasons feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS
#14314 opened Oct 23, 2023 by abellina
[FEA] parquet: rle_stream for dictionary pages cuIO cuIO issue feature request New feature or request Performance Performance related issue Spark Functionality that helps Spark RAPIDS
#14950 opened Feb 1, 2024 by abellina
[BUG] Loading a missing column from a Parquet file results in ArrayIndexOutOfBoundsException bug Something isn't working good first issue Good for newcomers Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS
#11278 opened Jul 15, 2022 by jlowe
[BUG] Malformed fixed length byte array Parquet file loads corrupted data instead of error 1 - On Deck To be worked on next bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#14104 opened Sep 13, 2023 by jlowe
[FEA] Enable Page-level filtering based on the ColumnIndex feature from parquet 1.11 cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#9269 opened Sep 22, 2021 by revans2
[BUG] Special case Parquet LIST names appear to be ignored bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#12043 opened Nov 1, 2022 by revans2
[BUG] Backwards compatible parquet MAP_KEY_VALUE is not treated properly 0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
#12044 opened Nov 1, 2022 by revans2
[BUG] Parquet column selection by name with schemas including list<struct<X, Y>> does not work. 0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
#14539 opened Nov 30, 2023 by nvdbaranec
[FEA] The C++ tests for parquet don't test row group selection very well. 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project
#14417 opened Nov 15, 2023 by nvdbaranec
[FEA] Parquet reader: replace skip_rows / num_rows with start_row / end_row 0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
#14465 opened Nov 21, 2023 by nvdbaranec
[FEA] Follow up on refactoring possibility from parquet chunked reader PR 1 - On Deck To be worked on next cuIO cuIO issue feature request New feature or request good first issue Good for newcomers improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
#12143 opened Nov 15, 2022 by nvdbaranec
[FEA] Parquet reader code cleanup, re: nested columns vs columns with lists. 0 - Backlog In queue waiting for assignment cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code
#11793 opened Sep 27, 2022 by nvdbaranec
[FEA] simplify page_state_s and possibly other structures for specialized parquet kernels cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue
#15267 opened Mar 11, 2024 by abellina
[FEA] Enable using num_rows and skip_rows with ParquetReader 0 - Blocked Cannot progress due to external reasons cuIO cuIO issue feature request New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
#16249 opened Jul 11, 2024 by mhaseeb123
[FEA] Use bloom filters in Parquet reader to filter row groups with equality predicates cuco cuCollections related issue cuIO cuIO issue feature request New feature or request improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
#17164 opened Oct 24, 2024 by mhaseeb123
[FEA] Reduce the occurrence of uncompressed pages in Parquet writer cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
#17313 opened Nov 13, 2024 by GregoryKimball
[FEA] Implement shared parquet footer processing for Spark-RAPIDS cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
#17716 opened Jan 10, 2025 by GregoryKimball