[BUG] Backwards compatible parquet MAP_KEY_VALUE is not treated properly #12044
Labels
0 - Backlog
In queue waiting for assignment
bug
Something isn't working
cuIO
cuIO issue
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
Milestone
Describe the bug
The parquet specification at https://github.com/apache/parquet-format/blob/master/LogicalTypes.md when talking about backwards compatibility in maps says that
The example schema given for this is.
I created a parquet file and put it in file.zip that is very similar, but it uses
int32
for both the key and the value.When I read the data back using CUDF I get a schema like
TABLE<STRUCT<STRUCT<INT32, INT32>>>
, but what we want isTABLE<LIST<STRUCT<INT32, INT32>>>
. Because that first column is a STRUCT and not a LIST only the first row in the LIST is returned.It looks like panads is able to do this.
Additional context
This is probably not a super high priority. It is an odd/rare corner case. At least until a customer hit this.
The text was updated successfully, but these errors were encountered: