-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(duckdb): add to_json() to Table and duckdb backend #10681
Conversation
I think I'd rather have
I think we should avoid this and instead test that round-tripping produces semantically equivalent pyarrow tables or dataframes. For example, two JSONL files with the same rows in a different order are equivalent, but comparing their on-disk representation would indicate they were different. |
Sounds good to me. I'll just keep this as
Want me to switch from pandas (as I do here) to pyarrow? |
I added I also actually implemented all the duckdb-specific options of compression, dateformat, and timestampformat. I didn't add any tests though since it felt like overkill. I explicitly did NOT add the |
failing CI looks like an unrelated flake, it didn't fail in this previous CI run |
DataFrames are fine if that's what you started with. No need to redo. |
if dateformat: | ||
opts += f", DATEFORMAT '{dateformat}'" | ||
if timestampformat: | ||
opts += f", TIMESTAMPFORMAT '{timestampformat}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually this should be done with sqlglot as much as possible, but this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixes #10413
I didn't bother adding this to any of the other backends. There is no handy pyarrow.to_json() function that can save the day here. I think we would need to to backend-specific implementations.
Open todos that could come later, but I don't think should be blocking for this PR:
format=Literal["array", "jsonl"] = "array"
being the most likely choice, but not needed until we survey the other backends.