feat(duckdb): add to_json() to Table and duckdb backend #10681

NickCrews · 2025-01-17T05:07:37Z

I didn't bother adding this to any of the other backends. There is no handy pyarrow.to_json() function that can save the day here. I think we would need to to backend-specific implementations.

Open todos that could come later, but I don't think should be blocking for this PR:

Consider taking some of the duckdb-specific config options, and hoisting them to be "official" options across all backends. I could see a format=Literal["array", "jsonl"] = "array" being the most likely choice, but not needed until we survey the other backends.
For dates/datetimes/decimals, where there is no corresponding JSON type, we need to decide what to do.
Once we decide, implement that.
Maybe add more tests to ensure that backends produce the same JSON strings? eg maybe the implementation of the test currently, of just reading back using pandas, is too lenient, and multiple on-disk values would lead to the same result. Maybe we actually want to look at the on-disk representation.

cpcloud · 2025-01-17T07:54:16Z

I could see a format=Literal["array", "jsonl"] = "array" being the most likely choice, but not needed until we survey the other backends.

I think I'd rather have to_json and to_jsonl methods. Then there's never an ambiguity about what the word "JSON" means when you're reading the code.

Maybe add more tests to ensure that backends produce the same JSON strings?

I think we should avoid this and instead test that round-tripping produces semantically equivalent pyarrow tables or dataframes.

For example, two JSONL files with the same rows in a different order are equivalent, but comparing their on-disk representation would indicate they were different.

NickCrews · 2025-01-18T05:50:04Z

I think I'd rather have to_json and to_jsonl methods

Sounds good to me. I'll just keep this as to_json for now and NOT expose any format options. we can add to_jsonl later if requested.

I think we should avoid this and instead test that round-tripping produces semantically equivalent pyarrow tables or dataframes.

Want me to switch from pandas (as I do here) to pyarrow?

Fixes ibis-project#10413

NickCrews · 2025-01-18T06:06:10Z

I added trino as an xfail. Now tests should be passing.

I also actually implemented all the duckdb-specific options of compression, dateformat, and timestampformat. I didn't add any tests though since it felt like overkill.

I explicitly did NOT add the array option, which means users cannot get jsonl through this backdoor. I want that to actually get implemented with the to_jsonl solution when the time comes.

NickCrews · 2025-01-18T06:12:52Z

failing CI looks like an unrelated flake, it didn't fail in this previous CI run

cpcloud · 2025-01-19T11:17:35Z

Want me to switch from pandas (as I do here) to pyarrow?

DataFrames are fine if that's what you started with. No need to redo.

cpcloud · 2025-01-19T11:19:17Z

ibis/backends/duckdb/__init__.py

+        if dateformat:
+            opts += f", DATEFORMAT '{dateformat}'"
+        if timestampformat:
+            opts += f", TIMESTAMPFORMAT '{timestampformat}'"


Eventually this should be done with sqlglot as much as possible, but this is fine for now.

Ah I see, thanks!

cpcloud

Thanks!

github-actions bot added tests Issues or PRs related to tests duckdb The DuckDB backend labels Jan 17, 2025

NickCrews force-pushed the to_json branch from 167a601 to 3788554 Compare January 18, 2025 05:54

feat(duckdb): add to_json() to Table and duckdb backend

0e2ed8c

Fixes ibis-project#10413

NickCrews force-pushed the to_json branch from 3788554 to 0e2ed8c Compare January 18, 2025 06:02

cpcloud reviewed Jan 19, 2025

View reviewed changes

cpcloud approved these changes Jan 19, 2025

View reviewed changes

cpcloud added this to the 10.0 milestone Jan 19, 2025

cpcloud added the feature Features or general enhancements label Jan 19, 2025

cpcloud merged commit b17c28d into ibis-project:main Jan 19, 2025
89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(duckdb): add to_json() to Table and duckdb backend #10681

feat(duckdb): add to_json() to Table and duckdb backend #10681

NickCrews commented Jan 17, 2025 •

edited

Loading

cpcloud commented Jan 17, 2025

NickCrews commented Jan 18, 2025 •

edited

Loading

NickCrews commented Jan 18, 2025

NickCrews commented Jan 18, 2025

cpcloud commented Jan 19, 2025

cpcloud Jan 19, 2025

NickCrews Jan 19, 2025

cpcloud left a comment

feat(duckdb): add to_json() to Table and duckdb backend #10681

feat(duckdb): add to_json() to Table and duckdb backend #10681

Conversation

NickCrews commented Jan 17, 2025 • edited Loading

cpcloud commented Jan 17, 2025

NickCrews commented Jan 18, 2025 • edited Loading

NickCrews commented Jan 18, 2025

NickCrews commented Jan 18, 2025

cpcloud commented Jan 19, 2025

cpcloud Jan 19, 2025

Choose a reason for hiding this comment

NickCrews Jan 19, 2025

Choose a reason for hiding this comment

cpcloud left a comment

Choose a reason for hiding this comment

NickCrews commented Jan 17, 2025 •

edited

Loading

NickCrews commented Jan 18, 2025 •

edited

Loading