-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for maintain_order param in joins #17698
base: branch-25.04
Are you sure you want to change the base?
Add support for maintain_order param in joins #17698
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Matt711, I think we are doing too much work in the "none" case.
"tests/unit/operations/test_join.py::test_update": 'LazyFrame.update doesn\'t maintain order of either table (ie. maintain_order="none")', | ||
"tests/unit/operations/test_join.py::test_join_preserve_order_left": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming.py::test_streaming_generic_left_and_inner_join_from_disk": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming_join.py::test_join_null_matches[False]": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming_join.py::test_join_null_matches_multiple_keys[False]": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/test_cse.py::test_cse_rename_cross_join_5405": 'The test includes a join w/ maintain_order="none"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I open an issue in polars to add maintain_order="left_right"
or some other value than none
? That way we wouldn't have to xfail some of these tests. WDYT?
cc. @wence-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh maybe, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this PR with Polars 1.19 and these tests now pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is related to left joins preserving the order of the left table by default. There are two options I see
- If I'm right about left joins, we can try handling that case in this PR (ie. how="left" and maintain_order=None => preserve the left table order)
- Keep the PR as is and do a polars version bump
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's bump the version I think. I would like to be aligned with the most recent polars release before we go into burndown anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests still fail after upgrading polars to 1.21. I rechecked and I was incorrect about the tests passing 1.19. This makes sense because these I think these tests all have joins w/maintain_order="none"
(ie. no ordering is guaranteed).
So, I think we still need to xfail these tests until we possibly update those polars tests.
cc. @wence-
Is this PR gated on #17771 going in? If there are Polars version-specific behaviors that we care about? |
I think we should get that in first, then I'll take a second look here. |
right_left = left.join(right, on="a", how="right", maintain_order="left") | ||
assert_gpu_result_equal(right_left) | ||
|
||
right_right = left.join(right, on="a", how="right", maintain_order="right") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't computing order preserving right join correctly. See this example
In [1]: import polars as pl
In [2]: right = pl.LazyFrame(
...: {
...: "a": [1, 4, 3, 7, None, None, 1],
...: "c": [2, 3, 4, 5, 6, 7, 8],
...: "d": [6, None, 7, 8, -1, 2, 4],
...: }
...: )
In [3]: left = pl.LazyFrame(
...: {
...: "a": [1, 2, 3, 1, None],
...: "b": [1, 2, 3, 4, 5],
...: "c": [2, 3, 4, 5, 6],
...: }
...: )
In [4]: left.join(right, on="a", how="right", maintain_order="right").collect(engine="cpu")
Out[4]:
shape: (9, 5)
┌──────┬──────┬──────┬─────────┬──────┐
│ b ┆ c ┆ a ┆ c_right ┆ d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═════════╪══════╡
│ 1 ┆ 2 ┆ 1 ┆ 2 ┆ 6 │
│ 4 ┆ 5 ┆ 1 ┆ 2 ┆ 6 │
│ null ┆ null ┆ 4 ┆ 3 ┆ null │
│ 3 ┆ 4 ┆ 3 ┆ 4 ┆ 7 │
│ null ┆ null ┆ 7 ┆ 5 ┆ 8 │
│ null ┆ null ┆ null ┆ 6 ┆ -1 │
│ null ┆ null ┆ null ┆ 7 ┆ 2 │
│ 1 ┆ 2 ┆ 1 ┆ 8 ┆ 4 │
│ 4 ┆ 5 ┆ 1 ┆ 8 ┆ 4 │
└──────┴──────┴──────┴─────────┴──────┘
In [5]: left.join(right, on="a", how="right", maintain_order="right").collect(engine="gpu")
Out[5]:
shape: (9, 5)
┌──────┬──────┬──────┬─────────┬──────┐
│ b ┆ c ┆ a ┆ c_right ┆ d │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═════════╪══════╡
│ 1 ┆ 2 ┆ 1 ┆ 2 ┆ 6 │
│ 1 ┆ 2 ┆ 1 ┆ 8 ┆ 4 │
│ 3 ┆ 4 ┆ 3 ┆ 4 ┆ 7 │
│ 4 ┆ 5 ┆ 1 ┆ 2 ┆ 6 │
│ 4 ┆ 5 ┆ 1 ┆ 8 ┆ 4 │
│ null ┆ null ┆ 4 ┆ 3 ┆ null │
│ null ┆ null ┆ 7 ┆ 5 ┆ 8 │
│ null ┆ null ┆ null ┆ 6 ┆ -1 │
│ null ┆ null ┆ null ┆ 7 ┆ 2 │
└──────┴──────┴──────┴─────────┴──────┘
"tests/unit/operations/test_join.py::test_update": 'LazyFrame.update doesn\'t maintain order of either table (ie. maintain_order="none")', | ||
"tests/unit/operations/test_join.py::test_join_preserve_order_left": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming.py::test_streaming_generic_left_and_inner_join_from_disk": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming_join.py::test_join_null_matches[False]": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/streaming/test_streaming_join.py::test_join_null_matches_multiple_keys[False]": 'The test includes a join w/ maintain_order="none"', | ||
"tests/unit/test_cse.py::test_cse_rename_cross_join_5405": 'The test includes a join w/ maintain_order="none"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests still fail after upgrading polars to 1.21. I rechecked and I was incorrect about the tests passing 1.19. This makes sense because these I think these tests all have joins w/maintain_order="none"
(ie. no ordering is guaranteed).
So, I think we still need to xfail these tests until we possibly update those polars tests.
cc. @wence-
Description
Closes #17696
Checklist