-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce slice fusion #11638
Reintroduce slice fusion #11638
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 4h 30m 42s ⏱️ + 2m 56s Results for commit 27a5441. ± Comparison against base commit 7f97e68. ♻️ This comment has been updated with latest results. |
I'm open to adding new APIs as long as they are somewhat general. I wouldn't want to add a Is there any other API you have in mind that would be useful to have? |
cc @hendrikmakait who's thinking about slimming down fuse and possibly changing its internal representation |
I honestly don't know what a good API here would look like. This all feel like horrible abstraction leaks. We might consider just living with the ugly thing here? Rebuilding the list of nested tasks just puts the ugliness somewhere else |
I find the fuse/unfuse to not be horrible. In fact, I could see this being useful outside of this application, e.g. if we encountered two sequential fused tasks and wanted to fuse those more efficiently like... |
"""Optimize slices | ||
1. Fuse repeated slices, like x[5:][2:6] -> x[7:11] | ||
|
||
This is generally not very important since we are fusing those tasks anyway. There | ||
is one specific exception to how xarray implements opening netcdf files and subsequent | ||
slices. Not merging them together can cause reading the whole netcdf file before | ||
we drop the unnecessary data. Fusing slices avoids that pattern. | ||
|
||
See https://github.com/pydata/xarray/issues/9926 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow confirm this is actually working now? A manual confirmation is good enough for me at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this manually with netcdf files, but the test also validates this behavior more generally
pre-commit run --all-files
This is mostly what the old implementation did...
I still have to add more docs and probably more tests too, but the general idea is there. We are only looking at fused tasks,
since they might have the pattern we care about that caused pydata/xarray#9926
The way we are currently modifying these fused tasks isn't great though, we are using the dictionary that defines the fused task and replace fused getitem tasks with an alias before modifying the parent task with the new slice, i.e.
We might want to consider adding an API for this instead of modifying the dictionary inplace? cc @fjetter for thoughts?