Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk dependency tracing: Re-compute only necessary chunks in Cubed plan #645

Open
TomNicholas opened this issue Dec 13, 2024 · 1 comment
Labels
enhancement New feature or request icechunk 🧊

Comments

@TomNicholas
Copy link
Member

Concept

Icechunk solves the problem of handling incremental updates to Zarr stores, meaning that users can transparently track changes to datasets at the chunk level. Often real-world data pipelines involve computing some aggregate result from an entire input dataset(s), but currently if you change just one chunk in a store then to get the new result you likely have to recompute using the entire dataset. This is potentially massively wasteful if only part of the new result actually depends on chunks that were changed since the last version of the input dataset.

Can we use Cubed to automatically re-compute only the output chunks that actually depend on the updated input chunks?

This would be an extremely powerful optimization - in the pathological case the differences in the re-computed result might only depend on 1 or 2 updated chunks in the original dataset, so only 1 or 2 chunks need to be re-computed instead of re-computing the entire thing.

Cubed potentially has enough information in the plan to trace back up from the desired result all the way to which input chunks are actually necessary.

cc @rabernat (whose idea this was) @tomwhite @sharkinsspatial

@TomNicholas TomNicholas added enhancement New feature or request icechunk 🧊 labels Dec 13, 2024
@dcherian
Copy link

hehhe funnily it seems like you could reverse the graph so inputs become outputs, apply a selection to the inputs to isolate the chunks that have changed, call cull, then reverse the graph back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request icechunk 🧊
Projects
None yet
Development

No branches or pull requests

2 participants