Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dask_awkward wrapper to Correction and CompoundCorrection #219

Merged
merged 6 commits into from
Feb 2, 2024

Conversation

lgray
Copy link
Contributor

@lgray lgray commented Jan 20, 2024

Also add awkward wrapper to CompoundCorrection.

This PR lets us pass dask_awkward.Array into correctionlib corrections.
It does the wrapping of the correction into a delayed object and map_partitions call internally now.

evaluate = sf.evaluate(
    dx,
    1.0,
)

Is significantly cleaner than the map_partitions version.

@lgray lgray changed the title feat: add dask_awkward wrapper to Correction and CompoundCorrection feat: add dask_awkward wrapper to Correction and CompoundCorrection Jan 20, 2024
also make min awkward/dask_awkward versions more easily configurable
@lgray
Copy link
Contributor Author

lgray commented Jan 22, 2024

@nsmith- please review when you have time, thanks!

@nsmith- nsmith- self-requested a review January 22, 2024 17:01
src/correctionlib/highlevel.py Show resolved Hide resolved
src/correctionlib/highlevel.py Show resolved Hide resolved
@lgray
Copy link
Contributor Author

lgray commented Jan 22, 2024

and just to be sure you're fine with the dask.delayed object getting cached (this was to enforce re-use on multiple calls to wrap the correction, otherwise it keeps generating a new key in the graph / more payload).

Mostly just an issue of thread safety, but I don't imagine people using correctionlib in python threads (as opposed to processes) that much.

@nsmith-
Copy link
Collaborator

nsmith- commented Jan 22, 2024

I'm much more scared of attempting to persist the dask.delayed object in the library code. Just having it wrapped is fine I think.

@lgray
Copy link
Contributor Author

lgray commented Jan 22, 2024

Yeah what I've implemented here was more or less what Martin suggested so far as dask usage patterns are concerned. No need to persist if you wrap it in the delayed object. It'll be handled by any scheduler that conforms to the spec. This is also what's being done over in coffea for corrections and ml models after his suggestion.

@nsmith- nsmith- mentioned this pull request Feb 2, 2024
@nsmith- nsmith- merged commit 2ab0000 into cms-nanoAOD:master Feb 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants