-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from dill to cloudpickle #1164
Conversation
Primarily for consistency with our other developments
Won't this break existing values dilled into HDF? At least we should have a fallback when cloud-unpickling fails. |
So far I have not found anything that does not work, two examples:
and
Still I agree this change is potentially dangerous and we should discuss very carefully when and how we want to do the migration. Nevertheless, given the advantages of |
Ah, I wasn't aware that they are interoperable. Is that something that they promise or does it just happen to be? Either way, moving to cloudpickle is a-ok with me, as long as we put a backwards test in. |
Would it harm to try:
cloudpickle.loads(hdf_item)
except (IDontKnowWhichError):
dill.loads(hdf_item) for a while? In each case I agree, it is good to use one framework to do such jobs :) |
So I haven't found any mention of the interoperability on the cloudpickle README. Maybe it's obvious when looking at the implementation, but barring that I suggest we add explicit tests that it works and a backwards compatibility test for tables. I think that's the only location where we use pickle/dill/cloud explicitly for long term storage. Then we can go ahead and swap it out. |
From the cloudpickle github:
This sounds like it will cause massive trouble with old objects at some point in the future. |
The same applies for |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
# Conflicts: # .ci_support/environment.yml # setup.py
@niklassiemer and @pmrv : I finally got back to this pull request and would like to merge it till the end of this year. As suggested by @pmrv I added two backwards compatibility tests based on the data_mining notebook in pyiron_atomistics https://github.com/pyiron/pyiron_atomistics/blob/main/notebooks/data_mining.ipynb . While the the system functions can be correctly unpickled by cloudpickle is they were initially pickled with dill, this does not apply to the user defined functions. These raise a From my perspective this pull request is now ready to be reviewed again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've browsed cloudpickles discussion wrt long term storage a bit and it seems that the statement concerning "cloudpickle is not meant for long-term storage" is mainly related to the fact that they use the highest pickle protocol by default and are hence potentially not backwards compatible. However for us that should mean as long as we do specify the pickling protocol we should be good.
Primarily for consistency with our other developments