Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from dill to cloudpickle #1164

Merged
merged 12 commits into from
Dec 19, 2023
Merged

Switch from dill to cloudpickle #1164

merged 12 commits into from
Dec 19, 2023

Conversation

jan-janssen
Copy link
Member

Primarily for consistency with our other developments

jan-janssen and others added 3 commits July 12, 2023 08:01
Primarily for consistency with our other developments
@pmrv
Copy link
Contributor

pmrv commented Jul 13, 2023

Won't this break existing values dilled into HDF? At least we should have a fallback when cloud-unpickling fails.

@jan-janssen
Copy link
Member Author

So far I have not found anything that does not work, two examples:

import dill
import cloudpickle
test_dict = {"a": 1, "b": 2}
cloudpickle.loads(dill.dumps(test_dict))

and

import dill
import cloudpickle
from ase.build import bulk
s = bulk("Al", cubic=True)
cloudpickle.loads(dill.dumps(s))

Still I agree this change is potentially dangerous and we should discuss very carefully when and how we want to do the migration. Nevertheless, given the advantages of pickle_by_value and pickle_by_reference that cloudpickle offers, I believe we have to do this switch sooner or later.

@pmrv
Copy link
Contributor

pmrv commented Jul 13, 2023

Ah, I wasn't aware that they are interoperable. Is that something that they promise or does it just happen to be? Either way, moving to cloudpickle is a-ok with me, as long as we put a backwards test in.

@niklassiemer
Copy link
Member

Would it harm to

try:
    cloudpickle.loads(hdf_item)
except (IDontKnowWhichError):
    dill.loads(hdf_item)

for a while? In each case I agree, it is good to use one framework to do such jobs :)

@pmrv
Copy link
Contributor

pmrv commented Jul 17, 2023

So I haven't found any mention of the interoperability on the cloudpickle README. Maybe it's obvious when looking at the implementation, but barring that I suggest we add explicit tests that it works and a backwards compatibility test for tables. I think that's the only location where we use pickle/dill/cloud explicitly for long term storage. Then we can go ahead and swap it out.

@Leimeroth
Copy link
Member

I think that's the only location where we use pickle/dill/cloud explicitly for long term storage. Then we can go ahead and swap it out.

From the cloudpickle github:

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

This sounds like it will cause massive trouble with old objects at some point in the future.

@jan-janssen
Copy link
Member Author

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

The same applies for pickle and dill - in general serialized objects are not guaranteed to remain compatible.

@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 12, 2023
@stale stale bot removed the stale label Aug 14, 2023
@jan-janssen jan-janssen added the help wanted Extra attention is needed label Aug 14, 2023
@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 17, 2023
@jan-janssen jan-janssen removed the stale label Dec 17, 2023
@jan-janssen jan-janssen added the format_black reformat the code using the black standard label Dec 17, 2023
@jan-janssen
Copy link
Member Author

@niklassiemer and @pmrv : I finally got back to this pull request and would like to merge it till the end of this year. As suggested by @pmrv I added two backwards compatibility tests based on the data_mining notebook in pyiron_atomistics https://github.com/pyiron/pyiron_atomistics/blob/main/notebooks/data_mining.ipynb . While the the system functions can be correctly unpickled by cloudpickle is they were initially pickled with dill, this does not apply to the user defined functions. These raise a ModuleNotFoundError error, so I followed @niklassiemer recommendation to fall back to dill if cloudpickle is not able to restore the object.

From my perspective this pull request is now ready to be reviewed again.

@jan-janssen jan-janssen requested a review from pmrv December 18, 2023 16:28
Copy link
Contributor

@pmrv pmrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've browsed cloudpickles discussion wrt long term storage a bit and it seems that the statement concerning "cloudpickle is not meant for long-term storage" is mainly related to the fact that they use the highest pickle protocol by default and are hence potentially not backwards compatible. However for us that should mean as long as we do specify the pickling protocol we should be good.

@jan-janssen jan-janssen merged commit e359d95 into main Dec 19, 2023
23 of 24 checks passed
@delete-merged-branch delete-merged-branch bot deleted the cloudpickel branch December 19, 2023 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black reformat the code using the black standard help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants