Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project.wrap_python_function does not play nicely with name lengths #1348

Open
liamhuber opened this issue Feb 20, 2024 · 5 comments
Open

Project.wrap_python_function does not play nicely with name lengths #1348

liamhuber opened this issue Feb 20, 2024 · 5 comments
Assignees

Comments

@liamhuber
Copy link
Member

Because the hash is pretty long by itself, the new `` functionality bumps into our 50 character limit on job name lengths and fails in an ungraceful manner:

from pyiron_base import Project

pr = Project("test")
pr.remove_jobs(recursive=True, silently=True)

def f(x):
    return 42

job = pr.wrap_python_function(f)
job.run() # Works fine


def fffffffffffffffffff(x):
    return 42

job = pr.wrap_python_function(fffffffffffffffffff)
job.run()
# ValueError: Invalid name for a PyIron object: must be less then or equal to 50 characters

If we're sure we know what the max length of the hash will be (idk if this is fixed) then the PythonFunctionContainerJob.python_function could either warn that the name will be truncated, or fail hard. In the former case, PythonFunctionContainerJob.save also needs to truncate.

Otherwise, PythonFunctionContainerJob.save should fail more gracefully, e.g. by breaking the failing name apart into the function name length and the hash length, so the user can see whether it will be possible to sufficiently truncate the function name themselves and try again.

@pmrv
Copy link
Contributor

pmrv commented Feb 21, 2024

Hashes are of fixed size, so issuing a warning on instantiation would be good. A hexadecimal representation of MD5 takes 32 characters. With base64 we could get it down to 24.

@jan-janssen
Copy link
Member

Alternatively, we could increase the limit from 50 to 256, that should keep us save. Or @XzzX do you have an idea if it is possible to remove the length constraint and what would be the performance penalty of doing so?

@XzzX
Copy link
Contributor

XzzX commented Feb 21, 2024

We can remove the length constraint. However, we enable users to store arbitrary amounts of data then. From a performance perspective I think we should be fine.

@liamhuber
Copy link
Member Author

Alternatively, we could increase the limit from 50 to 256, that should keep us save.

Only heuristically -- albeit to a very high probability. Nonetheless, in this case I would still want a check so that we fail early if it gets broken.

We can remove the length constraint. However, we enable users to store arbitrary amounts of data then. From a performance perspective I think we should be fine.

This is also a fine solution from my perspective 🚀

Either way we're going to need a PR to either (a) fail early, or (b) stop using 50 (in the database code, and the hard failure in pyiron_base.jobs.job.util)

@pmrv
Copy link
Contributor

pmrv commented Feb 21, 2024

According to the postgresql docs there's no performance difference between fixed length and free strings (modulo storage, which I don't think matters, hundred million jobs would be ~25GB only)

We'll need to fix it still to a maximum because of filesystem limitations (255 bytes filenames for ext4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants