Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #785

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Update README.md #785

wants to merge 1 commit into from

Conversation

folivetti
Copy link

added a tips section to the README on how to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README

added some tips to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README
@MilesCranmer
Copy link
Owner

Maybe this should go on the tuning page instead? https://ai.damtp.cam.ac.uk/pysr/tuning/

juliapkg.require_julia("~1.10")
```

2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything beyond $50$ seems to create a significant slowdown and memory usage.

I think we can turn this off now. It was basically only there because some beginners were running with like 10,000 maxsize, so I wanted it to warn them 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$10,000$ 😶 lol, I wonder why

2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage.
3. Using a single population often makes the algorithm unstable, with a high variance on the results. A good enough starting value for this parameter is $10$.
4. It can be a good practice to set `optimizer_nrestarts` to something larger than $1$, depending on the computational budget. The minimization of error for nonlinear regression models is multimodal and multiple restarts may be required to assess the quality of the equation.
5. The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:

You can do model.model_selection = "accuracy" for this btw

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it didn't work for me I don't know why. But I can rerun some experiments to see if it was related to using a single population instead of multiple islands. I'll let you know

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just tested here again. The instability issue was due to using populations=1 , the model_selection = "accuracy" works.


### Tips

1. When running PySR with `julia 1.11`, the process seems to run into a memory leak bug that halts the execution by exceeding the available memory. Forcing the version `1.10` can help avoiding such problem:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memory leak is a Julia bug and will be fixed once 1.11.3 is released (as well as 1.10.8 hopefully) – JuliaLang/julia#56801. So we probably don't need to provide this guidance as it will only be temporary until the new Julia is out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe put as a highlighted issue (same as the "have you used PySR in your paper") while this is not fixed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@folivetti
Copy link
Author

Maybe this should go on the tuning page instead? https://ai.damtp.cam.ac.uk/pysr/tuning/

I think it should be more appropriate there, yes.
But let me suggest to highlight the existence of this page in the README :-D I went through the example and didn't notice the link

@MilesCranmer
Copy link
Owner

It's linked in the last sentence of the Quickstart: https://github.com/MilesCranmer/PySR?tab=readme-ov-file#quickstart

There are also tips for tuning PySR on this page.

But I guess people miss this. How should it be made more prominent? Maybe a special "docs/tuning" badge at the top of the README or something?

@folivetti
Copy link
Author

It's linked in the last sentence of the Quickstart: https://github.com/MilesCranmer/PySR?tab=readme-ov-file#quickstart

There are also tips for tuning PySR on this page.

But I guess people miss this. How should it be made more prominent? Maybe a special "docs/tuning" badge at the top of the README or something?

yes!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants