Update README.md #785

folivetti · 2024-12-18T20:09:05Z

added a tips section to the README on how to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README

added some tips to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README

MilesCranmer · 2024-12-18T20:10:31Z

Maybe this should go on the tuning page instead? https://ai.damtp.cam.ac.uk/pysr/tuning/

MilesCranmer · 2024-12-18T20:12:13Z

README.md

+juliapkg.require_julia("~1.10")
+```
+
+2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage.


Anything beyond $50$ seems to create a significant slowdown and memory usage.

I think we can turn this off now. It was basically only there because some beginners were running with like 10,000 maxsize, so I wanted it to warn them 😄

$10,000$ 😶 lol, I wonder why

MilesCranmer · 2024-12-18T20:13:16Z

README.md

+2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage.
+3. Using a single population often makes the algorithm unstable, with a high variance on the results. A good enough starting value for this parameter is $10$.
+4. It can be a good practice to set `optimizer_nrestarts` to something larger than $1$, depending on the computational budget. The minimization of error for nonlinear regression models is multimodal and multiple restarts may be required to assess the quality of the equation.
+5. The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:


The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:

You can do model.model_selection = "accuracy" for this btw

it didn't work for me I don't know why. But I can rerun some experiments to see if it was related to using a single population instead of multiple islands. I'll let you know

ok, just tested here again. The instability issue was due to using populations=1 , the model_selection = "accuracy" works.

MilesCranmer · 2024-12-18T20:14:54Z

README.md

+
+### Tips
+
+1. When running PySR with `julia 1.11`, the process seems to run into a memory leak bug that halts the execution by exceeding the available memory. Forcing the version `1.10` can help avoiding such problem:


This memory leak is a Julia bug and will be fixed once 1.11.3 is released (as well as 1.10.8 hopefully) – JuliaLang/julia#56801. So we probably don't need to provide this guidance as it will only be temporary until the new Julia is out?

then maybe put as a highlighted issue (same as the "have you used PySR in your paper") while this is not fixed

folivetti · 2024-12-18T20:15:54Z

Maybe this should go on the tuning page instead? https://ai.damtp.cam.ac.uk/pysr/tuning/

I think it should be more appropriate there, yes.
But let me suggest to highlight the existence of this page in the README :-D I went through the example and didn't notice the link

MilesCranmer · 2024-12-18T20:17:59Z

It's linked in the last sentence of the Quickstart: https://github.com/MilesCranmer/PySR?tab=readme-ov-file#quickstart

There are also tips for tuning PySR on this page.

But I guess people miss this. How should it be made more prominent? Maybe a special "docs/tuning" badge at the top of the README or something?

folivetti · 2024-12-18T20:18:26Z

It's linked in the last sentence of the Quickstart: https://github.com/MilesCranmer/PySR?tab=readme-ov-file#quickstart

There are also tips for tuning PySR on this page.

But I guess people miss this. How should it be made more prominent? Maybe a special "docs/tuning" badge at the top of the README or something?

yes!!!

Update README.md

06a34f2

added some tips to set the hyper-parameters from a recent experience running PySR with multiple datasets. Feel free to make editions or reject the pull request if you think it is not appropriate for the README

MilesCranmer reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md #785

Update README.md #785

folivetti commented Dec 18, 2024

MilesCranmer commented Dec 18, 2024

MilesCranmer Dec 18, 2024

folivetti Dec 18, 2024

MilesCranmer Dec 18, 2024

folivetti Dec 18, 2024

folivetti Dec 19, 2024

MilesCranmer Dec 18, 2024

folivetti Dec 18, 2024

MilesCranmer Dec 19, 2024

folivetti commented Dec 18, 2024

MilesCranmer commented Dec 18, 2024

folivetti commented Dec 18, 2024


		### Tips

		1. When running PySR with `julia 1.11`, the process seems to run into a memory leak bug that halts the execution by exceeding the available memory. Forcing the version `1.10` can help avoiding such problem:

Update README.md #785

Are you sure you want to change the base?

Update README.md #785

Conversation

folivetti commented Dec 18, 2024

MilesCranmer commented Dec 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

folivetti commented Dec 18, 2024

MilesCranmer commented Dec 18, 2024

folivetti commented Dec 18, 2024