-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize trace generation #1558
Comments
Additional context: #1533 (comment) |
I think a good target here would be to get main trace generation to run at at least 50 MHz on something like an M1. And a stretch goal could be 100 MHz. I think it should be double but would require quite a bit of investigation and potential refactoring. |
One of the approach to investigate here is to completely split execution and trace generation. So, we'd have 2 components:
The main idea is that the executor will run in a single thread while multiple trace generators could run concurrently in multiple threads: The key question is which parts of the state trace generators would need to generate the trace. Here is my initial stab at this:
Assuming trace generation is at least 10x more expensive than execution, we should be able to get close to 5x-10x speed up on high-end laptops, and even more on sever-grade machines (i.e., the more cores we can throw at it, the faster we'll generate the trace). |
Currently, trace generation, both main and auxiliary, takes roughly about 10% of the total proving time. We should benchmark and profile this step in order to get an idea of the bottlenecks.
Witness generation is an inherently sequential process but there are still strategies that one could explore in order to improve things. One such approach is multi-stage building of the main trace through several passes, where for example the first pass could fill the minimal amount of trace values needed in order to be able to fill the hasher chiplet in parallel.
The text was updated successfully, but these errors were encountered: