Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Startup Indexing Time #4633

Open
SirTyson opened this issue Jan 29, 2025 · 2 comments · May be fixed by #4634
Open

Improve Startup Indexing Time #4633

SirTyson opened this issue Jan 29, 2025 · 2 comments · May be fixed by #4634
Assignees

Comments

@SirTyson
Copy link
Contributor

When starting fresh after new-db, core first downloads Buckets, reads them once to verify the hash, then reads them all again to construct BucketIndex. We should combine the index and verify step since startup is mostly disk bound. There is no additional DOS risk that this imposes. If a History Archive provider is malicious, they could zip bomb us anyway as an OOM attack vector.

@SirTyson SirTyson self-assigned this Jan 29, 2025
@SirTyson SirTyson changed the title Improve Startup Time Improve Startup Indexing Time Jan 29, 2025
@SirTyson SirTyson linked a pull request Jan 29, 2025 that will close this issue
6 tasks
@MonsieurNicolas
Copy link
Contributor

couple comments:

  • a zip bomb today does not cause an OOM, but a temporary (potentially full) disk space issue that gets resolved on retry.
  • on some systems, a process eating up all RAM may take the whole system down; the OOM killer may not be fast enough to catch the issue, and other processes will fail in not so deterministic ways (because VRAM is at capacity). I think I've seen some system lock up entirely (and need to be rebooted via AWS console).

net is: a proper analysis is probably needed + enforce some sort of upper bound "just in case"

@SirTyson
Copy link
Contributor Author

Leaving the full analysis off here because it's a bit of a DOS angle, but I ran the numbers and the worst case index attack is as follows:

100 GB worst case bucket = 2.04 GB index
150 GB worst case bucket = 4.6 GB index
200 GB worst case bucket = 8.18 GB index

I think it might be reasonable to put a 100 GB hard limit on unzipped buckets. If an unzipped Bucket is over the limit, we throw as invalid before we start the hashing or indexing process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants