Improve Startup Indexing Time #4633

SirTyson · 2025-01-29T01:56:02Z

When starting fresh after new-db, core first downloads Buckets, reads them once to verify the hash, then reads them all again to construct BucketIndex. We should combine the index and verify step since startup is mostly disk bound. There is no additional DOS risk that this imposes. If a History Archive provider is malicious, they could zip bomb us anyway as an OOM attack vector.

The text was updated successfully, but these errors were encountered:

MonsieurNicolas · 2025-01-30T22:21:59Z

couple comments:

a zip bomb today does not cause an OOM, but a temporary (potentially full) disk space issue that gets resolved on retry.
on some systems, a process eating up all RAM may take the whole system down; the OOM killer may not be fast enough to catch the issue, and other processes will fail in not so deterministic ways (because VRAM is at capacity). I think I've seen some system lock up entirely (and need to be rebooted via AWS console).

net is: a proper analysis is probably needed + enforce some sort of upper bound "just in case"

SirTyson · 2025-01-30T22:50:32Z

Leaving the full analysis off here because it's a bit of a DOS angle, but I ran the numbers and the worst case index attack is as follows:

100 GB worst case bucket = 2.04 GB index
150 GB worst case bucket = 4.6 GB index
200 GB worst case bucket = 8.18 GB index

I think it might be reasonable to put a 100 GB hard limit on unzipped buckets. If an unzipped Bucket is over the limit, we throw as invalid before we start the hashing or indexing process.

SirTyson added the enhancement label Jan 29, 2025

SirTyson self-assigned this Jan 29, 2025

SirTyson changed the title ~~Improve Startup Time~~ Improve Startup Indexing Time Jan 29, 2025

SirTyson linked a pull request Jan 29, 2025 that will close this issue

Apply buckets optimization #4634

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Startup Indexing Time #4633

Improve Startup Indexing Time #4633

SirTyson commented Jan 29, 2025

MonsieurNicolas commented Jan 30, 2025

SirTyson commented Jan 30, 2025

Improve Startup Indexing Time #4633

Improve Startup Indexing Time #4633

Comments

SirTyson commented Jan 29, 2025

MonsieurNicolas commented Jan 30, 2025

SirTyson commented Jan 30, 2025