Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage capacity estimates #1653

Open
morph-dev opened this issue Jan 29, 2025 · 3 comments
Open

Storage capacity estimates #1653

morph-dev opened this issue Jan 29, 2025 · 3 comments
Labels
enhancement New feature or request priority shelf-stable Will not be closed by stale-bot

Comments

@morph-dev
Copy link
Collaborator

morph-dev commented Jan 29, 2025

We are wrongly estimating our storage capacity usage. Goal of this issue is to enumerate all known issues on this topic, so we can start discussion on how to address it.

We underestimate how much space is used

Currently, we estimate how much disk space is used by adding lengths of content_id, content_key and content_value. However, there is extra overhead, I believe caused by:

  • other columns (rowid, distance_short, content_size)
  • data needed for indexes
  • other "empty" space in DB pages

I strongly believe that overhead is dependent on the number of stored items, not the size of the items. Because average size of the content on the state network is fraction of average size of the content on the history network (see data below), our estimate is significantly worse for the state network data.

I extracted the data from 3 different types of prod nodes:

History node (4444 nodes) - actual disk usage 36 183 mb

network radius content_count content_used content_capacity
history 1.9% 1 344 329 34 999 mb 35 000 mb

State nodes - actual disk usage 54 198 mb

network radius content_count content_used content_capacity
history 1.2% 761 237 17 500 mb 17 500 mb
state 37% 60 700 610 17 498 mb 17 500 mb
beacon 0% 326 434 mb 0mb

Big storage nodes (21m state) - actual disk usage 639 759 mb

network radius content_count content_used content_capacity
history 13% 218 968 5 000 mb 5 000 mb
state 100% 2 019 082 740 392 787 mb 500 000 mb
beacon 0% 129 15 mb 0 mb

We can observe that actual usage in nodes that have both history and state content is significantly worse (up to 60%) then when we only have history network (only ~3% extra)

Beacon network declares that it doesn't need any space

While beacon clearly uses some disk space, it's unclear how much that is. Also, it never reserves space from the one provided via flag (meaning, we just assume it uses zero and distribute specified capacity between history and state).

I would also highlight that I didn't investigate if beacon correctly reports it's usage (I know there are multiple tables, and I don't know if we are actually reporting stats from all of them).

Disk usage never goes down

Something to be tested, but I'm pretty sure that disk usage never goes down, even if we remove a lot of content at once.
My understanding is that sqlite keeps free database pages around, so it can reuse them later.

Forcing database to free space can be done either using VACUUM command (which I believe requires hard drive of twice as much space), or setting auto_vacuum which will clean database usage on the go (which I believe would become active only on new DB or after being followed by VACUUM command).

@morph-dev morph-dev added enhancement New feature or request priority shelf-stable Will not be closed by stale-bot labels Jan 29, 2025
@morph-dev
Copy link
Collaborator Author

morph-dev commented Feb 5, 2025

I did more analysis. I used following data:

  • I estimated disk usage for each table / index by running following sql command:
    > SELECT name, pageno, pgsize FROM dbstat WHERE aggregate=TRUE ORDER BY pgsize DESC;
    
    name                             pageno  pgsize    
    -------------------------------  ------  ----------
    ii1_history                      460273  1885278208
    ii1_state                        370773  1518686208
    sqlite_autoindex_ii1_state_1     34092   139640832 
    ii1_state_distance_short_idx     10591   43380736  
    ii1_state_content_size_idx       9154    37494784  
    lc_bootstrap                     2788    11419648  
    sqlite_autoindex_ii1_history_1   1091    4468736   
    ii1_history_distance_short_idx   337     1380352   
    ii1_history_content_size_idx     295     1208320   
    lc_update                        15      61440     
    sqlite_autoindex_lc_bootstrap_1  6       24576     
    bootstrap_slot_idx               3       12288     
    bootstrap_content_size_idx       3       12288     
    sqlite_schema                    1       4096      
    update_size_idx                  1       4096      
    historical_summaries             1       4096      
    store_info                       1       4096      
    sqlite_autoindex_store_info_1    1       4096      
    
    • the pageno is number of pages that table/index uses, the pgsize is aggregated size of those pages (each page size should be 4096 bytes)
  • I took estimated usage based on trin logs:
    2025-02-05T15:16:12.709757Z  INFO trin_state: reports~ data: radius=5.0% content=1351.6/17500mb #=3039996 disk=3646.9mb; msgs: offers=7025042/7681965, accepts=5354861/5354891, validations=1227428/1227428; cpu=24.8%
    2025-02-05T15:16:12.714019Z  INFO trin_beacon: reports~ data: radius=0.0000% content=7.8/0mb #=138 disk=3646.7mb; msgs: offers=44667/45579, accepts=40824/40824, validations=5728/5818
    2025-02-05T15:16:12.734855Z  INFO trin_history: reports~ data: radius=5.0% content=1855.6/17500mb #=97950 disk=3646.9mb; msgs: offers=50567/51681, accepts=30851/30868, validations=9051/9051
    

I assumed that estimated usage corresponds only to the usage of the table itself (estimated disk usage of indexes is zero), and come up with the following table (how much we are underestimating per content item):

Image

Few notes:

  • This is extracted from clients with varius db sizes:
    • my local node - first 2 rows, db size: 1GB
    • state nodes - next 4 rows, db size: 3.6 GB
    • history nodes - last 2 rows, db size: 35GB
      • These nodes are running for a very long time, and for a long time they used different formula (even less accurate) to estimate content size

Note: I ignored storage used by beacon network as it seems significantly smaller in comparison.
@ogenev , do we have an estimate on upper bound on how much data beacon chain can store in the db?

@morph-dev
Copy link
Collaborator Author

Based on the information from the previous message, I think we can have following conclusions:

  • estimate that distance_index and size_index take 16 bytes per item (maybe two 64-bit values?!)
  • estimate that content_key_index takes 48 bytes (32 bytes for content id + 16 bytes for something else?!)
  • Big difference per subnetwork can easily be explained by the fact that state content items are smaller in size, therefore they can be packed more compactly in one page (meaning, less wasted space)
  • differences observed for the same network can be explained by the fact that long running client will most likely have worse storage utilization (due to pruning). This would imply that estimates obtained by longer running clients with big databases should be taken more into consideration

Based on everything posted so far, I'm working on estimate on extra disk space needed per content item, so that we don't overestimate usage for long running clients.

@carver
Copy link
Collaborator

carver commented Feb 6, 2025

Hm, I'm not so sure we want autovacuum on by default. I get that there are situations were being able to run vacuum would be important, like when restarting trin with a lower db size limit. But maybe we should do a manual vacuum at that point, rather than have autovacuum all the time and significantly increase IOPS & DB wear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority shelf-stable Will not be closed by stale-bot
Projects
None yet
Development

No branches or pull requests

2 participants