Storage capacity estimates #1653

morph-dev · 2025-01-29T11:49:41Z

We are wrongly estimating our storage capacity usage. Goal of this issue is to enumerate all known issues on this topic, so we can start discussion on how to address it.

We underestimate how much space is used

Currently, we estimate how much disk space is used by adding lengths of content_id, content_key and content_value. However, there is extra overhead, I believe caused by:

other columns (rowid, distance_short, content_size)
data needed for indexes
other "empty" space in DB pages

I strongly believe that overhead is dependent on the number of stored items, not the size of the items. Because average size of the content on the state network is fraction of average size of the content on the history network (see data below), our estimate is significantly worse for the state network data.

I extracted the data from 3 different types of prod nodes:

History node (4444 nodes) - actual disk usage 36 183 mb

network	radius	content_count	content_used	content_capacity
history	1.9%	1 344 329	34 999 mb	35 000 mb

State nodes - actual disk usage 54 198 mb

network	radius	content_count	content_used	content_capacity
history	1.2%	761 237	17 500 mb	17 500 mb
state	37%	60 700 610	17 498 mb	17 500 mb
beacon	0%	326	434 mb	0mb

Big storage nodes (21m state) - actual disk usage 639 759 mb

network	radius	content_count	content_used	content_capacity
history	13%	218 968	5 000 mb	5 000 mb
state	100%	2 019 082 740	392 787 mb	500 000 mb
beacon	0%	129	15 mb	0 mb

We can observe that actual usage in nodes that have both history and state content is significantly worse (up to 60%) then when we only have history network (only ~3% extra)

Beacon network declares that it doesn't need any space

While beacon clearly uses some disk space, it's unclear how much that is. Also, it never reserves space from the one provided via flag (meaning, we just assume it uses zero and distribute specified capacity between history and state).

I would also highlight that I didn't investigate if beacon correctly reports it's usage (I know there are multiple tables, and I don't know if we are actually reporting stats from all of them).

Disk usage never goes down

Something to be tested, but I'm pretty sure that disk usage never goes down, even if we remove a lot of content at once.
My understanding is that sqlite keeps free database pages around, so it can reuse them later.

Forcing database to free space can be done either using VACUUM command (which I believe requires hard drive of twice as much space), or setting auto_vacuum which will clean database usage on the go (which I believe would become active only on new DB or after being followed by VACUUM command).

The text was updated successfully, but these errors were encountered:

morph-dev · 2025-02-05T19:26:12Z

I did more analysis. I used following data:

I estimated disk usage for each table / index by running following sql command:

> SELECT name, pageno, pgsize FROM dbstat WHERE aggregate=TRUE ORDER BY pgsize DESC;

name                             pageno  pgsize    
-------------------------------  ------  ----------
ii1_history                      460273  1885278208
ii1_state                        370773  1518686208
sqlite_autoindex_ii1_state_1     34092   139640832 
ii1_state_distance_short_idx     10591   43380736  
ii1_state_content_size_idx       9154    37494784  
lc_bootstrap                     2788    11419648  
sqlite_autoindex_ii1_history_1   1091    4468736   
ii1_history_distance_short_idx   337     1380352   
ii1_history_content_size_idx     295     1208320   
lc_update                        15      61440     
sqlite_autoindex_lc_bootstrap_1  6       24576     
bootstrap_slot_idx               3       12288     
bootstrap_content_size_idx       3       12288     
sqlite_schema                    1       4096      
update_size_idx                  1       4096      
historical_summaries             1       4096      
store_info                       1       4096      
sqlite_autoindex_store_info_1    1       4096

the pageno is number of pages that table/index uses, the pgsize is aggregated size of those pages (each page size should be 4096 bytes)

I took estimated usage based on trin logs:

2025-02-05T15:16:12.709757Z  INFO trin_state: reports~ data: radius=5.0% content=1351.6/17500mb #=3039996 disk=3646.9mb; msgs: offers=7025042/7681965, accepts=5354861/5354891, validations=1227428/1227428; cpu=24.8%
2025-02-05T15:16:12.714019Z  INFO trin_beacon: reports~ data: radius=0.0000% content=7.8/0mb #=138 disk=3646.7mb; msgs: offers=44667/45579, accepts=40824/40824, validations=5728/5818
2025-02-05T15:16:12.734855Z  INFO trin_history: reports~ data: radius=5.0% content=1855.6/17500mb #=97950 disk=3646.9mb; msgs: offers=50567/51681, accepts=30851/30868, validations=9051/9051

I assumed that estimated usage corresponds only to the usage of the table itself (estimated disk usage of indexes is zero), and come up with the following table (how much we are underestimating per content item):

Few notes:

This is extracted from clients with varius db sizes:
- my local node - first 2 rows, db size: 1GB
- state nodes - next 4 rows, db size: 3.6 GB
- history nodes - last 2 rows, db size: 35GB
  - These nodes are running for a very long time, and for a long time they used different formula (even less accurate) to estimate content size

Note: I ignored storage used by beacon network as it seems significantly smaller in comparison.
@ogenev , do we have an estimate on upper bound on how much data beacon chain can store in the db?

morph-dev · 2025-02-05T20:18:55Z

Based on the information from the previous message, I think we can have following conclusions:

estimate that distance_index and size_index take 16 bytes per item (maybe two 64-bit values?!)
estimate that content_key_index takes 48 bytes (32 bytes for content id + 16 bytes for something else?!)
Big difference per subnetwork can easily be explained by the fact that state content items are smaller in size, therefore they can be packed more compactly in one page (meaning, less wasted space)
differences observed for the same network can be explained by the fact that long running client will most likely have worse storage utilization (due to pruning). This would imply that estimates obtained by longer running clients with big databases should be taken more into consideration

Based on everything posted so far, I'm working on estimate on extra disk space needed per content item, so that we don't overestimate usage for long running clients.

carver · 2025-02-06T17:38:56Z

Hm, I'm not so sure we want autovacuum on by default. I get that there are situations were being able to run vacuum would be important, like when restarting trin with a lower db size limit. But maybe we should do a manual vacuum at that point, rather than have autovacuum all the time and significantly increase IOPS & DB wear.

morph-dev added enhancement New feature or request priority shelf-stable Will not be closed by stale-bot labels Jan 29, 2025

njgheorghita mentioned this issue Feb 4, 2025

Update HeaderWithProof Tracking Issue #1666

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage capacity estimates #1653

Storage capacity estimates #1653

morph-dev commented Jan 29, 2025 •

edited

Loading

morph-dev commented Feb 5, 2025 •

edited

Loading

morph-dev commented Feb 5, 2025

carver commented Feb 6, 2025

Storage capacity estimates #1653

Storage capacity estimates #1653

Comments

morph-dev commented Jan 29, 2025 • edited Loading

We underestimate how much space is used

History node (4444 nodes) - actual disk usage 36 183 mb

State nodes - actual disk usage 54 198 mb

Big storage nodes (21m state) - actual disk usage 639 759 mb

Beacon network declares that it doesn't need any space

Disk usage never goes down

morph-dev commented Feb 5, 2025 • edited Loading

morph-dev commented Feb 5, 2025

carver commented Feb 6, 2025

morph-dev commented Jan 29, 2025 •

edited

Loading

morph-dev commented Feb 5, 2025 •

edited

Loading