Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: store seed command in stress test CLI #657

Open
wants to merge 42 commits into
base: next
Choose a base branch
from

Conversation

SantiagoPittella
Copy link
Collaborator

@SantiagoPittella SantiagoPittella commented Jan 29, 2025

@bobbinth i'm applying your suggestion of a new structure (your comment here). The only thing I changed is that I create all the accounts before hand.

I added some queries to check the size of each table, and the avg size of each entry in every table. This is the current output of the binary:

For example, this is the current output for 100k accounts:

Creating new faucet account...
Generated 100000 accounts in 19.377 seconds
Creating notes...
Created notes: inserted 393 blocks with avg insertion time 641 ms
Store file size every 50 blocks:
Block 0: 4096 bytes
Block 50: 2645048 bytes
Block 100: 72965512 bytes
Block 150: 137780152 bytes
Block 200: 201362360 bytes
Block 250: 264911800 bytes
Block 300: 328420280 bytes
Block 350: 392027064 bytes
Block 400: 455551928 bytes
Average growth rate: 1159154.7888040713 bytes per blocks
Total time: 274.344 seconds
DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

This PR was built on top of #621 .

@bobbinth
Copy link
Contributor

I'm getting like 1.56 blocks/second. I'm now researching what can be the bottleneck.

Do you know where the bottlenecks are? Taking a brief look at the code it seems like we are instantiating a lot of random number generators - these could be quite expensive (especially the RPO ones). So, I'd switch to lighter versions and also try to use the same instance as much as possible.

Store file size every 1k batches:
0: 4096 bytes
1000: 4096 bytes
2000: 78749696 bytes
3000: 160059392 bytes
4000: 238931968 bytes
5000: 318918656 bytes
6000: 397672448 bytes
7000: 477810688 bytes
Average growth rate: 1215792.854961832 bytes per batch

Something doesn't seem right here:

  • Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.
  • 7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?
  • Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

@Mirko-von-Leipzig
Copy link
Contributor

Something doesn't seem right here:

  • Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I didn't check how size is measured; but the write could be stuck in the WAL file still?

@SantiagoPittella
Copy link
Collaborator Author

Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I was checking only the size of miden-store.sqlite3; I'm now running it again for 1M accounts using the size of both files (as Mirko mentioned) combined as the total size.

Though I want to clarify that it is each 1000 batches, each block is 16 batches in the implementation so we keep track of the increase of size every 62.5 blocks.

7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?

The 7k are batches, so it is like ~440 blocks. We are using 255 accounts per block + 1 tx to mint assets to each one of the accounts.

Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

I will come back with the results of this new run with 1M accounts and with the store size fixed. Currently it is taking like ~1 hour to run for 1M accounts (for the total process).

@SantiagoPittella
Copy link
Collaborator Author

here you can visualize a flamegraph for 1M accounts (removed the preview because it was too big):
https://github.com/user-attachments/assets/f7639a8c-63cc-4384-99cb-fedc50e8cc58

@SantiagoPittella
Copy link
Collaborator Author

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

@SantiagoPittella
Copy link
Collaborator Author

I'm consistently getting 4550~ish bytes/account

It is worth mention that this number is the result of doing total_db_size / total_number_account so it includes blocks and any other stuff that we store in the DB.

I ran the following query in the DB and got this results:

SELECT
     name,
     SUM(pgsize) AS size_bytes,
     (SUM(pgsize) * 1.0) / (SELECT COUNT(*) FROM accounts) AS bytes_per_row
 FROM dbstat
 WHERE name = 'accounts';

And got this results:
accounts| 6025216 bytes | 60.2750645245193 bytes per account

@bobbinth
Copy link
Contributor

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

4.5KB is better than 5MB - but still looks pretty high. Part of this is nullifiers and notes - but I don't see how these contribute more than 1KB (about 40 bytes for nullifiers + 80 bytes for notes + 500 bytes for note authentication paths). But it is also possible I'm missing something.

Another possibility is that SQLite doesn't do compaction, and maybe there is a lot of "slack" in the file.

@igamigo
Copy link
Collaborator

igamigo commented Jan 30, 2025

There's also the overhead of block headers and indices as well, but those are also probably too small to consider.
One test that you can do is run a VACUUM command to see if the file sizes change considerably after the initial seeding finishes.

@SantiagoPittella
Copy link
Collaborator Author

I added some queries to check the size of each table, and the avg size of each entry in every table. This is the current output of the binary:

Creating new faucet account...
Generated 100000 accounts in 19.377 seconds
Creating notes...
Created notes: inserted 393 blocks with avg insertion time 641 ms
Store file size every 50 blocks:
Block 0: 4096 bytes
Block 50: 2645048 bytes
Block 100: 72965512 bytes
Block 150: 137780152 bytes
Block 200: 201362360 bytes
Block 250: 264911800 bytes
Block 300: 328420280 bytes
Block 350: 392027064 bytes
Block 400: 455551928 bytes
Average growth rate: 1159154.7888040713 bytes per blocks
Average space used per account: 4545.705054133613 bytes
Total time: 274.344 seconds
DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-stress-testing-copy branch from 58cac73 to 348b54c Compare January 31, 2025 15:00
@SantiagoPittella SantiagoPittella changed the base branch from tomasarrachea-stress-test to next January 31, 2025 15:01
@SantiagoPittella SantiagoPittella marked this pull request as ready for review January 31, 2025 15:18
@SantiagoPittella SantiagoPittella changed the title draft: store seed refactor feat: store seed command in stress test CLI Jan 31, 2025
@SantiagoPittella
Copy link
Collaborator Author

I could not reduce the time that takes for each block. It is currently at 1.6 blocks/sec

@bobbinth
Copy link
Contributor

I could not reduce the time that takes for each block. It is currently at 1.6 blocks/sec

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

@bobbinth
Copy link
Contributor

DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

Interesting! It seems like 97% of the database is in the notes table (we are adding 4.5KB of data per note). Let's create an issue to optimize note storage. Let's create an issue to optimize note storage. I think we should be able to reduce the size by about 10x.

@SantiagoPittella SantiagoPittella mentioned this pull request Jan 31, 2025
5 tasks
@SantiagoPittella
Copy link
Collaborator Author

@bobbinth I created #662 to address the notes issue.

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

According to the flamegraph, a big part of this process is in miden_processor::Process::execute_mast_node. Though I'm making some changes to get more insights of this.

@bobbinth
Copy link
Contributor

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

According to the flamegraph, a big part of this process is in miden_processor::Process::execute_mast_node. Though I'm making some changes to get more insights of this.

Looking at the code, I think what we are measuring as insertion time is actually BlockBuilder::build_block() time. This will actually covers much more than just inserting block into the store (e.g., it also executes the block kernel to build the block). So, this will be much longer than what we want. What I really mean by insertion time is the time it takes the store to process apply_block() request. My hope is that this is less than 100ms - but we'll need to confirm.

What may make sense to do is build blocks manually (and probably the same for batches as well), without using block and batch builders. This is somewhat difficult now, but should become much easier after #659 and subsequent work is done.

So, I guess we have 3 paths forward:

  1. Review & merge this work roughly as is and then re-factor it once manual block/batch building becomes easier.
  2. Try to implement manual block/batch building now, and then update it with the new structure later.
  3. Put this on hold and then re-work once manual block/batch building becomes easier.

Assuming this is pretty close to being done, path 1 probably makes the most sense - but let me know what you think.

@@ -15,35 +15,37 @@ version.workspace = true
workspace = true

[features]
testing = ["dep:miden-air", "dep:miden-node-test-macro", "dep:rand_chacha", "dep:winterfell"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a more manual approach to batch/block building, we probably wouldn't need to introduce the testing feature. So, maybe that's an argument for putting the work on hold for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants