Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored account tree in order to store last N reverse updates, compute opening for recent blocks #643

Open
wants to merge 15 commits into
base: next
Choose a base branch
from

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Jan 24, 2025

In this PR we refactored account's in-memory storage in order to store last $N$ reverse updates alongside with the latest account's SMT. Updates are persistent (in files), old updates are deleted. We also reorganize storage files by putting them into the storage directory. Also implemented method compute_opening which constructs opening for account for any recent block.

This solution is not expected to be significantly slower than constructing opening for the SMT itself, but we will measure this in benchmarks.

We will need to reimplement calculation of reverse mutation set in order to keep transactional approach of applying changes (the current solution might cause inconsistent DB and in-memory states if writing of update failed).

@polydez polydez self-assigned this Jan 24, 2025
# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	crates/store/src/state.rs
@Mirko-von-Leipzig
Copy link
Contributor

@polydez is this ready for review? Still marked as a draft so just checking.

@polydez
Copy link
Contributor Author

polydez commented Jan 24, 2025

@polydez is this ready for review? Still marked as a draft so just checking.

It's almost ready, thanks for noticing. I will update CHANGELOG.md and mark the PR as ready for review.

@polydez polydez changed the title Refactored account tree in order to store last $N$ reverse updates, compute opening for recent blocks Refactored account tree in order to store last N reverse updates, compute opening for recent blocks Jan 24, 2025
@polydez polydez marked this pull request as ready for review January 24, 2025 08:02
Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments; still need to review the actual tree.

Do we need persistent storage for this? Could we not just store this in-memory only in a ring-buffer? Or are these huge? If they're huge then we will also have disk IO issues I think.

@@ -17,6 +17,7 @@ workspace = true
[dependencies]
deadpool-sqlite = { version = "0.9.0", features = ["rt_tokio_1"] }
hex = { version = "0.4" }
miden-crypto = { version = "0.13" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not stay the workspace version so it gets the patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets the patch while the crate has the same version (version.minor.* by semver) as the patched one. We usually specify dependencies in Cargo.toml of specific crate, if we use them only there. But I'm not sure this is the best approach, I would rather specify all versions in BoM (workspace's Cargo.toml).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does get a bit weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, @bobbinth has different opinion on this, we maybe need to discuss which approach is better to use. I have motivation to specify all versions in workspace's Cargo.toml: I expect that this will help us to reduce possible incompatibilities between different crates' versions in our project (when the same structure/trait from different dependency versions is used across different crates of the same workspace). This approach is also called as "bill of materials" (BoM) and helps developers to manage dependency versions in subprojects of some complex projects. This won't help to eliminate all incompatibilities (when dependencies outside of the workspace depend on different dependency versions), but I do think, we should use it at least to simplify dependency management within our workspace(s).

}
}

#[allow(async_fn_in_trait)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to suppress this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're going to use the trait only in our code, so it's okay to suppress it (or maybe even better to change this to #[expect(async_fn_in_trait)]).

Copy link
Contributor Author

@polydez polydez Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other approaches would be to desugar async functions to just returning impl Future<...> or use async_trait crate (but I think, we shouldn't use async_trait anyway).

Comment on lines +304 to +308
pub trait PersistentUpdatesStorage {
async fn load(&self, block_num: BlockNumber) -> Result<Option<Update>, DatabaseError>;
async fn save(&self, block_num: BlockNumber, update: &Update) -> Result<(), DatabaseError>;
async fn remove(&self, block_num: BlockNumber) -> Result<(), DatabaseError>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a trait? If its only for test purposes then I would prefer using something like tempfs for testing over having trait/generics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need this trait for testing and benchmarking. This approach also allows us to mockup implementations of storage in tests.

@polydez
Copy link
Contributor Author

polydez commented Jan 28, 2025

I made benchmarking for my solution which computes opening for one of recent blocks, comparing with opening in the latest account SMT.

For 10k accounts with 10k updates in the 1st block and 1k updates per each of remaining 99 blocks:

SMT opening elapsed (for 10000 accounts, average): 108 ms
AccountTree `compute_opening` elapsed (for 10000 accounts, average): 321 ms

So the compute_opening over reverse updates works 3 times slower than opening in single SMT. This is slower than I expected, but I think it is pretty good since it takes just ~0.03 ms for computing opening of a single account (on M3 Max MacBook Pro).

@polydez
Copy link
Contributor Author

polydez commented Jan 30, 2025

Do we need persistent storage for this? Could we not just store this in-memory only in a ring-buffer? Or are these huge? If they're huge then we will also have disk IO issues I think.

I apologize, I forgot to answer this question. We need persistent storage in order to reconstruct in-memory updates on node restart. Otherwise, we would need to wait for $N$ blocks to be applied until will be able to serve our future endpoint for up to $N$ recent proofs.

@Mirko-von-Leipzig
Copy link
Contributor

Do we need persistent storage for this? Could we not just store this in-memory only in a ring-buffer? Or are these huge? If they're huge then we will also have disk IO issues I think.

I apologize, I forgot to answer this question. We need persistent storage in order to reconstruct in-memory updates on node restart. Otherwise, we would need to wait for N blocks to be applied until will be able to serve our future endpoint for up to N recent proofs.

Is the alternative of rebuilding them on restart not possible? I'd prefer to minimize disk IO as much as possible - every new file is something that can lead to strange failures.

@polydez
Copy link
Contributor Author

polydez commented Jan 30, 2025

Is the alternative of rebuilding them on restart not possible? I'd prefer to minimize disk IO as much as possible - every new file is something that can lead to strange failures.

I was thinking about this and it seems to be too expensive: we need to construct reverse mutations sets and it's possible only when we know previous account's hash for each recent block. Blocks consist of all required information, but this might require to process blocks from the genesis up to the latest block because of accounts which might be changed only in first blocks. In the past we had previous account's hash in BlockAccountUpdate structure (if I remember correctly), but now we haven't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants