-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored account tree in order to store last N
reverse updates, compute opening for recent blocks
#643
base: next
Are you sure you want to change the base?
Conversation
…ofs for old blocks
# Conflicts: # Cargo.lock # Cargo.toml # crates/store/src/state.rs
@polydez is this ready for review? Still marked as a |
It's almost ready, thanks for noticing. I will update CHANGELOG.md and mark the PR as ready for review. |
N
reverse updates, compute opening for recent blocks
…o separate implementation, make code more suitable for testing and benchmarking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments; still need to review the actual tree.
Do we need persistent storage for this? Could we not just store this in-memory only in a ring-buffer? Or are these huge? If they're huge then we will also have disk IO issues I think.
@@ -17,6 +17,7 @@ workspace = true | |||
[dependencies] | |||
deadpool-sqlite = { version = "0.9.0", features = ["rt_tokio_1"] } | |||
hex = { version = "0.4" } | |||
miden-crypto = { version = "0.13" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not stay the workspace version so it gets the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets the patch while the crate has the same version (version.minor.*
by semver) as the patched one. We usually specify dependencies in Cargo.toml
of specific crate, if we use them only there. But I'm not sure this is the best approach, I would rather specify all versions in BoM (workspace's Cargo.toml
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it does get a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, @bobbinth has different opinion on this, we maybe need to discuss which approach is better to use. I have motivation to specify all versions in workspace's Cargo.toml
: I expect that this will help us to reduce possible incompatibilities between different crates' versions in our project (when the same structure/trait from different dependency versions is used across different crates of the same workspace). This approach is also called as "bill of materials" (BoM) and helps developers to manage dependency versions in subprojects of some complex projects. This won't help to eliminate all incompatibilities (when dependencies outside of the workspace depend on different dependency versions), but I do think, we should use it at least to simplify dependency management within our workspace(s).
crates/store/src/account_tree.rs
Outdated
} | ||
} | ||
|
||
#[allow(async_fn_in_trait)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to suppress this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we're going to use the trait only in our code, so it's okay to suppress it (or maybe even better to change this to #[expect(async_fn_in_trait)]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other approaches would be to desugar async
functions to just returning impl Future<...>
or use async_trait
crate (but I think, we shouldn't use async_trait
anyway).
pub trait PersistentUpdatesStorage { | ||
async fn load(&self, block_num: BlockNumber) -> Result<Option<Update>, DatabaseError>; | ||
async fn save(&self, block_num: BlockNumber, update: &Update) -> Result<(), DatabaseError>; | ||
async fn remove(&self, block_num: BlockNumber) -> Result<(), DatabaseError>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a trait? If its only for test purposes then I would prefer using something like tempfs
for testing over having trait/generics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need this trait for testing and benchmarking. This approach also allows us to mockup implementations of storage in tests.
I made benchmarking for my solution which computes opening for one of recent blocks, comparing with opening in the latest account SMT. For 10k accounts with 10k updates in the 1st block and 1k updates per each of remaining 99 blocks:
So the |
I apologize, I forgot to answer this question. We need persistent storage in order to reconstruct in-memory updates on node restart. Otherwise, we would need to wait for |
Is the alternative of rebuilding them on restart not possible? I'd prefer to minimize disk IO as much as possible - every new file is something that can lead to strange failures. |
I was thinking about this and it seems to be too expensive: we need to construct reverse mutations sets and it's possible only when we know previous account's hash for each recent block. Blocks consist of all required information, but this might require to process blocks from the genesis up to the latest block because of accounts which might be changed only in first blocks. In the past we had previous account's hash in |
In this PR we refactored account's in-memory storage in order to store last$N$ reverse updates alongside with the latest account's SMT. Updates are persistent (in files), old updates are deleted. We also reorganize storage files by putting them into the
storage
directory. Also implemented methodcompute_opening
which constructs opening for account for any recent block.This solution is not expected to be significantly slower than constructing opening for the SMT itself, but we will measure this in benchmarks.
We will need to reimplement calculation of reverse mutation set in order to keep transactional approach of applying changes (the current solution might cause inconsistent DB and in-memory states if writing of update failed).