Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better management of state witness proof size limits #12701

Open
Tracked by #103
shreyan-gupta opened this issue Jan 8, 2025 · 0 comments
Open
Tracked by #103

Better management of state witness proof size limits #12701

shreyan-gupta opened this issue Jan 8, 2025 · 0 comments
Assignees

Comments

@shreyan-gupta
Copy link
Contributor

Description

Context

With the introduction of stateless validation, managing of state witness size limits has become very important. An important component of state witness size limit is the soft limit for the storage proof managed by main_storage_proof_size_soft_limit config.

Trie contains a recorder that keeps track of all the trie nodes that are read during the runtime execution. In runtime we call the recorded_storage_size and recorded_storage_size_upper_bound functions after application of each receipt to check whether we have exceed the size limit or not.

Problem

There are couple of problems with the current approach that we would like to address.

Manual check for limits

As a developer, while working in runtime, we need to be mindful of the proof size limits and it becomes our duty to add checks. This can be a potential source of bug in case we forget to check the limits.

It would be much better and easier if the task of handling the size check is automatically done at the recorder layer.

Practicality of checking limits

Sometimes the code is structured in such a way that it's not possible to check the size limit. The solution is similar to above where we remove manual checks for limit and integrate it at the recorder layer.

A practical example that we encountered was in resharding. During resharding we duplicate the delayed receipts across both the child shards. On the runtime side, we modified the code to skip receipts that didn't belong to our shard, however due to the way the code is structured, checking the limit isn't possible after reading each receipt.

Checking of the limit is done here, however within the loop we could potentially be reading multiple receipts here.

Optimize access to trie key

Sometimes the leaf node that we are reading can be much larger than the remaining limit. Let's take an example where we have a limit for 100 KB, however the receipt we are reading is ~1 MB. Since we check the limit AFTER reading the key, we've already exceeded the limit by 900 KB.

However, in the trie layer, it's actually possible to check the size limit before reading the leaf node. While getting a key, we first get the optimized_ref and later dereference it. While the size limit check added to the recorder, it's actually possible to check the limit before calling self.deref_optimized.

pub enum OptimizedValueRef {
    Ref(ValueRef),
    AvailableValue(ValueAccessToken),
}

pub struct ValueRef {
    /// Value length in bytes.
    pub length: u32,
    /// Unique value hash.
    pub hash: CryptoHash,
}
    /// Retrieves the full value for the given key.
    pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError> {
        match self.get_optimized_ref(key, KeyLookupMode::FlatStorage)? {
            Some(optimized_ref) => Ok(Some(self.deref_optimized(&optimized_ref)?)),
            None => Ok(None),
        }
    }

So in this case, we would read a couple of bytes to walk through the trie nodes till we get the leaf node. Then for the case of ValueRef, we would first see the length and see that it is 1 MB and it exceeds our 100 KB limit. So we can error and not read the 1 MB value and not record it.

I don't remember the exact details but there was another issue we hit while handling buffered receipts during resharding. We needed to read the buffered receipt and make a decision based on that. In this case too it was possible that the act of reading the receipt would exceed the size limit.

Proposed solution

Like we described along with the problem, the most straightforward solution to all these three problems is to integrate the size limit with the recorder. From a design perspective that's the best place to have it.

github-merge-queue bot pushed a commit that referenced this issue Jan 11, 2025
This PR is the first part of
#12701

The PR moves tracking of proof_size_limit from runtime to trie recorder.
There should be no functional change.

This is a more natural place to expose the check for proof_size_limit
and sets the basis for future improvements like
- Potentially moving compute_limit to recorder as well
- Better tracking and checking of limits (required for resharding)
- Potential setup to add better checks for limits (required for reading
and managing buffered receipts)
@shreyan-gupta shreyan-gupta self-assigned this Jan 11, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 13, 2025
…12721)

This PR builds on top of #12710 and
is related to issue #12701

After resharding, we copy the set of delayed receipts from parent to
both children. While handling delayed receipts in the new shard, we
iterated over all the delayed receipts and ignored/removed the ones that
did not belong to our shard.

Due to the implementation however, we bypassed checks to the state
witness size limit in the specific case of when the receipt did not
belong to our shard.

Now that we are doing the checks for the state witness size within the
trie recorder, it's much simper to go back and check the proof size
limit after reaching EVERY delayed receipt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant