Better management of state witness proof size limits #12701

shreyan-gupta · 2025-01-08T18:51:43Z

Description

Context

With the introduction of stateless validation, managing of state witness size limits has become very important. An important component of state witness size limit is the soft limit for the storage proof managed by main_storage_proof_size_soft_limit config.

Trie contains a recorder that keeps track of all the trie nodes that are read during the runtime execution. In runtime we call the recorded_storage_size and recorded_storage_size_upper_bound functions after application of each receipt to check whether we have exceed the size limit or not.

Problem

There are couple of problems with the current approach that we would like to address.

Manual check for limits

As a developer, while working in runtime, we need to be mindful of the proof size limits and it becomes our duty to add checks. This can be a potential source of bug in case we forget to check the limits.

It would be much better and easier if the task of handling the size check is automatically done at the recorder layer.

Practicality of checking limits

Sometimes the code is structured in such a way that it's not possible to check the size limit. The solution is similar to above where we remove manual checks for limit and integrate it at the recorder layer.

A practical example that we encountered was in resharding. During resharding we duplicate the delayed receipts across both the child shards. On the runtime side, we modified the code to skip receipts that didn't belong to our shard, however due to the way the code is structured, checking the limit isn't possible after reading each receipt.

Checking of the limit is done here, however within the loop we could potentially be reading multiple receipts here.

Optimize access to trie key

Sometimes the leaf node that we are reading can be much larger than the remaining limit. Let's take an example where we have a limit for 100 KB, however the receipt we are reading is ~1 MB. Since we check the limit AFTER reading the key, we've already exceeded the limit by 900 KB.

However, in the trie layer, it's actually possible to check the size limit before reading the leaf node. While getting a key, we first get the optimized_ref and later dereference it. While the size limit check added to the recorder, it's actually possible to check the limit before calling self.deref_optimized.

pub enum OptimizedValueRef {
    Ref(ValueRef),
    AvailableValue(ValueAccessToken),
}

pub struct ValueRef {
    /// Value length in bytes.
    pub length: u32,
    /// Unique value hash.
    pub hash: CryptoHash,
}

    /// Retrieves the full value for the given key.
    pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError> {
        match self.get_optimized_ref(key, KeyLookupMode::FlatStorage)? {
            Some(optimized_ref) => Ok(Some(self.deref_optimized(&optimized_ref)?)),
            None => Ok(None),
        }
    }

So in this case, we would read a couple of bytes to walk through the trie nodes till we get the leaf node. Then for the case of ValueRef, we would first see the length and see that it is 1 MB and it exceeds our 100 KB limit. So we can error and not read the 1 MB value and not record it.

I don't remember the exact details but there was another issue we hit while handling buffered receipts during resharding. We needed to read the buffered receipt and make a decision based on that. In this case too it was possible that the act of reading the receipt would exceed the size limit.

Proposed solution

Like we described along with the problem, the most straightforward solution to all these three problems is to integrate the size limit with the recorder. From a design perspective that's the best place to have it.

The text was updated successfully, but these errors were encountered:

This PR is the first part of #12701 The PR moves tracking of proof_size_limit from runtime to trie recorder. There should be no functional change. This is a more natural place to expose the check for proof_size_limit and sets the basis for future improvements like - Potentially moving compute_limit to recorder as well - Better tracking and checking of limits (required for resharding) - Potential setup to add better checks for limits (required for reading and managing buffered receipts)

…12721) This PR builds on top of #12710 and is related to issue #12701 After resharding, we copy the set of delayed receipts from parent to both children. While handling delayed receipts in the new shard, we iterated over all the delayed receipts and ignored/removed the ones that did not belong to our shard. Due to the implementation however, we bypassed checks to the state witness size limit in the specific case of when the receipt did not belong to our shard. Now that we are doing the checks for the state witness size within the trie recorder, it's much simper to go back and check the proof size limit after reaching EVERY delayed receipt.

shreyan-gupta mentioned this issue Jan 9, 2025

[trie] Push tracking proof_size_limit to trie recorder #12710

Merged

shreyan-gupta mentioned this issue Jan 11, 2025

[resharding] Check proof size limit while reading delayed receipts #12721

Merged

shreyan-gupta self-assigned this Jan 11, 2025

shreyan-gupta mentioned this issue Jan 20, 2025

Better usage of state witness space #12751

Open

walnut-the-cat mentioned this issue Jan 21, 2025

[ProjectTracking]: State witness handling near/near-one-project-tracking#103

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better management of state witness proof size limits #12701

Better management of state witness proof size limits #12701

shreyan-gupta commented Jan 8, 2025

Better management of state witness proof size limits #12701

Better management of state witness proof size limits #12701

Comments

shreyan-gupta commented Jan 8, 2025

Description

Context

Problem

Manual check for limits

Practicality of checking limits

Optimize access to trie key

Proposed solution