You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the introduction of stateless validation, managing of state witness size limits has become very important. An important component of state witness size limit is the soft limit for the storage proof managed by main_storage_proof_size_soft_limit config.
Trie contains a recorder that keeps track of all the trie nodes that are read during the runtime execution. In runtime we call the recorded_storage_size and recorded_storage_size_upper_bound functions after application of each receipt to check whether we have exceed the size limit or not.
Problem
There are couple of problems with the current approach that we would like to address.
Manual check for limits
As a developer, while working in runtime, we need to be mindful of the proof size limits and it becomes our duty to add checks. This can be a potential source of bug in case we forget to check the limits.
It would be much better and easier if the task of handling the size check is automatically done at the recorder layer.
Practicality of checking limits
Sometimes the code is structured in such a way that it's not possible to check the size limit. The solution is similar to above where we remove manual checks for limit and integrate it at the recorder layer.
A practical example that we encountered was in resharding. During resharding we duplicate the delayed receipts across both the child shards. On the runtime side, we modified the code to skip receipts that didn't belong to our shard, however due to the way the code is structured, checking the limit isn't possible after reading each receipt.
Checking of the limit is done here, however within the loop we could potentially be reading multiple receipts here.
Optimize access to trie key
Sometimes the leaf node that we are reading can be much larger than the remaining limit. Let's take an example where we have a limit for 100 KB, however the receipt we are reading is ~1 MB. Since we check the limit AFTER reading the key, we've already exceeded the limit by 900 KB.
However, in the trie layer, it's actually possible to check the size limit before reading the leaf node. While getting a key, we first get the optimized_ref and later dereference it. While the size limit check added to the recorder, it's actually possible to check the limit before calling self.deref_optimized.
pub enum OptimizedValueRef {
Ref(ValueRef),
AvailableValue(ValueAccessToken),
}
pub struct ValueRef {
/// Value length in bytes.
pub length: u32,
/// Unique value hash.
pub hash: CryptoHash,
}
/// Retrieves the full value for the given key.
pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError> {
match self.get_optimized_ref(key, KeyLookupMode::FlatStorage)? {
Some(optimized_ref) => Ok(Some(self.deref_optimized(&optimized_ref)?)),
None => Ok(None),
}
}
So in this case, we would read a couple of bytes to walk through the trie nodes till we get the leaf node. Then for the case of ValueRef, we would first see the length and see that it is 1 MB and it exceeds our 100 KB limit. So we can error and not read the 1 MB value and not record it.
I don't remember the exact details but there was another issue we hit while handling buffered receipts during resharding. We needed to read the buffered receipt and make a decision based on that. In this case too it was possible that the act of reading the receipt would exceed the size limit.
Proposed solution
Like we described along with the problem, the most straightforward solution to all these three problems is to integrate the size limit with the recorder. From a design perspective that's the best place to have it.
The text was updated successfully, but these errors were encountered:
This PR is the first part of
#12701
The PR moves tracking of proof_size_limit from runtime to trie recorder.
There should be no functional change.
This is a more natural place to expose the check for proof_size_limit
and sets the basis for future improvements like
- Potentially moving compute_limit to recorder as well
- Better tracking and checking of limits (required for resharding)
- Potential setup to add better checks for limits (required for reading
and managing buffered receipts)
…12721)
This PR builds on top of #12710 and
is related to issue #12701
After resharding, we copy the set of delayed receipts from parent to
both children. While handling delayed receipts in the new shard, we
iterated over all the delayed receipts and ignored/removed the ones that
did not belong to our shard.
Due to the implementation however, we bypassed checks to the state
witness size limit in the specific case of when the receipt did not
belong to our shard.
Now that we are doing the checks for the state witness size within the
trie recorder, it's much simper to go back and check the proof size
limit after reaching EVERY delayed receipt.
Description
Context
With the introduction of stateless validation, managing of state witness size limits has become very important. An important component of state witness size limit is the soft limit for the storage proof managed by main_storage_proof_size_soft_limit config.
Trie contains a recorder that keeps track of all the trie nodes that are read during the runtime execution. In runtime we call the recorded_storage_size and recorded_storage_size_upper_bound functions after application of each receipt to check whether we have exceed the size limit or not.
Problem
There are couple of problems with the current approach that we would like to address.
Manual check for limits
As a developer, while working in runtime, we need to be mindful of the proof size limits and it becomes our duty to add checks. This can be a potential source of bug in case we forget to check the limits.
It would be much better and easier if the task of handling the size check is automatically done at the recorder layer.
Practicality of checking limits
Sometimes the code is structured in such a way that it's not possible to check the size limit. The solution is similar to above where we remove manual checks for limit and integrate it at the recorder layer.
A practical example that we encountered was in resharding. During resharding we duplicate the delayed receipts across both the child shards. On the runtime side, we modified the code to skip receipts that didn't belong to our shard, however due to the way the code is structured, checking the limit isn't possible after reading each receipt.
Checking of the limit is done here, however within the loop we could potentially be reading multiple receipts here.
Optimize access to trie key
Sometimes the leaf node that we are reading can be much larger than the remaining limit. Let's take an example where we have a limit for 100 KB, however the receipt we are reading is ~1 MB. Since we check the limit AFTER reading the key, we've already exceeded the limit by 900 KB.
However, in the trie layer, it's actually possible to check the size limit before reading the leaf node. While getting a key, we first get the optimized_ref and later dereference it. While the size limit check added to the recorder, it's actually possible to check the limit before calling
self.deref_optimized
.So in this case, we would read a couple of bytes to walk through the trie nodes till we get the leaf node. Then for the case of
ValueRef
, we would first see thelength
and see that it is 1 MB and it exceeds our 100 KB limit. So we can error and not read the 1 MB value and not record it.I don't remember the exact details but there was another issue we hit while handling buffered receipts during resharding. We needed to read the buffered receipt and make a decision based on that. In this case too it was possible that the act of reading the receipt would exceed the size limit.
Proposed solution
Like we described along with the problem, the most straightforward solution to all these three problems is to integrate the size limit with the recorder. From a design perspective that's the best place to have it.
The text was updated successfully, but these errors were encountered: