Advice provider, MMR, and SMT #750

bobbinth · 2023-03-06T07:56:39Z

bobbinth
Mar 6, 2023
Maintainer

In this discussion I wanted to think through potential refactoring that we'll need to make to the advice provider to better support MMR and SMT data structures. But first, I'd like to describe high-level goals and general structure of the advice provider.

Purpose of Advice provider

The purpose of the advice provider evolved somewhat over time. At first, it was just for providing nondeterministic inputs to the VM, but more recently it became a full-blown interface for the VM to communicate with the host. With the latest round of updates, the advice provide now persists beyond the execution of the VM and thus can be used by the VM to return large amounts of data (though, this data needs to be committed to via the values returned on the stack).

The advice provider consists of 3 parts:

Advice tape - which is the main way the VM reads nondeterministic inputs. In reality this is not a tape but a stack - so, at some point, we should probably rename it into advice stack.
Advice map - which is a key-value map where keys are word (4 field elements) and values can be vectors of field elements of arbitrary lengths. The VM can do two things with the advice map:
- Insert new values into it (useful for returning data via advice provider).
- Copy values from the advice map onto the advice tape.
Merkle sets - which is a store of Merkle-tree-like data structures and should probably be called Merkle store (that's how I will refer to it throughout this note). This is the component relevant to MMR and SMT data structures and it will be the focus of this note.

The purpose of the Merkle store is let the VM manipulate authenticated data structure without having to worry about how these data structures are implemented. That is, as long as a data structure meets the required interface, the VM can work with it regardless of what the data structure actually is.

The interface the VM imposes on these data structure is as follows:

All structures can be uniquely identified by a root (4 field elements).
Given an index and a depth, we should be able to get a node from these data structures (a node is also 4 elements).
Given an index and a depth, we should be able to get a Merkle path from the node at that location to the root.
Given an index and a depth, we should be able to update the node at that location and and this should update the root of the data structure. In the current implementation, this is actually not fully supported as we can update only leaf nodes - not internal nodes.

The data structures we have implemented as of yet (considering SMT and MMR to be still in progress):

Structure	Description
MerkleTree	A fully balanced Merkle tree of depth at most 64.
SimpleSmt	A simple Sparse Merkle tree of depth at most 64. This is just a more efficient version of `MerkleTree` for when we know that most leaves are zeros.
MerklePathSet	A collection of Merkle paths all having the same length and resolving to the same root. We probably should refactor this and rename into something like a `PartialMerkleTree`.

The VM exposes 3 instructions to work with these mtree_get, mtree_set, and mtree_cwm (see here). These instructions don't care which one of the 3 data structures they work with. For example, a program using mtree_get should work in exactly the same way regardless of whether the underlying data structure is MerkleTree, SimpleSmt, or MerklePathSet as long as the advice provider has enough data to satisfy this specific instruction invocation.

This is useful, for example, when we want to prepare a transaction for proving by someone else. In this scenario, we would first execute the transaction and instantiate the advice provider with full sparse Merkle trees describing account storage, vault etc. as we don't know which storage slots will be read/updated by the transaction. But as the transaction is executing, we can collect the minimum amount of data needed to execute/prove it. Thus, we can send this data to the prover and the prove can instantiate their advice provider with only the partial data they have.

Current shortcomings

The way the advice provider is currently set up is not sufficient to support compact SMT and MMR data structure and we'll probably need to modify it somewhat. But even before we get there, the advice provider has a couple of issues.

Current Issue 1

Clone and update operation works awkwardly and is challenging to use in some situations. Currently, this is done via mtree_cwm - but I now believe this instruction was a mistake. Instead, we should have had an instruction to clone an existing Merkle set and then use the already existing mtree_set to update it.

To support a separate clone operation, we need to add a new method to the AdvcieProvider interface. Something like:

pub trait AdviceProvider {
    fn clone_merkle_set(&mut self, root: Word) -> Result<(), ExecutionError>;
}

In the implementation (e.g., MemAdviceProvider) this shouldn't be too difficult to support either - i.e., instead of using BTreeMap<[u8; 32], MerkleSet> we could use BTreeMap<[u8; 32], Vec<MerkleSet>> and then clone operation would add new item to the vector under the specified key.

Current Issue 2

There is no way to create a new Merkle set in the advice provider. That is, we can pre-load the advice provider with some Merkle sets (each having a unique root), and we can modify them, but we cannot create a blank Merkle set and start adding leaves to it.

One way to address this is to always pre-load the advice provider with immutable empty Sparse Merkle trees of all possible depth (1 - 64). That will allow users to clone empty tries of desired depth (using the clone method described above) and then add leaves to them using mtree_set instruction.

There could be other ways too.

Challenges with MMR

MMR is not a single tree but rather a collection of trees each with a different root. Thus, it doesn't quite fit the current advice provider model well (where every structure is identified by a single root). There are two main operations we want to support with an MMR: (1) verify that a node is in the MMR and (2) append a new node to the MMR.

I think there are broadly two ways of supporting MMRs in the advice provider:

We can introduce MMR as a completely new data structure type.
We can make it work as a collection of Merkle sets.

The first approach would require adding new methods to the AdviceProvider trait which would access Mmr or PartialMmr structs directly. This would probably imply that we need dedicated assembly instructions to work with MMRs (rather than putting the functionality into stdlib). Instruction to verify membership would probably be pretty simple, but I'm not sure how complex the instruction for appending a new node would be. Another open question for this approach is how identify a given MMR in the advice provider (since there no single unique root). We could probably use a collection of peaks to do that - but maybe there is a better approach.

The second approach would require fewer changes to the AdviceProvider interface, but we'd probably need to change the underlying implementation to add another level of indirection. Specifically, given a root we would first need to figure out which data structure could "serve" information about that root, and then go to that data structure. The data structure itself could contain information about one or more roots (vs. current situation where each data structure contains only a single root). Handling membership proofs this way would again be easy - but I'm not sure yet if there is a good way handle insertions since for efficient insertions we need to treat the whole operation as atomic.

I'm leaning somewhat towards the second approach - but we should think through both to see all the pros and cons.

Challenges with compact SMT

The main challenge with supporting compact SMT (i.e., similar to the one here) is that currently the AdviceProvider assumes that for all Merkle sets leaves are located at the same depth, and moreover, we can update only leaves (even though mtree_set instruction is more general).

Similar to MMRs, we have two broad approaches to add support for compact SMTs:

We can introduce compact SMT as a completely new data structure type.
We can make it work as a Merkle set.

The first approach is similar to what is proposed in #724. We need to add some specialized methods for handling compact SMTs to the advice provider. We also probably will need to have 2 different implementations: one for full SMT and one for compact one. And we may have to introduce specialized assembly instructions to work with the SMT data structures directly.

The second approach would probably require adding ability to update internal nodes of Merkle sets via the mtree_set instruction. This could be a fairly complex thing to do.

As with MMR, I'm leaning towards the second approach - but we should evaluate both.

grjte · 2023-03-06T09:51:30Z

grjte
Mar 6, 2023

My initial feeling is that I would also prefer to make both SMT and MMR work with Merkle sets, either as a single Merkle set (SMT) or a collection of them (MMR). I agree that we should also evaluate the approach of introducing SMT and MMR as new data structures, but having more flexible core Merkle set functionality that is data-set agnostic and usable for any (or most) Merkle-tree-based data structure(s) would make the advice provider more powerful without adding too many different moving parts.

Regarding the other current shortcomings of the advice provider -

Instead, we should have had an instruction to clone an existing Merkle set and then use the already existing mtree_set to update it.

Just to clarify - you're talking about a new instruction that just handles cloning, right? So the user would call mtree_clone and then the user would call mtree_set? I think this makes sense as a change. It helps address a lot of the issues we're having & gives more flexibility for supporting a variety of data structures.

One way to address this is to always pre-load the advice provider with immutable empty Sparse Merkle trees of all possible depth (1 - 64). That will allow users to clone empty tries of desired depth (using the clone method described above) and then add leaves to them using mtree_set instruction.

I would like to think about other ways, but this is straightforward and works, so maybe it makes sense to take this approach for now so we can get initial implementations of everything we currently need for the rollup.

5 replies

bobbinth Mar 6, 2023
Maintainer Author

Just to clarify - you're talking about a new instruction that just handles cloning, right? So the user would call mtree_clone and then the user would call mtree_set?

Yes, that's exactly how I was thinking about it. mtree_clone would actually be more like a decorator as it would only affect the state of the advice provider but not the rest of the VM state.

hackaugusto Mar 6, 2023

IMO the mtree_clone is not really necessary. Merkle Trees are effectively immutable data structures, the update operation actually creates a totally new tree with a new root, the old one never ceases to exist.

The mtree_clone instruction would effectively expose the internals of the VM to the user, and it basically is asking the user to manage the number of copies we have in memory on our behalf because a mtree_set mutates in place. A more user friendly approach, which would also be a bit more efficient since the MASM code doesn't have to manage the number of copies in the advice set, is to have copy-on-write structures or persistent data structures. What I mean by this is that after a user performs a mtree_set, they can still identify the exact tree prior to the change with the old root, we just need a way to keep it around.

Edit: I did suggest a while ago that we should have an opcode to clone a range of elements, I think that is totally unnecessary because the user can do that by referencing an existing tree's internal node.

bobbinth Mar 6, 2023
Maintainer Author

I would prefer this approach but it seems like it would be much more complex to implement. Like if we start out with some Merkle tree and make 1000 updates to it while executing a program, wouldn't the advice provider end up having to track 1000 different trees (even if 999 of these are fairly light)? Also, the question is how "light" each additional tree view would be.

hackaugusto Mar 6, 2023

I would prefer this approach but it seems like it would be much more complex to implement. Like if we start out with some Merkle tree and make 1000 updates to it while executing a program, wouldn't the advice provider end up having to track 1000 different trees (even if 999 of these are fairly light)? Also, the question is how "light" each additional tree view would be.

Observations:

An update for node at depth d will force an update for d nodes, since all its parents have to be rehashed.

so:

If we define the mtree_set to only allow for updates to leaves, then 1000 updates would cause 1000 paths to change, to we would end with the initial tree of data + 1000 paths. If we allow for updates of inner nodes, that would be worst case, test best case would be 1000 updates to the root, so we would have the initial tree + 1000 nodes
I just noticed that we can't have reference counting done automatically on the VM, so clone would be needed here to do efficient copy-on-write. So I guess the question is if the cost outline above would be too high. Edit: We can implement copy-on-write a bit more efficiently than I thought, see the response below and Compact representation for persistent structures crypto#90 for an alternative

frisitano Mar 8, 2023

I would prefer this approach but it seems like it would be much more complex to implement. Like if we start out with some Merkle tree and make 1000 updates to it while executing a program, wouldn't the advice provider end up having to track 1000 different trees (even if 999 of these are fairly light)? Also, the question is how "light" each additional tree view would be.

I think the way in which the data is structured / stored could have significant implications here. If we store data using node index as a key e.g. a map of (node_index -> node_hash) then I believe we will have to store a unique copy of all nodes for each version of the tree as there could be differences in the node_hash for particular node_index's for different tree versions. However, if instead we store nodes using a map of node_hash -> (left_child_hash, right_child_hash) this would no longer be the case as the key uniquely represents the data in the node. To do a lookup you just need the tree root and the key. You lookup the root hash in the map and get the (left_child_hash, right_child_hash) tuple, you then select the appropriate child based on the bit in the key and use that to lookup the next node in the map. You continue this until you reach the leaf - see an example here. As @hackaugusto mentioned, updates to a tree involve adding changed nodes to the map. All versions of the tree remain accessible, you just need to specify the root of the tree version you are interested in and a key. Nodes that are the same across different versions of the tree are only stored once in the map and are not replicated.

grjte · 2023-03-08T09:29:31Z

grjte
Mar 8, 2023

A new idea is to address some of these issues by replacing MerklePathSet with MerkleStore, as described here: 0xPolygonMiden/crypto#89

With this, SimpleSMT, TieredSMT, and MerkleTree (the fully-realized Merkle tree) would all be handled as special cases of MerkleStore

0 replies

hackaugusto · 2023-03-08T22:29:57Z

hackaugusto
Mar 8, 2023

@bobbinth could you explain the bit below further?

The second approach would require fewer changes to the AdviceProvider interface, but we'd probably need to change the underlying implementation to add another level of indirection. Specifically, given a root we would first need to figure out which data structure could "serve" information about that root, and then go to that data structure

Would the lookup for the root be done in the current version of MASM or are you thinking about some change to the VM?

I was trying to figure out if we could extend the MPverify to not only work on binary nodes, but also on a list of elements. The opcodes on the Hasher chiplet that we need are there, namely LINEAR_HASH, I just couldn't find a way of expressing "this value corresponds to the nth position on this list" that would be needed for the MMR.

2 replies

bobbinth Mar 8, 2023
Maintainer Author

If we go with implementation suggested in 0xPolygonMiden/crypto#89 (comment), my comment would not be as relevant.

I was trying to figure out if we could extend the MPverify to not only work on binary nodes, but also on a list of elements.

I'm not sure that's needed. If we can determine the relevant root and node index for a specific lookup in the MMR we can execute the existing mtree_get instruction.

I think the operations we want to support are:

Given an absolute position retrieve the leaf value from the MMR. For this we can add a decorator which given a position and information about MMR state (this could come from VM memory) can return peak index, relative node index in the tree under this peak, and the depth of the tree.
Given a node value assert that the value is somewhere in the MMR. I'm not sure how to do this without having an extra map that lets us look up node position based on its value - so, this will probably require some changes to the advice provider. But I'm also wonder if this functionality is really needed.

bobbinth Mar 9, 2023
Maintainer Author

For this we can add a decorator which given a position and information about MMR state (this could come from VM memory) can return peak index, relative node index in the tree under this peak, and the depth of the tree.

Actually, we might have to compute this in the VM. I believe logic in this method will give us the peak index, and probably some more logic will be needed to determine relative index and depth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advice provider, MMR, and SMT #750

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Advice provider, MMR, and SMT #750

bobbinth Mar 6, 2023 Maintainer

Purpose of Advice provider

Current shortcomings

Current Issue 1

Current Issue 2

Challenges with MMR

Challenges with compact SMT

Replies: 3 comments · 7 replies

grjte Mar 6, 2023

bobbinth Mar 6, 2023 Maintainer Author

hackaugusto Mar 6, 2023

bobbinth Mar 6, 2023 Maintainer Author

hackaugusto Mar 6, 2023

frisitano Mar 8, 2023

grjte Mar 8, 2023

hackaugusto Mar 8, 2023

bobbinth Mar 8, 2023 Maintainer Author

bobbinth Mar 9, 2023 Maintainer Author

bobbinth
Mar 6, 2023
Maintainer

Replies: 3 comments 7 replies

grjte
Mar 6, 2023

bobbinth Mar 6, 2023
Maintainer Author

bobbinth Mar 6, 2023
Maintainer Author

grjte
Mar 8, 2023

hackaugusto
Mar 8, 2023

bobbinth Mar 8, 2023
Maintainer Author

bobbinth Mar 9, 2023
Maintainer Author