Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miden Component Model MVP #1624

Open
plafer opened this issue Jan 14, 2025 · 1 comment
Open

Miden Component Model MVP #1624

plafer opened this issue Jan 14, 2025 · 1 comment

Comments

@plafer
Copy link
Contributor

plafer commented Jan 14, 2025

In this issue, I'd like to propose a concrete MVP for a "Miden Component Model" as introduced by @bitwalker in #1171 and subsequent issues. Concretely speaking, this calls for a change in how execution contexts work in the Miden VM, among a few other things. The goals of this post are to start the conversation on a more concrete implementation plan of a Miden Component model, confirm that the current proposal solves the problems on the compiler side while still being correct from the rollup's perspective, and evaluate if and when we would want to move forward with the proposal (or a subset of it).

The problems with the current design of the VM that this proposal tackles are:

  1. Each procedure call creates a new memory context, creating issues when compiling from higher-level languages (specifically Rust/wasm).
    • For example, currently the compiler is forced to reinitialize any global variables on every call to procedures, most likely leading to a non-trivial performance penalty.
    • Note that currently only the "root context" can be called into multiple times (using syscall), and hence is the only context in which we can persist state for the duration of the program.
    • This design also means that there's often one correct way to call a procedure (i.e. call, exec, dyncall or dynexec), leading us to document the correct way to call each procedure, which is awkward and error-prone.
  2. Provide a simple native access-control mechanism
    • this would supersede the current caller-based mechanism used in the rollup, where only Account-based methods are allowed to call into the kernel.
    • the kernel maintains a whitelist of procedures which are allowed to call into it, and on every kernel method call, first checks whether the caller procedure is whitelisted (using the caller instruction)
  3. (Optional) provide an "call indirect" instruction that is not MAST root-based

This issue is structured as follows:

  1. First, we introduce what a "Miden component" is concretely
  2. Next, we talk about all the changes needed in the VM
  3. Next, we talk about the changes needed in miden-base
  4. Finally, we touch on the possibility of adding a "call indirect" instruction

The Miden component

A Miden component is simply

  1. a memory image that is persistent across calls,
  2. a set of exposed procedures that other components can CALL into,
  3. a statically-defined list of components which are allowed to call the exposed procedures.

Think of it as a module with its own address space. Intuitively, the "component" is a generalization of our current kernel concept, with the modifications that

  1. The kernel defines the only memory context that is re-accessible across calls. In the new model, we can have "multiple kernels" in that sense.
  2. With the current kernel, you need to use the syscall instruction to call into a procedure exposed by the kernel. In the new model, you just use call, because there is no kernel anymore (or, "every component is a kernel").
  • hence, we will need to modify the call instruction slightly so that you can specify which component you're calling into.
  1. The current VM has no concept of access control, and therefore the rollup needs to hack one together using the caller instruction. Therefore the Miden component defines an access control system to let a component (e.g. the transaction kernel) define who is allowed to call it (e.g. the Account component)
    • Note: the current proposal has the access-control list statically-defined for simplicity, but I'm not sure if that is compatible with how the rollup works.

To talk to another component, we use the call instruction on one of the callee's public procedure roots (otherwise the call results in an error). As in the current design, components can use the stack to pass procedure arguments. If 16 elements is not enough, they can use the advice provider to un-hash the data (i.e. caller passes the data hash on the stack, while the callee reads the data from advice provider, hashes it, and verifies that the hash matches the one passed on the stack).

The VM will also still need to know which component is the "kernel" - that is, the component that defines the entrypoint of the program (encoded in the public inputs). Other than this small edge case, all components in the system are identical (and the kernel is just another component). Hence, a Program becomes a set of interacting components, with a single component defining the entrypoint.

Changes to the VM

The new responsibilities required from the VM are:

  1. enforce that only public procedures exposed by components be call-able
  2. each component can define an "allow list" of components that can call is procedures
  3. a mechanism to initialize the memory image of each component exactly once

Both require the call instruction (the mechanism for cross-component calls) to be modified to take a "component id": call.component_id.procedure_root. That is, in the current system, the memory execution context is determined by the clock cycle; in the new system, it is written directly in the instruction.

(1) is easily solvable by re-using the existing kernel ROM machinery. The general idea is that a program would declare all components and their exposed procedure roots in the public inputs, in a similar way to our current Kernel declaration. Then, on every call instruction, the (modified) kernel ROM would ensure that the (component_id, procedure_root) pair is present in the public inputs (in an analogous way to how it currently checks that all syscalls are indeed made to an exposed kernel procedure).

As mentioned previously, we will no longer need the syscall and caller instructions.

(2) would be solved in a similar way. Every component defines a list of components that are allowed to call it (where maybe "empty list" means "all"). We would define a new lookup table similar in spirit to the kernel ROM (converted to a lookup table as described in #1518), but instead would look like source_id, dest_id, multiplicity. The table is initialized from the public inputs with all allowed calls (e.g. if component 0 only wants to allow calls from component 1, we would have an entry 1, 0, ? where the multiplicity is determined after running the program).

As for (3), there are a few ways to do it. I propose that we use a privileged "bootloader" that would run at the beginning of all programs and initialize all the memory contexts (thanks to @bitwalker for pointing me in this direction). Hence, this requires

2.1 Adding a "privileged mode" to the VM that would allow the bootloader to write to any component's memory (along with privileged versions of the mem_store and mem_storew instructions), and
2.2 a new INIT MastNode, for which

  • Every program is required to start with INIT
  • INIT works just like JOIN, except that the left child is always the bootloader.
  • The left child of INIT runs in privileged mode, while the right child (i.e. the user program) runs in non-privileged mode.

2.1 Privileged mode

This requires adding a bit column, which is set to 1 at the beginning of the program (i.e. when executing the left child of INIT), and turned off when transitioning to the right child of INIT (and for the rest of the program).

Note that this privileged mode can be extended in the future to be turned on in other scenarios than only at the beginning on the program. It can also be used for more than only writing to any component's memory.

We will also need two new instructions mem_store_p and mem_storew_p, which work analogously to mem_store and mem_storew, but that take an extra component_id parameter. They will be constrained to only be allowed when the privileged bit is set to 1.

2.2 the INIT node, and the bootloader

The VM will constrain the first node to be the INIT node, which as mentioned previously, works like a JOIN node: it has two children, and executes the left followed by the right. However, it also manages the privileged bit, which is turned on at the beginning of the program, and turned off after executing the left child (that we call "bootloader").

We will also want to constrain the bootloader that is executed, by having the user place the MAST root of the bootloader that it expects in public inputs, as well as a constraint that checks that the left child of INIT is indeed set to that value.

The bootloader would unhash from the advice provider the description of the memory images for each component (and compare the resulting hash against the expected hash provided in public inputs). For example, a possible format could be a serialized version of

- number of components
- component id 1
  - [address_1, word_1, ..., address_n, word_n]  # all the words 
  - [address_1, element_1, ..., address_m, element_m] # all the individual elements
- ...
- component id k
  - [address_1, word_1, ..., address_n, word_n]  # all the words 
  - [address_1, element_1, ..., address_m, element_m] # all the individual elements 

The bootloader would run through this data, use mem_storew_p to write the words to the appropriate addresses in the specified component, and similarly mem_store_p to write the elements.

Note: We would have a similar serialized data structure for the "allow lists" defined by each component.

Note: there are many bootloaders possible, and in this design, we are free to change the bootloader without any changes to the VM. This of course excludes the case where a new bootloader needs a new privileged instruction.

Changes to miden-base

I don't think many changes to miden-base are needed. For transaction execution, the transaction kernel would still the "kernel component" defining the entrypoint, and would store the account's data. The Account component would presumably always be component 1, such that the allow-list of the transaction kernel can always be "only component 1". Then, all note scripts can run in their own component (with component IDs 2+). Everything else pretty much is structured as it currently is.

As previously mentioned, I'm not sure if having a static access control list works; @bobbinth would love your input on this. Also, the access control list has no analog in the Wasm component model, so @bitwalker I would love to know if this causes issues when compiling down from the Wasm component model.

(Optional) The "call indirect" instruction

Currently, the rollup uses exec_kernel_proc so that changes to the kernel doesn't result in changes to an Account's MAST root. For example, if the kernel method was called directly using syscall.kernel_proc_hash, and the kernel method changed, then the syscall.kernel_proc_hash MAST hash would also change, and therefore the Account's procedure containing the syscall's MAST root would change.

This shows the desire to have both "content-addressable" calls (e.g. with call.proc_hash), as well as "dynamic/indirect" calls (using exec_kernel_proc). Adding a "call indirect" instruction would fit pretty nicely in the proposed component model. In MASM it would look like calli.component_id.proc_idx, where proc_idx is the index of the procedure exposed by the component. This would have the desired effect that the target component's procedure can change its MAST root without changing the MAST root of the caller.

Conclusion

Although I tried to design this so that it be the smallest possible distance from our current state, it still seems like it would be a significant amount of work to fully implement. Assuming that there aren't major flaws that completely invalidate the design, we should discuss if and when we want to start working on this.

@greenhat
Copy link
Contributor

...

Changes to the VM

The new responsibilities required from the VM are:

  1. enforce that only public procedures exposed by components be call-able
  2. each component can define an "allow list" of components that can call is procedures
  3. a mechanism to initialize the memory image of each component exactly once

Both require the call instruction (the mechanism for cross-component calls) to be modified to take a "component id": call.component_id.procedure_root. That is, in the current system, the memory execution context is determined by the clock cycle; in the new system, it is written directly in the instruction.

(1) is easily solvable by re-using the existing kernel ROM machinery. The general idea is that a program would declare all components and their exposed procedure roots in the public inputs, in a similar way to our current Kernel declaration. Then, on every call instruction, the (modified) kernel ROM would ensure that the (component_id, procedure_root) pair is present in the public inputs (in an analogous way to how it currently checks that all syscalls are indeed made to an exposed kernel procedure).

As mentioned previously, we will no longer need the syscall and caller instructions.

(2) would be solved in a similar way. Every component defines a list of components that are allowed to call it (where maybe "empty list" means "all"). We would define a new lookup table similar in spirit to the kernel ROM (converted to a lookup table as described in #1518), but instead would look like source_id, dest_id, multiplicity. The table is initialized from the public inputs with all allowed calls (e.g. if component 0 only wants to allow calls from component 1, we would have an entry 1, 0, ? where the multiplicity is determined after running the program).

...

Since the Wasm CM does not have any access control tools baked in, we need to be able to encode the "all allowed" value in the "allow list". Empty "allow list" sounds good for this.
In case we want to specify an "allow list" in the Rust project, we could do it in the package metadata in Cargo.toml and populate the "allow list".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants