Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembler: enable vendoring of compiled libraries (fixes #1435) #1643

Draft
wants to merge 13 commits into
base: next
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#### Changes
- Update minimum supported Rust version to 1.84.
- Change Chiplet Fields to Public (#1629).
- Added to the `Assembler` the ability to vendor a compiled library.


## 0.12.0 (2025-01-22)
Expand Down
45 changes: 40 additions & 5 deletions assembly/src/assembler/mast_forest_builder.rs
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
use alloc::{
collections::{BTreeMap, BTreeSet},
sync::Arc,
vec::Vec,
};
use core::ops::{Index, IndexMut};

use miette::{IntoDiagnostic, Report};
use vm_core::{
crypto::hash::RpoDigest,
mast::{
DecoratorFingerprint, DecoratorId, MastForest, MastNode, MastNodeFingerprint, MastNodeId,
Remapping, RootIterator,
},
Decorator, DecoratorList, Operation,
};

use super::{GlobalProcedureIndex, Procedure};
use crate::AssemblyError;
use crate::{AssemblyError, Library};

// CONSTANTS
// ================================================================================================
Expand Down Expand Up @@ -59,6 +62,11 @@ pub struct MastForestBuilder {
/// used as a candidate set of nodes that may be eliminated if the are not referenced by any
/// other node in the forest and are not a root of any procedure.
merged_basic_block_ids: BTreeSet<MastNodeId>,
/// A MastForest that contains vendored libraries, it's used to find precompiled procedures and
/// copy their subtrees instead of inserting external nodes.
vendored_mast: Arc<MastForest>,
/// Keeps track of the new ids assigned to nodes that are copied from the vendored_mast.
vendored_remapping: Remapping,
}

impl MastForestBuilder {
Expand All @@ -68,12 +76,24 @@ impl MastForestBuilder {
/// It also returns the map from old node IDs to new node IDs; or `None` if the `MastForest` was
/// unchanged. Any [`MastNodeId`] used in reference to the old [`MastForest`] should be remapped
/// using this map.
pub fn build(mut self) -> (MastForest, Option<BTreeMap<MastNodeId, MastNodeId>>) {
pub fn build(mut self) -> (MastForest, BTreeMap<MastNodeId, MastNodeId>) {
let nodes_to_remove = get_nodes_to_remove(self.merged_basic_block_ids, &self.mast_forest);
let id_remappings = self.mast_forest.remove_nodes(&nodes_to_remove);

(self.mast_forest, id_remappings)
}

pub fn new<'a>(
vendored_libraries: impl IntoIterator<Item = &'a Library>,
) -> Result<Self, Report> {
Comment on lines +86 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add doc comments to this function.

Also, nit: I'd move it above the build() function (e.g., above line 79).

let forests = vendored_libraries.into_iter().map(|lib| lib.mast_forest().as_ref());
let (mast_forest, _remapping) =
vm_core::mast::MastForest::merge(forests).into_diagnostic()?;
Ok(MastForestBuilder {
vendored_mast: Arc::new(mast_forest),
..Self::default()
})
}
}

/// Takes the set of MAST node ids (all basic blocks) that were merged as part of the assembly
Expand Down Expand Up @@ -450,9 +470,24 @@ impl MastForestBuilder {
self.ensure_node(MastNode::new_dyncall())
}

/// Adds an external node to the forest, and returns the [`MastNodeId`] associated with it.
pub fn ensure_external(&mut self, mast_root: RpoDigest) -> Result<MastNodeId, AssemblyError> {
self.ensure_node(MastNode::new_external(mast_root))
/// If the root is present in the vendored mast, copy its subtree. Otherwise add an external
/// node to the forest, and return the [`MastNodeId`] associated with it.
pub fn vendor_or_ensure_external(
&mut self,
mast_root: RpoDigest,
) -> Result<MastNodeId, AssemblyError> {
if let Some(root_id) = self.vendored_mast.find_procedure_root(mast_root) {
for old_id in RootIterator::new(&root_id, &self.vendored_mast.clone()) {
let mut node = self.vendored_mast[old_id].clone();
node.remap(&self.vendored_remapping);
let new_id = self.ensure_node(node)?;
self.vendored_remapping.insert(old_id, new_id);
}
Comment on lines +480 to +485
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will copy the node + its decorators. But where do we copy the advice map data from the vendored libraries?

let new_root_id = root_id.remap(&self.vendored_remapping);
Ok(new_root_id)
} else {
self.ensure_node(MastNode::new_external(mast_root))
}
}

pub fn set_before_enter(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
Expand Down
43 changes: 29 additions & 14 deletions assembly/src/assembler/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ pub struct Assembler {
warnings_as_errors: bool,
/// Whether the assembler enables extra debugging information.
in_debug_mode: bool,
vendored_libraries: BTreeMap<RpoDigest, Library>,
}

impl Default for Assembler {
Expand All @@ -82,6 +83,7 @@ impl Default for Assembler {
module_graph,
warnings_as_errors: false,
in_debug_mode: false,
vendored_libraries: BTreeMap::new(),
}
}
}
Expand All @@ -97,6 +99,7 @@ impl Assembler {
module_graph,
warnings_as_errors: false,
in_debug_mode: false,
vendored_libraries: BTreeMap::new(),
}
}

Expand Down Expand Up @@ -170,7 +173,7 @@ impl Assembler {
module: impl Compile,
options: CompileOptions,
) -> Result<ModuleIndex, Report> {
let ids = self.add_modules_with_options(vec![module], options)?;
let ids = self.add_modules_with_options([module], options)?;
Ok(ids[0])
}

Expand All @@ -196,7 +199,7 @@ impl Assembler {
Ok(module)
})
.collect::<Result<Vec<_>, Report>>()?;
let ids = self.module_graph.add_ast_modules(modules.into_iter())?;
let ids = self.module_graph.add_ast_modules(modules)?;
Ok(ids)
}
/// Adds all modules (defined by ".masm" files) from the specified directory to the module
Expand Down Expand Up @@ -251,6 +254,21 @@ impl Assembler {
self.add_library(library)?;
Ok(self)
}

/// Adds a compiled library that can be used to copy procedures during assembly instead of
/// introducing external nodes.
pub fn add_vendored_library(&mut self, library: impl AsRef<Library>) -> Result<(), Report> {
Comment on lines +258 to +260
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this as:

/// Adds a compiled library procedures from which will be vendored into the assembled code.
///
/// Vendoring in this context means that when a procedure from this library is invoked from the
/// assembled code, the entire procedure MAST will be copied into the assembled code. Thus,
/// when the resulting code is executed on the VM, the vendored library does not need to be 
/// provided to the VM to resolve external calls.

We should also update the description of add_library() procedure above.

self.add_library(&library)?;
self.vendored_libraries
.insert(*library.as_ref().digest(), library.as_ref().clone());
Ok(())
}

/// See [`Self::add_vendored_library`]
pub fn with_vendored_library(mut self, library: impl AsRef<Library>) -> Result<Self, Report> {
Comment on lines +267 to +268
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this as:

/// Adds a compiled library procedures from which will be vendored into the assembled code.
///
/// See [`Self::add_vendored_library`]

self.add_vendored_library(library)?;
Ok(self)
}
}

// ------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -298,9 +316,9 @@ impl Assembler {
modules: impl IntoIterator<Item = impl Compile>,
options: CompileOptions,
) -> Result<Library, Report> {
let ast_module_indices = self.add_modules_with_options(modules, options)?;
let mut mast_forest_builder = MastForestBuilder::new(self.vendored_libraries.values())?;

let mut mast_forest_builder = MastForestBuilder::default();
let ast_module_indices = self.add_modules_with_options(modules, options)?;

let mut exports = {
let mut exports = BTreeMap::new();
Expand All @@ -326,11 +344,9 @@ impl Assembler {
};

let (mast_forest, id_remappings) = mast_forest_builder.build();
if let Some(id_remappings) = id_remappings {
for (_proc_name, node_id) in exports.iter_mut() {
if let Some(&new_node_id) = id_remappings.get(node_id) {
*node_id = new_node_id;
}
for (_proc_name, node_id) in exports.iter_mut() {
if let Some(&new_node_id) = id_remappings.get(node_id) {
*node_id = new_node_id;
}
}

Expand Down Expand Up @@ -393,7 +409,8 @@ impl Assembler {
.ok_or(SemanticAnalysisError::MissingEntrypoint)?;

// Compile the module graph rooted at the entrypoint
let mut mast_forest_builder = MastForestBuilder::default();
let mut mast_forest_builder = MastForestBuilder::new(self.vendored_libraries.values())?;

self.compile_subgraph(entrypoint, &mut mast_forest_builder)?;
let entry_node_id = mast_forest_builder
.get_procedure(entrypoint)
Expand All @@ -402,9 +419,7 @@ impl Assembler {

// in case the node IDs changed, update the entrypoint ID to the new value
let (mast_forest, id_remappings) = mast_forest_builder.build();
let entry_node_id = id_remappings
.map(|id_remappings| id_remappings[&entry_node_id])
.unwrap_or(entry_node_id);
let entry_node_id = *id_remappings.get(&entry_node_id).unwrap_or(&entry_node_id);

Ok(Program::with_kernel(
mast_forest.into(),
Expand Down Expand Up @@ -822,7 +837,7 @@ impl Assembler {
Some(_) | None => (),
}

mast_forest_builder.ensure_external(mast_root)
mast_forest_builder.vendor_or_ensure_external(mast_root)
}
}

Expand Down
11 changes: 3 additions & 8 deletions assembly/src/assembler/module_graph/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -207,10 +207,6 @@ impl ModuleGraph {

/// Add `module` to the graph.
///
/// NOTE: This operation only adds a module to the graph, but does not perform the
/// important analysis needed for compilation, you must call [recompute] once all modules
/// are added to ensure the analysis results reflect the current version of the graph.
///
/// # Errors
///
/// This operation can fail for the following reasons:
Expand All @@ -223,9 +219,8 @@ impl ModuleGraph {
/// This function will panic if the number of modules exceeds the maximum representable
/// [ModuleIndex] value, `u16::MAX`.
pub fn add_ast_module(&mut self, module: Box<Module>) -> Result<ModuleIndex, AssemblyError> {
let res = self.add_module(PendingWrappedModule::Ast(module))?;
self.recompute()?;
Ok(res)
let ids = self.add_ast_modules([module])?;
Ok(ids[0])
}

fn add_module(&mut self, module: PendingWrappedModule) -> Result<ModuleIndex, AssemblyError> {
Expand All @@ -242,7 +237,7 @@ impl ModuleGraph {

pub fn add_ast_modules(
&mut self,
modules: impl Iterator<Item = Box<Module>>,
modules: impl IntoIterator<Item = Box<Module>>,
) -> Result<Vec<ModuleIndex>, AssemblyError> {
let idx = modules
.into_iter()
Expand Down
28 changes: 28 additions & 0 deletions assembly/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3029,3 +3029,31 @@ fn test_program_serde_with_decorators() {

assert_eq!(original_program, deserialized_program);
}

#[test]
fn vendoring() -> TestResult {
let context = TestContext::new();
let mut mod_parser = ModuleParser::new(ModuleKind::Library);
let vendor_lib = {
let source = source_file!(&context, "export.bar push.1 end export.prune push.2 end");
let mod1 = mod_parser.parse(LibraryPath::new("test::mod1").unwrap(), source).unwrap();
Assembler::default().assemble_library([mod1]).unwrap()
};

let lib = {
let source = source_file!(&context, "export.foo exec.::test::mod1::bar end");
let mod2 = mod_parser.parse(LibraryPath::new("test::mod2").unwrap(), source).unwrap();

let mut assembler = Assembler::default();
assembler.add_vendored_library(vendor_lib)?;
assembler.assemble_library([mod2]).unwrap()
};

let expected_lib = {
let source = source_file!(&context, "export.foo push.1 end");
let mod2 = mod_parser.parse(LibraryPath::new("test::mod2").unwrap(), source).unwrap();
Assembler::default().assemble_library([mod2]).unwrap()
};
assert!(lib == expected_lib);
Ok(())
}
40 changes: 37 additions & 3 deletions core/src/mast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,9 @@ impl MastForest {
pub fn remove_nodes(
&mut self,
nodes_to_remove: &BTreeSet<MastNodeId>,
) -> Option<BTreeMap<MastNodeId, MastNodeId>> {
) -> BTreeMap<MastNodeId, MastNodeId> {
Comment on lines 185 to +188
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed for this PR?

if nodes_to_remove.is_empty() {
return None;
return [].into();
}

let old_nodes = mem::take(&mut self.nodes);
Expand All @@ -196,7 +196,7 @@ impl MastForest {

self.remap_and_add_nodes(retained_nodes, &id_remappings);
self.remap_and_add_roots(old_root_ids, &id_remappings);
Some(id_remappings)
id_remappings
}

pub fn set_before_enter(&mut self, node_id: MastNodeId, decorator_ids: Vec<DecoratorId>) {
Expand Down Expand Up @@ -277,6 +277,35 @@ impl MastForest {
}
}

/// Iterates over all the nodes a root depends on, in pre-order.
/// The iteration can include other roots in the same forest.
pub struct RootIterator<'a> {
forest: &'a MastForest,
discovered: Vec<MastNodeId>,
unvisited: Vec<MastNodeId>,
}
impl<'a> RootIterator<'a> {
pub fn new(root: &MastNodeId, forest: &'a MastForest) -> Self {
let discovered = vec![];
let unvisited = vec![*root];
RootIterator { forest, discovered, unvisited }
}
}
impl Iterator for RootIterator<'_> {
type Item = MastNodeId;
fn next(&mut self) -> Option<MastNodeId> {
while let Some(id) = self.unvisited.pop() {
let mut children = self.forest[id].children();
if children.is_empty() {
return Some(id);
};
self.discovered.push(id);
self.unvisited.append(&mut children);
Comment on lines +298 to +303
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the previous comment, we could re-write this as:

let node = self.forest[id];
if !node.has_children() {
    return Some(id);
} else {
    self.discovered.push(id);
    node.append_children_to(&mut self.unvisited);
}

}
self.discovered.pop()
}
}
Comment on lines +280 to +307
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it feels a bit weird to have this defined between MastForest impl blocks. I would probably move down to line 657 and add a section separator there.

Also, RootIterator feels like we are iterating over roots. Maybe it RootSubtreeIterator or something similar would be more clear?


/// Helpers
impl MastForest {
/// Adds all provided nodes to the internal set of nodes, remapping all [`MastNodeId`]
Expand Down Expand Up @@ -528,6 +557,7 @@ impl IndexMut<DecoratorId> for MastForest {
/// the underlying [`MastNode`].
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct MastNodeId(u32);
pub type Remapping = BTreeMap<MastNodeId, MastNodeId>;

impl MastNodeId {
/// Returns a new `MastNodeId` with the provided inner value, or an error if the provided
Expand Down Expand Up @@ -595,6 +625,10 @@ impl MastNodeId {
pub fn as_u32(&self) -> u32 {
self.0
}

pub fn remap(self, remapping: &Remapping) -> Self {
*remapping.get(&self).unwrap_or(&self)
}
Comment on lines +629 to +631
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add doc comments to this function.

}

impl From<MastNodeId> for usize {
Expand Down
6 changes: 5 additions & 1 deletion core/src/mast/node/call_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use miden_formatting::{

use crate::{
chiplets::hasher,
mast::{DecoratorId, MastForest, MastForestError, MastNodeId},
mast::{DecoratorId, MastForest, MastForestError, MastNodeId, Remapping},
OPCODE_CALL, OPCODE_SYSCALL,
};

Expand Down Expand Up @@ -169,6 +169,10 @@ impl CallNode {

/// Mutators
impl CallNode {
pub fn remap(&mut self, remapping: &Remapping) {
self.callee = self.callee.remap(remapping)
}

/// Sets the list of decorators to be executed before this node.
pub fn set_before_enter(&mut self, decorator_ids: Vec<DecoratorId>) {
self.before_enter = decorator_ids;
Expand Down
Loading
Loading