Skip to content

Commit

Permalink
refactor: finish initial rewrite of backend using hir2
Browse files Browse the repository at this point in the history
This commit builds on the last one with the remaining changes to get all
of the necessary pieces of the codegen backend rebuilt on top of the
HIR2 infrastructure.

NOTE: The new backend changes are untested, that will follow in a
subsequent commit that cleans things up a bit more, and validates that
the new codegen stage components work as expected.

There are many changes here, but in summary:

* The old IR maintained a duplicate representation of MASM for
  historical reasons. We're at a stage where that is no longer
  particularly useful, so we are now lowering directly to the AST nodes
  provided in miden-assembly.
* Original op/inst emitter code remains largely unchanged
* Original operand stack management code and optimization remains
  unchanged
* Linker is now part of the backend, and handles examining a Component
  to be compiled in order to extract data segments and global variables
  for layout in memory. This information is provided to the instruction
  lowering code.
* Conversion pass infra is replaced with a ToMasmComponent trait,
  implemented for Component (possibly will implement it for Module, to
  support classic Wasm inputs).
* Old FunctionEmitter is removed.
* BlockEmitter remains, but acts primarily as a driver for visiting
  operations in the entry block of a Function operation, and invoking
  the instruction lowering trait. It is assumed that by the time we
  reach the backend, only structured control flow remains.
* Instruction lowering is facilitated by an HirLowering trait, which is
  implemented for all of the operations in the 'hir' dialect. These are
  given the current BlockEmitter and other important context, and use
  the emitter to handle any lowering details.

There are a variety of other changes as well, mostly things that fell
out of this work, e.g. symbol management has been reworked in HIR2 to
better reflect the Wasm Component Model, and recent packaging efforts.

There are various TODOs noted in the code, but the main things remaining
for this to be ready for use are:

* Figure out if any of the old tests from the codegen crate are still
  useful, as well as the old MasmCompiler type, and if so, port them
  over and validate they are still good
* Revisit spills analysis/transform once we have useful tests to work
  with. These passes were hard to port cleanly to the new IR and
  dataflow analysis framework, so need to be revalidated.
* Rework compiler stage(s) in frontend
* Lower from Wasm to HIR2
* Implement CFG-to-SCF pass and ensure it is run on initial IR to
  recover structured control flow needed for codegen.
* Remove old IR crates that are deprecated, rename HIR2 crates to remove
  references to HIR2.
  • Loading branch information
bitwalker committed Dec 24, 2024
1 parent cdaa58c commit be4c100
Show file tree
Hide file tree
Showing 55 changed files with 2,561 additions and 1,591 deletions.
13 changes: 0 additions & 13 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 3 additions & 9 deletions codegen/masm2/src/artifact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use core::fmt;

use miden_core::utils::DisplayHex;
use miden_processor::Digest;
use midenc_hir2::{constants::ConstantData, dialects::builtin, FunctionIdent};
use midenc_hir2::{constants::ConstantData, dialects::builtin};

use crate::{lower::NativePtr, masm};

Expand All @@ -13,11 +13,11 @@ pub struct MasmComponent {
///
/// This function is responsible for initializing global variables and writing data segments
/// into memory at program startup, and at cross-context call boundaries (in callee prologue).
pub init: Option<FunctionIdent>,
pub init: Option<masm::InvocationTarget>,
/// The symbol name of the program entrypoint, if this component is executable.
///
/// If unset, it indicates that the component is a library, even if it could be made executable.
pub entrypoint: Option<FunctionIdent>,
pub entrypoint: Option<masm::InvocationTarget>,
/// The kernel library to link against
pub kernel: Option<masm::KernelLibrary>,
/// The rodata segments of this component keyed by the offset of the segment
Expand All @@ -26,12 +26,6 @@ pub struct MasmComponent {
pub stack_pointer: Option<u32>,
/// The set of modules in this component
pub modules: Vec<Box<masm::Module>>,
/// The set of components nested within this component.
///
/// Nested components are typically only visible internally. When assembling to MAST, modules
/// and rodata from nested components are merged into their parent component. Nested components
/// must either have an unspecified kernel, or the same kernel as their ancestors.
pub components: Vec<Box<MasmComponent>>,
}

/// Represents a read-only data segment, combined with its content digest
Expand Down
36 changes: 11 additions & 25 deletions codegen/masm2/src/emit/mem.rs
Original file line number Diff line number Diff line change
@@ -1,27 +1,11 @@
use miden_core::{Felt, FieldElement};
use midenc_hir2::{self as hir, SourceSpan, StructType, Type};
use midenc_hir2::{dialects::builtin::LocalId, SourceSpan, StructType, Type};

use super::{masm, OpEmitter};
use crate::lower::NativePtr;

/// Allocation
impl<'a> OpEmitter<'a> {
/// Allocate a procedure-local memory slot of sufficient size to store a value
/// indicated by the given pointer type, i.e the pointee type dictates the
/// amount of memory allocated.
///
/// The address of that slot is placed on the operand stack.
pub fn alloca(&mut self, ptr: &Type, span: SourceSpan) {
match ptr {
Type::Ptr(pointee) => {
let local = self.function.alloc_local(pointee.as_ref().clone());
self.emit(masm::Instruction::LocAddr(local), span);
self.push(ptr.clone());
}
ty => panic!("expected a pointer type, got {ty}"),
}
}

/// Return the base address of the heap
#[allow(unused)]
pub fn heap_base(&mut self, span: SourceSpan) {
Expand Down Expand Up @@ -58,9 +42,10 @@ impl<'a> OpEmitter<'a> {
///
/// Internally, this pushes the address of the local on the stack, then delegates to
/// [OpEmitter::load]
pub fn load_local(&mut self, local: hir::LocalId, span: SourceSpan) {
let ty = self.function.local(local).ty.clone();
self.emit(masm::Instruction::LocAddr(local), span);
pub fn load_local(&mut self, local: LocalId, span: SourceSpan) {
let local_index = local.as_usize();
let ty = self.locals[local_index].clone();
self.emit(masm::Instruction::Locaddr((local_index as u16).into()), span);
self.push(Type::Ptr(Box::new(ty.clone())));
self.load(ty, span)
}
Expand Down Expand Up @@ -932,9 +917,10 @@ impl<'a> OpEmitter<'a> {
///
/// Internally, this pushes the address of the given local on the stack, and delegates to
/// [OpEmitter::store] to perform the actual store.
pub fn store_local(&mut self, local: hir::LocalId, span: SourceSpan) {
let ty = self.function.local(local).ty.clone();
self.emit(masm::Instruction::LocAddr(local), span);
pub fn store_local(&mut self, local: LocalId, span: SourceSpan) {
let local_index = local.as_usize();
let ty = self.locals[local_index].clone();
self.emit(masm::Instruction::Locaddr((local_index as u16).into()), span);
self.push(Type::Ptr(Box::new(ty)));
self.store(span)
}
Expand Down Expand Up @@ -1008,7 +994,7 @@ impl<'a> OpEmitter<'a> {

// Create new block for loop body and switch to it temporarily
let mut body = Vec::default();
let mut body_emitter = OpEmitter::new(self.function, &mut body, self.stack);
let mut body_emitter = OpEmitter::new(self.locals, self.invoked, &mut body, self.stack);

// Loop body - compute address for next value to be written
let value_size = value.ty().size_in_bytes();
Expand Down Expand Up @@ -1128,7 +1114,7 @@ impl<'a> OpEmitter<'a> {

// Create new block for loop body and switch to it temporarily
let mut body = Vec::default();
let mut body_emitter = OpEmitter::new(self.function, &mut body, self.stack);
let mut body_emitter = OpEmitter::new(self.locals, self.invoked, &mut body, self.stack);

// Loop body - compute address for next value to be written
// Compute the source and destination addresses
Expand Down
Loading

0 comments on commit be4c100

Please sign in to comment.