Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: hir refactoring #332

Merged
merged 65 commits into from
Jan 17, 2025
Merged

wip: hir refactoring #332

merged 65 commits into from
Jan 17, 2025

Conversation

bitwalker
Copy link
Contributor

TBD

@bitwalker bitwalker added frontend blocker This issue is one of our top priorities labels Sep 23, 2024
@bitwalker bitwalker requested a review from greenhat September 23, 2024 14:27
@bitwalker bitwalker self-assigned this Sep 23, 2024

Values have _uses_ corresponding to operands or successor arguments (special operands which are used
to satisfy successor block parameters). As a result, values also have _users_, corresponding to the
specific operation and operand forming a _use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specific operation and operand forming a _use.
specific operation and operand forming a _use_.

}
}

pub fn match_and_rewrite<A, F, S>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks astonishing! Looking forward to RewritePattern impl example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten virtually of the rewrite infra built now, but the last kink I'm working out is how to approach the way the Listener classes are used in MLIR. In effect, the way it is used in MLIR relies on mutable aliases, which is obviously a no go in Rust, and because of that (and I suspect somewhat as an artifact of the design they chose), the class hierarchy and interactions between builders/rewriters/listeners is very confusing.

I've untangled the nest a bit in terms of how it actually works in MLIR, but its a non-starter in Rust, so a different design is required, but one that ideally achieves similar goals. The key thing listeners enable in MLIR, is the ability for say, the pattern rewrite driver, to be notified whenever a rewrite modifies the IR in some way, and as a result, modify its own state to take those modifications into account. There are other things as well, such as listeners which perform expensive debugging checks without those checks needing to be implemented by each builder/rewriter, but the main thing I care about is the rewrite driver.

There are a couple of approaches that I think should work, but need to test them out first. In any case, that's the main reason things have been a bit slow in the last week.

}
}

derive! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verification subsystem is fascinating! Great job! There is a lot of macros and type-level magic going on here. I must admit that after reading all the extensive documentation, I still don't fully understand how it works. I'm fine with this level of complexity, but looking at the derive! macro usage, I'm not sure that macro's benefits outweigh the complexity. I mean this derive! call expands to the following code:

#[doc = r" Op's regions have no arguments"]
pub trait NoRegionArguments {}
impl<T: crate::Op + NoRegionArguments> crate::Verify<dyn NoRegionArguments> for T {
    #[inline]
    fn verify(&self, context: &crate::Context) -> Result<(), crate::Report> {
        <crate::Operation as crate::Verify<dyn NoRegionArguments>>::verify(
            self.as_operation(),
            context,
        )
    }
}
impl crate::Verify<dyn NoRegionArguments> for crate::Operation {
    fn should_verify(&self, _context: &crate::Context) -> bool {
        self.implements::<dyn NoRegionArguments>()
    }

    fn verify(&self, context: &crate::Context) -> Result<(), crate::Report> {
        #[inline]
        fn no_region_arguments(op: &Operation, context: &Context) -> Result<(), Report> {
            for region in op.regions().iter() {
                if region.entry().has_arguments() {
                    return Err(context
                        .session
                        .diagnostics
                        .diagnostic(Severity::Error)
                        .with_message("invalid operation")
                        .with_primary_label(
                            op.span(),
                            "this operation does not permit regions with arguments, but one was \
                             found",
                        )
                        .into_report());
                }
            }
            Ok(())
        }
        no_region_arguments(self, context)?;
        Ok(())
    }
}

Which is not that larger than the macro call itself. If part of it can be generated by an attribute macros it's not that hard to write the rest of it by hand (Verify::verify only?). I mean, the benefits of derive! macro might not outweigh the drawbacks (remembering the syntax, writing code inside a macro call, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I mostly didn't want any of the type-level magic (and boilerplate) to be hand-written as things were evolving - modifying the macro and updating all of the traits at once was far easier, and is really the only reason why the derive! macro still exists. However, as you've pointed out, it would be best to define an attribute macro for this as well, just like #[operation], but I've punted on that since the derive! macro gets the job done for now, and not many new op traits will be getting defined in the near term, but its certainly on the TODO list.

hir2/src/lib.rs Outdated
Comment on lines 1 to 22
#![feature(allocator_api)]
#![feature(alloc_layout_extra)]
#![feature(coerce_unsized)]
#![feature(unsize)]
#![feature(ptr_metadata)]
#![feature(layout_for_ptr)]
#![feature(slice_ptr_get)]
#![feature(specialization)]
#![feature(rustc_attrs)]
#![feature(debug_closure_helpers)]
#![feature(trait_alias)]
#![feature(is_none_or)]
#![feature(try_trait_v2)]
#![feature(try_trait_v2_residual)]
#![feature(tuple_trait)]
#![feature(fn_traits)]
#![feature(unboxed_closures)]
#![feature(const_type_id)]
#![feature(exact_size_is_empty)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There goes my hope to someday ditch the nightly. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've really blown a hole in that plan 😂. Once you need some of the nigh perma-unstable features like ptr_metadata, fn_traits, or specialization/min_specialization, it's basically game over for using stable, unless you have alternative options that allow you to sidestep the use of those features.

That said, a fair number of these are pretty close to stabilization. The ones that are problematic are:

  • specialization and rustc_attrs are needed because min_specialization isn't quite sufficient for the type-level magic I'm doing for automatically deriving verifiers for operations. If improvements are made to min_specialization, we can at least switch over to that. Longer term, we'd have to come up with an alternative approach to verification if we wanted to switch to stable though.
  • ptr_metadata has an unclear stabilization path, and while it is basically table stakes for implementing custom smart pointer types, stabilizing it will require Rust to commit to some implementation details they are still unsure they want to commit to.
  • allocator_api and alloc_layout_extra should've been stabilized ages ago IMO, but they are still being fiddled with. However, we aren't really leaning on these very much, and may actually be able to remove them.
  • coerce_unsized and unsize are unlikely to be stabilized any time soon, as they've been a source of unsoundness in the language (not due to the features, but due to the traits themselves, which in stable are automatically derived). Unfortunately, they are also table stakes in implementing smart pointer types with ergonomics like the ones in libstd/liballoc, so its hard to avoid them.
  • try_trait_v2 and try_trait_v2_residual are purely ergonomic, so we could remove them in exchange for more verbose code in a couple places, not a big deal
  • fn_traits, unboxed_closures, and tuple_trait are all intertwined with the ability to implement implementations of Fn/FnMut/FnOnce by hand, which I'm currently using to make the ergonomics of op builders as pleasant as possible. We might be able to tackle ergonomics in another way though, and as a result, no longer require these three features.

So, long story short, most of these provide some really significant ergonomics benefits - but if we really wanted to ditch nightly for some reason, we could sacrifice some of those benefits and likely escape the need for most of these features. Verification is the main one that is not so easily abandoned, as the alternative is a lot of verbose and fragile boilerplate for verifying each individual operation - that's not to say it can't be done, just that I'm not sure the tradeoff is worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with it. The verification is totally worth it!

/// else_region: RegionRef,
/// }
#[proc_macro_attribute]
pub fn operation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dig the operation proc macro attribute. I expanded it in a few ops. So much pretty complex boilerplate is generated. Awesome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if you say the derive! macro before I implemented the #[operation] attribute macro, but...it was madness 😅.

Now it is much cleaner (and more maintainable!), but it does a lot under the covers, so the main tradeoff is lack of visibility into all of that. That said, I tried to keep the macro focused on generating boilerplate for things that feel pretty intuitive, e.g. deriving traits, op construction and verification, correspondence between fields of the struct definition and the methods that get generated. So hopefully we never need to actually look at the generated code except in rare circumstances.

}

#[inline(always)]
unsafe fn downcast_ref_unchecked<T: Op>(&self, ptr: *const ()) -> &T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of op traits implementation! TIL about DynMetadata. So cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely happy with how that has turned out so far, it enables a number of really neat tricks! You might also be interested in a bit of trivia about pointer metadata in Rust:

  • DynMetadata<dyn Trait> is the type of metadata associated with a trait object for the trait Trait, and is basically just a newtype for a pointer to the vtable of the trait object. While you could specify DynMetadata<T> for some T that isn't a trait, that doesn't actually mean anything, and is technically invalid AFAIK.
  • usize is the type of metadata associated with "unsized" types (except trait objects), and corresponds to the unit size for the pointee type. For example, &[T] is an unsized reference, and its pointer metadata is the number of T in the slice, i.e. a typical "fat" pointer; but the semantics of the metadata value depend on the type, so you can have custom types that use usize metadata to indicate the allocation size of the pointee, or something else along those lines.
  • () is the type of metadata associated with all "sized" types, i.e. pointers to those types don't actually have metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm definitely going to dig into it.

Copy link
Contributor

@greenhat greenhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks super cool! I'm looking forward to using it!

@bitwalker
Copy link
Contributor Author

bitwalker commented Oct 16, 2024

Thanks for reviewing things, and for the kind comments! As mentioned, things are quite close to finished now, the last thing that needs some fiddling is finding a Rust-friendly design for handling the interaction between patterns, builders/rewriters, and the rewrite driver (i.e. so that the rewrite driver is made aware of changes to the IR made by a rewrite pattern via the builder/rewriter).

Once that's out of the way, there is a short list of tasks to switch over to this new IR:

  • Port dominator tree and loop tree analyses on top of the new IR
  • Port data flow analysis framework on top of the new IR
  • Port liveness analysis and spills transformation using the data flow analysis framework
  • Wire up lowering from the new IR to MASM
  • Wire up lowering from Wasm to the new IR

Comment on lines +275 to +325
fn matches(&self, op: OperationRef) -> Result<bool, Report> {
use crate::matchers::{self, match_chain, match_op, MatchWith, Matcher};

let binder = MatchWith(|op: &UnsafeIntrusiveEntityRef<Shl>| {
log::trace!(
"found matching 'hir.shl' operation, checking if `shift` operand is foldable"
);
let op = op.borrow();
let shift = op.shift().as_operand_ref();
let matched = matchers::foldable_operand_of::<Immediate>().matches(&shift);
matched.and_then(|imm| {
log::trace!("`shift` operand is an immediate: {imm}");
let imm = imm.as_u64();
if imm.is_none() {
log::trace!("`shift` operand is not a valid u64 value");
}
if imm.is_some_and(|imm| imm == 1) {
Some(())
} else {
None
}
})
});
log::trace!("attempting to match '{}'", self.name());
let matched = match_chain(match_op::<Shl>(), binder).matches(&op.borrow()).is_some();
log::trace!("'{}' matched: {matched}", self.name());
Ok(matched)
}

fn rewrite(&self, op: OperationRef, rewriter: &mut dyn Rewriter) {
log::trace!("found match, rewriting '{}'", op.borrow().name());
let (span, lhs) = {
let shl = op.borrow();
let shl = shl.downcast_ref::<Shl>().unwrap();
let span = shl.span();
let lhs = shl.lhs().as_value_ref();
(span, lhs)
};
let constant_builder = rewriter.create::<Constant, _>(span);
let constant: UnsafeIntrusiveEntityRef<Constant> =
constant_builder(Immediate::U32(2)).unwrap();
let shift = constant.borrow().result().as_value_ref();
let mul_builder = rewriter.create::<Mul, _>(span);
let mul = mul_builder(lhs, shift, Overflow::Wrapping).unwrap();
let mul = mul.borrow().as_operation().as_operation_ref();
log::trace!("replacing shl with mul");
rewriter.replace_op(op, mul);
}
Copy link
Contributor

@greenhat greenhat Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ergonomics looks great! The code gives strong MLIR vibes! It'd take some time to get used to the types, techniques, etc. but it worth it. I'm looking forward to starting writing some rewrites.

Copy link
Contributor

@greenhat greenhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

The component and module interfaces and instances feels generic and flexible enough to be able to represent Wasm CM.
Let's consider the following WIT:

package miden:basic-wallet@1.0.0;

use miden:base/core-types@1.0.0;

interface basic-wallet {
    use core-types.{core-asset, tag, recipient, note-type};
    receive-asset: func(core-asset: core-asset);
    send-asset: func(core-asset: core-asset, tag: tag, note-type: note-type, recipient: recipient);
}

interface aux {
    use core-types.{felt};
    process-list-felt: func(input: list<felt>) -> list<felt>;
}

interface foo {
    get-foo(a: list<felt>) -> list<felt>;
}

interface bar {
    get-bar(a: list<felt>) -> list<felt>;
}


world basic-wallet-world {
    include miden:core-import/all@1.0.0;
    
    import foo;
    import bar;

    export basic-wallet;
    export aux;
}

The component (high-level) imports (foo and bar) would be represented as module interfaces (one per component imported interface?) and the lowerings we will generate by creating a module that imports this module interface and exports the generated lowering functions with low-level types signatures which in turn be imports for the core Wasm module.
The component (high-level) exports (basic-wallet and aux) would be represented as module interfaces (one per component exported interface?) and the liftings we will generate by creating a module that imports the core Wasm module exports and exports the generated lifting functions with high-level types signatures.
We're going to need to give those generated liftings/lowerings modules access to the core Wasm module memory (Wasm memory

region: RegionRef,
patterns: Rc<FrozenRewritePatternSet>,
mut config: GreedyRewriteConfig,
) -> Result<bool, bool> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me some time to grasp the "error's" bool part in Result<bool, bool> pattern. I wonder if this extra mental overhead is worth it. Plus, the need of a comment explaining its meaning vs. self-explaining error type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dedicated type might be worth it, basically we want to signal whether or not convergence occurred, and whether it did or not, we also want to know whether the IR was changed.

This framework is ported from MLIR, but redesigned somewhat in a more
Rust-friendly fashion. There are still some pending fixes/improvements
here to enable some interesting use cases this implementation doesn't
support quite yet, but that can wait for now. The high notes:

* Load multiple analyses into the DataFlowSolver, and run it on an
  operation until the set of analyses reach a fixpoint. Analyses, in a
  sense, run simultaneously, though that is not literally the case.
  Instead, a change subscription/propagation mechanism is implemented so
  that an analysis can query for some interesting state it wishes to
  know (but does not itself compute), and implicitly this subscribes it
  to future changes to that state, at the queried program point. When
  that state changes, the analysis in question is re-run with the new
  state at that program point, and is able to derive new facts and
  information, which might then trigger further re-analysis by other
  analyses. This is guaranteed to reach fixpoint unless someone writes
  an incorrect join/meet implementation (i.e. it does not actually
  follow the requirements of a join/meet semilattice).
* Analyses are run once on the entire operation the solver is applied
  to, however they will be re-run (as needed), whenever some new facts
  are discovered about a program point they have derived facts about
  from the changed state, but _only on that program point_. This ensures
  minimal re-computation of analysis results, while still maximizing the
  benefit of the combined analyses (e.g. dead code elimination +
  constant propagation == sparse conditional constant propagation)
* Analyses have a fair amount of ways in which they can hook into the
  change management system, as well as customize the analysis itself.
* There are two primary analysis types for which most of the heavy
  lifting has been done here: forward and backward analyses of both
  dense and sparse varieties. Dense analyses anchor state to program
  points, while sparse analyses anchor state to values (which do not
  change after initial definition, hence sparse). Forward analyses
  follow the CFG, while backward analyses work bottom-up.
* Dead code analysis is implemented, and some of the infra for constant
  propagation is implemented, which I will revisit once the new IR is
  hooked up to codegen.
This commit ports over the liveness analysis from HIR1, and makes
various improvements/extensions to the dataflow framework. The primary
changes to the dataflow framework are the following:

* Give the solver a reference to an AnalysisManager so that analyses can
  be requested by a DataFlowAnalysis implementation during
  initialization. We primarily are interested in this to allow obtaining
  cached analyses from it, as the alternative is computing those
  analyses from scratch when needed, e.g. dominator tree, loop fores.
* Use the dominator tree and loop forest analyses, when applicable, to
  choose a better visitation order when initializing dense dataflow
  analyses. For forward analysis, we follow the CFG in reverse postorder,
  and skip any unreachable blocks entirely. For backward analysis, we
  visit the CFG in postorder, also skipping unreachable blocks. If an op
  has no regions, or the regions do not use SSA dominance, or consist
  only of a single block, the naive visitation order is used, i.e. just
  iterating over the set of regions/blocks/ops regardless of CFG edges
  between them.
* An additional hook was added to the dense forward/backward dataflow
  analysis traits that allows an analysis to control how propagation is
  handled along unstructured/non-regional control flow edges. This was
  needed by liveness analysis to be able to handle subtleties in the
  transfer function along such edges.

Liveness analysis has been adapted to be run as a dense-backward
dataflow analysis via the dataflow framework, and to support
region-based (structured) control flow in addition to unstructured
control flow. With some additional tweaks, support for inter-procedural
analysis could be supported as well, which would primarily be useful to
for optimizing calls to functions where the caller is the sole caller of
the function, and not all of the arguments are being used (and thus
could be removed from both the callsite and the callee definition).
Combined with sparse-conditional constant propagation, it now also takes
into account whether or not control flow edges are executable.
This commit builds on the last one with the remaining changes to get all
of the necessary pieces of the codegen backend rebuilt on top of the
HIR2 infrastructure.

NOTE: The new backend changes are untested, that will follow in a
subsequent commit that cleans things up a bit more, and validates that
the new codegen stage components work as expected.

There are many changes here, but in summary:

* The old IR maintained a duplicate representation of MASM for
  historical reasons. We're at a stage where that is no longer
  particularly useful, so we are now lowering directly to the AST nodes
  provided in miden-assembly.
* Original op/inst emitter code remains largely unchanged
* Original operand stack management code and optimization remains
  unchanged
* Linker is now part of the backend, and handles examining a Component
  to be compiled in order to extract data segments and global variables
  for layout in memory. This information is provided to the instruction
  lowering code.
* Conversion pass infra is replaced with a ToMasmComponent trait,
  implemented for Component (possibly will implement it for Module, to
  support classic Wasm inputs).
* Old FunctionEmitter is removed.
* BlockEmitter remains, but acts primarily as a driver for visiting
  operations in the entry block of a Function operation, and invoking
  the instruction lowering trait. It is assumed that by the time we
  reach the backend, only structured control flow remains.
* Instruction lowering is facilitated by an HirLowering trait, which is
  implemented for all of the operations in the 'hir' dialect. These are
  given the current BlockEmitter and other important context, and use
  the emitter to handle any lowering details.

There are a variety of other changes as well, mostly things that fell
out of this work, e.g. symbol management has been reworked in HIR2 to
better reflect the Wasm Component Model, and recent packaging efforts.

There are various TODOs noted in the code, but the main things remaining
for this to be ready for use are:

* Figure out if any of the old tests from the codegen crate are still
  useful, as well as the old MasmCompiler type, and if so, port them
  over and validate they are still good
* Revisit spills analysis/transform once we have useful tests to work
  with. These passes were hard to port cleanly to the new IR and
  dataflow analysis framework, so need to be revalidated.
* Rework compiler stage(s) in frontend
* Lower from Wasm to HIR2
* Implement CFG-to-SCF pass and ensure it is run on initial IR to
  recover structured control flow needed for codegen.
* Remove old IR crates that are deprecated, rename HIR2 crates to remove
  references to HIR2.
@bitwalker bitwalker marked this pull request as ready for review January 13, 2025 15:58
@bitwalker bitwalker merged commit c7ee0fd into next Jan 17, 2025
5 checks passed
@bitwalker bitwalker deleted the bitwalker/hir2 branch January 17, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is one of our top priorities frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants