Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HUGR Model and binary serialisation format. #1433

Open
7 of 14 tasks
ss2165 opened this issue Aug 15, 2024 · 2 comments
Open
7 of 14 tasks

HUGR Model and binary serialisation format. #1433

ss2165 opened this issue Aug 15, 2024 · 2 comments
Assignees
Labels
breaking-change Changes that break semver python Pull requests that update Python code rust Pull requests that update Rust code spec Issues to do with the specification document(s) tracking-issue An issue tracking progress on multiple sub-tasks.

Comments

@ss2165
Copy link
Member

ss2165 commented Aug 15, 2024

Goals:

  1. Stop serialising core HUGR structures directly and go via a new Model structure. See `hugr-model` #1326.
  2. Switch to a binary serialisation format with efficient table-encoding.
    Unblocks:
  3. Text format in/out from model.
  4. Stable-ish serialisation.
  5. Resolve conceptual issues with "value". See Values and constants #1425.

Tasks:

@ss2165 ss2165 added the tracking-issue An issue tracking progress on multiple sub-tasks. label Aug 15, 2024
@ss2165 ss2165 added this to the hugr-rs 0.13.0 / hugr-py 0.9.0 milestone Sep 2, 2024
@ss2165 ss2165 added spec Issues to do with the specification document(s) rust Pull requests that update Rust code breaking-change Changes that break semver python Pull requests that update Python code labels Sep 5, 2024
@ss2165 ss2165 pinned this issue Sep 5, 2024
@zrho
Copy link
Contributor

zrho commented Oct 4, 2024

Ref Types

The model has three Ref types: GlobalRef, LocalRef, and LinkRef. These types are pointers to other objects in the model and come in two variants: each type of Ref has a named variant and a more compact variant that directly refers to the thing being referenced by its id.

GlobalRef

GlobalRefs point to global definitions, things that are defined by a node: this includes functions, aliases, custom operations and custom types. Where a GlobalRef is used, there is the ability to pass values to the parameters of the global definition.

Functions and aliases already can be defined or declared by nodes with the corresponding operations. We can therefore refer to them either by their name or by the id of that node. We don't have these operations for custom types and custom operations yet, and so they need to be referred to by their name. As part of the declarative extensions project, we can add the missing operations to declare custom types and custom operations. We can then refer to them simply by the id of the corresponding node. Moreover we can attach metadata to these nodes, including a name and description, to generate documentation or code, where desired.

  • Declarative extensions: declare custom operations in the model.
  • Declarative extensions: declare custom types in the model.

LocalRef

LocalRefs point to variables that are bound by a node's parameters. For instance, when a node defines a function that is polymorphic in some type, this type can be referenced by a LocalRef. Currently, the indexed variant of LocalRef only points at the index of the parameter in the list of parameters of the node. I've come to see that this is not yet ideal, as term deduplication would identify two variables as the same even though they reference parameters of different nodes and potentially even different types. This can be easily fixed by adding the id of the node to the LocalRef.

  • Add the id of the node to the LocalRef.

LinkRef

"Links" are what up until recently I've called "edges". "Edge" was not quite the appropriate name for it, since a link can include any number of ports. Every port as an associated link, and the ports are considered connected when they share a link. This subsumes both copying in data flow regions and merging of branches in control flow regions.

Aside: "hyperedge" would also not be the right term. We can interpret
hugr as a form of hypergraph, but in that interpretation the hugr nodes are the
hyperedges and the links are the nodes. Since that would be rather confusing
at this point, I think we shouldn't rely on the terminology that comes from this
interpretation too heavily.

This model also allows for a more general mode of connectivity, where more than one input and output port are connected to a single link. That is quite useful for various applications from applied category theory.

Name Resolution

The Refs allow us to address multiple problems at once:

  • We can read the text format directly into the model without having to mix name resolution and the parser.

  • We can postpone name resolution for a bit, so that we can iterate on the model, text format and binary serialisation without having to also have a conclusive name resolution story as well.

  • Once we implement name resolution, it can be a separate pass that takes in a model and modifies it. This way it does not need to be baked into every consumer of the model.

  • We also get a smooth transition to declarative extensions. For now, a customoperation or type is referred to by name, just like the core does. Once we have the operations to declare these, we already have the capability to refer to those nodes by their id instead.

  • Figure out appropriate scoping rules for the different types of Ref.

  • Implement name resolution for those scoping rules.

@zrho
Copy link
Contributor

zrho commented Oct 4, 2024

Variables in Lists/Rows and Extension Sets

Core rows are called lists in the model to avoid confusion since rows are different things in type theory. I suggest adopting that terminology in the core as well. That also opens up the name "row" for bona fide rows (those with labels, see Tierkreis for instance) in case we want them.

There is a mismatch between how variables are treated within lists/rows and extension sets between the core and the model. In the core, rows and extension sets can contain multiple variables. In the model they can contain at most one; in the cast of lists it must be at the end. I think this is preferable since it is closer to well-studied objects from type theory. With multiple variables, unification becomes ambiguous, which is needless complexity. In the cases where we need concatenation using a constraint would be the cleanest solution.

Update: Contemplating on this, the variables in the middle of lists and extension sets don't break too many things: #1609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Changes that break semver python Pull requests that update Python code rust Pull requests that update Rust code spec Issues to do with the specification document(s) tracking-issue An issue tracking progress on multiple sub-tasks.
Projects
None yet
Development

No branches or pull requests

3 participants