Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Value::Function contain only the Node (index) of a FuncDefn #1856

Open
acl-cqc opened this issue Jan 13, 2025 · 16 comments
Open

Make Value::Function contain only the Node (index) of a FuncDefn #1856

acl-cqc opened this issue Jan 13, 2025 · 16 comments
Assignees

Comments

@acl-cqc
Copy link
Contributor

acl-cqc commented Jan 13, 2025

(or FuncDecl, I guess). Then, remove LoadFunction - it becomes superceded by LoadConstant.

This allows many use cases of "constant Hugrs" via lambda-lifting (turning Lambdas into top-level functions), and "a function pointer" (no captures, not a closure) is the only runtime interpretation of a Type::Function that works without dynamic memory allocation.

Ops taking Type::Function thus should generally also take a row of extra arguments. Note that there is still a difference between a EdgeKind::Value(Type::Function(....)) and EdgeKind:Static(Type::Function(...)) (following #1594 enabling the latter) i.e. do we perform runtime computation to select the function pointer (this cannot also select any captured values - the functions must have exactly the same signature including any "captures"==extra arguments).

The many use cases of closures requiring dynamic allocation (captures, runtime-generated Hugrs, etc.) should thus be captured by various extension types, with their own apply ops etc. There are just too many variations here, and it's not clear which we should favour

  • BRAT will need closures that can reference functions in the enclosing Hugr; this requires a linking step when such closures are executed (the approach of copying any functions needed from the outer hugr, into the inner Hugr, doesn't work - the outer Hugr/functions might contain the constant/inner Hugr, thus leading to an infinitely-big Hugr).
  • E.g. @doug-q has called for "self-contained Hugrs" that do not require a linking step

I think this corresponds to the hugr-model idea of a value/constant referring to an "isolated" region of the Hugr, where "isolated" means - no incoming nonlocal edges, but static references would be allowed. @zrho right?

@acl-cqc acl-cqc self-assigned this Jan 13, 2025
@doug-q
Copy link
Collaborator

doug-q commented Jan 13, 2025

I think it's a very bad idea to literally put the node of the function into the const node. This is another encoding of an edge. All graph operations will break if there are additional hidden edges. E.g. replace the function with a different one (different nodeid).

@acl-cqc
Copy link
Contributor Author

acl-cqc commented Jan 13, 2025

So maybe we remove Value::Function altogether and only have LoadFunction? We'd lose the ability to have "constants" that were, say, tuples of multiple functions / functions with other things, but that feels minor. (We have also talked about whether we maintain all edge data structures all the time...@zrho)

@acl-cqc
Copy link
Contributor Author

acl-cqc commented Jan 13, 2025

I think we talked about turning hugr-model terms that referred to nodes, into static edges in the hugr. I mean, giving Const nodes an arbitrary number of static inputs would be...a bigger change than I had anticipated here (not sure how big, depends on all those other-port-kind methods etc.), but that may be where we head eventually

@zrho
Copy link
Contributor

zrho commented Jan 13, 2025

Let my try to formulate the ideas in my head on where this could go.

Functions as parameters, isolated regions

There appears to be two different ways an operation can be passed a "subgraph" that both make sense but have different characteristics. The first way is to pass a (reference to a) function as a parameter to the operation. Parameters need to be static data, and so an operation that expects a function would have a parameter of type (const (-> ints outs exts) exts). Terms of this type are function constants that do not capture any of their runtime environment and can be produced arbitrarily often. In other words, these are "blueprints" for functions. In hugr-model there are currently two ways to produce such a thing: either by referring to a function definition/declaration or by an (fn ...) term that embeds an isolated region.

An isolated region in this context is a subgraph that does not have any non-local edges to the outside. The goal is that this subgraph may be represented as its own independent Hugr. There is some open design space still for how to deal with references to symbols in the parent, but there's options for that. I understand the point of contention in the discussion #1546 to be that it should be possible to treat functions that are passed to custom ops as opaque. This is precisely achieved by the isolation requirement.

A particular point of this design is that the nested Hugr or the reference to a node that defines a function would not live in Value but in TypeArg. This is currently not possible, but should be. No separate parameter list needed and nicely compositional.

Regions with non-local edges

The mechanism of passing a function to an operation is not entirely sufficient for all purposes. In particular, it can not be used for the builtin tail loop or conditional. The reason is that the tail loop and conditional allow non-local edges. The intended semantics for the non-local edges in the case of the loop is that the values of those edges are copied so that there is one of them for each iteration of the loop. In the case of the conditional, a non-local edge connected with multiple input ports doesn't need a copy as long as each input port is in a different branch. I don't currently have a good answer on how to capture this behaviour in types, so it would be - for now - restricted to the builtins. The builtins, in addition to parameters, take a list of non-isolated regions. These are represented in the same Hugr via the portgraph hierarchy.

We might in the future find the need to extend the ability to pass non-isolated regions directly to custom ops, and we might find a way to encode the requirements on the non-local edges in the type system, and to deal with the concerns of #1546 in a satisfactory way. This would likely involve extending the type of operations with the additional parameter list that @acl-cqc mentioned above. But this is not needed at the moment, nor is it fully worked out.

Static edges

I think that static edges and Const nodes are a misfeature that we should migrate away from. The current state of hugr-model is a proposal of how to do this as it contains neither static edges nor Const nodes (it has const nodes but they correspond to LoadConstant nodes in hugr-core). Constants in hugr-model are represented as terms and can be composed freely and passed as arguments to operations.

Const nodes in hugr-model allow sharing of constant values that are used in multiple LoadConstant nodes. This sharing optimisation is subsumed by a deduplicated representation of terms. Term deduplication also exploits more sharing than is achieved by const nodes, as it can share substructures.

I appreciate that there is value in going from a function declaration to all points where the function is referenced, and that this is something that the static edges do when the reference is from a node. However that precludes us from referring to function declarations/definitions from locations that aren't themselves nodes. This includes terms/TypeArgs that refer to a function in order to pass it as a parameter to an operation. This includes the tuples of functions etc. that @acl-cqc mentioned.

Referring to nodes from TypeArgs, really?

It was mentioned before that we likely want extension types that model closures. This is one case in which terms that refer to functions are very useful. I can expand on this more tomorrow, but for now I wanted to make the point that referring to functions from the middle of a term/TypeArg is incredibly useful, and the paradigm in which functions are referred to by static edges precludes this.

@acl-cqc
Copy link
Contributor Author

acl-cqc commented Jan 13, 2025

@zrho that's super-useful, thank you :) :)

Can I check - is it that "isolated" regions cannot have incoming edges from other parts of the Hugr, even static edges (I guess that would mean they cannot refer to terms that contain static references to other parts)? Or is it only that they can't capture runtime values from elsewhere?

And, there is the issue of linking - for BRAT purposes, a "nested hugr"/isolated region that can't refer to outer FuncDefns is of little use, i.e. we're likely to have to define our own extension function type anyway. (As e.g. partial or compose is an outer function yes, but then you still need captures.) Hence it seems to me that removing the current Value::Function in favour of many extension types that can each specify exactly the semantics they want is probably a step in the right direction. (If it's not clear what to replace it with then maybe we don't replace it with anything yet.) That may possibly go against your proposal with isolated regions though....

@doug-q
Copy link
Collaborator

doug-q commented Jan 14, 2025

I think that static edges and Const nodes are a misfeature that we should migrate away from

You say this, with which I strongly disagree(because it breaks rewriting the const node/func node), and then you say:

referring to functions from the middle of a term/TypeArg is incredibly useful

With which I agree.

We should not refer to nodes from TypeArgs(because then our graph would have implicit edges). We should refer to static edges of the op to which the TypeArg is applied.

This solves:

However that precludes us from referring to function declarations/definitions from locations that aren't themselves nodes.

Which seems to be the only objection to static edges that you raise.

@zrho
Copy link
Contributor

zrho commented Jan 14, 2025

We should not refer to nodes from TypeArgs (because then our graph would have implicit edges). We should refer to static edges of the op to which the TypeArg is applied.

I'm trying to understand how this is compatible with referring to functions from the middle of a TypeArg. Do you mean that there'd be a TypeArg that carries an index i which refers to the ith static port of the node that contains the TypeArg? I feel like this would make rewriting (and everything else) even harder: When referring to nodes TypeArgs are to be interpreted in the context of the Hugr in which they are contained; when referring to the static input port index a TypeArg is interpreted in the context of a single node. So even just checking if the types of a connected pair of input and output ports are the same is non-trivial. Let alone rewriting.

This feels like a hack to try and squeeze the use-def chain between TypeArgs and nodes into the graph even though TypeArgs aren't in the graph themselves.

because it breaks rewriting the const node/func node

Can you explain what you mean here?

@doug-q
Copy link
Collaborator

doug-q commented Jan 14, 2025

because it breaks rewriting the const node/func node

Can you explain what you mean here?

If you have a static edge pointing to a const node you can modify that const node, (without changing the type) and everything is fine.

If you create a new const node replace an old one, you can move the edges out of the old const node to come out of the new const node.

I understand your suggestion to be inlining the const value into the type arg. Now the const values have lost their identity. How do you rewrite that const value now? you walk every operation looking for type args that match I guess? How do you know which type args to mutate? Const values can't be checked for equality in general, so this seems hopeless.

All of this also applies to FuncDefns, the other case where we use static edges.

I'm trying to understand how this is compatible with referring to functions from the middle of a TypeArg. Do you mean that there'd be a TypeArg that carries an index i which refers to the ith static port of the node that contains the TypeArg? I feel like this would make rewriting (and everything else) even harder: When referring to nodes TypeArgs are to be interpreted in the context of the Hugr in which they are contained; when referring to the static input port index a TypeArg is interpreted in the context of a single node. So even just checking if the types of a connected pair of input and output ports are the same is non-trivial. Let alone rewriting.

Yes that is what I mean.

Harder than what?

TypeArgs to Ops are currently interpreted in the context of the Op in which they occur. The Op defines the TypeParams to which type var occurances (in ops) refer.
EDIT: That's not true, Type Vars in signatures are interpreted in the context of op, TypeArgs to ops are interpreted in the context of the function in which they occur

I don't think one can sensibly encode Values into type args, because doing so would mean you would lose the ability to compare (saturated) ops for equality, because Values don't support equality in general. If one could find a way to do so then allowing for "holes" (Value Vars) that reference static inputs seems fine.
EDIT: To clarify, clearly you can encode Values into type args via opaque blobs like json etc. But the opacity of said blob seems to preclude "holes"

@zrho
Copy link
Contributor

zrho commented Jan 14, 2025

I have to expand the scope a bit more to explain where I am coming from here.

I do not propose to store Values in TypeArgs but rather that constants are encoded in TypeArgs (or something like those). You mention something like this in your edit in form of an opaque JSON encoding, but this can be done in a less opaque way. When serialised via hugr-model, we represent constants simply as terms. In particular, they can contain references to nodes and local variables (i.e. parameters of the function that contains them). Via this encoding constants immediately enjoy all the infrastructure that other terms have, including deduplication, nesting, references, typing, custom constructors etc.

At the moment hugr-core uses Types, TypeArgs, TypeParams, Values and metadata as four different encodings of essentially the same thing (with Value living in an uncanny valley between syntax and semantics, hence the entire Eq distraction). I understand the motivation to make illegal states unrepresentable, but this requires us to have different infrastructure for all of these things, precludes us from recognising their similarity, and makes it difficult for them to interact. Then we develop adhoc and incomparable ways to refer to these things from each other, to rewrite them, serialize them, define them etc.

I remember there was a guppy issue about using a nat that was passed as a type parameter. With a constant as a Value this is not possible since they don't have a space to refer to variables. Even if they did, they are too opaque to see if there is any nested variable. Now the immediate temptation is to extend the typeclass for Values in such a way that we could traverse contained subvalues, do substitutions, etc. After writing a bunch of code we are then in a situation where there's yet another place to put variables.

  • TypeEnum::Variable for variables in a Type.
  • TypeEnum::RowVar for variables in a TypeRow.
  • TypeArg::Variable or TypeEnum::Variable nested in TypeArg::Type in a TypeArg.
  • Yet to be added at some point in TypeParam probably.
  • The newly added VarConst.
    When serialised into the model they all become Term::Var.

I fear that if we do not address this (not the variables per se but the redundancy) first, whatever solution we come up with to refer to FuncDefns will inevitably work completely differently for TypeArgs and Values, have odd corner cases and lead to yet more implementation and conceptual complexity that is entirely avoidable. This is not just a question of elegance, it is a very pragmatic issue.

@doug-q
Copy link
Collaborator

doug-q commented Jan 15, 2025

To be clear, I am only focused on hugr-core here and above. My primary concern is that we should not add non-edge links between nodes. My understanding of the issue is that it proposes to add a Node into a constructor of Value, which, since Values live in Const nodes, is a non-edge link between nodes. If we do decide to abandon "HUGRs are graphs", which to me is what non-edge links between nodes entails, then we should do so deliberately and with the consensus of all stakeholders.

I do not propose to store Values in TypeArgs but rather that constants are encoded in TypeArgs (or something like those)

Values are our constants. This doesn't seem to make sense?.

At the moment hugr-core uses Types, TypeArgs, TypeParams, Values and metadata as four different encodings of essentially the same thing

This is not clear. They are certainly closely related and tightly coupled, but this is not sufficient for them to be essentially the same thing.

TypeEnum::RowVar for variables in a TypeRow.

RowVar variables occur in TypeRV. (a flavour of Type.)

Yes it would be good for things to be simpler, but it's not clear that they can be. Even if these ideas were the same as you claim, it's not clear that we could unify them, or that doing so would not incur massive breakage. Implementing things non-generically is not the worst thing.

I think it would be more productive to think about how we should enrich Const nodes and extension ops with static egdes.

@zrho
Copy link
Contributor

zrho commented Jan 15, 2025

The issue is that in order to have references to functions nested in TypeArgs and Values we need a link not between two nodes but between a (sub-) TypeArg/Value and a node. You don't like the idea of the TypeArg/Value referencing the node because that is a non-edge link, but your suggestion is to add an edge link between the containing op and the node, and then on top of that a non-edge link from the TypeArg/Value to the corresponding static port of the containing op. So that still entails a connection that isn't an edge, except now there's additional bookkeeping involved for managing the static ports.

Yes it would be good for things to be simpler, but it's not clear that they can be. Even if these ideas were the same as you claim, it's not clear that we could unify them, or that doing so would not incur massive breakage.

They definitely can be simpler and we can unify them; I'm happy to go into details how this could be once the current pre-conference crunch is over. It is true that there'd be breakage which is why I haven't touched any of this. Part of the reason for a serialisation format that isn't tied to the internal representation is that we gain more freedom to rearrange internals. We would sooner or later have to do that rearrangement anyway since the current representation is really slow. At that point we can piggyback on that break.

@doug-q
Copy link
Collaborator

doug-q commented Jan 15, 2025

So that still entails a connection that isn't an edge, except now there's additional bookkeeping involved for managing the static ports.

That's not the case. TypeArgs and Values are components of a Node, not first class nodes themselves.

We would sooner or later have to do that rearrangement anyway since the current representation is really slow.

There are no requirements I'm aware of specifying how fast it needs to be. If there were such requirements it's not clear that they could not be satisfied without large breakage.

The majority (or maybe all?) of downstream dependencies use hugr-core to do their work. A serialisation format that isn't tied to the internal representation is of very comfort here.

@zrho
Copy link
Contributor

zrho commented Jan 15, 2025

That's not the case. TypeArgs and Values are components of a Node, not first class nodes themselves.

Indeed they aren't first class nodes or individually addressable things but they should be precisely in order to track dependencies like this. I am not saying to make TypeArgs into nodes in the same graph. I am saying that they should have ids so that we can have the auxiliary data structures on top of them to track relationships with other parts of the Hugr.

Consider this illustration:

Image

Your proposal is on the left. The TypeArg which is nested somewhere within the node refers to the static input of the containing node which then in turn is connected via a static edge to the static output of the FuncDefn. The red arrow is the connection that isn't an edge in the portgraph. It is still a connection: The TypeArg refers to the static input port and the static input port is used by the TypeArg.

On the right is a variant where the TypeArg directly refers to the node. Again there's a red arrow that signifies a connection that isn't an edge in the portgraph. If the TypeArg had an id, we could record this connection and make it navigable in both directions.

Say that now I have a type variable somewhere in a TypeArg instead. How do I substitute the type variable with a reference to the function? With the static port approach it is difficult to even express because TypeArgs then don't exist outside of the context of a node. I'd have to add a static port that is connected to the function and then construct a TypeArg that refers to that new static port. Certainly possible, but suddenly term substitution needs to do bookkeeping on the parent nodes.

@doug-q
Copy link
Collaborator

doug-q commented Jan 16, 2025

I agree that your picture is an accurate representation of the two options.

I think adding an additional layer of bookkeeping on top of the graph is worse than keeping it all in the graph.

Type args requiring the context of a node (as they do now, to interpret variable occurences) doesn't seem too bad.

@acl-cqc
Copy link
Contributor Author

acl-cqc commented Jan 17, 2025

Ok I'm really sorry for raising this issue just before I went away. As a halfway house / migration issue it was not thought through; (a) I had believed I needed it for static-evaluation of BRAT but I don't (likely I just need to define an extension type with appropriate constant-folding routines), (b) as an intermediate step it indeed breaks some things that we have come to expect from a Hugr.

I think to @doug-q I would say - Hugr "graph" is already a mixture of two things: there is the runtime graph (edges down which values are passed) and the static graph (edges down which....nothing is passed: it's a reference from the target to the source). Then the TypeArgs also contain static information, but with massive duplication in order to maintain treelikeness, which is what we need so that a TypeArg can be interpreted in the context of a node (i.e. there always is a node). Lukas' proposal is to combine the latter two in a non-tree-like representation i.e. "terms". These can also be seen as a graph but not treelike. So, rather than "one graph" with multiple edge types, we might have two graphs - one for runtime and one for static - and then, do we use the same edge representation for both? For the static graph, we might not wish to maintain back-references all the time - they are expensive to maintain/update when they are not needed - instead we might store the "edges" in one direction only, and then compute backreferences when needed (potentially updating these under mutation).

So, perhaps we have to change more things at a time in order not to lose functionality.

I think that leaves open the question of whether there is any intermediate step that's useful. I would seriously consider just removing Value::Function, then, in order to dissuade people from using something whose semantics have never been entirely clear (e.g. linking, e.g. can the inner Hugr be a FuncDefn/Module or does it have to be a DFG? Or a CFG/other parent? etc.) - all the possible variants can then be introduced as extension types when/if desired. This would leave LoadFunction as the only node producing a value output of Type::Function. Doing this now might be a good plan as I don't think we'll have many users of Value::Function yet but these will likely increase over time...

A side goal is devirtualization (detecting when the target of an indirect call is known to be a single FuncDefn). For that we could then add a separate enum ValueOrFunc {Extension(CustomConst), Sum(Vec<Vec<ValueOrFunc>>), Function(Node)} just used for dataflow analysis (i.e. the first two variants work like Value, to capture all varieties of constant folding; the final variant is produced only by LoadFunction nodes). Then a CallIndirect whose target was a ValueOrFunc::Function could be converted to a Call with static target; there is no need for core Value to serve this purpose.

@doug-q
Copy link
Collaborator

doug-q commented Jan 17, 2025

Hugr "graph" is already a mixture of two things: there is the runtime graph (edges down which values are passed) and the static graph (edges down which....nothing is passed: it's a reference from the target to the source).

A hugr is literally a single graph with labeled edges distinguishing categories. I'm not sure I agree on your characterisation of "runtime graph". (A linear unit type is a useful value edge that carries no information). But I don't think this detail is important.

I would seriously consider just removing Value::Function,

I think this is a good idea, not because the semantics are unclear but because it can be implemented as a CustomConst.

can the inner Hugr be a FuncDefn/Module or does it have to be a DFG? Or a CFG/other parent?

It can be a monomorphic FuncDefn or a DFG this is defined in the code. IMO we should also allow CFG + TailLoop. I can escape this oppressive tyranny with my own CustomConst though!

This would leave LoadFunction as the only node producing a value output of Type::Function

This is not true. A CustomConst can have a FunctionType. Also sums-of-products including FunctionType.

A side goal is devirtualizatio

A side goal of what? I agree we should do it with dataflow-analysis + constant folding,

Lukas' proposal is to combine the latter two in a non-tree-like representation i.e. "terms". These can also be seen as a graph but not treelike. So, rather than "one graph" with multiple edge types, we might have two graphs - one for runtime and one for static - and then, do we use the same edge representation for both? For the static graph, we might not wish to maintain back-references all the time - they are expensive to maintain/update when they are not needed - instead we might store the "edges" in one direction only, and then compute backreferences when needed (potentially updating these under mutation).

Yes I understand the proposal. Maybe it would have been a better design. I can''t see how redesigning everything helps us further our goals. It's not like what we have now doesn't work well. A large redesign is likely to have unforseen downsides.

Let's move forward with what we have, not backwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants