-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type representation design #2348
Comments
Outcome of some thinking during a nice hike through the snow. Let’s collect all the things we have to think of now! |
Thanks for collecting this! I agree with most of these points. A few questions below.
Can you elaborate on this? In particular, isn't dfinity/candid#168 about subtyping over Candid types, while typerep would represent Motoko types (which are different and have different subtyping rules)?
What error messages do you have in mind? This strikes me as rather an anti-requirement -- it would be unfortunate if we couldn't erase field names anymore, and if rtts were even more bloated than they already have to be. They are not supposed to become a user-facing reflection feature.
Hm, where will type identity be needed? I can see that for optimisations, e.g., for memoisation, but that wouldn't necessarily need to be accurate. But for semantics, aren't you always inspecting the type instead of comparing it? Or if you compare, won't it be up to subtyping? Notably, even in the static semantics, there are very few occurrences of using type equivalence -- essentially, only subtyping over invariant type constructors (which could be replaced with antisymmetry).
This will probably be the most tricky part, since it requires a form of substitution, so graph copy. (Fortunately not involving reduction, since we do not have higher-order polymorphism (yet?).)
In Motoko you could represent this with var indirections, though, can't you?
Yes, there is only one typerep type, so subtyping doesn't enter the picture, AFAICS.
Let's try to keep strings out of typereps? I don't have a specific opinion regarding an embedded vs native representation. Would it be realistic to do some prototyping to see which one is simpler in practice? |
When implementing deserialization, you actually need
The last point can’t be omitted: And if you don’t want to do a fresh candid subtyping check repeatedly on each recursion of what’s now
Without them you can’t implement Field names are part of our types, for what it’s worth, I don’t think we can avoid them.
Yes, memoization for Candid decoding (only use-case so far) and yes, don’t need to be accurate (that’s what I wanted to say). We could rephrase this as “must support loop breakers”? Or what is the right technical term for the property we need? I don't see a use-case yet for Motoko subyping on typreps (thankfully).
It’s probaly not too bad; you statically compile code that allocates and builds the graph with holes, and then put in the pointers to
You could, but since we don't need the properties of mutable boxes (e.g. have identity), don’t actually see the gain over just creating a coinductive structure.
See above, I don’t think we can.
Actually, I don’t think that choice is very important, most of the hard work is orthogonal to that. Slightly leaning towards using the native representation, so that we can maybe write some useful helper functions in Motoko. |
On second thought, it must be possible to implement the Candid decoding statically; it's just the partial evaluation of the dynamic one (using typrep) to the statically created typerep. But probably not a lot of fun, with the memorization, and lots of code bloat. So this just as an aside |
I'm not worried about the field name strings. We already embed them in the code (show, or in Candid deserialisation for the error message about missing fields); moving that into the type rep will only reduce bloat. And they will all be just pointers to statically allocated strings (ah: and hence always contiguous), so effectively just words. What is your worry about them? |
Hm, now this is why I'm so reluctant to let hacks like Ideally, typereps should have minimal overhead and only contain the information necessary to implement the language semantics. Syntactic names, whether variables or labels, aren't semantically relevant up to equivalence, so shouldn't need to be represented at runtime. But FWIW, this is a debugging feature, isn't it? In principle, we could reasonably produce text names only in debugging mode. Whether it's worth bothering is a different question. |
I am not sure. Pretty-printing, or decoding/encoding , are very fundamental use cases for any kind of generic or deriving features. And I really don't see the problem with representing them with a unique number that happens to be a pointer to static memory. 🤷♂️ |
Aren't the hashes a leaky abstraction in the first place? I mean, in principle, users need to worry about hash collisions, which depends on the concrete choice, not distinction, of names, even if collisions are rare in practice. Using a table index to represent label names doesn't seem like a terrible idea, though it would complicate separate compilation (and representing static data) in a way that hashing avoids. Is the Motoko I imagine
But if we add indexing, what would |
What is the advantage of a table over simply treating it like we treat other static string constants? |
I was am discussing the representation of |
You mean dedup and allocate in static memory, and use static address for equality? I guess that would work for now, but break on separate compilation. I was imaging some scheme where each sep. compiled module has a table used to lookup runtime representations of labels, with tables patched on linking so they agree on label representation. |
Where do we need (at runtime) to compare the strings? We need them
For the hashes, it may be worth including them in the typrep, to avoid doing the hashing dynamically, e.g. something like But I don’t see a use-case for comparing the labels of two type reps, unleess I am missing something. (Or did you think I was proposing to change our value representation as well, replacing field hashes with pointers to strings? That was not my intention here, sorry if I was unclear.) |
I did, in fact, interpret it this way... |
Playing around a little bit with this. I think it makes sense to pursue approach one and use Motoko-values (or rather, IR values), if only for the singular reason that we can generate the values in an IR-to-IR pass, so that we get type-checking of the result, and can run the IR interpreter on them. I am pondering if we should allow cyclic values in our IR language, so that the typrep of
which looks like this in IR
will compile. Would that be a viable direction? |
Hmm, not sure. We’d have to
Probably too big of a change. I guess I can insert
into the value representation; this would allow me to tie the knot, and consumers of that type can just follow that dereference. This would work in the interpreter, but we’d get these extra indirections in the backend. Maybe not too bad, I think I can still compile that to static memory. These mutable cells would become GC roots, because the backend doesn’t know that they are write-once. Maybe good enough to get started, the rest is just optimization. |
So, this seems to work. Not the most efficent in-memory representation right now, but maybe first get this working, and then refactor if needed (e.g. use less Guess the next step is to actually use the type rep for something interesting, such as pretty-printing. I am a bit torn whether I want to implement that completely in rust, or in |
NB: with dfinity/candid#311 changing the way Candid decoding works again we don't need this anymore, so the second bullet in the original motivation doesn't apply anymore. The rest does, but makes this less pressing. |
In this issue I’d like to design our initial type representation (typerep) design.
What’s this?
The (run-time) type representation is a value at run-time that indicates a type.
What for, and why now?
We need (or can use this) for:
Currently, we do type-driven code generation in the IR-to-IR passes (
show.ml
) and the code generator (for serialization/deseriaizaion). This could be replaced with type-driven generation of a typrep value, and then interpreting that value in (static) rust code.The benefits are:
In particular, I don’t see how we can extend our existing approach to deserialization code generation to the new requirement of dynamic subtyping checks (Spec: Do a subtyping check when decoding candid#168); this really seems to require a typerep value (with identity).
A bit down the road, this is a requisite for Shared Generics (Shared generics #2096), and even if we don’t do that right away, it would be good to keep that use-case in mind in the design.
Because this seems to be a blocker for correct Candid deserialization, now seems to be the right time to tackele this. Nevertheless, it might be advisable to first migrate
debug_show
, as it is simpler.Requirements
?t
should contain the typrep fort
). Statically meaning the typrep fort
exists in memory already. This seems to be necessary for all kind of recursive algorithms (show
), and it would be odd if they have to allocate. This likely rules out a format like our “type hash” (which otherwise is, in a way, a very compact representation of the full type).The following requirements come from the usecase of shared generics.
t
one can construct the typrep for?t
). This must also work for infinite types (e.g. fromt
constructlet t2 = ?(t, t2) in t2
).Approach 1: Use Motoko Values
One possible design is to define a Motoko type:
and use the representation of that.
prelude.ml
or in IR-to-IR passesTypRep
are always contiguous.Approach 2: Dedicated heap objects
We could introduce dedicated heap objects for the TypReps.
(No details here yet.)
Pros and Cons mostly the inverse of Approach 1.
And now?
Discuss!
The text was updated successfully, but these errors were encountered: