Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Arbitrary (First-order) Datatypes in Ports #490

Closed
maxsnew opened this issue Feb 5, 2014 · 25 comments
Closed

Feature Request: Arbitrary (First-order) Datatypes in Ports #490

maxsnew opened this issue Feb 5, 2014 · 25 comments
Labels
Milestone

Comments

@maxsnew
Copy link
Contributor

maxsnew commented Feb 5, 2014

This seems pretty simple to implement:

data Foo a = Bar String
           | Baz (Foo a) Int
x = Bar "bleh"
y = Baz x 7

then x and y could be serialized to the following javascript (field names could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }
var y = { _ctor: "Baz", _0: x, _1: 7 }

The only problem I see with this is that the serialization of Maybe would be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're unsure about doing this in general, we definitely need to at least support the Either type. Currently I'm doing the following to get around this limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is obviously suboptimal.

@johnpmayer
Copy link
Contributor

Totally support this and I've already given it some thought, so just some
comments:

First, in
https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs,
the code path currently only knows that it's dealing with a Foo, but it
doesn't know that Foo is made up of Bars and Bazs. This "ADT shape
information" isn't available to this part of the program (at least it
wasn't obvious to me where that would be found), and it will have to be
made available. I defer to Evan on the actual implementation challenges of
wiring through this information.

Personally, I don't like the fact the null equates with Maybe. I would
remove that compatibility in favor of arbitrary ADT serialization.

As far as the serialization scheme, I noticed that you essentially chose
the native representation of Elm types. I agree that this is a nice way to
do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it
leads to an implementation that is both simple and fast. However, I would
drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where
_params is a list of arguments, as this is compatible with existing code in
Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum
types with product types, and our basic product types (tuples) are already
implemented as lists, so that's a nice property.

On Wed, Feb 5, 2014 at 6:51 PM, Max S. New [email protected] wrote:

This seems pretty simple to implement:

data Foo a = Bar String
| Baz (Foo a) Intx = Bar "bleh"y = Baz x 7

then x and y could be serialized to the following javascript (field names
could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7 }

The only problem I see with this is that the serialization of Maybe would
be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're
unsure about doing this in general, we definitely need to at least support
the Either type. Currently I'm doing the following to get around this
limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is
obviously suboptimal.

Reply to this email directly or view it on GitHubhttps://github.com//issues/490
.

@johnpmayer
Copy link
Contributor

My overall feeling on this are that, because elm is a strict language, all non-lambda types should have default serialization built in, and thus could all be used in ports.

@johnpmayer
Copy link
Contributor

One small problem I see in your example: Without type annotations, the type variable a, in both the type of x and the type of y, is indeterminate. I point this out mainly to say that, if there are free variables in a type, then that type should not be serializable.

@maxsnew
Copy link
Contributor Author

maxsnew commented Feb 6, 2014

Oh woops, I meant to use the a.
Given

data Foo a = Bar a
           | Baz (Foo a) Int

This should work:

port out : Signal (Foo String)
port out = constant <| Bar "Work!"

producing

{ _ctor: "Bar", _0: "Work!" }

but this should obviously be an error:

port out : Signal ([a] -> a)
port out = contant head

This is already handled to some extent since Maybe/List are serializable.

@evancz
Copy link
Member

evancz commented Feb 6, 2014

I'd be more inclined to use the _0, _1, etc style. I currently think it is easier to use (and should generally have better performance). Either way, I think the leading underscore for _ctor and _params is not adding any value, and I believe it is ctor in the compiler now?

Let's do some examples to see on how it looks:

switch(value.ctor) { ... }
switch(value.constructor) { ... }  // collides with built-in JS name?

value._0

// I feel like this is not a thing that is named, but
value.params[0]
value.values[0]

// or maybe
value.v0
value.p0
value.a0
value[0] // you can use numbers as field names! :O

No idea!

As it is, you don't need to have a full list of constructors to do any of this. Although, it could be very valuable to have such a list because generated JS could switch on integers instead of strings, which should be significantly faster:

switch (x.ctor) {
case "Just":
case "Nothing":
}

// possibly faster version
switch (x.ctor) {
case 0:
case 1:
}

I think this kind of optimization is not a good idea right now. No one will notice speed gains in this realm, and it'll make things harder to maintain and make generated code harder to read.

@JohnNilsson
Copy link

Why not just use an array, using the first element for ctor?
["Baz", x, 7]
On Feb 6, 2014 2:35 AM, "John P Mayer, Jr" [email protected] wrote:

Totally support this and I've already given it some thought, so just some
comments:

First, in

https://github.com/johnpmayer/Elm/blob/master/compiler/Generate/JavaScript/Ports.hs
,
the code path currently only knows that it's dealing with a Foo, but it
doesn't know that Foo is made up of Bars and Bazs. This "ADT shape
information" isn't available to this part of the program (at least it
wasn't obvious to me where that would be found), and it will have to be
made available. I defer to Evan on the actual implementation challenges of
wiring through this information.

Personally, I don't like the fact the null equates with Maybe. I would
remove that compatibility in favor of arbitrary ADT serialization.

As far as the serialization scheme, I noticed that you essentially chose
the native representation of Elm types. I agree that this is a nice way to
do it; browsers are pretty efficient at JSON.parse and JSON.toString, so it
leads to an implementation that is both simple and fast. However, I would
drop the {_ctor,_0,_1,_2} pattern in favor of {_ctor,_params}, where
_params is a list of arguments, as this is compatible with existing code in
Haskell's Data.Aeson.TH. If _params is a list, then ADTs are simply sum
types with product types, and our basic product types (tuples) are already
implemented as lists, so that's a nice property.

On Wed, Feb 5, 2014 at 6:51 PM, Max S. New [email protected]
wrote:

This seems pretty simple to implement:

data Foo a = Bar String
| Baz (Foo a) Intx = Bar "bleh"y = Baz x 7

then x and y could be serialized to the following javascript (field names
could easily be something else):

var x = { _ctor: "Bar", _0: "bleh" }var y = { _ctor: "Baz", _0: x, _1: 7
}

The only problem I see with this is that the serialization of Maybe would
be inconsistent with this.

Right now it's actually impossible to serialize a Sum type. Even if we're
unsure about doing this in general, we definitely need to at least
support
the Either type. Currently I'm doing the following to get around this
limitation

type Either' a b = (Maybe a, Maybe b)

where I ignore the 2nd value if the first one is non-null, but this is
obviously suboptimal.

Reply to this email directly or view it on GitHub<
https://github.com/evancz/Elm/issues/490>
.

Reply to this email directly or view it on GitHubhttps://github.com//issues/490#issuecomment-34283540
.

@evancz
Copy link
Member

evancz commented Feb 6, 2014

@JohnNilsson, that's actually how things were represented early on in the generated code :) I think it was slower, but I don't think that's the point of this API.

switch(foo[0]) {
case "Baz":
  return foo[1] + foo[2];
case "Bar":
}

Any opinions on the looks of this? (with your JS cap on)

@evancz
Copy link
Member

evancz commented Feb 6, 2014

Maybe we should have a more realistic example and do a bunch of gists?

@maxsnew
Copy link
Contributor Author

maxsnew commented Feb 7, 2014

Here is the code for the program I was writing: https://gist.github.com/maxsnew/8855176 .

@johnpmayer
Copy link
Contributor

I'm really just hoping for ADT compatibility with Data.Aeson.TH, which
automatically generates JSON serialization in Haskell. I know this isn't a
priority for everybody, but I feel like it's certainly something to consider,
at least as prior art.

Let's say we have:

data Foo = Bar Int String
x = Bar 5 "baz"

Elm internals currently look like { "ctor" : "Bar", "_0" : 5, "_1" : "baz" }

Data.Aeson.TH has three modes for ADTs.

TaggedObject looks like { "tag" : "Bar", "contents" : [ 5, "baz" ] }

ObjectWithSingleField looks like { "Bar" : [ 5, "baz" ] }

TwoElemArray looks liks [ "Bar", [ 5, "baz" ] ]

On Thu, Feb 6, 2014 at 7:15 PM, Max S. New [email protected] wrote:

Here is the code for the program I was writing:
https://gist.github.com/maxsnew/8855176 .

Reply to this email directly or view it on GitHubhttps://github.com//issues/490#issuecomment-34391435
.

@evancz
Copy link
Member

evancz commented Feb 8, 2014

Cool, @maxsnew that examples helps me :) I like the names "tag" and "contents" a lot! That is much more intuitive for me when I think of them as JS data structures.

I think the ObjectWithSingleField always seems like a cool and great idea until you actually try to use it in JS and find that it's a decent pain in the ass (this is the key problem with the type format in generated documentation).

So I would currently lean towards the TaggedObject style.

@evancz
Copy link
Member

evancz commented Feb 8, 2014

As for the issue of "should Maybe Int represent the set of all integers and null or be serialized to this format?"

It may make sense to have a type alias type Nullable a = Maybe a that has the different behavior. I'm not sure though. I think this is a particular case where it makes sense to have a special case. Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.

@johnpmayer
Copy link
Contributor

Yes I agree that TaggedObject is the nicest pattern. I don't think there
would be any significant performance hit.

I've backtracked on my opinion of Maybe; I think it does make sense to have
something to handle nulls in JS. Maybe is the natural candidate, and that's
how Aeson handles null as well.

On Sat, Feb 8, 2014 at 10:19 AM, Evan Czaplicki [email protected]:

As for the issue of "should Maybe Int represent the set of all integers
and null or be serialized to this format?"

It may make sense to have a type alias type Nullable a = Maybe a that has
the different behavior. I'm not sure though. I think this is a particular
case where it makes sense to have a special case. Ports are about writing
natural and colloquial code in both Elm and JS, not forcing an approach on
JS people.

Reply to this email directly or view it on GitHubhttps://github.com//issues/490#issuecomment-34546381
.

@evancz evancz added this to the Ports milestone Apr 1, 2014
@evancz
Copy link
Member

evancz commented Apr 1, 2014

Okay, so the main goal of this issue is: Get conversion with the TaggedObject style.

The subgoal is: Optimize case expressions to use extra information to generate switches on integers.

I think there is a hack to do this :) When you generate constructors for an ADT, you also generate some helper structures in JS.

data Either a b = Left a | Right b

would generate the normal Left and Right constructor, but also something like this:

var Either = { Left : 0, Right : 1, schemas : [ ... ] };

And from there you use Either.Left and Either.Right to do switches on integers while maintaining readability in the source code. I don't think that's an important issue, but some people do. We just have to hope that inlining integers is faster than comparing strings in switches. You can also use the schemas to convert from JS to Elm.

@jprider63
Copy link

Hi,

I was wondering what the status of this feature is. I made a post on the mailing list that is related to this. I can potentially contribute towards this if it would help accelerate this feature.

By the way, I've benchmarked switching on strings vs. inlining integers, and surprisingly strings seems to do better. I ran this in chrome 35 and strings are twice as fast:

Average string time: 0.0010951002826914192
Average constructor time: 0.002199900755658746

var benchmark = function ( f) {
        var t0 = performance.now(); //new Date().getTime();
        f();
        var tf = performance.now(); //new Date().getTime();

        // console.log( tf);
        // console.log( t0);

        return tf - t0;
}

var match_string = function( str) {
        return ( function() {
                switch ( str) {
                        case "ConstructorA":
                                break;
                        case "ConstructorB":
                                break;
                        case "ConstructorC":
                                break;
                        case "ConstructorD":
                                break;
                        default:
                                break;
                };
        });
}

var ADT = { ConstructorA: 0, ConstructorB: 1, ConstructorC: 2, ConstructorD: 3}
var match_constructor = function( cstor) {
        return ( function() {
                switch ( cstor) {
                        case ADT.ConstructorA:
                                break;
                        case ADT.ConstructorB:
                                break;
                        case ADT.ConstructorC:
                                break;
                        case ADT.ConstructorD:
                                break;
                        default:
                                break;
                }
        });
}

var sum_string = 0.0;
var sum_cstor = 0.0;

var n = 10000;

for ( var i = 0; i < n; i++) {
        var rand = Math.floor(Math.random() * 4);
        var rand_string;
        var rand_cstor;
        switch ( rand) {
                case 0:
                        rand_string = "ConstructorA";
                        rand_cstor = ADT.ConstructorA;
                        break;
                case 1:
                        rand_string = "ConstructorB";
                        rand_cstor = ADT.ConstructorB;
                        break;
                case 2:
                        rand_string = "ConstructorC";
                        rand_cstor = ADT.ConstructorC;
                        break;
                case 3:
                        rand_string = "ConstructorD";
                        rand_cstor = ADT.ConstructorD;
                        break;
        }

        sum_string += benchmark( match_string( rand_string));
        sum_cstor += benchmark( match_constructor( rand_cstor));
}

console.log( "Average string time: " + (sum_string / n));
console.log( "Average constructor time: " + (sum_cstor / n));

@evancz
Copy link
Member

evancz commented Jul 11, 2014

Just talked with @michaelbjames about working on this feature. These benchmarks appear to confirm your results, which is very confusing to me. Perhaps people switch on strings more often? In any case, let's not do that part of this then.

You are welcome to take a look, but keep @michaelbjames up to date on what you find. I think it's a trickier to implement than it sounds.

@deadfoxygrandpa
Copy link
Contributor

I remember making a jsperf test for this int vs string switching a long time ago, and I remember being surprised that strings were faster. It's apparently been this way for quite a while.

@jprider63
Copy link

Ok, I'll try to get in contact with @michaelbjames about working on this.

@thSoft
Copy link
Contributor

thSoft commented Nov 11, 2014

+1 What is the status of this? I would advise first implementing, then optimizing it. :)

@thSoft
Copy link
Contributor

thSoft commented Jan 24, 2015

As of now, a workaround is to pass Json.Values through ports and Json.Decode/Encode them.

@rtfeldman
Copy link
Member

I'm a bit late to the party here, but has anyone considered exposing (and mandating the use of) explicit constructor functions? For example:

Elm.Just(42)
Elm.Nothing()

I see a few benefits to this:

  1. The intermediate representation is no longer exposed, so it can change between versions without breaking existing code.
  2. You can no longer accidentally use this API for dirty hacks; all the exposed constructors only permit doing things "the right way," so you'd have to deliberately avoid using them in order to do things the wrong way.
  3. This looks more like the equivalent Elm code.

Thoughts?

@knuton
Copy link

knuton commented May 29, 2015

I pitched an idea to Evan last week which boils down to registering destructors (in-port) and constructors (out-port) for arbitrary representations of variant types. We agreed that I'd write it up here to get some feedback.

The main motivation is to arrive at a unified approach that subsumes currently port-compatible ADTs while allowing arbitrary data representations outside of Elm. This picks up on @rtfeldman's proposal.

I very much tried to think with my JS cap on: How might I want to deal with this data in a JS codebase? (Hence trying to avoid if (foo[0] === 'Bar') return foo[1];.)

Credo:

Ports are about writing natural and colloquial code in both Elm and JS, not forcing an approach on JS people.

Benefits:

  • No need to fight over representation: Outside representation can be "idiomatic" in various JS styles
  • Conversion of e.g. Maybe fits into this approach naturally
  • Be compatible with Data.Aeson.TH if you want
  • The default behavior can be convenient and simple, e.g. TaggedObject going in and out
  • Conversion and border policing collapse into one process, we incur low overhead

I'll reuse the example from above:

data Foo a = Bar a
           | Baz (Foo a) Int

Let's assume that in my JS I represent those as (assuming Foo String)

{ kind: 'Bar', size: '40sqm' }
{ kind: 'Baz', bar: { ... }, rooms: 5 }

Other encodings abound, e.g. OOP style.

Destructing Outside Representations (in-ports)

port : Signal (Foo String)

The basic idea for conversion at an in-port with type T is to register a function that knows how to transform an incoming value when given access to all of the constructors of T. The trick is to only accept conversion results when they are constructed with our constructors. This allows us to assure that ultimately incoming values are of the right form.

So in this case, for whatever outside representation OuterFoo String of Foo String we need a function

   OuterFoo String
-> (String -> Foo String) -- Bar
-> (Foo String -> Int -> Foo String) -- Baz
-> Foo String

for example suppose the following function is registered for Foo

function fooIn(val, Bar, Baz) {
  switch (val.tag) {
    case 'Bar': return Bar(val.size);
    case 'Baz': return Baz(val.bar, val.rooms);
  }
}

Bar and Baz are not simple constructors, but are guarded -- this is the border policing:

What does Bar do?

  • If val.size is a string, construct a Foo String
  • Otherwise, throw

What does Baz do?

  • Look up the converter for Foo, i.e. fooIn
  • If fooIn(val.bar) returns and val.rooms is a number, construct a Foo String
  • Otherwise, throw

What happens with the return value of fooIn?

  • Check whether it was constructed with either Bar or Baz, throw if not

Note that this construction process is recursive, that is, nested ADTs can be sent through ports.

Constructing Outside Representations (out-ports)

The idea is similar but now the converter is called with a tag (probably a string?) and the data, from which it constructs whichever representation desired.

Converter Registration

A default converter (e.g. from and to TaggedObject) can be used when no converter is registered for a type. Converters can be registered globally for all types and/or per type.

Because ADTs may appear nested inside ADTs or records, we need to do one of two things:

  • Fall back to a default if we don't have a converter for a type
  • Throw if we don't have a converter for a type (this is basically what happens now at compile time)
    • Either when initializing the Elm app (requires type traversal on defined ports)
    • Or on processing incoming/outgoing values

Builtins

The Elm runtime could make use of this approach to register a TaggedObject converter as default, while pre-registering special converters for builtin types such as Maybe, List, etc.


I have more details, but this is too long alreay. (Sorry!) What do you think?

@evancz
Copy link
Member

evancz commented Aug 6, 2015

I am trying to clean up the issues on these repos. My goal is for issues on repos like elm-compiler and elm-repl are about specific bugs. It is not working to manage proposals and stuff that requires changes to the language here, so I am devising a way of keeping track of all this stuff in a transparent way.

The idea is that it doesn't super matter where the proposal is opened, but we should collect related ideas in certain places so we can look at them as a collection of issues and find a coherent solution that addresses them all as much as possible. So I'm gonna close this and reference elm-lang/elm-plans#17 which is collecting a bunch of port problems.

If there is more to say about this idea, we should continue on this thread. The fact that is closed is more about decluttering stuff!

@evancz evancz closed this as completed Aug 6, 2015
jwmerrill added a commit to jwmerrill/elm-todomvc that referenced this issue Jan 31, 2016
This makes it more difficult to send the model through a port for
serialization, so I ripped that feature out. It seems like this won't
be easy until there is a solution to

elm/compiler#490
@thewoolleyman
Copy link

Hello,

What is the status of this? elm-lang/elm-plans#17 is still open with no status, but the repo is "deprecated"...

Thanks,
-- Chad

@dullbananas
Copy link

@evancz what is the progress on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests