Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework data pattern matching to use default cases #5557

Merged
merged 3 commits into from
Jan 29, 2025

Conversation

dolio
Copy link
Contributor

@dolio dolio commented Jan 24, 2025

Previously, we were always expanding data type matching to explicitly match against every case. This was necessary because we would eagerly dump the entire data contents to the stack and analyze the tag from there. Different constructors with different numbrs of arguments would result in a different stack layout, so default cases could not be uniform.

However, a while ago we started analyzing data type tags directly, to avoid putting them on the stack. This change goes the final step. It selects the branch before deciding to dump the data type to the stack, and only does so in specific case branches. Nothing is added to the stack in default branches, so they can be shared among all constructors.

On the other end, the pattern matching compiler has been changed to preserve default cases when possible. When splitting on a data type, the actual constructors matched in the source are used as the explicit branches, and a default case is used if it exists in the source. The translation will still sometimes duplicate branches, but not nearly as much as before (and mainly due to complicated pattern matching).

Passed base tests, unison tests, unison transcripts and runtime tests.

Previously, we were always expanding data type matching to explicitly
match against every case. This was necessary because we would eagerly
dump the entire data contents to the stack and analyze the tag from
there. Different constructors with different numbrs of arguments would
result in a different stack layout, so default cases could not be
uniform.

However, a while ago we started analyzing data type tags directly, to
avoid putting them on the stack. This change goes the final step. It
selects the branch before deciding to dump the data type to the stack,
and only does so in specific case branches. Nothing is added to the stack
in default branches, so they can be shared among all constructors.

On the other end, the pattern matching compiler has been changed to
preserve default cases when possible. When splitting on a data type,
the actual constructors matched in the source are used as the explicit
branches, and a default case is used if it exists in the source. The
translation will still sometimes duplicate branches, but not nearly as
much as before (and mainly due to complicated pattern matching).
@dolio
Copy link
Contributor Author

dolio commented Jan 24, 2025

No appreciable benchmark differences as far as I can see. Probably not surprising.

Copy link
Contributor

@ceedubs ceedubs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the Unison Cloud integration tests pass with this change.

Given that the benchmarks show no appreciable change, could you shed some light on the motivation for this? Is the thought that it could make a significant difference in some cases of pattern matching? If so, should we benchmark it in those cases?

@dolio
Copy link
Contributor Author

dolio commented Jan 24, 2025

This change only affects situations where you have something like:

match v with
  C1 x y z -> ...
  _ -> ...

where are several constructors besides C1. Before it was duplicating the default case many times, and now it actually uses the default support of the interpreter for it, and is able to use certain special 1 and 2 case matching constructs which can be a little faster. I don't think the benchmarks really contain any situations like this, so they wouldn't be expected to change.

Chris had an example where it made some difference, but I don't think it's in our tests right now.

Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎👍

If nothing else this will reduce code sizes, which matters for distributed execution. Though I suspect we could craft some examples where there’s a decent speedup.

cc @ChrisPenner

@aryairani
Copy link
Contributor

Is this one ready to merge, then?

@dolio
Copy link
Contributor Author

dolio commented Jan 28, 2025

Yeah, I think so. Unless we want to wait for Chris to weigh in.

@dolio
Copy link
Contributor Author

dolio commented Jan 28, 2025

Oh. One other thing I should maybe mention. I didn't change the way that request matching works, so that will still expand cases. I guess I could change that, but it actually seems kind of weird to me to have catch-all cases there. I'm not sure people actually write them (although I think it's technically allowed).

Let me know if you think I should change this.

@ChrisPenner
Copy link
Contributor

Sorry I've been slacking on reading this notification haha; I'll build and run this on some of the tests I was using on my prototype

@ChrisPenner
Copy link
Contributor

ChrisPenner commented Jan 28, 2025

Here's the suite I'm running:

type Four
  = One Nat
  | Two Nat
  | Three Nat
  | Four Nat

type Thirty
  = T1
  | T2
  | T3
  | T4
  | T5
  | T6
  | T7
  | T8
  | T9
  | T10
  | T11
  | T12
  | T13
  | T14
  | T15
  | T16
  | T17
  | T18
  | T19
  | T20
  | T21
  | T22
  | T23
  | T24
  | T25
  | T26
  | T27
  | T28
  | T29
  | T30

type Three
  = One Nat
  | Two Nat
  | Three Nat

type Two
  = One Nat
  | Two Nat


t : '{IO, Exception} ()
t = do
  use Nat - <= ==
  printTime
    "Two match" 1 let
      use Two One Two
      go = cases
        One x -> if x == 0 then () else go (Two (x - 1))
        Two x -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Three match" 1 let
      use Three One Three Two
      go = cases
        One x   -> if x == 0 then () else go (Two (x - 1))
        Two x   -> if x == 0 then () else go (Three (x - 1))
        Three x -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Four match" 1 let
      use Four One Two
      go = cases
        One x        -> if x == 0 then () else go (Two (x - 1))
        Two x        -> if x == 0 then () else go (One (x - 1))
        Four.Three x -> if x == 0 then () else go (Four (x - 1))
        Four x       -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Thirty match" 1 let
      go n = cases
        T8 -> if n <= 0 then () else go (n - 1) T7
        _  -> if n <= 0 then () else go (n - 1) T8
      n -> repeat n do go 100000 T8

Here are the results:

trunk -> pattern-compilation
Two match
4.672896ms -> 4.903328ms

Three match
4.948268ms -> 5.099365ms

Four match
4.969759ms -> 5.075553ms

Thirty match
4.927771ms -> 4.487581ms

After several runs it does seem to be consistently worse in the Two match case, which is interesting, it only seems to catch up in the Thirty match case.

@@ -13,7 +13,7 @@ module Unison.Runtime.Pattern
where

import Control.Monad.State (State, evalState, modify, runState, state)
import Data.List (transpose)
import Data.List (nub, transpose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v should have Ord right? Is it possible to nubOrd instead?
If not, no worries.

@dolio
Copy link
Contributor Author

dolio commented Jan 29, 2025

Hmm, it's strange that it's slower for those other examples, because those are already completely split.

Oh, except I had to change the way some things work in the machine to accommodate this. I'll check that out.

@pchiusano
Copy link
Member

Agree that for request handling, it'd be extremely unusual to not handle all the cases, so I wouldn't sweat that now.

@dolio
Copy link
Contributor Author

dolio commented Jan 29, 2025

FYI, seems like there's a random component in the transcripts. The windows run failed once (two lines were transposed out of alphabetical order somewhere), but succeeded on re-run.

@dolio dolio merged commit 02e09da into trunk Jan 29, 2025
36 checks passed
@dolio dolio deleted the topic/pattern-compilation branch January 29, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants