Rework data pattern matching to use default cases #5557

dolio · 2025-01-24T20:47:46Z

Previously, we were always expanding data type matching to explicitly match against every case. This was necessary because we would eagerly dump the entire data contents to the stack and analyze the tag from there. Different constructors with different numbrs of arguments would result in a different stack layout, so default cases could not be uniform.

However, a while ago we started analyzing data type tags directly, to avoid putting them on the stack. This change goes the final step. It selects the branch before deciding to dump the data type to the stack, and only does so in specific case branches. Nothing is added to the stack in default branches, so they can be shared among all constructors.

On the other end, the pattern matching compiler has been changed to preserve default cases when possible. When splitting on a data type, the actual constructors matched in the source are used as the explicit branches, and a default case is used if it exists in the source. The translation will still sometimes duplicate branches, but not nearly as much as before (and mainly due to complicated pattern matching).

Passed base tests, unison tests, unison transcripts and runtime tests.

Previously, we were always expanding data type matching to explicitly match against every case. This was necessary because we would eagerly dump the entire data contents to the stack and analyze the tag from there. Different constructors with different numbrs of arguments would result in a different stack layout, so default cases could not be uniform. However, a while ago we started analyzing data type tags directly, to avoid putting them on the stack. This change goes the final step. It selects the branch before deciding to dump the data type to the stack, and only does so in specific case branches. Nothing is added to the stack in default branches, so they can be shared among all constructors. On the other end, the pattern matching compiler has been changed to preserve default cases when possible. When splitting on a data type, the actual constructors matched in the source are used as the explicit branches, and a default case is used if it exists in the source. The translation will still sometimes duplicate branches, but not nearly as much as before (and mainly due to complicated pattern matching).

dolio · 2025-01-24T21:00:05Z

No appreciable benchmark differences as far as I can see. Probably not surprising.

ceedubs

I can confirm that the Unison Cloud integration tests pass with this change.

Given that the benchmarks show no appreciable change, could you shed some light on the motivation for this? Is the thought that it could make a significant difference in some cases of pattern matching? If so, should we benchmark it in those cases?

dolio · 2025-01-24T21:42:37Z

This change only affects situations where you have something like:

match v with
  C1 x y z -> ...
  _ -> ...

where are several constructors besides C1. Before it was duplicating the default case many times, and now it actually uses the default support of the interpreter for it, and is able to use certain special 1 and 2 case matching constructs which can be a little faster. I don't think the benchmarks really contain any situations like this, so they wouldn't be expected to change.

Chris had an example where it made some difference, but I don't think it's in our tests right now.

pchiusano

😎👍

If nothing else this will reduce code sizes, which matters for distributed execution. Though I suspect we could craft some examples where there’s a decent speedup.

cc @ChrisPenner

aryairani · 2025-01-28T14:51:46Z

Is this one ready to merge, then?

dolio · 2025-01-28T15:56:51Z

Yeah, I think so. Unless we want to wait for Chris to weigh in.

dolio · 2025-01-28T20:34:04Z

Oh. One other thing I should maybe mention. I didn't change the way that request matching works, so that will still expand cases. I guess I could change that, but it actually seems kind of weird to me to have catch-all cases there. I'm not sure people actually write them (although I think it's technically allowed).

Let me know if you think I should change this.

ChrisPenner · 2025-01-28T20:40:24Z

Sorry I've been slacking on reading this notification haha; I'll build and run this on some of the tests I was using on my prototype

ChrisPenner · 2025-01-28T21:08:31Z

Here's the suite I'm running:

type Four
  = One Nat
  | Two Nat
  | Three Nat
  | Four Nat

type Thirty
  = T1
  | T2
  | T3
  | T4
  | T5
  | T6
  | T7
  | T8
  | T9
  | T10
  | T11
  | T12
  | T13
  | T14
  | T15
  | T16
  | T17
  | T18
  | T19
  | T20
  | T21
  | T22
  | T23
  | T24
  | T25
  | T26
  | T27
  | T28
  | T29
  | T30

type Three
  = One Nat
  | Two Nat
  | Three Nat

type Two
  = One Nat
  | Two Nat


t : '{IO, Exception} ()
t = do
  use Nat - <= ==
  printTime
    "Two match" 1 let
      use Two One Two
      go = cases
        One x -> if x == 0 then () else go (Two (x - 1))
        Two x -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Three match" 1 let
      use Three One Three Two
      go = cases
        One x   -> if x == 0 then () else go (Two (x - 1))
        Two x   -> if x == 0 then () else go (Three (x - 1))
        Three x -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Four match" 1 let
      use Four One Two
      go = cases
        One x        -> if x == 0 then () else go (Two (x - 1))
        Two x        -> if x == 0 then () else go (One (x - 1))
        Four.Three x -> if x == 0 then () else go (Four (x - 1))
        Four x       -> if x == 0 then () else go (One (x - 1))
      n -> repeat n do go (One 100000)
  printTime
    "Thirty match" 1 let
      go n = cases
        T8 -> if n <= 0 then () else go (n - 1) T7
        _  -> if n <= 0 then () else go (n - 1) T8
      n -> repeat n do go 100000 T8

Here are the results:

trunk -> pattern-compilation
Two match
4.672896ms -> 4.903328ms

Three match
4.948268ms -> 5.099365ms

Four match
4.969759ms -> 5.075553ms

Thirty match
4.927771ms -> 4.487581ms

After several runs it does seem to be consistently worse in the Two match case, which is interesting, it only seems to catch up in the Thirty match case.

ChrisPenner · 2025-01-28T21:14:06Z

unison-runtime/src/Unison/Runtime/Pattern.hs

@@ -13,7 +13,7 @@ module Unison.Runtime.Pattern
 where

 import Control.Monad.State (State, evalState, modify, runState, state)
-import Data.List (transpose)
+import Data.List (nub, transpose)


v should have Ord right? Is it possible to nubOrd instead?
If not, no worries.

dolio · 2025-01-29T15:02:52Z

Hmm, it's strange that it's slower for those other examples, because those are already completely split.

Oh, except I had to change the way some things work in the machine to accommodate this. I'll check that out.

pchiusano · 2025-01-29T15:28:19Z

Agree that for request handling, it'd be extremely unusual to not handle all the cases, so I wouldn't sweat that now.

dolio · 2025-01-29T23:18:14Z

FYI, seems like there's a random component in the transcripts. The windows run failed once (two lines were transposed out of alphabetical order somewhere), but succeeded on re-run.

dolio requested review from pchiusano, aryairani, ceedubs and ChrisPenner January 24, 2025 20:47

ceedubs approved these changes Jan 24, 2025

View reviewed changes

pchiusano approved these changes Jan 25, 2025

View reviewed changes

ChrisPenner reviewed Jan 28, 2025

View reviewed changes

dolio added 2 commits January 29, 2025 15:37

Rewrite dataBranch to perform a bit better.

83718f9

Use nubOrd instead of nub

cd81713

dolio merged commit 02e09da into trunk Jan 29, 2025
36 checks passed

dolio deleted the topic/pattern-compilation branch January 29, 2025 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework data pattern matching to use default cases #5557

Rework data pattern matching to use default cases #5557

dolio commented Jan 24, 2025

dolio commented Jan 24, 2025

ceedubs left a comment

dolio commented Jan 24, 2025

pchiusano left a comment •

edited

Loading

aryairani commented Jan 28, 2025

dolio commented Jan 28, 2025

dolio commented Jan 28, 2025

ChrisPenner commented Jan 28, 2025

ChrisPenner commented Jan 28, 2025 •

edited

Loading

ChrisPenner Jan 28, 2025

dolio commented Jan 29, 2025 •

edited

Loading

pchiusano commented Jan 29, 2025

dolio commented Jan 29, 2025

Rework data pattern matching to use default cases #5557

Rework data pattern matching to use default cases #5557

Conversation

dolio commented Jan 24, 2025

dolio commented Jan 24, 2025

ceedubs left a comment

Choose a reason for hiding this comment

dolio commented Jan 24, 2025

pchiusano left a comment • edited Loading

Choose a reason for hiding this comment

aryairani commented Jan 28, 2025

dolio commented Jan 28, 2025

dolio commented Jan 28, 2025

ChrisPenner commented Jan 28, 2025

ChrisPenner commented Jan 28, 2025 • edited Loading

ChrisPenner Jan 28, 2025

Choose a reason for hiding this comment

dolio commented Jan 29, 2025 • edited Loading

pchiusano commented Jan 29, 2025

dolio commented Jan 29, 2025

pchiusano left a comment •

edited

Loading

ChrisPenner commented Jan 28, 2025 •

edited

Loading

dolio commented Jan 29, 2025 •

edited

Loading