-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework data pattern matching to use default cases #5557
Conversation
Previously, we were always expanding data type matching to explicitly match against every case. This was necessary because we would eagerly dump the entire data contents to the stack and analyze the tag from there. Different constructors with different numbrs of arguments would result in a different stack layout, so default cases could not be uniform. However, a while ago we started analyzing data type tags directly, to avoid putting them on the stack. This change goes the final step. It selects the branch before deciding to dump the data type to the stack, and only does so in specific case branches. Nothing is added to the stack in default branches, so they can be shared among all constructors. On the other end, the pattern matching compiler has been changed to preserve default cases when possible. When splitting on a data type, the actual constructors matched in the source are used as the explicit branches, and a default case is used if it exists in the source. The translation will still sometimes duplicate branches, but not nearly as much as before (and mainly due to complicated pattern matching).
No appreciable benchmark differences as far as I can see. Probably not surprising. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the Unison Cloud integration tests pass with this change.
Given that the benchmarks show no appreciable change, could you shed some light on the motivation for this? Is the thought that it could make a significant difference in some cases of pattern matching? If so, should we benchmark it in those cases?
This change only affects situations where you have something like:
where are several constructors besides Chris had an example where it made some difference, but I don't think it's in our tests right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎👍
If nothing else this will reduce code sizes, which matters for distributed execution. Though I suspect we could craft some examples where there’s a decent speedup.
cc @ChrisPenner
Is this one ready to merge, then? |
Yeah, I think so. Unless we want to wait for Chris to weigh in. |
Oh. One other thing I should maybe mention. I didn't change the way that request matching works, so that will still expand cases. I guess I could change that, but it actually seems kind of weird to me to have catch-all cases there. I'm not sure people actually write them (although I think it's technically allowed). Let me know if you think I should change this. |
Sorry I've been slacking on reading this notification haha; I'll build and run this on some of the tests I was using on my prototype |
Here's the suite I'm running:
Here are the results:
After several runs it does seem to be consistently worse in the |
@@ -13,7 +13,7 @@ module Unison.Runtime.Pattern | |||
where | |||
|
|||
import Control.Monad.State (State, evalState, modify, runState, state) | |||
import Data.List (transpose) | |||
import Data.List (nub, transpose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v
should have Ord right? Is it possible to nubOrd
instead?
If not, no worries.
Hmm, it's strange that it's slower for those other examples, because those are already completely split. Oh, except I had to change the way some things work in the machine to accommodate this. I'll check that out. |
Agree that for request handling, it'd be extremely unusual to not handle all the cases, so I wouldn't sweat that now. |
FYI, seems like there's a random component in the transcripts. The windows run failed once (two lines were transposed out of alphabetical order somewhere), but succeeded on re-run. |
Previously, we were always expanding data type matching to explicitly match against every case. This was necessary because we would eagerly dump the entire data contents to the stack and analyze the tag from there. Different constructors with different numbrs of arguments would result in a different stack layout, so default cases could not be uniform.
However, a while ago we started analyzing data type tags directly, to avoid putting them on the stack. This change goes the final step. It selects the branch before deciding to dump the data type to the stack, and only does so in specific case branches. Nothing is added to the stack in default branches, so they can be shared among all constructors.
On the other end, the pattern matching compiler has been changed to preserve default cases when possible. When splitting on a data type, the actual constructors matched in the source are used as the explicit branches, and a default case is used if it exists in the source. The translation will still sometimes duplicate branches, but not nearly as much as before (and mainly due to complicated pattern matching).
Passed base tests, unison tests, unison transcripts and runtime tests.