-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid table scans on registry #1330
Conversation
Hey @bryanjos can you post a It'd be good to see some specific examples to demonstrate this. |
@benwilson512 I'll try to put that together. A teammate made this change so I might not have the full picture. I think the issue stems from There's a not at the bottom of the docs for select that I think summarizes the issue |
@benwilson512 I put together a benchee script and recorded the output in this gist. The proposed change is definitely faster, but does use more memory |
in practice, i believe it'd be a neat PR into the erlang codebase to (compile-time?) warn that the match spec for what it's worth, this was a service killer for us, it brought down entire nodes, whatever is going on under the covers with |
hey @koudelka sorry to hear that it caused such issues. I'm happy to move forward with the PR. There is still a linear part to this PR, because have to call I have a few "code golf" items I'll remark on and then we can head towards a merge. |
Map.put_new_lazy(acc, doc_id, fn -> | ||
Map.update!(doc, :initial_phases, &PipelineSerializer.unpack/1) | ||
end) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure here where you first uniq the doc_ids
and then reduce into a map with Map.put_new_lazy
seems redundant to me. The Mapset already guarantees that each doc_id
will only exist once, so Map.put_new_lazy
will never be lazy. Each doc_id will only happen once.
Beyond that, in my experiments with the MapSet
and Map.new
code back in the day it was generally faster to make a list of keys and values and then call Map.new
on those keys / values. So perhaps can you benchmark it with using a list as the accumulator instead of the map and then pipe it to map.new?
@benwilson512 Looks like the made a small distance in the right direction. I'll update the PR to use a list as the accumulator and pipe into
|
I am very puzzled at the 5x difference in memory performance between what we have today and the proposed changes. Let me see if I can poke at this and figure out what the cause is. |
I'm thinking it's probably the use of the data structures and size of the examples. There probably is something that could tweak that could be made to balance out space and time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and looks great, thanks!
Related to #1329, we noticed that looking in the registry would take a long time, making the unbounded concurrency issue worse as processes would wait for the lookups here to finish. This removes the table scan and replaces it with a MapSet instead.