-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try speeding up weeder with a bloom filter #186
base: master
Are you sure you want to change the base?
Conversation
-- TODO maybe we can make this faster by only hashing the location. | ||
instance Hashable Declaration where | ||
hashIO32 d s = hashIO32 (declarationStableName d) s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely not fast, and could be why this approach doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth trying the Uniquable
instance of Module
and OccName
to produce hashes.
Note [The Unique of an OccName]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
They are efficient, because FastStrings have unique Int# keys. We assume
this key is less than 2^24, and indeed FastStrings are allocated keys
sequentially starting at 0.
So we can make a Unique using
mkUnique ns key :: Unique
where 'ns' is a Char representing the name space. This in turn makes it
easy to build an OccEnv.
-}
(or maybe even better: the Uniquable
instance of the original Name
s we derive the declarations from with nameToDeclaration
)
-- The elem docs say: | ||
-- @ | ||
-- If the value is present, return True. If the value is not present, there is still some possibility that True will be returned. | ||
-- @ | ||
-- I.e. if some declaration is a weed, it will definitely show up in the result, but also some weeds will show up in the result. | ||
-- So we need to do another set difference afterwards, but with a much smaller set. | ||
in Set.difference (Set.filter (not . (`BloomFilter.elem` bloom)) allDecls) usedDecls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about 50% sure I got this backward in some way, so that might be part of the issue as well.
Another idea that might help make this work: We're now using both a hash and non-hash-based way of putting declarations in a set. If we could re-use the hash of a declaration then we would only need to hash it once and compare only the hash. |
It might be even better to avoid hashing altogether and just use Then you just need to worry about the string-y values at the boundaries.
|
The timings are so similar that I'm a bit suspicious about whether anything is happening at all.. |
I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:
Perhaps this could still work with a better hash function or some more tuning with respect to how the bloom filter is constructed.
@ocharles I figured you might still like to see this, even though the experiment seems to have failed.
Feel free to close this PR