Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try speeding up weeder with a bloom filter #186

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NorfairKing
Copy link
Contributor

I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:

Test with these changes:                       
weeder-check> real      0m32.576s
weeder-check> user      0m30.698s
weeder-check> sys       0m1.788s
 
Test without these changes:
weeder-check> real      0m31.765s
weeder-check> user      0m30.047s
weeder-check> sys       0m1.633s

Perhaps this could still work with a better hash function or some more tuning with respect to how the bloom filter is constructed.

@ocharles I figured you might still like to see this, even though the experiment seems to have failed.
Feel free to close this PR

Comment on lines +143 to +145
-- TODO maybe we can make this faster by only hashing the location.
instance Hashable Declaration where
hashIO32 d s = hashIO32 (declarationStableName d) s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not fast, and could be why this approach doesn't work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth trying the Uniquable instance of Module and OccName to produce hashes.

Note [The Unique of an OccName]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
They are efficient, because FastStrings have unique Int# keys.  We assume
this key is less than 2^24, and indeed FastStrings are allocated keys
sequentially starting at 0.

So we can make a Unique using
        mkUnique ns key  :: Unique
where 'ns' is a Char representing the name space.  This in turn makes it
easy to build an OccEnv.
-}

(or maybe even better: the Uniquable instance of the original Names we derive the declarations from with nameToDeclaration)

Comment on lines +186 to +192
-- The elem docs say:
-- @
-- If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.
-- @
-- I.e. if some declaration is a weed, it will definitely show up in the result, but also some weeds will show up in the result.
-- So we need to do another set difference afterwards, but with a much smaller set.
in Set.difference (Set.filter (not . (`BloomFilter.elem` bloom)) allDecls) usedDecls
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about 50% sure I got this backward in some way, so that might be part of the issue as well.

@NorfairKing
Copy link
Contributor Author

Another idea that might help make this work: We're now using both a hash and non-hash-based way of putting declarations in a set. If we could re-use the hash of a declaration then we would only need to hash it once and compare only the hash.

@TeofilC
Copy link
Collaborator

TeofilC commented Dec 21, 2024

It might be even better to avoid hashing altogether and just use Ints as keys and IntMap/IntSet. (Maybe from the Unique's that @ryndubei mentions)

Then you just need to worry about the string-y values at the boundaries.

HashMap needs to do Eq checks on the unhashed values if the hashes are equal. So, you can get some savings if you can avoid that.

@NorfairKing
Copy link
Contributor Author

I had this idea last night so I wanted to try it out, but it looks like this isn't actually faster:

The timings are so similar that I'm a bit suspicious about whether anything is happening at all..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants