Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash_id: Change to xorshift*. #455

Merged

Conversation

silentbicycle
Copy link
Collaborator

@silentbicycle silentbicycle commented Jan 3, 2024

This was supposed to go in #452 -- it's what kate's "Scott confirmed this is uniformly better" was referring to -- but that got merged just before I pushed.

katef's testing with words.sh found some suspicious timing, profiling with callgrind showed there's still some kind of bad collision behavior doing PHI64(a) ^ PHI64(b) with exactly two IDs. It's probably still a bad idea to combine multiple Fibonacci hashes, even with xor-ing rather than adding.

Changing to xorshift* (another fast, high quality hash function for 64-bit ints) immediately makes the issue go away, so do that.

katef's testing with words.sh found some suspicious timing, profiling
with callgrind showed there's still some kind of bad collision behavior
doing PHI64(a) ^ PHI64(b) with exactly two IDs. It's probably still a
bad idea to combine multiple Fibonacci hashes, even with xor-ing rather
than adding.

Changing to xorshift* (another fast, high quality hash function for
64-bit ints) immediately makes the issue go away, so do that.
@silentbicycle silentbicycle requested a review from katef January 3, 2024 20:09
@katef
Copy link
Owner

katef commented Jan 3, 2024

Current word.sh hits less sticky-out special cases (although they're still visible in the same spots, due to the text blab generates for the same seed):

; ./words.sh 40000 100 | guff -b
    x: [0 - 19]    y: [0 - 3435]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⡀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠁⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠠⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⢀⠀⠠⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣂⣈⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀

Noting that the total time for this trie is about the same as we were prior to #452 (see the same benchmark commented on that PR). So relative to main prior to #452, changing the hash improves things for the situation Scott mentions, but doesn't make anything worse elsewhere.

@katef katef merged commit 84d7da4 into main Jan 3, 2024
322 checks passed
@katef katef deleted the sv/improve-hash-performance-in-determinisation--continued branch January 3, 2024 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants