Improve performance when hashing sets of state IDs. #452

silentbicycle · 2023-12-19T18:54:03Z

Add probe count logging to to_set_htab_check.

~~Switch from adding each PHI_64*(id+1) to xor-ing, it has better collision behavior.~~ Switch to xoshiro64* instead.

This change makes

time ./re -rpcre 'a[^bz][^bz][^bz][^b][^bz][^bz][^bz][^bz][^bz][^bz][^b][^bz][^bz][^bz][^bz][^bz]'

go from taking ~14 seconds to ~6 when I run it locally, because adding leads to much higher probe counts on average.

Add probe count logging to `to_set_htab_check`. Switch from adding each PHI_64*(id+1) to xor-ing, it has better collision behavior. This change makes time ./re -rpcre 'a[^bz][^bz][^bz][^b][^bz][^bz][^bz][^bz][^bz][^bz][^b][^bz][^bz][^bz][^bz][^bz]' go from taking ~14 seconds to ~6 when I run it locally, because adding leads to much higher probe counts on average.

silentbicycle · 2023-12-19T18:55:09Z

The bad hash function became noticeable while fuzzing.

katef · 2024-01-03T17:37:01Z

trie of 40k 100-character random words on my desktop

before:

; ./words.sh 40000 100 | guff -b
    x: [0 - 19]    y: [0 - 3250]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⢀⠀⠂⠀⠄
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⢀⠀⠠⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⢀⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⢀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣂⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀

after:

; ./words.sh 40000 100 | grep -v 6... | grep -v ..... | guff -b
    x: [0 - 17]    y: [0 - 3417]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠁⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠂⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠄⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣄⣀⣁⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀

(I grepped out a couple of outliers, the random seed for blab produces extreme situations occasionally)

So this case hits slightly differently for the local details, but the overall behaviour is the same curve.

Those outliers look like this:

; ./words.sh 40000 100 | guff -b
    x: [0 - 19]    y: [0 - 12048]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⢀⠀⠀⠀⠠⠀⠂⠀⠂
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠠⠀⠠⠀⠐⠀⠁⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣀⣠⣀⣐⣀⣐⣀⣁⣀⣁⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀

which I've seen before. but I think we should check this hashing isn't introducing new especially bad cases somehow

katef · 2024-01-03T19:39:50Z

Scott confirmed this is uniformly better

silentbicycle · 2024-01-03T20:12:43Z

I was confirming that changing to xorshift* was uniformly better, but was just about to push when this got merged, so that's in a new PR.

silentbicycle added the enhancement label Dec 19, 2023

silentbicycle requested a review from katef December 19, 2023 18:54

katef merged commit 7dd550a into main Jan 3, 2024
322 checks passed

katef deleted the sv/improve-hash-performance-in-determinisation branch January 3, 2024 19:39

silentbicycle mentioned this pull request Jan 3, 2024

hash_id: Change to xorshift*. #455

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when hashing sets of state IDs. #452

Improve performance when hashing sets of state IDs. #452

silentbicycle commented Dec 19, 2023 •

edited

Loading

silentbicycle commented Dec 19, 2023

katef commented Jan 3, 2024

katef commented Jan 3, 2024

silentbicycle commented Jan 3, 2024

Improve performance when hashing sets of state IDs. #452

Improve performance when hashing sets of state IDs. #452

Conversation

silentbicycle commented Dec 19, 2023 • edited Loading

silentbicycle commented Dec 19, 2023

katef commented Jan 3, 2024

katef commented Jan 3, 2024

silentbicycle commented Jan 3, 2024

silentbicycle commented Dec 19, 2023 •

edited

Loading