-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Use bit-reversed CRC32 computation with hash/crc32 package #43
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Improved BZIP2 CRC32 calculation by using bit-reversed values with the standard hash/crc32 package. This allows us to leverage hardware CRC32 instructions: - ARM64: 10-12% speedup using RBIT instruction for bit reversal - AMD64: ~7% speedup using lookup table for bit reversal Test data shows consistent improvements on large inputs (>1GB). Benchmark results (ARM64): Before: goos: darwin goarch: arm64 pkg: github.com/cosnicolaou/pbzip2/internal/bzip2 cpu: Apple M2 Max BenchmarkDecodeDigits-12 330 3624395 ns/op 27.59 MB/s 3612915 B/op 51 allocs/op BenchmarkDecodeDigits-12 330 3644010 ns/op 27.44 MB/s 3612915 B/op 51 allocs/op BenchmarkDecodeDigits-12 327 3732479 ns/op 26.79 MB/s 3612947 B/op 51 allocs/op BenchmarkDecodeNewton-12 76 14911637 ns/op 38.04 MB/s 3630765 B/op 51 allocs/op BenchmarkDecodeNewton-12 78 14745067 ns/op 38.47 MB/s 3630768 B/op 51 allocs/op BenchmarkDecodeNewton-12 80 14724860 ns/op 38.52 MB/s 3630758 B/op 51 allocs/op BenchmarkDecodeRand-12 966 1359254 ns/op 12.05 MB/s 3644075 B/op 51 allocs/op BenchmarkDecodeRand-12 969 1265783 ns/op 12.94 MB/s 3644089 B/op 51 allocs/op BenchmarkDecodeRand-12 960 1255644 ns/op 13.05 MB/s 3644082 B/op 51 allocs/op BenchmarkWiktionary-12 1 205158310917 ns/op 51.18 MB/s 367216776 B/op 542483 allocs/op BenchmarkWiktionary-12 1 207003968667 ns/op 50.72 MB/s 367216776 B/op 542483 allocs/op BenchmarkWiktionary-12 1 208614686041 ns/op 50.33 MB/s 367216760 B/op 542483 allocs/op After: goos: darwin goarch: arm64 pkg: github.com/cosnicolaou/pbzip2/internal/bzip2 cpu: Apple M2 Max BenchmarkDecodeDigits-12 348 3410393 ns/op 29.32 MB/s 3613298 B/op 51 allocs/op BenchmarkDecodeDigits-12 351 3401614 ns/op 29.40 MB/s 3613299 B/op 51 allocs/op BenchmarkDecodeDigits-12 351 3397780 ns/op 29.43 MB/s 3613294 B/op 51 allocs/op BenchmarkDecodeNewton-12 88 13143153 ns/op 43.16 MB/s 3631144 B/op 51 allocs/op BenchmarkDecodeNewton-12 90 13220578 ns/op 42.90 MB/s 3631145 B/op 51 allocs/op BenchmarkDecodeNewton-12 86 13253067 ns/op 42.80 MB/s 3631149 B/op 51 allocs/op BenchmarkDecodeRand-12 1011 1212203 ns/op 13.52 MB/s 3644467 B/op 51 allocs/op BenchmarkDecodeRand-12 1005 1216967 ns/op 13.46 MB/s 3644460 B/op 51 allocs/op BenchmarkDecodeRand-12 979 1227642 ns/op 13.35 MB/s 3644458 B/op 51 allocs/op BenchmarkWiktionary-12 1 182874540791 ns/op 57.41 MB/s 367217160 B/op 542483 allocs/op BenchmarkWiktionary-12 1 183373722875 ns/op 57.26 MB/s 367217160 B/op 542483 allocs/op BenchmarkWiktionary-12 1 182450789709 ns/op 57.54 MB/s 367217160 B/op 542483 allocs/op Benchmark results (AMD64): Before: goos: linux goarch: amd64 pkg: github.com/cosnicolaou/pbzip2/internal/bzip2 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz BenchmarkDecodeDigits-8 225 5091263 ns/op 19.64 MB/s 3612579 B/op 51 allocs/op BenchmarkDecodeDigits-8 213 5065560 ns/op 19.74 MB/s 3612580 B/op 51 allocs/op BenchmarkDecodeDigits-8 236 5303314 ns/op 18.86 MB/s 3612582 B/op 51 allocs/op BenchmarkDecodeDigits-8 225 5297896 ns/op 18.88 MB/s 3612579 B/op 51 allocs/op BenchmarkDecodeDigits-8 226 5209545 ns/op 19.20 MB/s 3612580 B/op 51 allocs/op BenchmarkDecodeDigits-8 192 5342964 ns/op 18.72 MB/s 3612583 B/op 51 allocs/op BenchmarkDecodeNewton-8 57 19633889 ns/op 28.89 MB/s 3630421 B/op 51 allocs/op BenchmarkDecodeNewton-8 58 19672671 ns/op 28.83 MB/s 3630428 B/op 51 allocs/op BenchmarkDecodeNewton-8 61 19758570 ns/op 28.71 MB/s 3630423 B/op 51 allocs/op BenchmarkDecodeNewton-8 61 19739598 ns/op 28.73 MB/s 3630422 B/op 51 allocs/op BenchmarkDecodeNewton-8 61 19763141 ns/op 28.70 MB/s 3630418 B/op 50 allocs/op BenchmarkDecodeNewton-8 62 19736282 ns/op 28.74 MB/s 3630428 B/op 51 allocs/op BenchmarkDecodeRand-8 660 2487447 ns/op 6.59 MB/s 3643743 B/op 51 allocs/op BenchmarkDecodeRand-8 519 2424910 ns/op 6.76 MB/s 3643740 B/op 51 allocs/op BenchmarkDecodeRand-8 681 2246711 ns/op 7.29 MB/s 3643742 B/op 51 allocs/op BenchmarkDecodeRand-8 519 2818677 ns/op 5.81 MB/s 3643742 B/op 51 allocs/op BenchmarkDecodeRand-8 480 3195923 ns/op 5.13 MB/s 3643744 B/op 51 allocs/op BenchmarkDecodeRand-8 426 2545397 ns/op 6.44 MB/s 3643741 B/op 51 allocs/op BenchmarkWiktionary-8 1 255098611022 ns/op 41.16 MB/s 367215496 B/op 542483 allocs/op BenchmarkWiktionary-8 1 250397169327 ns/op 41.93 MB/s 367215480 B/op 542483 allocs/op BenchmarkWiktionary-8 1 238734724759 ns/op 43.98 MB/s 367215464 B/op 542483 allocs/op BenchmarkWiktionary-8 1 244597109758 ns/op 42.92 MB/s 367215480 B/op 542483 allocs/op BenchmarkWiktionary-8 1 252179664415 ns/op 41.63 MB/s 367215464 B/op 542483 allocs/op BenchmarkWiktionary-8 1 257409744381 ns/op 40.79 MB/s 367215480 B/op 542483 allocs/op After: goos: linux goarch: amd64 pkg: github.com/cosnicolaou/pbzip2/internal/bzip2 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz BenchmarkDecodeDigits-8 238 5338924 ns/op 18.73 MB/s 3612966 B/op 51 allocs/op BenchmarkDecodeDigits-8 192 5548144 ns/op 18.02 MB/s 3612965 B/op 51 allocs/op BenchmarkDecodeDigits-8 231 5160329 ns/op 19.38 MB/s 3612964 B/op 51 allocs/op BenchmarkDecodeDigits-8 234 5213871 ns/op 19.18 MB/s 3612967 B/op 51 allocs/op BenchmarkDecodeDigits-8 237 5183392 ns/op 19.29 MB/s 3612964 B/op 51 allocs/op BenchmarkDecodeDigits-8 234 5216613 ns/op 19.17 MB/s 3612966 B/op 51 allocs/op BenchmarkDecodeNewton-8 66 18813414 ns/op 30.15 MB/s 3630812 B/op 51 allocs/op BenchmarkDecodeNewton-8 64 18738811 ns/op 30.27 MB/s 3630815 B/op 51 allocs/op BenchmarkDecodeNewton-8 64 18774511 ns/op 30.21 MB/s 3630812 B/op 51 allocs/op BenchmarkDecodeNewton-8 64 18803172 ns/op 30.17 MB/s 3630817 B/op 51 allocs/op BenchmarkDecodeNewton-8 64 18793181 ns/op 30.18 MB/s 3630815 B/op 51 allocs/op BenchmarkDecodeNewton-8 64 18779499 ns/op 30.20 MB/s 3630812 B/op 51 allocs/op BenchmarkDecodeRand-8 710 2445520 ns/op 6.70 MB/s 3644127 B/op 51 allocs/op BenchmarkDecodeRand-8 573 2337578 ns/op 7.01 MB/s 3644126 B/op 51 allocs/op BenchmarkDecodeRand-8 544 2681357 ns/op 6.11 MB/s 3644127 B/op 51 allocs/op BenchmarkDecodeRand-8 354 2838394 ns/op 5.77 MB/s 3644129 B/op 51 allocs/op BenchmarkDecodeRand-8 439 2403619 ns/op 6.82 MB/s 3644126 B/op 51 allocs/op BenchmarkDecodeRand-8 687 2569978 ns/op 6.38 MB/s 3644126 B/op 51 allocs/op BenchmarkWiktionary-8 1 243108260459 ns/op 43.19 MB/s 367215880 B/op 542483 allocs/op BenchmarkWiktionary-8 1 233205853611 ns/op 45.02 MB/s 367215864 B/op 542483 allocs/op BenchmarkWiktionary-8 1 225249072544 ns/op 46.61 MB/s 367215864 B/op 542483 allocs/op BenchmarkWiktionary-8 1 229793144010 ns/op 45.69 MB/s 367215864 B/op 542483 allocs/op BenchmarkWiktionary-8 1 233835578682 ns/op 44.90 MB/s 367215848 B/op 542483 allocs/op BenchmarkWiktionary-8 1 241809950266 ns/op 43.42 MB/s 367215864 B/op 542483 allocs/op
klauspost
approved these changes
Oct 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Lint stuff is fixed in #42
I resolved conflicts |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improved BZIP2 CRC32 calculation by using bit-reversed values with the standard hash/crc32 package. This allows us to leverage hardware CRC32 instructions:
Test data shows consistent improvements on large inputs (>1GB).
Benchmark results (ARM64):
Before:
After:
Benchmark results (AMD64):
Before:
After: