Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force hashing to be done on aligned usizes #35

Closed
wants to merge 1 commit into from

Conversation

krtab
Copy link
Contributor

@krtab krtab commented Mar 26, 2024

Summary

This is a WIP/PoC of doing hashing on aligned usizes. This seems to reduce the size of the generated assembly, and I would like to benchmark it, but I don't know how to run a compiler bench using a modified crate.

Drawbacks of this PR:

  • Adds unsafe code
  • Changes the generated hashes

Explanation

Let's call N the size in bytes of a usize.
Two cases are distinguished. First if bytes.len() < N, then the previous strategy is used, that is 32 bits are read, then 16, then 8.

However, if the slice is at least N long, we can:

  1. Extract the first N bytes in a usize (which may not be aligned),
  2. Extract the last N bytes in a different usize (which may not be aligned),
  3. Extract a correctly aligned usize slice. On real code (ie not miri), align_to guarantees that this slice is maximal

These three elements may overlap, but they cover all input bytes. We can now add to the hasher all these usizes, and the compiler will take advantage of the middle slice being well aligned.

Remarks

I haven't updated the tests so it will inevitably fail CI.

@krtab
Copy link
Contributor Author

krtab commented Mar 26, 2024

I've found how to do the benchmarking here: rust-lang/rust#59594

@Noratrieb
Copy link
Member

Yes, make a PR to rust-lang/rust changing the dependency to your branch.

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 26, 2024
[DO NOT MERGE] bench tentative perf improvements in rustc-hash

This is a bench PR for rust-lang/rustc-hash#35

r? Nilstrieb
@krtab
Copy link
Contributor Author

krtab commented Mar 26, 2024

I've realized that my algorithm is buggy. Hashing two equal byte slices will lead to different results if their start address is not the same modulo 64.

@krtab krtab closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants