Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix thread unsafety #63

Merged
merged 4 commits into from
Apr 7, 2024
Merged

Conversation

adienes
Copy link
Contributor

@adienes adienes commented Apr 1, 2024

fixes some potential race conditions involving the use of threadid. see https://julialang.org/blog/2023/07/PSA-dont-use-threadid/ for details

I've chosen to use the library OhMyThreads.jl here, which did not exist at the time the original code here was written, but it's a quite nice + lightweight dependency to do parallel stuff

@adienes adienes marked this pull request as draft April 1, 2024 01:27
@adienes adienes marked this pull request as ready for review April 1, 2024 01:47
@matthieugomez
Copy link
Owner

matthieugomez commented Apr 5, 2024

Thank you so much for your work and for spotting this issue. However, I really do not want to add dependencies (also, OhMyTreads also seems to depend on other dependencies so does not seem that lightweight). It would be great if you could try to fix it without adding this dependency.

@adienes
Copy link
Contributor Author

adienes commented Apr 5, 2024

while IMO it is a worthwhile dependency for any multithreaded application, I can definitely understand the reticence. the use case here is pretty vanilla so it shouldn't be too bad to just stick to Base

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 92.30769% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 97.34%. Comparing base (b384824) to head (b617b3a).

Files Patch % Lines
src/find.jl 92.30% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
- Coverage   97.54%   97.34%   -0.20%     
==========================================
  Files           8        8              
  Lines         571      566       -5     
==========================================
- Hits          557      551       -6     
- Misses         14       15       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adienes
Copy link
Contributor Author

adienes commented Apr 7, 2024

reimplemented using the @spawn pattern described in the blog post

note that I've assumed here that the compare calls dominate the runtime and that collecting over the input is cheap. if that's not the case other optimizations may have to be made. for thorough optimization in multithreading type stability etc., a new dependency may be required (else will end up re-inventing a lot)

@matthieugomez
Copy link
Owner

One issue in your current rewrite is that the min_score does not get updated over the iteration loop. This is actually super important, as a lot of StringDistances can bail early if they realize that compare is never going to give something higher than min_score.

@adienes
Copy link
Contributor Author

adienes commented Apr 7, 2024

gotcha, so I'll have to stick closer to the original process

re: the atomic operations, what is your preference w.r.t. Julia version compatibility? the current code uses Threads.atomic_max!, but it seems these are soft-deprecated in favor of an @atomic based API developed for v1.7 and later

@matthieugomez
Copy link
Owner

matthieugomez commented Apr 7, 2024

I'm fine with doing a v1.7 version. The optimal thing would be to keep having a common min_score across threads (what I have right now), but I think it'd be fine to have one specific to each thread if that makes things much easier.

@adienes
Copy link
Contributor Author

adienes commented Apr 7, 2024

latest commit should allow for short circuiting with a common min_score. I would recommend trying to benchmark in the same context as benchmark/benchmark.jl was written. I did run it locally and "everything is faster" but that's almost certainly incomparable due to differing julia versions, hardware, thread count, etc.

@matthieugomez matthieugomez merged commit b7dd39f into matthieugomez:main Apr 7, 2024
2 checks passed
@matthieugomez
Copy link
Owner

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants