Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use actual thread local queues instead of using a RwLock #93

Merged
merged 22 commits into from
Feb 22, 2024

Conversation

james7132
Copy link
Contributor

@james7132 james7132 commented Feb 14, 2024

Currently, runner local queues rely on a RwLock<Vec<Arc<ConcurrentQueue>>>> to store the queues instead of using actual thread-local storage.

This adds thread_local as a dependency, but this should allow the executor to work steal without needing to hold a lock, as well as allow tasks to schedule onto the local queue directly, where possible, instead of always relying on the global injector queue.

Fixes #62.

@james7132
Copy link
Contributor Author

james7132 commented Feb 14, 2024

Whoa, that miri failure is a big red flag. Not sure how that's happening. Looks like a soundness bug with ThreadLocal::iter. Closing this out for now.

EDIT: This happens regardless of whether we're chaining the iterators or not, iterating over elements always seems to trigger Miri.

@james7132 james7132 closed this Feb 14, 2024
@james7132
Copy link
Contributor Author

Filed an issue regarding the UB: Amanieu/thread_local-rs#70

@james7132 james7132 reopened this Feb 16, 2024
@james7132
Copy link
Contributor Author

The miri issue seems to be fixed with Amanieu/thread_local-rs#72, so this is likely going to be blocked on that being merged and released.

Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue here is that thread_local's MSRV might exceed ours at some point. I've considered this in the past. However, the maintainer of thread_local has stated that the MSRV for thread-local might exceed Rust v1.63 in the future.

The issue is that, if thread-local's MSRV is bumped, we would be forced to depend on an older version of thread-local. For our dependents with higher MSRVs this would cause duplicate dependencies in the tree.

In the past we'd encountered this issue with once_cell, see smol-rs/async-io#93. This is why I avoid the use of once_cell throughout smol, instead opting to prefer async_lock::OnceCell.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@notgull notgull marked this pull request as draft February 17, 2024 20:25
src/lib.rs Outdated Show resolved Hide resolved
@james7132
Copy link
Contributor Author

The issue is that, if thread-local's MSRV is bumped, we would be forced to depend on an older version of thread-local. For our dependents with higher MSRVs this would cause duplicate dependencies in the tree.

Is there a location where the MSRV policy for crates under this organization is documented? All things considered, I think the update frequency for thread_local is so low that this should be pretty low risk. That said, the once_cell dependency also is under a similar situation.

@notgull
Copy link
Member

notgull commented Feb 17, 2024

Is there a location where the MSRV policy for crates under this organization is documented? All things considered, I think the update frequency for thread_local is so low that this should be pretty low risk. That said, the once_cell dependency also is under a similar situation.

The MSRV policy is here. I should really add it explicitly to all crates.

Even if the risk is low it's not zero. It was a headache the first time and I'd like to avoid a repeat if I can.

@notgull
Copy link
Member

notgull commented Feb 18, 2024

I'm fine with merging it for now; from my perspective it doesn't look like thread-local will be bumping MSRV anytime soon. Not to mention with async traits coming out in the near future I expect a futures breaking change, which will translate into a breaking change in smol alongside a liberal MSRV bump.

@james7132
Copy link
Contributor Author

Did a quick benchmark comparison after all of the aforementioned changes, not sure what to make of these results:

executor::create        time:   [1.2670 µs 1.2680 µs 1.2693 µs]
                        change: [+10.796% +10.967% +11.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

single_thread/executor::spawn_one
                        time:   [1.6679 µs 1.6746 µs 1.6822 µs]
                        change: [+0.2716% +1.8302% +3.6198%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
single_thread/executor::spawn_many_local
                        time:   [6.0765 ms 6.1225 ms 6.1719 ms]
                        change: [+1.7673% +2.7052% +3.7431%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
single_thread/executor::spawn_recursively
                        time:   [35.880 ms 36.350 ms 36.840 ms]
                        change: [-3.4080% -1.6301% +0.2145%] (p = 0.09 > 0.05)
                        No change in performance detected.
single_thread/executor::yield_now
                        time:   [5.2446 ms 5.2552 ms 5.2670 ms]
                        change: [-10.483% -10.256% -10.034%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

multi_thread/executor::spawn_one
                        time:   [1.4244 µs 1.4298 µs 1.4361 µs]
                        change: [-1.6323% -0.6865% +0.2260%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
multi_thread/executor::spawn_many_local
                        time:   [24.883 ms 24.969 ms 25.057 ms]
                        change: [+3.4173% +3.8879% +4.4088%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.5s, or reduce sample count to 30.
multi_thread/executor::spawn_recursively
                        time:   [162.44 ms 162.78 ms 163.19 ms]
                        change: [-4.6241% -4.2275% -3.8289%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking multi_thread/executor::yield_now: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
multi_thread/executor::yield_now
                        time:   [1.8351 ms 1.8451 ms 1.8580 ms]
                        change: [-92.286% -92.235% -92.165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

The impact to create I think can be chalked up to runtime environment since we've changed nothing about the initialization of the executor, and the mulitthreaded/executor::yield_now results are very suspicious.

@james7132
Copy link
Contributor Author

Cross checking the benchmark results for #37, it seems like the results are to be expected.

src/lib.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@james7132 james7132 marked this pull request as ready for review February 20, 2024 23:58
@james7132 james7132 requested a review from notgull February 20, 2024 23:58
Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me

Cargo.toml Outdated Show resolved Hide resolved

// Pick a random starting point in the iterator list and rotate the list.
let n = local_queues.len();
let n = local_queues.iter().count();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a cold operation? It seems like this would take a while.

Copy link
Contributor Author

@james7132 james7132 Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one part I'm not so sure about. Generally this shouldn't be under contention, since the cost to spin up new threads is going to be higher than it is to scan over the entire container, unless you have literally thousands of threads. It otherwise is just a scan through fairly small buckets.

We could use an atomic counter to track how many there are, but since you can't remove items from the ThreadLocal, there will be residual thread locals from currently unused threads (as thread IDs are reused), that may get out of sync.

src/lib.rs Show resolved Hide resolved
@james7132 james7132 requested a review from notgull February 21, 2024 23:43
src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: John Nunley <[email protected]>
@notgull notgull merged commit 7592d41 into smol-rs:master Feb 22, 2024
8 checks passed
@notgull notgull mentioned this pull request Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Push task directly to the local queue
2 participants