Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like
atomic store ... release
on x86, itself a lowering of Rust'sAtomicBool::store(.., .., Ordering::Release)
. These facts could allow something like the following code to be emitted:But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a "yes-temporal" write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order:
This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states, especially as the number of writes, thus the number of write buffers that may begin retiring simultaneously, thus the chance of them resolving in an unfortunate order, increases.
To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE. Even
unsafe fn
must explicitly pass this invariant to their callers, so it can be preferable for a function to internally "close over" any possible resulting unsoundness.As one function using streaming stores is used in a fairly tight loop itself,
DisplayFrameBuffer::copy_untrusted_row_rgb24_to_bgrx32
, name it as a streaming store and pass the invariant up. Then insidecopy_from_raw_untrusted_rgb24_to_bgrx32
use SFENCE. This prevents undue performance overhead from repeated SFENCE usage. However, this IS required as some callers immediately do use atomic stores in precisely the way that causes undefined behavior.