Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect unsynchronized TSCs across cores and fallback to the OS time source? #111

Open
tatsuya6502 opened this issue Jan 3, 2025 · 3 comments

Comments

@tatsuya6502
Copy link

tatsuya6502 commented Jan 3, 2025

As we discovered in #61, some machines have TSCs that are not synchronized across cores. On such a machine, quanta::Instant will not return monotonically increasing time when queried from different cores. A users of my library, who uses a Thinkpad with Ryzen mobile CPU, seems to have this problem:

moka-rs/moka#472 (comment)

CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics
System: NixOS Linux x86_64

EDIT: Looking at the linked issue:

Thinkpads with Ryzen mobile CPUs

Yep, this is the case here.

I added a workaround by doing a saturating subtraction of quanta::Instant instead of checked subtraction, so it will not return None when a time warp happens. But this is not a solution, as it only hides the problem.

I did some search and found this: https://bugzilla.kernel.org/show_bug.cgi?id=202525#c2

I used a simple tool to watch the TSC values for each CPU, and it looks like CPUs 1-7 are synchronized, but CPU 0 has a large offset compared to the others. They're all advancing at the same frequency (2195MHz) though:

1028:     -8   1606   1606   1606   1606   1606   1606   1606  3695718 KHz

...
The interesting and incriminating part in the output above is that CPU0 is about -1600ms offset from the TSCs on the other CPUs.

So I am wondering if quanta can detect this situation and fallback to the OS time source. Maybe detecting it in the calibration phase? I am not sure how to do this, but the only way I can think of is to read TSCs of different cores and see if the differences between the values are within some threshold.

Maybe it is something like this?

  1. Create a std::sync::Barrier to synchronize the threads.
  2. Spawn a thread on each core:
    • Pin the thread to the core using core_affinity crate.
      • (This will not work on macOS, but I think it is not a problem as Intel Macs would not have this unsynchronized TSC problem.)
  3. Wait on the barrier.
  4. Read the TSC.
  5. Compare the differences between the TSCs of different cores.

But I have got questions like: what would be the good threshold? (Given the above -1600ms off example, maybe a few milliseconds would be large enough? Maybe this is application dependent?)

A completely different solution for my library would be to use the OS time source (e.g. std::time::Instant) where the time accuracy is important, and use TSC (quanta::Instant) in other places.

@tobz
Copy link
Member

tobz commented Jan 3, 2025

👋🏻

It would likely be fine to do this as part of the calibration phase as you suggest. An easier approach may simply be to copy what minstant and ask the OS if the TSC was detected as being stable/synchronized.

I trust the kernel to do a better job of that than trying to do it in userland.

This would only cover Linux x86-64 platforms, but that strikes me as acceptable. If you're running a hacked together Ryzen laptop w/ macOS... I'd say you're on your own for support. 😅

@tatsuya6502
Copy link
Author

Thank you for your prompt reply. OK. I will try to implement this in the calibration phase, and open a PR.

This would only cover Linux x86-64 platforms

The user reported the problem on Linux x86_64, so it should be fine. I wonder what would happen on Windows, but I will never know since I do not have such a Ryzen laptop to test. So, I will just leave it at that for now.

@tobz
Copy link
Member

tobz commented Jan 6, 2025

Just an update, since I realized I gave some bad advice before: we wouldn't want to do this in calibration, but in the calls we make to see if the TSC is available.

By the time we're in calibration, we've decided to use the TSC, so it has to come earlier than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants