core/sync: add wait queues #21123

derMihai · 2025-01-08T13:25:33Z

Contribution description

Adds Linux-kernel-like wait queue for mutex-less condition signaling.

Motivation

Copy-paste from core/include/wait_queue.h:

Wait queues enable lock-free, IRQ-safe condition signaling. This implementation is inspired from the Linux Kernel.

Wait queues have similar semantics to condition variables, but don't require setting the condition + signaling to be atomic, hence no mutex is needed. In turn, one may safely call QUEUE_WAKE() from an ISR. Note, while cond_signal() and cond_broadcast() are safe to call from an ISR context too, doing so will very probably cause a race condition elsewhere. Consider the following scenario using condition variables:

static uint64_t measurement;
mutex_t cond_lock = MUTEX_INIT;
cond_t cond = COND_INIT;

void measurement_irq(void)
{
    measurement = measure();
    cond_broadcast(&cond);
}

void wait_for_critical_value(void)
{
    mutex_lock(&cond_lock);
    while (atomic_load_u64(&measurement) < THRESHOLD) {
        cond_wait(&cond, cond_lock);
    }
    mutex_unlock(&cond_lock);
}

Note, the mutex is there only for the cond_wait() API call, as we're not allowed to call mutex_lock() inside the ISR. This alone is a hint that something isn't right and indeed, the following sequence of events is possible:

thread sees measurement < THRESHOLD, and is about to call cond_wait()
ISR fires, sets measurement = THRESHOLD, and signals the condition
thread calls cond_wait() and goes to sleep, possibly forever

Using a wait queue, we can do this:

static uint64_t measurement;
wait_queue_t wq = WAIT_QUEUE_INIT;

void measurement_irq(void)
{
    measurement = measure();
    queue_wake(&wq);
}

void wait_for_critical_value(void)
{
    QUEUE_WAIT(&wq, atomic_load_u64(&measurement) >= THRESHOLD);
}

This is free of the race condition above because QUEUE_WAIT() is a macro that checks the condition AFTER queueing the current thread to be waken up.

When to use?

QUEUE_WAIT() is a macro and might come with some additional code size cost due to inlining. If you're not synchronizing with an ISR and care very much about code size then go for condition variables, otherwise there is no reason not to use the wait queue.

Can't I just use a mutex?

You can abuse a mutex by locking it in a loop in the thread context and unlocking from the ISR context. But this will only work for a single waiter and makes improper use of the mutex semantics.

Correctness

Below is a state-diagram of a thread calling QUEUE_WAIT(). I'll argue about its correctness and assume that my code implements it. Note: for simplicty I dropped the early exit optimization from the state diagram as it's trivially clear that it doesn't affect correctness.

events

cond true - user's the condition expression evaluated true
cond false - user's the condition expression evaluated false
wake up - another thread/ISR called queue_wake(), which woke up this thread
block - user's condition expression blocks on some other locking primitive (including some other wait queue)
unblock - the locking primitive the user's condition expression blocks on got signaled

state sub-states

Each state is a combination of two sub-states: the queuing status and the thread state. A thread is only then enqueued in the wait-queue if it the state is labeled enqueued. Regarding the scheduler states:

RUNNABLE - on the scheduler's runqueue (e.g. running or pending)
WAITING - blocked, waiting on this wait queue
BLOCKED - blocked on some other locking primitive during the user condition expression evaluation, including some other wait queue

transitions

We assume transitions between states to be atomic.

enqueued RUNNABLE - the initial state
- cond true -> EXIT: the condition evaluated true. The thread is then de-queued and QUEUE_WAIT() exits.
- cond false -> enqueued WAITING: the condition evaluated false. The thread sleeps on the wait queue.
- wake up -> RUNNABLE: the wait queue got signaled and removed current thread from the wait queue, but the condition expression is still being evaluated.
- block -> enqueued BLOCKED: the condition expression blocks on some other locking primitive
RUNNABLE
- cond true -> EXIT: the condition evaluated true. QUEUE_WAIT() exits
- cond false -> enqueued RUNNABLE: the condition evaluated false. Since the thread is not enqueued anymore, it has to queue and do another condition check.
- block -> BLOCKED: the condition expression blocks on some other locking primitve
BLOCKED
- unblock -> RUNNABLE: the locking primitve is signaled and the thread can continue evaluating the condition expression
enqueued BLOCKED
- unblock -> enqueued RUNNABLE: the locking primitive is signaled and the thread can continue evaluating the condition expression. The thread is still enqueued.
- wake up -> BLOCKED: the wait queue is signaled. The thread is removed from the wait queue, but it may not wake up yet, as the thread is blocked on something else.
enqueued WAITING
- wake up-> enqueued RUNNABLE: the queue is signaled. We don't know the state of the condition, so we have to queue again and run the condition check again.

Code size

COND_WAIT() is a macro and comes with the expected costs associated with inlining. I haven't run any code-size test comparing it with the condition variable, but I expect it to require more. But not much more, and here's why:

Inlining QUEUE_WAIT(&wq, atomic_load_u64(&cond_val) > THRESHOLD); looks like this:

  do {
    if (atomic_load_u64(&cond_val) > THRESHOLD) {
      break;
    };

    wait_queue_entry_t me;
    _prepare_to_wait(&wq, &me);
    while (!(atomic_load_u64(&cond_val) > THRESHOLD)) {
      _maybe_yield_and_enqueue(&wq, &me);
    }
    _wait_dequeue(&wq, &me);
  } while (0);

A typical condition variable counterpart looks as follows:

mutex_lock(&cond_lock);
while (!(cond_val > THRESHOLD)) {
      cond_wait(&cond, &cond_lock);
}
mutex_unlock(&cond_lock);

Apart from the early exit optimization (which can be turned off with a compile flag), it looks pretty much the same. I don't have any hard numbers but I think the benefits (leaner code, IRQ compatibility, lower dead-lock risk) outweigh the code size penalty s.t. it make sense to use it in all cases where a condition variable is suitable.

Future work

A timed variant of QUEUE_WAIT() will be a nice addition at some point. I've already scketched out a solution but it's not trivial and adds complexity to the already not-so-simple implementation.

Testing procedure

There are two unit tests: tests/core/wait_queue and tests/core/wait_queue_optimized. The former is the extensive one, but it disables the early exit optimization in order to better test some edge cases. I think that's sensible, given that the early exit is not really part of the wait queue logic. But, for the sake of completeness, the second test does enable the optimization. It's just a symlink to the sources in test/core/wait_queue, but with some parts of the test disabled.

I've run the unit tests on the following boards:

native
nucleo-f401re
samd20-xpro

I've copied the board blacklist form the condition variable tests, but tbh. it seems a bit restrictive. I tried to compile for bluepill-stm32f030c8 (which is part of the blacklist) and it worked, but I know the CI builds differently. Should I try removing some boards from the blacklist? And if yes, which?

maribu · 2025-01-09T21:02:01Z

Just to be sure I understand the motivation: This does cater the same use case as cond, but is faster, correct?

If so, is there a concrete use case where cond actually is too slow?

To be honest, we current already have issues with the mechanisms for thread synchronization and inter-thread communication to work reliably due to premature optimizations (see e.g. #20878). I'm not a fan of adding to that complexity without a very compelling reason. I'd much rather have effort spent on fixing and cleaning up the existing ones.

derMihai · 2025-01-09T21:29:19Z

Speed is not the scope of this, please read the motivation. RIOT currently doesn't have a way for signaling conditions from ISR, or at least not a way for doing so that covers all the requirements. TLDR:

condition variables will almost always lead to some race condition
unlocking a mutex from ISR won't work for multiple waiters
thread flags are global and shouldn't be used by applications/drivers etc.

With the wait queue you can write context-agnostic code and it's simpler to use.

maribu · 2025-01-09T21:33:54Z

What is the issue you have with thread_flags? Why can't unlock more than one mutex from ISR? Why do you need to signal more than one waiter from ISR in the first place?

derMihai · 2025-01-09T21:43:49Z

Thread flags are fine for building core stuff with them because they are fast and simple. But they are global. If drivers, apllications, libraries and so on use them extensively, they will conflict at some point. With wait queues, just as with condition variables, you can have as many as you want.

You may signal multiple mutexes but you can't have multiple threads waiting on a single mutex, because it will wake just one of them. Mutexes just aren't meant for condition signaling. And as of why multiple threads: think about a simple buffer, with multiple consumers waiting for an ISR to fill it.

derMihai · 2025-01-09T21:54:22Z

Also, with thread flags, you have to know which thread to wake up in the first place. So you need to manually queue them in some way, which is what the wait queue does for you.

maribu · 2025-01-09T21:55:43Z

If drivers, apllications, libraries and so on use them extensively, they will conflict at some point.

There are right now only 2 thread flags in THREAD_FLAG_PREDEFINED_MASK reserved for RIOT. All other thread flags can be used for drivers and apps.

Note that flag BIT3 on thread A may have a completely different meaning as BIT3 for thread B; the flags do not need to be unique across different flags.

You may signal multiple mutexes but you can't have multiple threads waiting on a single mutex, because it will wake just one of them.

Yeah, unlock one mutex per thread to wake up.

And as of why multiple threads: think about a simple buffer, with multiple consumers waiting for an ISR to fill it.

Why would there multiple consumers on a single buffer? RIOT has not SMP support. You would waste CPU cycles on context switching and RAM for stacks by splitting processing of a single buffer into multiple threads.

Even if there really would be the need for wake up multiple threads from ISR (which I'm not convinced of), unlocking multiple mutexes would work fine. After all, the number of threads running with RIOT is quite limited, so there is no need for a mechanism that can efficiently wake up an unbounded amount of threads.

derMihai · 2025-01-09T22:31:44Z

You can have a library, a driver, and user application using BIT3, each of them meaning something different. You then signal BIT3 meaning the one from the application, while a the thread is waiting for the same BIT3, but in the driver, meaning something completely different. Pray and hope that the driver is hardened against spurious wakeups.

I agree with you with the buffer example, in embedded context you will probably have one consumer thread. But generally, I noticed that threads are spawned way more often than they should be (simply for the reason that, if you're not syncing a lot, it's so much easier than using an e.g. event queue). And then you want these threads to sync on some event.

And in both the thread flag and mutex case, if you have multiple threads you have to register them somehow, manually. If a thread is waiting for a flag, you have to know which. If you have multiple mutexes from multiple threads, you have to know them.

And, just to play devil's advocate: you can actually use one mutex with multiple threads, it's just not intuitive and hacky (and still doesn't cover all cases), which leads to my main point: even if one can improvize with existing mechanisms, it's always some aspect that is overseen, lots of boilerplate, and opportunity for bugs. The main scope of the wait queue is to have one powerful mechanism to cover all these use cases and make code simpler, safer and more portable across contexes (ISR/thread).

maribu

NACK.

Let's not add thread synchronization mechanisms without a tangible use case. Thread synchronization mechanisms and inter thread communication mechanisms in particular and core in general are crucial to RIOT, so they really need to be well maintained. Adding code and complexity here without a tangible use case is not a good idea. In addition RIOT already has so many synchronization mechanisms that new users can easily be overwhelmed by it and often choose a poor fit for their task. Adding a mechanism that has no tangible use case to the mix will only add to that problem.

This PR also has a number of issues:

It uses a custom linked list scheme, rather than thread_add_to_list() and container_of(), as the other mechanisms do
It breaks OpenOCD's RIOT support (e.g. info threads). Adding a new thread state would require coordination with OpenOCD's source code and adding versioning to RIOT's threading data structures
It probably breaks assumptions in the FreeRTOS threading compatibility layer used in the ESP port to run the binary blob WiFI driver

But to avoid confusion: The main issue is the lack of a tangible use case here.

Wrong button

maribu

Now the right button, see above for the reasoning.

derMihai · 2025-01-10T12:55:37Z

Let's not make simplistic assumptions about usage, because this is what nudges users to "creative" and often wrong workarounds. We shouldn't punish users for sub-optimal or even bad design decisions, if what they're doing is actually corrrect. If that were the case, RIOT shouldn't have recursive mutexes and even condition variables in the first place. Recursive mutexes are the red flags of concurret design, but people still use them because they make life easier; and condition variables solve solely the problem that you dismiss: multiple waiters. Otherwise one can just use a mutex and call it a day.

I do understand that this is an embedded operating system, and especially regarding core stuff, we can't just bloat it on a whim. But the wait queue is such a simple thing (it's really just condition variables "in reverse") and it covers pretty much every signaling scenario. I think it's worth considering.

In any case, I'm not going to provide any other use cases for you to take apart and provide work-arounds, as the whole point of this is to avoid hacks. You started this disussion with a dismissive stance and without trying to understand what this is about. I'm not willing to commit to any more convincing work. It should also be noted that I didn't invent the wait queues, they're in the Linux kernel for decades and there's plenty of resources about them that back my arguments.

However, you also noted some techincal difficulties that admittedly weren't known to me (OpenOCD, Free RTOS). I'm willing to look into these and try to solve them but I need to know If I get some support here, otherwise I'm just going to close this. While I'd really love to have and use wait queues for my daily work, I have meanwhile enough understanding of RIOT's core mechanics to just keep hacking my way around.

Regarding reliablity and correctness of core stuff: we should resume work on #20940 and #20941.

kaspar030 · 2025-01-10T14:47:12Z

I stumbled over the difficulties of cond_var myself, so I see some need here.
Thanks @derMihai for trying to tackle this!

... but I also think we shouldn't introduce another IPC mechanismn. (The implementation here wouldn't pass our guidelines, at least b/c QUEUE_WAIT() is a macro, and there's a free coded linked list.)

I think the argument that the thread flags namespace is global is overrated. Unless the exact amount of wakeups is required, even multi-use of the same thread flag doesn't harm much, and that can be avoided.

Anyhow, I played with waking up multiple waiters using thread flags, here's some code: https://github.com/kaspar030/RIOT/tree/wq_thread_flags

edit fixed link

derMihai · 2025-01-10T15:34:25Z

Thanks @kaspar030 for the productive feedback. The reason I coded the linked list myself is to keep the exclusive wake functionality of the condition variable, which requires entries to be inserted at specific location based on thread priority. But here I would argue similarly as @maribu, which is: I don't actually see the need for that exclusive waking, I just wanted to keep features. Exclusive waiting is no guarantee of exclusivity, but rather a performance optimization. Without that requirement in the way, I can use the standard linked list manipulation code.

I really like your variant based on thread flags, and I even considered doing something similar at some point. I went for a new sync mechanism at the end because it's better optimized, allowed me to keep the exclusive wake feature, and mainly because I didn't know of the additional technical difficulties that @maribu pointed out (OpenOCD, FreeRTOS), which definitely are a deal breaker. If that's fine for you, I'd like to continue on top of your sketch. That would also allow the wait queue to be included optionally, as a module.

One more question: what is wrong with wrapping the waiter loop boiler plate within a macro?

maribu · 2025-01-10T15:49:10Z

One more question: what is wrong with wrapping the waiter loop boiler plate within a macro?

We generally try to avoid the use of the preprocessor if the same can be done with C code instead.

If that's fine for you, I'd like to continue on top of your sketch.

Implementing the API proposed here on top of existing mechanisms is very welcome.

kaspar030 · 2025-01-10T15:52:13Z

If that's fine for you, I'd like to continue on top of your sketch.

Sure! While writing it, I thought this is so close to thread_flags, its name should reflect that. Maybe thread_flag_group_t/attach()/detach()/...? Expect some bike shedding there.
And, maybe instead hard-coding 0x1 as thread flags, just add the mask as parameter to the wake() function?

One more question: what is wrong with wrapping the waiter loop boiler plate within a macro?

In general, style, coding conventions (somewhat), and just look at how clean the RIOT code base is without all those macros. Also, macros are often not understood by tooling.

In this specific case, nothing specific, maybe that the condition is evaluated twice. And that it just might not be needed.

derMihai · 2025-01-10T16:04:56Z

We generally try to avoid the use of the preprocessor if the same can be done with C code instead.

I see. The issue here is that the user will always have to write the same boilerplate, something like:

enqueue()
while (!(condition_expression)):
   wait()

dequeue()

I wanted all this to be something like queue_wait(condition_expression). This cannot be wrapped by a function because condition expression is well... an expression and we can't pass that to a function.

What I can do instead is to provide this functions als public API, and a macro as a shorthand.

Sure! While writing it, I thought this is so close to thread_flags, its name should reflect that. Maybe thread_flag_group_t/attach()/detach()/...? Expect some bike shedding there.

@kaspar030 I have no preference on where to put the code and how to name the API, I'll follow your suggestion.

[..] maybe that the condition is evaluated twice. And that it just might not be needed.

Probably not needed, which is also why I made that early check a macro that can be turned off. I'll leave it out.

kaspar030 · 2025-01-10T16:30:18Z

I see. The issue here is that the user will always have to write the same boilerplate, something like:

Yeah, that's life for C programmers.

Do you have an actual use case? Is that more like one-off functions as in wait_for_critical_value(){} in the PR description? Or do you have multiple classic waiting-in-a-loop threads?

I'd much prefer if this would turn into a helper around thread flags (basically just adding the attach()/detach() functionality), as it would then compose with msg, mbox, events and timers (timeouts).

github-actions bot added Area: doc Area: Documentation Area: tests Area: tests and testing framework Area: core Area: RIOT kernel. Handle PRs marked with this with care! labels Jan 8, 2025

derMihai force-pushed the mir/wait_queue_head branch 2 times, most recently from a8091c5 to 4b66ba8 Compare January 8, 2025 19:32

derMihai and others added 15 commits January 8, 2025 20:37

WIP: wait queue

3afecf5

WIP: waitqueues

c21b720

WIP: with separate wq head

a82a7a4

fixed wait queue

f2823e6

moved code to wq.c; wait_enqueue assumes not enqueued

1fa588c

dont enqueue if already enqueued

0110398

WIP: don't wake up wrong queue

da7d346

better doc + slimmer macro

523705c

Added sched_switch() to queue_wake_common(). WIP: tests

437ee6f

added unit tests

3adb6e4

some final touches

ae1a0c9

small size improvement

46e960c

assert ordering in test

57d7c52

fixed email

06173ec

Doc and cosmetic fixes

ca2557c

derMihai force-pushed the mir/wait_queue_head branch from 4b66ba8 to ca2557c Compare January 8, 2025 19:41

derMihai added 2 commits January 8, 2025 20:50

fixup! Doc and cosmetic fixes

9d77c52

added test for condition already true

96ec8f0

derMihai marked this pull request as ready for review January 8, 2025 20:12

derMihai requested review from leandrolanzieri, aabadie, MichelRottleuthner and kaspar030 as code owners January 8, 2025 20:12

derMihai added 2 commits January 8, 2025 21:19

fixup! added test for condition already true

5ead3af

fixup! fixup! added test for condition already true

eab59ee

benpicco requested a review from maribu January 9, 2025 14:16

maribu previously approved these changes Jan 9, 2025

View reviewed changes

maribu requested changes Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/sync: add wait queues #21123

core/sync: add wait queues #21123

derMihai commented Jan 8, 2025 •

edited

Loading

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025 •

edited

Loading

derMihai commented Jan 9, 2025

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025

maribu left a comment

maribu left a comment

derMihai commented Jan 10, 2025

kaspar030 commented Jan 10, 2025 •

edited

Loading

derMihai commented Jan 10, 2025 •

edited

Loading

maribu commented Jan 10, 2025

kaspar030 commented Jan 10, 2025

derMihai commented Jan 10, 2025

kaspar030 commented Jan 10, 2025

core/sync: add wait queues #21123

Are you sure you want to change the base?

core/sync: add wait queues #21123

Conversation

derMihai commented Jan 8, 2025 • edited Loading

Contribution description

Motivation

Correctness

events

state sub-states

transitions

Code size

Future work

Testing procedure

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025 • edited Loading

derMihai commented Jan 9, 2025

maribu commented Jan 9, 2025

derMihai commented Jan 9, 2025

maribu left a comment

Choose a reason for hiding this comment

maribu left a comment

Choose a reason for hiding this comment

derMihai commented Jan 10, 2025

kaspar030 commented Jan 10, 2025 • edited Loading

derMihai commented Jan 10, 2025 • edited Loading

maribu commented Jan 10, 2025

kaspar030 commented Jan 10, 2025

derMihai commented Jan 10, 2025

kaspar030 commented Jan 10, 2025

derMihai commented Jan 8, 2025 •

edited

Loading

derMihai commented Jan 9, 2025 •

edited

Loading

kaspar030 commented Jan 10, 2025 •

edited

Loading

derMihai commented Jan 10, 2025 •

edited

Loading