Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX Simulator: Handle pthreads not created by FreeRTOS differently #1223

Merged
merged 5 commits into from
Jan 25, 2025

Conversation

johnboiles
Copy link
Contributor

@johnboiles johnboiles commented Jan 17, 2025

Avoid calling pthread_sigmask on pthreads not created by FreeRTOS. Also avoids waiting forever on vPortEndScheduler if that's called from a non-FreeRTOS thread.

Description

This PR modifies the behavior of the FreeRTOS POSIX simulator. The tick handler (via sigaction) might happen on any thread in the current process. This can cause hangs with non-FreeRTOS threads (because they can get hung when prvSuspendSelf is called on them.

Test Steps

Run this program against the main branch, notice that it hangs on shutdown.

#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <thread>
#include <iostream>
#include <unistd.h>

void appMainTask(void *parameters) {
    while (true) {}
    vTaskDelete(NULL);
}

bool mainLoop() {
    static bool shouldRun = true;
    if (std::cin.peek() != EOF) {
        char input;
        std::cin >> input;
        if (input == 'q') {
            shouldRun = false;
        }
    }
    return shouldRun;
}

auto main(int argc, char *argv[]) -> int {
    // Start the FreeRTOS scheduler
    std::thread schedulerThread([]() {
        xTaskCreate(appMainTask, "app_main", 10000, NULL, 1, NULL);
        vTaskStartScheduler();
        printf("Scheduler thread done\n");
    });

    while (mainLoop()) {
        // Limit to ~60fps so we don't murder battery unnecessarily
        usleep(1000000.0 / 60.0);
    }

    vTaskEndScheduler();
    schedulerThread.join();

    return 0;
}

Now run it again with this change and notice that it shuts down cleanly.

Checklist:

  • I have tested my changes. No regression in existing tests.
  • I have modified and/or added unit-tests to cover the code changes in this Pull Request.

I still need to read up on the test suite. Looking for directional feedback first.

@johnboiles johnboiles requested a review from a team as a code owner January 17, 2025 22:23
@johnboiles johnboiles changed the title Avoid calling pthread_sigmask on pthreads not created by FreeRTOS Posix Simulator: Handle pthreads not created by FreeRTOS differently Jan 17, 2025
@johnboiles johnboiles force-pushed the posix-dont-break-external-threads branch from 6f630dc to aa8fcad Compare January 17, 2025 23:28
Copy link

codecov bot commented Jan 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.61%. Comparing base (31dd1e3) to head (aa8fcad).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1223      +/-   ##
==========================================
- Coverage   91.64%   91.61%   -0.03%     
==========================================
  Files           6        6              
  Lines        3254     3257       +3     
  Branches      903      901       -2     
==========================================
+ Hits         2982     2984       +2     
  Misses        132      132              
- Partials      140      141       +1     
Flag Coverage Δ
unittests 91.61% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnboiles johnboiles changed the title Posix Simulator: Handle pthreads not created by FreeRTOS differently POSIX Simulator: Handle pthreads not created by FreeRTOS differently Jan 18, 2025
@aggarg
Copy link
Member

aggarg commented Jan 21, 2025

The aim of the POSIX port is to aid development of FreeRTOS applications. Why would you want to create native pthreads in a FreeRTOS application?

@johnboiles
Copy link
Contributor Author

johnboiles commented Jan 21, 2025

A simulator program may need other threads to simulate peripheral hardware. For example I need another thread to run my virtual display window that simulates my hardware display. My main FreeRTOS application does not need to know about my display simulator.

On some platforms (I'm experimenting with iOS) the system frameworks create several threads at startup so there are already other threads existing when user code is run. Running my FreeRTOS code on iOS is useful to me as it allows me to easily prototype peripherals like touch screens and HDMI output without bringing up the target hardware (which we haven't received yet).

The current FreeRTOS implementation eventually causes all other threads to lock up.

@hoxi
Copy link

hoxi commented Jan 21, 2025

We’re also using separate (non-FreeRTOS) pthreads to simulate various interrupt sources, and this change aligns well with a pull request we plan to submit. In our case, the interrupt pthreads call different FreeRTOS ISR APIs to inject data into the FreeRTOS application. For this to work, we’ve added an additional port mutex layer to prevent ISR APIs from being executed while FreeRTOS is in a critical section. It would be great to see this patch merged in some form so that we can rebase our planned pull request onto it.

archigup
archigup previously approved these changes Jan 23, 2025
Copy link
Member

@jasonpcarroll jasonpcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change, and its clear that other users find it valuable as well. Thank you! I just had some minor comments regarding early returns - mainly just to make static analyzers happy since MISRA C will flag it.

portable/ThirdParty/GCC/Posix/port.c Outdated Show resolved Hide resolved
portable/ThirdParty/GCC/Posix/port.c Outdated Show resolved Hide resolved
portable/ThirdParty/GCC/Posix/port.c Outdated Show resolved Hide resolved
@jasonpcarroll
Copy link
Member

Hmm a check is failing but it doesn't seem related to your change... I will look into it.

@archigup archigup merged commit 2b35979 into FreeRTOS:main Jan 25, 2025
16 of 17 checks passed
@jasonpcarroll
Copy link
Member

For now, we will merge this as there is no action needed on this PR. Thanks @johnboiles!

@johnboiles johnboiles deleted the posix-dont-break-external-threads branch January 25, 2025 01:16
@johnboiles
Copy link
Contributor Author

Thanks for getting this in there!

aggarg added a commit to aggarg/FreeRTOS-Kernel that referenced this pull request Jan 25, 2025
@denravonska
Copy link

denravonska commented Jan 29, 2025

Are leak checks a port of the CI? After switching to main we started noticing memory leaks in our tests and I'm trying to figure out if it's due to this PR or if we're shutting down incorrectly.

Direct leak of 1 byte(s) in 1 object(s) allocated from:
    #0 0x7ffff78fd891 in malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x5555557ab096 in prvMarkAsFreeRTOSThread ../third-party/freertos/repo/portable/ThirdParty/GCC/Posix/port.c:155
    #2 0x7ffff6aa32cd  (/usr/lib/libc.so.6+0x942cd) (BuildId: aed3a2b0cf4e6cc12296052529af22f6a450a75a)

Edit: Seems to only happen in one of our test suites so I think it's something on our end.
Edit: It was on my end. Interestingly, killing all threads via vTaskEndScheduler then killing the thread again via a C++ destructor will show up as a memory leak.

@aggarg
Copy link
Member

aggarg commented Jan 29, 2025

Thank you for sharing! Which tool are you using for finding these leaks?

@johnboiles
Copy link
Contributor Author

I think the problem is a pre-existing issue that is now showing up in your tools because of the added malloc call in this PR. pthread_keys should get destructed (in our case prvThreadKeyDestructor is called) when the associated pthread terminates. But I do not think the FreeRTOS POSIX port terminates all of its pthreads.

I've noticed this issue when trying to start and stop FreeRTOS multiple times within the same process (e.g. for running unit tests) -- some threads from the first execution stick around and cause problems with subsequent FreeRTOS runs. My workaround has been to execute each test that needs to start/stop FreeRTOS in a new process.

@denravonska
Copy link

denravonska commented Jan 29, 2025

Thank you for sharing! Which tool are you using for finding these leaks?

We try to build as much as possible for host so we can enable ASAN for this very purpose. In this case is was the most redest herring I've ever seen.

I think the problem is a pre-existing issue that is now showing up in your tools because of the added malloc call in this PR. pthread_keys should get destructed (in our case prvThreadKeyDestructor is called) when the associated pthread terminates. But I do not think the FreeRTOS POSIX port terminates all of its pthreads.

I've noticed this issue when trying to start and stop FreeRTOS multiple times within the same process (e.g. for running unit tests) -- some threads from the first execution stick around and cause problems with subsequent FreeRTOS runs. My workaround has been to execute each test that needs to start/stop FreeRTOS in a new process.

I think the pure FreeRTOS implementation is safe in this case as I was unable to reproduce it without our C++ wrappers. Also after reading more about the thread local key/value mechanisms I get the impression that the way it's implemented in this PR is the correct way to do it.
In my particular case is was a bug/design flaw that double-killed long lasting threads, but I'm not exactly sure why it manifests itself as a memory leak and not just an assertion failure or a crash.

Edit: Try your test now that this PR is merged. We also saw sticky threads prior to that fix.

@johnboiles
Copy link
Contributor Author

Edit: Try your test now that #1233 PR is merged. We also saw sticky threads prior to that fix.

Ok! I see the threads getting cleaned up! Though I also still see deadlocks sometimes. I'm testing with this example:

#include <FreeRTOS.h>
#include <task.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

extern "C" {
    void vAssertCalled(const char *file, int line) {
        fprintf(stderr, "Assertion failed in file %s:%d\n", file, line);
        abort();
    }
}

int main() {
    TaskHandle_t task;
    xTaskCreate(
        [](void *param) {
        printf("FreeRTOS scheduler started\n");
        vTaskDelay(pdMS_TO_TICKS(1000));
        printf("Task Done, ending scheduler\n");
        vTaskEndScheduler();
        assert(false && "After scheduler ended (SHOULD NOT GET HERE)");
    }, "start", 10000, nullptr, 1, &task);
    printf("Starting FreeRTOS scheduler\n");
    vTaskStartScheduler();
    printf("FreeRTOS scheduler exited\n");
    vTaskDelete(task);
    printf("Task deleted\n");
}

If I breakpoint at the vTaskDelay and the final printf("Task deleted\n"); I (usually) see the correct threads

Process 86424 stopped
  thread #1: tid = 0x54bdc87, 0x00000001815da960 libsystem_kernel.dylib`__sigwait + 8, name = 'Scheduler', queue = 'com.apple.main-thread'
* thread #2: tid = 0x54bdca1, 0x0000000100003ce4 freertos_posix_example`main::$_0::operator()(this=0x000000016fe86f97, param=0x0000000000000000) const at main.cpp:19:9, name = 'start', stop reason = breakpoint 2.1
  thread #3: tid = 0x54bdca2, 0x00000001815d26ec libsystem_kernel.dylib`__psynch_cvwait + 8
  thread #4: tid = 0x54bdca3, 0x00000001815d26ec libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'Tmr Svc'
  thread #5: tid = 0x54bdca4, 0x00000001815d7720 libsystem_kernel.dylib`__pthread_kill + 8, name = 'Scheduler timer'
(lldb) th l
Process 86424 stopped
* thread #1: tid = 0x54bdc87, 0x0000000100003c68 freertos_posix_example`main at main.cpp:28:5, name = 'Scheduler', queue = 'com.apple.main-thread', stop reason = breakpoint 5.1

When I see deadlocks it's like this:

(lldb) th l
Process 98952 stopped
* thread #1: tid = 0x54c5baa, 0x00000001815da960 libsystem_kernel.dylib`__sigwait + 8, name = 'Scheduler', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x54c5bbb, 0x00000001815d1bbc libsystem_kernel.dylib`__psynch_mutexwait + 8, name = 'start'
  thread #5: tid = 0x54c5bbe, 0x00000001815d24e8 libsystem_kernel.dylib`__semwait_signal + 8, name = 'Scheduler timer'
(lldb) t 2
* thread #2, name = 'start'
    frame #0: 0x00000001815d1bbc libsystem_kernel.dylib`__psynch_mutexwait + 8
libsystem_kernel.dylib`__psynch_mutexwait:
->  0x1815d1bbc <+8>:  b.lo   0x1815d1bdc    ; <+40>
    0x1815d1bc0 <+12>: pacibsp
    0x1815d1bc4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1815d1bc8 <+20>: mov    x29, sp
(lldb) bt
* thread #2, name = 'start'
  * frame #0: 0x00000001815d1bbc libsystem_kernel.dylib`__psynch_mutexwait + 8
    frame #1: 0x000000018160d3f8 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84
    frame #2: 0x000000018160adbc libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220
    frame #3: 0x000000010000f980 freertos_posix_example`event_signal(ev=0x0000600002748090) at wait_for_event.c:128:9
    frame #4: 0x000000010000f344 freertos_posix_example`vPortCancelThread(pxTaskToDelete=0x00006000022480b0) at port.c:534:5
    frame #5: 0x00000001000074c0 freertos_posix_example`prvDeleteTCB(pxTCB=0x00006000022480b0) at tasks.c:6463:9
    frame #6: 0x00000001000073bc freertos_posix_example`vTaskDelete(xTaskToDelete=0x00006000022480b0) at tasks.c:2337:13
    frame #7: 0x00000001000091ac freertos_posix_example`vTaskEndScheduler at tasks.c:3808:13
    frame #8: 0x0000000100003cfc freertos_posix_example`main::$_0::operator()(this=0x000000016fe86f97, param=0x0000000000000000) const at main.cpp:21:9
    frame #9: 0x0000000100003cb8 freertos_posix_example`main::$_0::__invoke(param=0x0000000000000000) at main.cpp:17:9
    frame #10: 0x000000010000ede8 freertos_posix_example`prvWaitForStart(pvParams=0x000000015801b858) at port.c:556:5
    frame #11: 0x00000001816102e4 libsystem_pthread.dylib`_pthread_start + 136
(lldb) t 5
* thread #5, name = 'Scheduler timer'
    frame #0: 0x00000001815d24e8 libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1815d24e8 <+8>:  b.lo   0x1815d2508    ; <+40>
    0x1815d24ec <+12>: pacibsp
    0x1815d24f0 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1815d24f4 <+20>: mov    x29, sp
(lldb) bt
* thread #5, name = 'Scheduler timer'
  * frame #0: 0x00000001815d24e8 libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001814b16f0 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x00000001814b1608 libsystem_c.dylib`usleep + 68
    frame #3: 0x000000010000f4e4 freertos_posix_example`prvTimerTickHandler(arg=0x0000000000000000) at port.c:463:9
    frame #4: 0x00000001816102e4 libsystem_pthread.dylib`_pthread_start + 136

@johnboiles
Copy link
Contributor Author

johnboiles commented Jan 29, 2025

Looks to me like it pthread_cancels the idle thread but then gets stuck at event_signal( pxThreadToCancel->ev );. I wonder if there's something we need to do first to suspend the idle task.

(lldb) bt
* thread #2, name = 'start'
    frame #0: 0x00000001815d1bbc libsystem_kernel.dylib`__psynch_mutexwait + 8
    frame #1: 0x000000018160d3f8 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84
    frame #2: 0x000000018160adbc libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220
    frame #3: 0x000000010000f980 freertos_posix_example`event_signal(ev=0x0000600002748090) at wait_for_event.c:128:9
    frame #4: 0x000000010000f344 freertos_posix_example`vPortCancelThread(pxTaskToDelete=0x00006000022480b0) at port.c:534:5
    frame #5: 0x00000001000074c0 freertos_posix_example`prvDeleteTCB(pxTCB=0x00006000022480b0) at tasks.c:6463:9
    frame #6: 0x00000001000073bc freertos_posix_example`vTaskDelete(xTaskToDelete=0x00006000022480b0) at tasks.c:2337:13
  * frame #7: 0x00000001000091ac freertos_posix_example`vTaskEndScheduler at tasks.c:3808:13
    frame #8: 0x0000000100003cfc freertos_posix_example`main::$_0::operator()(this=0x000000016fe86f97, param=0x0000000000000000) const at main.cpp:21:9
    frame #9: 0x0000000100003cb8 freertos_posix_example`main::$_0::__invoke(param=0x0000000000000000) at main.cpp:17:9
    frame #10: 0x000000010000ede8 freertos_posix_example`prvWaitForStart(pvParams=0x000000015801b858) at port.c:556:5
    frame #11: 0x00000001816102e4 libsystem_pthread.dylib`_pthread_start + 136
(lldb) f 7
frame #7: 0x00000001000091ac freertos_posix_example`vTaskEndScheduler at tasks.c:3808:13
   3805	        /* Delete Idle tasks created by the kernel.*/
   3806	        for( xCoreID = 0; xCoreID < ( BaseType_t ) configNUMBER_OF_CORES; xCoreID++ )
   3807	        {
-> 3808	            vTaskDelete( xIdleTaskHandles[ xCoreID ] );
   3809	        }
   3810
   3811	        /* Idle task is responsible for reclaiming the resources of the tasks in
(lldb)

@johnboiles
Copy link
Contributor Author

johnboiles commented Jan 29, 2025

Also, I don't know if it's related, but I've never been able to start/stop the FreeRTOS POSIX simulator multiple times in the same process. If I wrap the contents of my above main() in a for loop, it never makes it past the second call to vTaskDelay and adding a breakpoint to vPortSystemTickHandler doesn't ever fire meaning tick handling is stopped. LMK if you think that's related and I'm happy to provide more details. Would be nice for unit tests to be able to start/stop FreeRTOS in the same process!

@johnboiles
Copy link
Contributor Author

johnboiles commented Jan 29, 2025

Ah I'm on macOS so I don't get the benefits of #1233 :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants