Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in tracemalloc causes segfaults #128679

Open
itamarst opened this issue Jan 9, 2025 · 4 comments
Open

Race condition in tracemalloc causes segfaults #128679

itamarst opened this issue Jan 9, 2025 · 4 comments
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@itamarst
Copy link

itamarst commented Jan 9, 2025

Crash report

What happened?

This is a bit of a tricky situation, but it is real and impacting my ability to use tracemalloc. As background, I've added code to Polars to make it record all of its allocations in tracemalloc, and this is enabled in debug builds. This then allows writing unit tests that check memory usage, which is very useful in ensuring high memory usage is fixed, and making sure it doesn't get high again.

Unfortunately, I'm hitting a situation where tracemalloc causes segfaults in multi-threaded situations. I believe that this is a race condition between PyTraceMalloc_Track() in a new non-Python thread that does not hold the GIL, and tracemalloc.stop() being called in another thread. My hypothesis in detail:

  • Thread 1 does tracemalloc.start().
  • Thread 2 does not hold the GIL.
  • Thread 2 calls PyTraceMalloc_Track().
  • Thread 2 checks if tracemalloc is enabled, the answer is yes.
  • Thread 2 starts acquiring the GIL (see PyTraceMalloc_Track implementation). As part of acquiring the GIL memory needs to be allocated.
  • Thread 2 therefore once again checks if tracemalloc is enabled (I haven't actually read source code enough to verify if this part is true, but the coredump suggests it probably is). The answer is yes.
  • At this point, Thread 1 calls tracemalloc.stop().
  • Thread 2 continues as if tracemalloc enabled, but it is not!
  • BOOM

If this hypothesis is correct, the solution would for GIL acquisition to bypass tracemalloc altogether if it allocates; it's not like it allocates a lot of memory, so not tracking it is fine. This may be difficult in practice, so another approach would involve having an additional lock so there's no race condition around checking if tracemalloc is enabled.

Here is a stack trace from a coredump from the reproducer (see below) that led me to the above hypothesis:

#0  0x00000000005f0482 in traceback_new () at ../Python/tracemalloc.c:371
#1  0x00000000005f0083 in tracemalloc_add_trace (domain=0, ptr=132539872185888, size=336) at ../Python/tracemalloc.c:471
#2  0x00000000006930db in tracemalloc_alloc (elsize=336, nelem=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, use_calloc=<optimized out>) at ../Python/tracemalloc.c:541
#3  tracemalloc_raw_alloc (use_calloc=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, nelem=<optimized out>, elsize=336) at ../Python/tracemalloc.c:715
#4  0x0000000000639a52 in alloc_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>) at ../Python/pystate.c:1452
#5  new_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>, whence=4) at ../Python/pystate.c:1563
#6  0x0000000000480ef0 in PyGILState_Ensure () at ../Python/pystate.c:2766
#7  0x00000000005203fc in PyTraceMalloc_Track (domain=123, ptr=10, size=1) at ../Python/tracemalloc.c:1318
#8  0x0000788b7088774c in tracemalloc_repro::in_thread () at src/lib.rs:23

To run the reproducer you will need to pip install rustimport and have Rust installed. (I tried with Cython, had a hard time, gave up.)

Here's the Python file:

import gc
import time
import tracemalloc

import rustimport.import_hook

# rustimport will automatically compile tracemalloc_repro.rs into an extension:
import tracemalloc_repro

tracemalloc.start()
# This launches ~50 threads that run in parallel to this one:
tracemalloc_repro.run()
# This happens in parallel to the new threads running:
tracemalloc.stop()
gc.collect()

time.sleep(10)

And here is the Rust file, you should call it tracemalloc_repro.rs:

// rustimport
//: [package]
//: name = "tracemalloc_repro"
//: version = "0.1.0"
//: edition = "2021"
//:
//: [lib]
//: name = "tracemalloc_repro"
//: crate-type = ["cdylib"]
//:
//: [dependencies]
//: pyo3 = {version = "0.23", features = ["extension-module"]}
//: libc = "0.2"

use pyo3::prelude::*;
use libc::{c_int, c_uint, size_t, uintptr_t};

extern "C" {
    fn PyTraceMalloc_Track(domain: c_uint, ptr: uintptr_t, size: size_t) -> c_int;
}

fn in_thread() {
    let result = unsafe { PyTraceMalloc_Track(123, 10, 1) };
    println!("Result of tracking: {result}");
}

#[pyfunction]
fn run(py: Python) {
    py.allow_threads(|| {
        // With GIL released, run function in a thread.
        for _ in 0..50 {
            let _ = std::thread::spawn(in_thread);
        }
    });
}

#[pymodule]
fn tracemalloc_repro(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(run, m)?)?;
    Ok(())
}

You can reproduce by calling repro.py. Because this is a race condition, you may need to run it a few times; I had more consistent crashes with Python 3.12, but it does crash on Python 3.13. You may need to tweak the number 50 above to make it happen.

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.1 (main, Dec 4 2024, 08:54:15) [GCC 13.2.0]

Linked PRs

@itamarst itamarst added the type-crash A hard crash of the interpreter, possibly with a core dump label Jan 9, 2025
@tom-pytel
Copy link

Thread 2 therefore once again checks if tracemalloc is enabled (I haven't actually read source code enough to verify if this part is true, but the coredump suggests it probably is). The answer is yes.

PyTraceMalloc_Track() in thread 2 does not in fact currently check tracemalloc enabled again after acquiring GIL, but that is not the problem here. The problem is that tracing is turned off at some point WHILE PyGILState_Ensure() is being processed and after it has decided to call the tracemalloc allocators. Simple quick solution is to add one final check before attempting to ADD_TRACE() and just skip adding the trace as an emergency fallback if tracing has been turned off in this critical interval. This is just the hack starting fix I will put here, a better fix would be to properly delineate and lock this critical interval. Also including a C extension module which seems to reproduce the problem 100% of the time:

mymod.c:

#include <pthread.h>

#include "Python.h"

void *
threadfunc(void *p) {
    PyTraceMalloc_Track(123, 10, 1);

    return NULL;
}

static PyObject *
test(PyObject *self, PyObject *args) {
    for (int i = 0; i < 50; i++) {
        pthread_t p;

        if (pthread_create(&p, NULL, threadfunc, NULL) != 0)
            break;

        pthread_detach(p);
    }

    Py_RETURN_NONE;
}

static PyMethodDef module_methods[] = {
    {"test", test, METH_NOARGS},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef module_definition = {
    PyModuleDef_HEAD_INIT,
    "mymod",
    "test module",
    -1,
    module_methods
};

PyMODINIT_FUNC
PyInit_mymod(void) {
    return PyModule_Create(&module_definition);
}

test_mymod.py:

import gc
import time
import tracemalloc

import mymod

tracemalloc.start()
mymod.test()
tracemalloc.stop()
gc.collect()

print('Waiting...')
time.sleep(10)
print('Done.')

Here are the changes as a starting point for the eventual fix:

$ git diff main
diff --git a/Python/tracemalloc.c b/Python/tracemalloc.c
index f661d69c03..d2e3dfc53b 100644
--- a/Python/tracemalloc.c
+++ b/Python/tracemalloc.c
@@ -538,11 +538,13 @@ tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
         return NULL;
 
     TABLES_LOCK();
-    if (ADD_TRACE(ptr, nelem * elsize) < 0) {
-        /* Failed to allocate a trace for the new memory block */
-        TABLES_UNLOCK();
-        alloc->free(alloc->ctx, ptr);
-        return NULL;
+    if (tracemalloc_config.tracing) {
+        if (ADD_TRACE(ptr, nelem * elsize) < 0) {
+            /* Failed to allocate a trace for the new memory block */
+            TABLES_UNLOCK();
+            alloc->free(alloc->ctx, ptr);
+            return NULL;
+        }
     }
     TABLES_UNLOCK();
     return ptr;
@@ -963,8 +965,11 @@ _PyTraceMalloc_Stop(void)
     if (!tracemalloc_config.tracing)
         return;
 
-    /* stop tracing Python memory allocations */
+    /* stop tracing Python memory allocations,
+       but not while something might be in the middle of an operation */
+    TABLES_LOCK();
     tracemalloc_config.tracing = 0;
+    TABLES_UNLOCK();
 
     /* unregister the hook on memory allocators */
 #ifdef TRACE_RAW_MALLOC
@@ -1317,6 +1322,12 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
 
     gil_state = PyGILState_Ensure();
 
+    if (!tracemalloc_config.tracing) {
+        /* tracing may have been turned off as we were acquiring the GIL */
+        PyGILState_Release(gil_state);
+        return -2;
+    }
+
     TABLES_LOCK();
     res = tracemalloc_add_trace(domain, ptr, size);
     TABLES_UNLOCK();

As stated, the C extension module reproduces the problem 100% of the time and this fix appears to fix it 100% of the time but the person in charge of tracemalloc should really have a look at this.

@itamarst
Copy link
Author

itamarst commented Jan 9, 2025

Thank you for the quick response and consistent reproducer!

@tomasr8
Copy link
Member

tomasr8 commented Jan 9, 2025

@tom-pytel I can confirm that your patch appears to fix the problem. Using the tables_lock to protect tracemalloc_config.tracing seems reasonable to me given that we cannot rely on the GIL yet, though I'm not an expert in this area..
Would you like to open a PR with the proposed changes?

@tom-pytel
Copy link

Would you like to open a PR with the proposed changes?

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

3 participants