-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in tracemalloc causes segfaults #128679
Comments
#include <pthread.h>
#include "Python.h"
void *
threadfunc(void *p) {
PyTraceMalloc_Track(123, 10, 1);
return NULL;
}
static PyObject *
test(PyObject *self, PyObject *args) {
for (int i = 0; i < 50; i++) {
pthread_t p;
if (pthread_create(&p, NULL, threadfunc, NULL) != 0)
break;
pthread_detach(p);
}
Py_RETURN_NONE;
}
static PyMethodDef module_methods[] = {
{"test", test, METH_NOARGS},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef module_definition = {
PyModuleDef_HEAD_INIT,
"mymod",
"test module",
-1,
module_methods
};
PyMODINIT_FUNC
PyInit_mymod(void) {
return PyModule_Create(&module_definition);
}
import gc
import time
import tracemalloc
import mymod
tracemalloc.start()
mymod.test()
tracemalloc.stop()
gc.collect()
print('Waiting...')
time.sleep(10)
print('Done.') Here are the changes as a starting point for the eventual fix: $ git diff main
diff --git a/Python/tracemalloc.c b/Python/tracemalloc.c
index f661d69c03..d2e3dfc53b 100644
--- a/Python/tracemalloc.c
+++ b/Python/tracemalloc.c
@@ -538,11 +538,13 @@ tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
return NULL;
TABLES_LOCK();
- if (ADD_TRACE(ptr, nelem * elsize) < 0) {
- /* Failed to allocate a trace for the new memory block */
- TABLES_UNLOCK();
- alloc->free(alloc->ctx, ptr);
- return NULL;
+ if (tracemalloc_config.tracing) {
+ if (ADD_TRACE(ptr, nelem * elsize) < 0) {
+ /* Failed to allocate a trace for the new memory block */
+ TABLES_UNLOCK();
+ alloc->free(alloc->ctx, ptr);
+ return NULL;
+ }
}
TABLES_UNLOCK();
return ptr;
@@ -963,8 +965,11 @@ _PyTraceMalloc_Stop(void)
if (!tracemalloc_config.tracing)
return;
- /* stop tracing Python memory allocations */
+ /* stop tracing Python memory allocations,
+ but not while something might be in the middle of an operation */
+ TABLES_LOCK();
tracemalloc_config.tracing = 0;
+ TABLES_UNLOCK();
/* unregister the hook on memory allocators */
#ifdef TRACE_RAW_MALLOC
@@ -1317,6 +1322,12 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
gil_state = PyGILState_Ensure();
+ if (!tracemalloc_config.tracing) {
+ /* tracing may have been turned off as we were acquiring the GIL */
+ PyGILState_Release(gil_state);
+ return -2;
+ }
+
TABLES_LOCK();
res = tracemalloc_add_trace(domain, ptr, size);
TABLES_UNLOCK(); As stated, the C extension module reproduces the problem 100% of the time and this fix appears to fix it 100% of the time but the person in charge of tracemalloc should really have a look at this. |
Thank you for the quick response and consistent reproducer! |
@tom-pytel I can confirm that your patch appears to fix the problem. Using the |
Done. |
Crash report
What happened?
This is a bit of a tricky situation, but it is real and impacting my ability to use tracemalloc. As background, I've added code to Polars to make it record all of its allocations in tracemalloc, and this is enabled in debug builds. This then allows writing unit tests that check memory usage, which is very useful in ensuring high memory usage is fixed, and making sure it doesn't get high again.
Unfortunately, I'm hitting a situation where tracemalloc causes segfaults in multi-threaded situations. I believe that this is a race condition between
PyTraceMalloc_Track()
in a new non-Python thread that does not hold the GIL, andtracemalloc.stop()
being called in another thread. My hypothesis in detail:tracemalloc.start()
.tracemalloc.stop()
.If this hypothesis is correct, the solution would for GIL acquisition to bypass tracemalloc altogether if it allocates; it's not like it allocates a lot of memory, so not tracking it is fine. This may be difficult in practice, so another approach would involve having an additional lock so there's no race condition around checking if tracemalloc is enabled.
Here is a stack trace from a coredump from the reproducer (see below) that led me to the above hypothesis:
To run the reproducer you will need to
pip install rustimport
and have Rust installed. (I tried with Cython, had a hard time, gave up.)Here's the Python file:
And here is the Rust file, you should call it
tracemalloc_repro.rs
:You can reproduce by calling
repro.py
. Because this is a race condition, you may need to run it a few times; I had more consistent crashes with Python 3.12, but it does crash on Python 3.13. You may need to tweak the number 50 above to make it happen.CPython versions tested on:
3.12, 3.13
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.13.1 (main, Dec 4 2024, 08:54:15) [GCC 13.2.0]
Linked PRs
The text was updated successfully, but these errors were encountered: