Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation faults with Structural Plasticity in NEST v3.6 onwards #3394

Open
neuroady opened this issue Jan 15, 2025 · 0 comments
Open

Segmentation faults with Structural Plasticity in NEST v3.6 onwards #3394

neuroady opened this issue Jan 15, 2025 · 0 comments

Comments

@neuroady
Copy link

This issue has been opened in reference to a mailing list post. Details about the original post can be found here

Using structural plasticity (SP) with MPI-based simulations leads to spontaneous crashes in NESTv3.6 onward

To Reproduce
Steps to reproduce the behavior:

  1. Create an MPI-based script that demonstrates structural plasticity.
  2. Run the script with 32 or more MPI processes
    • fewer MPI processes can also generate a segmentation fault

Expected behavior
The simulation will crash hinting that a segmentation fault has occurred.

  • The strerr dump from minimal.py on NESTv3.6 with 32 MPI processes is shown below:

[jsfc114:24182:0:24182] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7473)
==== backtrace (tid:  24182) ====
0 0x000000000003e6f0 __GI___sigaction()  :0
1 0x0000000000655387 nest::Connector<nest::static_synapse<nest::TargetIdentifierPtrRport> >::send()  ???:0
2 0x000000000043cfd6 nest::EventDeliveryManager::deliver_events_<nest::SpikeData>()  event_delivery_manager.cpp:0
3 0x000000000043f29f nest::EventDeliveryManager::deliver_events()  ???:0
4 0x000000000040abaa nest::SimulationManager::update_()  simulation_manager.cpp:0
5 0x00000000000156e6 GOMP_parallel()  /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:178
6 0x00000000000156e6 GOMP_parallel_end()  /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:140
7 0x00000000000156e6 GOMP_parallel()  /dev/shm/swmanage/jusuf/GCCcore/12.3.0/system-system/gcc-12.3.0/stage3_obj/x86_64-pc-linux-gnu/libgomp/../../../libgomp/parallel.c:179
8 0x000000000040c067 nest::SimulationManager::update_()  ???:0
9 0x000000000040c96c nest::SimulationManager::call_update_()  ???:0
10 0x0000000000411129 nest::SimulationManager::run()  ???:0
11 0x00000000003f5d7d nest::run()  ???:0
12 0x00000000003f5e51 nest::simulate()  ???:0
13 0x00000000003b1836 nest::NestModule::SimulateFunction::execute()  ???:0
14 0x00000000000bac21 SLIInterpreter::execute_()  interpret.cc:0
15 0x0000000000030d04 __pyx_pw_12pynestkernel_10NESTEngine_9run()  pynestkernel.cxx:0
16 0x00000000001d5e9c _PyEval_EvalFrameDefault()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5225
17 0x00000000001d5e9c _PyEval_EvalFrameDefault()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5226
18 0x00000000001ce50a _PyEval_EvalFrame()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/./Include/internal/pycore_ceval.h:73
19 0x00000000001ce50a _PyEval_Vector()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:6443
20 0x00000000001d6c3a _PyEval_EvalFrameDefault()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:5380
21 0x00000000001ce50a _PyEval_EvalFrame()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/./Include/internal/pycore_ceval.h:73
22 0x00000000001ce50a _PyEval_Vector()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:6443
23 0x00000000002562e1 PyEval_EvalCode()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/ceval.c:1154
24 0x0000000000273443 run_eval_code_obj()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1714
25 0x000000000026fbaa run_mod()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1735
26 0x00000000002851e1 pyrun_file()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:1630
27 0x0000000000284054 _PyRun_SimpleFileObject()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:440
28 0x0000000000283c24 _PyRun_AnyFileObject()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Python/pythonrun.c:79
29 0x000000000027df4c pymain_run_file_obj()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:360
30 0x000000000027df4c pymain_run_file()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:379
31 0x000000000027df4c pymain_run_python()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:601
32 0x000000000027df4c Py_RunMain()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:680
33 0x0000000000246c67 Py_BytesMain()  /dev/shm/swmanage/jusuf/Python/3.11.3/GCCcore-12.3.0/Python-3.11.3/Modules/main.c:734
34 0x0000000000029590 __libc_start_call_main()  ???:0
35 0x0000000000029640 __libc_start_main_alias_2()  :0
36 0x000000000040106e _start()  ???:0
=================================
<PSP:r0000028:Backtrace after SIGSEGV (Invalid memory reference):>
<PSP:r0000028:# 0: /p/software/jusuf/stages/2024/software/pscom/5-default-GCCcore-12.3.0/lib/libpscom.so.2(+0xb4e4) [0x1529ccad14e4]>
<PSP:r0000028:# 1: /usr/lib64/libc.so.6(+0x3e6f0) [0x152a4963e6f0]>
<PSP:r0000028:# 2: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest9ConnectorINS_14static_synapseINS_24TargetIdentifierPtrRportEEEE4sendEmmRKSt6vectorIPNS_14ConnectorModelESaIS7_EERNS_5EventE+0x87) [0x152a3bc69387]>
<PSP:r0000028:# 3: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(+0x43cfd6) [0x152a3ba50fd6]>
<PSP:r0000028:# 4: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest20EventDeliveryManager14deliver_eventsEm+0x6f) [0x152a3ba5329f]>
<PSP:r0000028:# 5: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(+0x40abaa) [0x152a3ba1ebaa]>
<PSP:r0000028:# 6: /p/software/jusuf/stages/2024/software/GCCcore/12.3.0/lib64/libgomp.so.1(GOMP_parallel+0x46) [0x152a406b06e6]>
<PSP:r0000028:# 7: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager7update_Ev+0x197) [0x152a3ba20067]>
<PSP:r0000028:# 8: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager12call_update_Ev+0x5dc) [0x152a3ba2096c]>
<PSP:r0000028:# 9: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest17SimulationManager3runERKNS_4TimeE+0x339) [0x152a3ba25129]>
<PSP:r0000028:#10: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest3runERKd+0x9d) [0x152a3ba09d7d]>
<PSP:r0000028:#11: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZN4nest8simulateERKd+0x11) [0x152a3ba09e51]>
<PSP:r0000028:#12: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libnest.so.3(_ZNK4nest10NestModule16SimulateFunction7executeEP14SLIInterpreter+0x36) [0x152a3b9c5836]>
<PSP:r0000028:#13: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/../../../nest/libsli.so.3(_ZN14SLIInterpreter8execute_Em+0x201) [0x152a3b041c21]>
<PSP:r0000028:#14: /p/software/jusuf/stages/2024/software/nest-simulator/3.6-gpsmpi-2023a/lib/python3.11/site-packages/nest/pynestkernel.so(+0x30d04) [0x152a3c1cfd04]>
<PSP:r0000028:#15: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x41bc) [0x152a49c2ce9c]>
<PSP:r0000028:#16: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(+0x1ce50a) [0x152a49c2550a]>
<PSP:r0000028:#17: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x4f5a) [0x152a49c2dc3a]>
<PSP:r0000028:#18: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(+0x1ce50a) [0x152a49c2550a]>
<PSP:r0000028:#19: /p/software/jusuf/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/libpython3.11.so.1.0(PyEval_EvalCode+0xa1) [0x152a49cad2e1]>
readFromPMIClient: lost connection to the PMI client
kvsprovider[23316]: releaseMySelf: wrong message type 3 (PSP_CD_CLIENTREFUSED)
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
readFromPMIClient: lost connection to the PMI client
srun: error: jsfc114: tasks 0-27,29-31: Terminated
srun: error: jsfc114: task 28: Exited with exit code 1
srun: Force Terminated StepId=659919.0

Desktop/Environment (please complete the following information):

  • OS: Linux 5.4.0-204-generic x86_64; HPCs (NEMO, JUSUF)
  • Shell: bash
  • Python-Version: Python 3.8.10, Python 3.9.7 :: Intel Corporation, Python 3.12.3
  • NEST-Version: v3.6, v3.8
  • Installation: using cmake with MPI support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant