-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
back_spawn tests failing with SEGFAULT #18
Comments
Thank you very much for taking the time for writing this. I tested NECI, commit 558e88c, by compiling it with GCC 12.3.0, OpenMPI 4.2.5, on both Intel and AMD hardwares. I could not reproduce the issue. I compile the program by using: Could please tell me the following info?
|
@demetriovilardi Thank you for looking into this. We saw the segfaults happen on both RHEL 8.8 and 9.4 systems, across a range of CPU types (different generations of Intel & AMD). We build NECI via EasyBuild, which uses the following
In addition, we set |
Just a quick comment on this one.¹ NECI anyway uses @demetriovilardi it seems like a good idea to also add ¹ I changed groups and am not anymore one of the maintainers. |
This has grown historically I guess: |
After analysing the test with valgrind, we noticed an issue about a missing initialisation of a variable inside the test |
@demetriovilardi Can you point us to the details of the fix, so we can apply a patch for this and have a test suite that passes in full? |
Essentially, inside
one line after allocation in both cases. After this change I get no further valgrind warnings or errors. |
@PetrKralCZ Can you look into making a patch file for NECI based on this info, and see if that fixes the segfaults we were seeing in the test suite? |
When building the latest version of NECI (commit 558e88c) with GCC 12.3.0, we're consistently seeing segfaults in the tests, even when a lot of memory is available on the system (~185GB, with 36 cores).
More details on
test_neci_back_spawn_excit_gen
below incl. backtrace, the actual full error is "Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
" (other tests fail in a very similar way, and always inback_spawn.F90:593
):Are there any known problems when NECI is built with a recent compiler?
Any suggestions on how to get to the bottom of this, could it be a bug in
back_spawn.F90
?Although we've seen very similar problems before with an older version of NECI and GCC 11.3.0 (see easybuilders/easybuild-easyconfigs#17164), we didn't observe these problems when using GCC 12.2.0 to build NECI commit 558e88c, which seems strange to me...
The text was updated successfully, but these errors were encountered: