-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running NECI failes #13
Comments
Unfortunately I am having a hard time recreating your problem. I suppose you are running the test in You could try something like the following (start in root dir): mkdir build && cd build # or wherever you prefer to build
cmake -DENABLE_HDF5=OFF -DCMAKE_BUILD_TYPE=Debug -DCMAKE_Fortran_COMPILER=mpifort -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx ..
cmake --build . -j -v --
ctest -j # you may wish to end this early if they are all passing so far; it is rather slow with Debug mode
cd ../test_suite/neci/parallel/N_FCIMCPar/
../../../../build/bin/neci neci.inp At which step (if any) do you get an error, and what is it? |
Currently, I have a bunch of compilation issues due to several routines incorrectly declared as pure (1. starting from src/lib/error_handling_neci.F90, line 8, the routine implementing the interface is definitely NOT pure which is not allowed, 2. GCC apparently has a false-positive in src/matmul.F90, line 24/29 assuming that the variable I may be 1 in the second branch) |
I finished the tests. A few executables are not created (including neci). The test summary is Total Test time (real) = 5.06 sec The following tests FAILED: |
Sorry, could you please clarify how you were able to run it before? I was under the impression from your first post that it compiles but crashes at the start of the run (after some print out). I suppose you must have compiled differently; do you have those commands and/or your CMakeCache? The stop_all routine is written that way with an interface specifically as a trick that allows us to use it as a pure function. That shouldn't be the problem. What specifically is the compilation error you are getting? |
I did more tests. Apparently, CMake sometimes picked up the wrong compilers. In a clean setup, I started with the Intel compilers (Intel oneapi, Intel HPC kit, version 2021.10.0) which compiles perfectly and most tests pass. Then, I compiled with GCC 13.2+OpenMPI 4.1.5 and the code does not compile because of
EDIT: It is sad if only the intel compiler works as there is enough non-Intel hardware outside (clusters, notebooks, desktop machines, 8 out of the top 10 of the TOP500 list, 7 out of the top 10 of the GREEN500 list) on which the Intel compiler is not expected to provide efficient code or not even available. What compiler do you commonly use to build the code? |
Dear @fstein93 , Thank you for all these comments and sorry for our late answer.
Fully agree, we also use AMD hardware ourselves and strive to support as many compilers as possible. Regarding your second comment. |
I agree, GCC7 is not widely available (or even supported) anymore on most hardware (my OS starts with GCC10). I still double-checked the standard. You are right regarding Unrelated to that, you might also consider to switch to the modern |
Another aspect you might consider is OpenMP parallelization. If I understand it correctly, your code is only MPI parallelized. As such, you have to replicate ERIs between all ranks which becomes your memory bottleneck. With a hybrid MPI/OpenMP approach, you may reduce the memory requirements by using (ideally) only a single (a few) rank(s) per node and allowing the OpenMP threads to share the ERIs (they do not change anyways). |
Can you compile with gcc10?
That is already done, in our private repo. We will soon update this public one. Regarding OpenMP: |
ERI=Electron Repulsion Integral GCC10 was not possible for me with similar issues than with GCC13. |
Ah good to know. As I wrote above they reside already in node-shared memory. Regarding GCC10, I leave the issue open and we aim for updating our compilers better sooner than later. |
I have just given the recompilation with GCC another try. I have figured out that the code does indeed compile with GCC 13.1 in release mode. As soon as I compile NECI in debug mode, the compilation fails with the error(s) described above. In release mode, the test suite fails with the same unspecific error (segmentation fault without any reasonable backtrace). Attaching NECI to the GNU debugger reveals the line (/home/fstein/NECI/NECI_STABLE/build/fypp/libneci/excitation_types.F90, line 1496, I am aware that this file is produced by FYPP). I have observed the same issue within my own projects and I know that this is a bug in Gfortran. This bug is related to the assignment of a polymorphic variable. This is fixed by using source allocation (compare here). The same issue appears in /home/fstein/NECI/NECI_STABLE/build/fypp/libkmneci/excitation_types.F90, line 1536. |
Ok this is good news. set( ${PROJECT_NAME}_Fortran_WARN_ERROR_FLAG "") instead of set( ${PROJECT_NAME}_Fortran_WARN_ERROR_FLAG "-Werror") in Oh yes I found a similar bug in gfortran here. PS: If you need access because you would like to use NECI for actual production runs I can ask if we can give you access to the private repo. It is just that we don't want unpublished implementations of new algorithms out in the wild. |
Dear developers,
I am currently trying to compile and run NECI on my notebook. I am able to compile the code, but starting any kind of calculation with NECI, the trial fails with a not useful error message.
Error message
Backtrace for this error:
#0 0x7f07aecf151f in ???
at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#1 0x0 in ???
The last few lines of the output file of the directory NECI_STABLE/test_suite/neci/parallel/N_FCIMCPar
Setting integer bit-length of determinants as bit-strings to: 64
SYMMETRY MULTIPLICATION TABLE
No Symmetry table found.
21 Symmetry PAIRS
8 DISTINCT ORBITAL PAIR PRODUCT SYMS
Symmetry and spin of orbitals correctly set up for excitation generators.
Simply transferring this into a spin orbital representation.
Not storing the H matrix.
My setup:
Ubuntu 22 hosted on Windows Subsystem
Compilers: GCC 10 (the oldest compiler still available on the system) and GCC13
MPI: OpenMPI 4.1.5
AMD Ryzen5 5600H
16 GB RAM
I tried to compile with and without HDF5 1.12.2.
I tried to compile the Code with the standard optimization level (-O3) and the debug one (-Og), always with the same result.
I tried a serial run and a parallel run.
Do you have any advice on how to compile and run the code? What are the memory requirements of the code apart from the replicated ERIs?
The text was updated successfully, but these errors were encountered: