-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/ISSUE] Crash in OpenMP loop in hcox_lightnox_mod #75
Comments
Thanks Seb, this is good info to have. Perhaps the thing to do in the interim is disable OMP in HEMCOBuildProperties, and GEOSChemBuildProperties, but leave it enabled in MAPL since IIRC that's where the segfault with OMP=OFF was happening. Does that seem reasonable to you? If so I will check that it doesn't break GCHP+Intel runs on Compute1. Would you be able to try the update on Pleiades? Note: The recommended way to turn OMP off is running |
Happy to give that a shot on Pleiades! Will update ASAP. |
For what it's worth, I previously found an issue in the lightning NOx HEMCO extension file when compiling with Intel debug flag to check pointers. I narrowed it down to an OpenMP issue. Not sure if it's related to what you are seeing @sdeastham, but it may indeed point to the culprit. See HEMCO issue geoschem/HEMCO#50. |
I pushed b9a70c4 on a new branch Could you |
It looks like Intel MPI isn't working on Compute1 today, so I'll have to follow up on my test tomorrow. |
I can confirm that, with the given cherry-pick, the code compiles and runs as-is (defaulting to OMP_HEMCO=OFF) and fails in lightnox if OMP_HEMCO=ON. |
It must be something in MAPL then that's causing the crash. Thanks for checking. Here's the issue that Lizzie found: geoschem/GCST-internal#5. I think we need to dig a bit deeper since Intel+Debug mode+OMP=ON on Pleiades crashes while Intel+Release mode+OMP=OFF crashes on Cannon. I will check if
@sdeastham Have you tried the default build in |
A quick 1-hour test of those two setups with ifort 18 and OpenMPI on Cannon yields no crash for GCHP compiled with |
Thanks for checking that @WilliamDowns. I tried both cases on Compute1, and both were executed successfully (with Intel 2020.0 compilers). Since @sdeastham (Pleiades), @WilliamDowns (Cannon), and my (Compute1) tests of GCHP built with Intel compilers and |
Sounds reasonable to me. My understanding is using OpenMP in GCHP should not improve performance and the only reason it was on was to avoid an error associated with GMAO libraries that I previously ran into. If that is no longer an issue, and my performance assumption is correct, then it makes sense to have OpenMP off by default. |
Any concerns to me making |
Works for me! |
Is this issue all set to close? I still have the HEMCO issue open where HEMCO crash in the lightning NOx extension when using Intel compilers and -check-pointers debug flag. But if that was the cause of this issue then it is not longer relevant to GCHP. |
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue. |
Description of the problem
When building GCHP 13.0.0 with Intel 18.3 and the HPE MPT (an MPI implementation) on NASA's Pleiades cluster, I found that the code would fail (with no stack trace) immediately after the HEMCO (VOLCANO) printout which reports read-in of the volcano emissions file. Based on some hand-debugging, the error appears to originate within an OpenMP loop in
hcox_lightnox_mod.F90
. This bug is resolved by either removing the relevant OpenMP directives or modifying the rootCMakeLists.txt
file so thatOMP=OFF
.There are already several reasons to want to disable OpenMP when building GCHP (see e.g. geoschem/HEMCO#57), but @lizziel noted in commit 7bf5fdd that compiling GCHP with Intel compilers resulted in a segfault if OpenMP was disabled. Since that does not seem to be the case on Pleiades, and OpenMP is continuing to cause issues, I'd like to suggest re-opening the investigation of if and why OpenMP is needed for Intel compilers.
NB: The lack of a stack trace appears to be a semi-reliable indication that the issue is originating in an OpenMP loop.
GEOS-Chem version
GEOS-Chem v13.0.0
Description of code modifications
No modifications were necessary to produce this bug. The code was compiled with the CMake directive
-DCMAKE_BUILD_TYPE=Debug
.Software versions
The text was updated successfully, but these errors were encountered: