-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATES Sat phenology test failing restart starting in ctsm5.1.dev174 #2478
Comments
Note this was passing with I think the culprit was the change to the |
Well -- the change from a single node to four nodes does seem to be a problem. However, we expect that NO change in PE layout should break restarts. So the fact that it was working with a a single node isn't the real issue. And what we want to fix is to get it to work for any PE layout. |
I've been able to narrow down the fates tag that this came in with by checking out a API33, updating the
|
This might have been fixed via: NGEET/fates#1189 ? |
Brief summary of bug
The test on Derecho
ERS_Ld30.f45_f45_mg37.I2000Clm51FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhen
fails the restart comparison.
General bug information
CTSM version you are using: Seen in ctsm5.1.dev175, ctsm5.1.dev176 and in what will be ctsm5.2.0
Does this bug cause significantly incorrect results in the model's science? Maybe? Restart issues can be a problem in the science...
Configurations affected: FATES SP mode.
Details of bug
@glemieux found it works in ctsm5.1.dev174 but fails in later versions.
Important details of your setup / configuration so we can reproduce the bug
Interestingly this test
ERP_P128x2_Ld30.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhen
has been passing. I've verified that there are only two differences:
ERS vs ERP
PE layout of 128x2 vs the default
The test difference just means the restart portion is on half the number of processors. So running 64 MPI tasks with one thread.
The PE layout difference is sequential with 2 nodes, threaded verses the default which is a concurrent one with a node for datm, and 4 nodes for CTSM, with no threads.
Important output or errors that show the problem
The cprnc.out files in TestStatus.log show no differences in the cpl files. But, four fields differ in the CTSM history files as follows:
The text was updated successfully, but these errors were encountered: