-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure running develop code on hercules #820
Comments
@jack-woollen , thank you for reporting this error. The following test was conducted on Hercules
GSI Perhaps something didn't get merged correctly when you brought |
@RussTreadon-NOAA Thanks for the info. For the next step I cloned and built develop directly and reran and got the same outcome. The output is in /work2/noaa/da/jwoollen/RAEXPS/2023exp2/2023060106/logs/run_gsiobserver.out. forrtl: severe (174): SIGSEGV, segmentation fault occurred |
@jack-woollen , since ctests work and no one else has reported similar problems on Hercules, we will need to debug this case. Where is the script used to run this case? |
@RussTreadon-NOAA This is the structure @jswhit made to run reanalysis scout runs since the workflow for RA isn't worked out yet. I also made some changes for my testing. It was working well until the merge with current develop. The directory with scripts is /work2/noaa/da/jwoollen/RAEXPS/scripts/2023exp2. There is a change in the merged version of netcdfgfs_io.f90 defining num_fields. Can this be a problem? |
@jack-woollen , I agree. This looks like a configuration error. One caution: We no longer have GSI code support given the transition to JEDI. The only global GSI development these days is what is required for GFS v16.4 and v17. We stop using GSI for global atmospheric DA with GFS v18. The longer you use GSI, the more likely you will encounter issues. A ncdump of
In contrast the global_4denvar ctest sigf06 contains
Note that the fields you identified are named differently between the two sets of backgrounds. Your sigf06 has Not sure if we need to change the variable names in the background fields, entries in anavinfo, and/or the setting of gsi namelist variable(s). Seems like we need to do some old fashioned debugging by adding print statements to your older pre-merge code that works and the updated post-merge code that does not work. |
@jack-woollen , I modified a stand-alone rungsi script to use what I think is input for your 20230601 06Z case. The script ran develop Here are key files and directories
I see that my |
@RussTreadon-NOAA Thanks for your time on this. Given what you found, I think I can track down the seg fault and move on. |
@RussTreadon-NOAA After some more debugging it turns out setting imp_physics=11 (GFDL) instead of imp_physics=8 (Thompson) will allow the merged code and scripts to run as is. Some relatively recent changes to general_read_fv3atm added an if block for imp_physics=8 which tried to read netcdf variable name(s) not defined in the scout run forecast records. Not sure what the implication of switching imp_physics from 8 to 11 is. Maybe @jswhit2 has some insight about this setting in the gsi for reanalysis use. A note about the newly merged code is it has a refactored version of read_satwnd which enables the use of all platforms back to 1979, speeds execution up a fair amount compared to the current develop, and it gives identical satwnd counts in gsi observer testing with 2023 data. |
Good to hear @jack-woollen that you got The satwind speed up you mention sounds interesting. Do you have your changes in a branch? Processing the satwnd file takes a long time. @BrettHoover-NOAA was testing the splitting up of satwnd by subset and processing the subset files in parallel. |
@RussTreadon-NOAA Thanks. Its good to get it right. The merged fork is found at https://github.com/jack-woollen/GSI. @BrettHoover-NOAA The refactored read_satwnd is about 15% faster. Parallel reads could speed it up. Maybe something like read_bufrtovs would work. |
I think this patch to general_read_gfsatm.f90 might work (it tells the code to try using the old variable names if it can't read the new ones)
|
apparently there is a problem with module_ncio error handling in MPI codes, and this patch should work around that
|
@jswhit2 Yup, that work around does work around. @RussTreadon-NOAA @BrettHoover-NOAA I could use this issue to make a new pull request. Comments before I do? |
@jack-woollen and @jswhit2 : Thank you for developing and testing a work around change for Some comments and questions:
|
@RussTreadon-NOAA 3. I'm happy to PR just the read_satwnd code. Then we can think more about 1. and 2. |
@DavidHuber-NOAA @RussTreadon-NOAA I'm testing my current reanalysis fork on hercules and it runs okay. When I updated the fork from gsi develop it fails in what looks like netcdf reading. I think there are some other active issues like this. Is there a status on this problem? Thanks!
libc.so.6 00001530CB738D90 Unknown Unknown Unknown
gsi.x 0000000001D92F13 module_ncio_mp_re 50 read_vardata_code_3d.f90
gsi.x 000000000115E783 general_read_gfsa 3603 general_read_gfsatm.f90
gsi.x 00000000013730B4 netcdfgfs_io_mp_r 225 netcdfgfs_io.f90
gsi.x 000000000092FFE4 read_guess_ 201 read_guess.F90
gsi.x 000000000085D849 observermod_mp_in 165 observer.F90
The text was updated successfully, but these errors were encountered: