Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ultra low resolution configuration for testing #2508

Open
danholdaway opened this issue Nov 21, 2024 · 78 comments · May be fixed by #2564
Open

Ultra low resolution configuration for testing #2508

danholdaway opened this issue Nov 21, 2024 · 78 comments · May be fixed by #2564
Assignees
Labels
enhancement New feature or request

Comments

@danholdaway
Copy link

Description

In order to test changes to the global-workflow system we have to run the cycled system. Currently this is done with a C48 atmosphere and 127 level / 5 degree ocean model configuration. However this is not fast enough for performing the testing and is limiting the number of pull requests that can be merged in global-workflow. Additionally the ensemble and deterministic forecasts are both run using C48, meaning we are not replicating the dual resolution setup that is used in production.

Note that the configuration does not have to produce anything scientifically sound, just be able to reliably run in order to test the connection between tasks of the workflow.

Solution

  • Develop a C12 and 32 level atmosphere with 10 degree ocean configuration for the ensemble.
  • Develop a C18 and 32 level atmosphere with 5 degree ocean configuration for the deterministic. If C18 is not a permissible configuration in FV3 then C24.

Note that the ocean can already by run at 5 degrees.

The data assimilation group at EMC will work on the enabling these configurations with GSI and JEDI.

@danholdaway danholdaway added the enhancement New feature or request label Nov 21, 2024
@JessicaMeixner-NOAA
Copy link
Collaborator

@danholdaway - Please let me know if ultra-low wave configurations are also required and I'm happy to help provide those parts. (Also fine to turn off waves for various tests too).

@danholdaway
Copy link
Author

Thank you @JessicaMeixner-NOAA I think it would be useful to include waves in this effort. We aren't currently testing with waves turned on but we should start to include that.

@junwang-noaa
Copy link
Collaborator

@danholdaway Our team will look into atm/ocn/ice components for the test you requested. May I ask some background questions:

  1. For atmosphere, is the choice of C12/C18/C24 (800km/500km/400km resolution) coming from the running speed only or are there some other reasons? Do you have test cases with 32 vertical levels for us as a configuration reference? I am thinking that we may have to use a different atm physics package at this coarse resolution from those used for GFS/GEFS/SFS, and the dycore needs to be hydrostatic, which is different from GFS/GEFS.
  2. For ocean, do you know if there is 10 degree super grid available? I assume you are OK with 5 degree ocean/ice if we can't find a 10 degree super grid
  3. How long do you expect these tests to run? I assume these tests only need to run 15hrs with IAU. There might be some coupling configuration change if longer running time is expected. Thanks

@danholdaway
Copy link
Author

Thanks for looking into this @junwang-noaa, we very much appreciate the help.

  1. I had a quick play around with JEDI and found that I couldn't create a C18 geometry with FV3 so that might be out of the question. The choice of C12/C24 is driven by running speed. For most of the PRs we are creating we don't want to test what the model does but rather test the connections between the tasks of the workflow. We need the model to run from one cycle to the next and output files for the DA to pick up. If we change the model configuration, e.g. physics or non-hydro -> hydro it should be fine though we might need to be aware that we might be reading less from the files, e.g. no delz to pick up. We don't have any configuration for 32 levels yet. We thought about starting with the 127 level file and taking ~every 4th level. Do you think that would work?

  2. @guillaumevernieres do you know if there exist any 10degree grids for MOM6? I think you mentioned you'd testing something at that resolution in SOCA.

  3. We don't need long forecasts with the coupled model. Just long enough to cycle. I think that's just 12h with IAU.

@guillaumevernieres
Copy link

@danholdaway , we do have a 10deg grid for mom6. Something would need to be done for cice6 as well.

@DeniseWorthen
Copy link
Collaborator

We create the CICE6 fix files from the MOM6 supergrid and mask. We'd need a MOM_input consistent w/ a 10deg MOM6 physics.

@junwang-noaa
Copy link
Collaborator

@danholdaway Thanks for the information.
@guillaumevernieres Would you please share the 10deg mom6 super grid and run configuration with us?

@guillaumevernieres
Copy link

@danholdaway Thanks for the information. @guillaumevernieres Would you please share the 10deg mom6 super grid and run configuration with us?

Let me see what we have. We did that work ages ago and I don't remember if we went beyond being able to use it within soca/jedi.

@guillaumevernieres
Copy link

@DeniseWorthen , @junwang-noaa , here's link to the jcsda soca repo with hopefully all the needed bits and pieces:
https://github.com/JCSDA/soca/tree/develop/test/Data/36x17x25

@DeniseWorthen
Copy link
Collaborator

To utilized the cpld_gridgen utility, we need a ocean_hgrid.nc file, a topography file and a mask file for MOM6. The grid_spec file you point to seems to already be at the target resolution (17x36x35).

I see that MOM_input is asking for the super grid file GRID_FILE = "ocean_hgrid_small.nc and topography file ocean_topog_small.nc. Those are the files we'd need. Do you have those somewhere?

@guillaumevernieres
Copy link

To utilized the cpld_gridgen utility, we need a ocean_hgrid.nc file, a topography file and a mask file for MOM6. The grid_spec file you point to seems to already be at the target resolution (17x36x35).

I see that MOM_input is asking for the super grid file GRID_FILE = "ocean_hgrid_small.nc and topography file ocean_topog_small.nc. Those are the files we'd need. Do you have those somewhere?

yes, sorry, I thought these were in one of the sub-dir but apparently not. I'll get back to you when I find these files.

@guillaumevernieres
Copy link

@DeniseWorthen , the files are on MSU:

/work/noaa/da/gvernier/mom6-10geg/36x17x25

do you need the MOM.res.nc restart as well?

@junwang-noaa
Copy link
Collaborator

@yangfanglin do you have a physics package for the ultra low resolutions (~800km/400km) that we can use to set up the C12/C24 with 32 vertical level tests? Thanks

@DeniseWorthen
Copy link
Collaborator

@DeniseWorthen , the files are on MSU:

/work/noaa/da/gvernier/mom6-10geg/36x17x25

do you need the MOM.res.nc restart as well?

I don't need restarts to generate the fix files. Are you setting the land mask = 0 where depth =0 (unless you have a mask file somewhere).

@GeorgeGayno-NOAA
Copy link
Contributor

Note: UFS_UTILS requires some slight updates to create low-resolution grids. This is being worked here:
ufs-community/UFS_UTILS#1000

@DeniseWorthen
Copy link
Collaborator

@guillaumevernieres I'm not making sense of the ocean_hgrid_small file. This should contain 2x the desired resolution, so for a final resolution of 36x17, it should be 72x34. I see 72x70 in the file.

@DeniseWorthen
Copy link
Collaborator

Establishing a 10-deg ocean resolution is going to play havoc w/ the naming convention and have knock-on impacts from fix file generation down through the regression test scripts and inputs etc. This is because currently we use a 3-character string to denote the ocean/ice resolution. For example, mx050 is 1/2 deg, mx100 is one deg etc. Creating a 10-deg resolution will require a 4 character string--so the 1/2 deg will need to be mx0050 and 5-deg will be mx0500 and 10-deg will be mx1000.

I wonder if a 9-deg ocean/ice resolution would be a better idea. That would be a 40x20 grid (vs 10deg = 36x17) but would avoid the issue with file naming and file regeneration, RT modifications etc.

@danholdaway
Copy link
Author

Thanks for reporting this @DeniseWorthen. Probably not worth changing the entire naming convention for this and I think 9 degree would suffice.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Dec 4, 2024

I've been able to create a 9-deg configuration for ocean and produce the associated CICE grid files using the ufs-utils/cpld_gridgen utility. I'll also document this in the ufs-utils PR I'll open. I have not tried to run this yet, but I'll try w/ a DATM-OCN-ICE configuration next.

The ufs-utils repo has some of the require fre-nctools available to build, but not the make_topog tool, which is also required. I found the tools were installed system-wide on Gaea-C5, so I was able to use those to do the following:

  1. make the supergrid for 9deg
/ncrc/home2/fms/local/opt/fre-nctools/2024.04/ncrc5/bin/make_hgrid --grid_type tripolar_grid --nxbnd 2 --nybnd 2 --xbnd -280,80 --ybnd -85,90 --nlon 80 --nlat 40 --grid_name ocean_hgrid --center c_cell
  1. make the ocean mosaic
/ncrc/home2/fms/local/opt/fre-nctools/2024.04/ncrc5/bin/make_solo_mosaic --num_tiles 1 --dir ./ --mosaic_name ocean_mosaic --tile_file ocean_hgrid.nc --periodx 360
  1. make the ocean topog
/ncrc/home2/fms/local/opt/fre-nctools/2024.04/ncrc5/bin/make_topog --mosaic ocean_mosaic.nc --topog_type realistic --topog_file /ncrc/home2/fms/archive/mom4/input_data/OCCAM_p5degree.nc --topog_field TOPO --scale_factor=-1
  1. make the ocean mask
ncap2 -s 'maskdpt=0.0*depth;where(depth>=10.)maskdpt=1.0;tmask=float(maskdpt);tmask(0,:)=0.0' topog.nc ocean_mask.nc
ncks -O -x -v depth,maskdpt ocean_mask.nc ocean_mask.nc
ncrename -v tmask,mask ocean_mask.nc
ncatted -a standard_name,,d,, -a units,,d,, ocean_mask.nc

@guillaumevernieres

@danholdaway
Copy link
Author

Great progress, thank you @DeniseWorthen

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Dec 4, 2024

Well, by golly, it ran! I completed 24 hours using the DATM coupled to the 9deg MOM6/CICE6.

/work2/noaa/stmp/dworthen/CPLD_GRIDGEN/rt_349784/datm.mx900

@guillaumevernieres
Copy link

Well, by golly, it ran! I completed 24 hours using the DATM coupled to the 9deg MOM6/CICE6.

You rock @DeniseWorthen 🎉 🎉 Thanks for doing this!

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Dec 5, 2024

The next step is to generate the mapped ocean masks for the new ATM resolutions. These are used by chgres to create the oro-data and ICs. It sounds like you definitely want C12+9deg and C18/24+5deg, but there's some question of what ATM configuration (# levels, physics) will work?

George has created the required low-res C-grids, and I've created the mapped ocean masks (eg. C12.mx900.tile1.nc) on hercules here

/work2/noaa/stmp/dworthen/CPLD_GRIDGEN/rt_2310699

It seems like the best next step is to try to create the atm ICs and see if any of them run?

@LarissaReames-NOAA
Copy link
Collaborator

Yeah, my plan is to try using global_hyblev.l28.txt for vertical levels to create ATM ICs from existing chres_cube regression test input GFS data once we have C12.mx900 orography files. It looks like George was able to create C12mx100 files on Hera (/scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS/fix/orog.lowres) so hopefully the mx900 files shouldn't give us problems.

@DeniseWorthen
Copy link
Collaborator

OK, thanks. After generating the mapped ocean masks, I'm not well-versed on the process so I won't be much help.

@LarissaReames-NOAA
Copy link
Collaborator

I'd like to try testing out the C24mx500 configuration now that C12mx900 is working. When you're available, @DeniseWorthen can you create those input directories in /scratch2/NCEPDEV/stmp3/Denise.Worthen/input-data-20240501 like you did for C12mx900 ? Hopefully getting C24mx500 working should be trivial now that C12mx900 is working.

@DeniseWorthen
Copy link
Collaborator

@LarissaReames-NOAA It should just be a matter of updating the scripting, which currently pairs the 5-deg only w/ C48. There are currently 3 tests for the C48mx500 configuration. Are you thinking that all three w/ be replaced w/ the C24?

@LarissaReames-NOAA
Copy link
Collaborator

@LarissaReames-NOAA It should just be a matter of updating the scripting, which currently pairs the 5-deg only w/ C48. There are currently 3 tests for the C48mx500 configuration. Are you thinking that all three w/ be replaced w/ the C24?

My understanding is that the goal was a C12mx900 for GEFS testing and C24mx500 for GFS testing, so I was aiming to test that second configuration, not replace any existing regression tests.

@DeniseWorthen
Copy link
Collaborator

I think then I need to understand whether the same warmstart tests will be needed. The original issue is that we were asked to provided a configuration which would produce a MOM6 restart at hour=1 (I believe this was for Marine DA purposes). But because of the lagged-startup for MOM6 and the timesteps for MOM6, we can't produce a hour=1 restart for the really low resolution cases.

Lagged-startup means that for initial runs (ie, starting from T,S), MOM6 does not advance until the second coupling period. At the second coupling period, it advances two timesteps. After than, all timesteps/coupling is as specified. So since MOM6 doesn't actually advance until hour=2, we can't produce a hour=1 restart.

That meant we needed a special warmstart/restart test. Those are set up to actually run from 032306, 24 hours later than all other tests. The input they use is created by running the control test for 24 hours and copying the warmstart files into the input-data directory.

@LarissaReames-NOAA
Copy link
Collaborator

LarissaReames-NOAA commented Jan 2, 2025

Ah, I see your point, thanks for explaining. Since the goal is to avoid running the C48 tests, but we need these restart files, we have some choices to make. Could we change the time step of MOM6? Or do we need to create that special warmstart/restart test for the other ocean resolutions as well? Does the DA team want these restart files for both resolutions, or would they be okay with just one?

@DeniseWorthen
Copy link
Collaborator

I don't think we want MOM6 to be any slower (ie, cut the timestep) since right now I have the initial 9deg configuration running at 1 hour and it can probably be much larger actually. I'll just set up the same control/warmstart/restart configurations for C12mx900 and we'll add/subtract the desired configurations before commit.

@jiandewang
Copy link
Collaborator

@DeniseWorthen with 9x9 super coarse resolution, you can use at least 4hr as its time step. BTW, where did you get MOM_input ?

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 2, 2025

@jiandewang Understood, but then we can't produce an hour=1 restart at all.

EDIT: The 9-deg is the 5-deg w/ the resolution changed and (right now) using a z-initialization. The original 5-deg has a MOM (and CICE) restart generated from the Marine DA side

@jiandewang
Copy link
Collaborator

@DeniseWorthen thanks for the explanation. So 1hr time step for ocean will be the max. otherwise you can't generate hour 1 restart file.
this ultra coarse job will be included in rt.sh, I have concern that it will be unstable (if we have PR that's going to change answers). I had similar concern on previous 5x5 settings but lucky we didn't encounter issue. Hope this 9x9 will be the same.

@DeniseWorthen
Copy link
Collaborator

@LarissaReames-NOAA I'm looking at the configuration in this directory:

/scratch1/NCEPDEV/stmp2/Larissa.Reames/FV3_RT/rt_3367148/control_c24_intel

In input.nml and model.configure, I see these settings;

layout = 2,1
....
write_groups:            1
write_tasks_per_group:   2

ATM should then need 2x1x6 + 2x1 tasks = 14. But in ufs.configure you have

ATM_petlist_bounds:             0 11

?

@LarissaReames-NOAA
Copy link
Collaborator

@DeniseWorthen Ah, for that run I'd turned off quilting, hence only 12 PETs used. It looks like I had 14 in the job_card, but I think it ignores an extra processes assigned to the job at the top level.

@DeniseWorthen
Copy link
Collaborator

Does quilting=false turn off the write-grid component?

@LarissaReames-NOAA
Copy link
Collaborator

Does quilting=false turn off the write-grid component?

Yes.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 3, 2025

@LarissaReames-NOAA I've run into an issue trying w/ your low res branch and mine. My job kept turning up a seg-fault with the error

 8:
 1: forrtl: severe (408): fort: (3): Subscript #3 of the array TRACER1 has value -99 which is less than the lower bound of 2
 0:
 0: Image              PC                Routine            Line        Source
 0: fv3.exe            0000000010464D4F  Unknown               Unknown  Unknown
 0: fv3.exe            0000000006DB46AA  gfs_rrtmg_pre_mp_         841  GFS_rrtmg_pre.F90
 0: fv3.exe            0000000006358D36  ccpp_fv3_coupled_        1049  ccpp_FV3_coupled_lowres_radiation_cap.F90
 0: fv3.exe            00000000052C6AB6  ccpp_static_api_m         272  ccpp_static_api.F90

and I couldn't figure out why you weren't getting the same error.

I can see your lowres_rt is trying to do a debug build (-DCMAKE_BUILD_TYPE==Debug) but if you look in the actual build log, it appears to be actually doing "Release", I think because your build_type is getting over-written w/ a later value. See this log file, which is from my running your lowres_rt:

/scratch1/NCEPDEV/stmp2/Denise.Worthen/FV3_RT/rt_larissa/compile_s2s_intel/out

....
Compiling -DAPP=S2S -DCCPP_SUITES=FV3_coupled_lowres -DCMAKE_BUILD_TYPE==Debug -DHYDRO=ON into fv3_s2s_intel.exe on hera
CMAKE_FLAGS = -DAPP=S2S -DCCPP_SUITES=FV3_coupled_lowres -DCMAKE_BUILD_TYPE==Debug -DHYDRO=ON -DMPI=ON -DCMAKE_BUILD_TYPE=Release
UFS MODEL DIR: /scratch1/NCEPDEV/nems/Denise.Worthen/WORK/ufs_larissa
-- The C compiler identification is Intel 2021.5.0.20211109
....

@LarissaReames-NOAA
Copy link
Collaborator

@DeniseWorthen Good catch. I'd forgotten to change the field_table for the coupled regression test. I changed it to field_table_gfsv16 and now get errors in the CICE model about 3 hours in to the forecast when compiled with DDEBUG=ON :

14: [h21c47:2073385:0:2073385] Caught signal 8 (Floating point exception: floating-point divide by zero)
14: ==== backtrace (tid:2073385) ====
14:  0 0x0000000000053519 ucs_debug_print_backtrace()  ???:0
14:  1 0x0000000000012d10 __funlockfile()  :0
14:  2 0x000000000ff3c70f __libm_log_l9()  ???:0
14:  3 0x000000000f135f89 icepack_atmo_mp_atmo_boundary_layer_()  /scratch1/BMC/gsd-fv3/Larissa.Reames/ufs-weather-model-lowresRT/CICE-interface/CICE/icepack/columnphysics/icepack_atmo.F90:241
14:  4 0x000000000f13a7f2 icepack_atmo_mp_icepack_atm_boundary_()  /scratch1/BMC/gsd-fv3/Larissa.Reames/ufs-weather-model-lowresRT/CICE-interface/CICE/icepack/columnphysics/icepack_atmo.F90:939
14:  5 0x000000000fc72795 icepack_therm_vertical_mp_icepack_step_therm1_()  /scratch1/BMC/gsd-fv3/Larissa.Reames/ufs-weather-model-lowresRT/CICE-interface/CICE/icepack/columnphysics/icepack_therm_vertical.F90:2630
14:  6 0x000000000eff4a0e ice_step_mod_mp_step_therm1_()  /scratch1/BMC/gsd-fv3/Larissa.Reames/ufs-weather-model-lowresRT/CICE-interface/CICE/cicecore/cicedyn/general/ice_step_mod.F90:392
14:  7 0x000000000e4e6ae1 L_cice_runmod_mp_ice_step__206__par_loop0_2_0()  /scratch1/BMC/gsd-fv3/Larissa.Reames/ufs-weather-model-lowresRT/CICE-interface/CICE/cicecore/drivers/nuopc/cmeps/CICE_RunMod.F90:225

As an aside: Something we might consider is modifying the build script to not overwrite DCMAKE_BUILD_TYPE if DDEBUG isn't set as the former is a standard CMAKE option. Currently if DDEBUG isn't set, it defaults to OFF and sets DCMAKE_BUILD_TYPE=Release even if DCMAKE_BUILD_TYPE=Debug already.

@DeniseWorthen
Copy link
Collaborator

I'm not sure setting a CMAKE_BUILD_TYPE in the actual rt.conf is expected. Maybe that applicable for build.sh ? @DusanJovic-NOAA would know.

It seems the issue w/ CICE is because of land mismatches in between ATM/ICE. Looking at the regridStatus field from CMEPS (set write_dstatus = true in the MED attributes), I can see that there are unmapped points (the pink ones), which is probably sending something invalid to the ICE. Let me look at the fractional land mask for ATM. It should be mapping from all points which are <1.

Screenshot 2025-01-06 at 10 27 42 AM

@DusanJovic-NOAA
Copy link
Collaborator

I'm not sure setting a CMAKE_BUILD_TYPE in the actual rt.conf is expected. Maybe that applicable for build.sh ? @DusanJovic-NOAA would know.

Currently, we only use DEBUG flag in rt.conf to specify whether to build the executable in debug mode or not. It is used for historical reasons, the same flag was used even before we used cmake as a build system. We could get rid of it and switch to just use cmake specific flags, 'Debug' or 'Release' and set CMAKE_BUILD_TYPE directly in rt.conf, if that's what people prefer.

@LarissaReames-NOAA
Copy link
Collaborator

LarissaReames-NOAA commented Jan 6, 2025

I'm not sure setting a CMAKE_BUILD_TYPE in the actual rt.conf is expected. Maybe that applicable for build.sh ? @DusanJovic-NOAA would know.

Currently, we only use DEBUG flag in rt.conf to specify whether to build the executable in debug mode or not. It is used for historical reasons, the same flag was used even before we used cmake as a build system. We could get rid of it and switch to just use cmake specific flags, 'Debug' or 'Release' and set CMAKE_BUILD_TYPE directly in rt.conf, if that's what people prefer.

I don't have any problem with continuing the use of DDEBUG, but I think having the build system not override user-provided DCMAKE_BUILD_TYPE would be a very tiny but useful change. Or am I mis-understanding how it works and the two are not interchangeable?

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 6, 2025

@LarissaReames-NOAA We're missing a frac_grid = .true. in the input.nml. Using that gets me back to my tracer index error.

@LarissaReames-NOAA
Copy link
Collaborator

@DeniseWorthen I thought it might be something like that. Thanks for debugging.

After the discussion on slack about export_fv3_v16 going away soonish, I changed the RT to use export_fv3 and modified the scheme/test accordingly. Main change was just switching to Thompson MP. I compiled the RT with -DDEBUG=ON and now have a successful C12 run here: /scratch1/NCEPDEV/stmp2/Larissa.Reames/FV3_RT/rt_148030/cpld_control_c12_intel/

I'm guessing that frac_grid option is probably set correctly in export_fv3.

@DeniseWorthen
Copy link
Collaborator

You'll need a place to slot in the variable to the cpld_lowres.nml.IN (assuming you'll still be using it?)

....
  frac_grid    = @[FRAC_GRID]
  cplchm       = @[CPLCHM]
....

@LarissaReames-NOAA
Copy link
Collaborator

Yep, you're right. I have a new test with the frac_grid line added to cpld_lowres.nml.IN here: /scratch1/NCEPDEV/stmp2/Larissa.Reames/FV3_RT/rt_545552/

@DeniseWorthen
Copy link
Collaborator

Both cpld_c12 and cpld_c24 now run to completion in debug mode. I'm going to now create the warmstart files and turn those tests on.

@DeniseWorthen
Copy link
Collaborator

@danholdaway or @guillaumevernieres Can either of you say whether the C48-5deg configuration is still desired? If we're maintaining it, I need to organize the RT input-data one way vs a different way.

@guillaumevernieres
Copy link

@danholdaway or @guillaumevernieres Can either of you say whether the C48-5deg configuration is still desired? If we're maintaining it, I need to organize the RT input-data one way vs a different way.

@DeniseWorthen , I don't see a need for it anymore but give us a day to poll the interested parties.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jan 16, 2025

@LarissaReames-NOAA Do we know what the CDMBWD settings should be for the C12 and C24---or is it even used w/ the physics setup you've created?

For my current tests, I set it to 1.0,1.0,1.0,1.0. I think you had it set to the C48 values.

@DeniseWorthen DeniseWorthen linked a pull request Jan 21, 2025 that will close this issue
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

9 participants