Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSI changes to read and assimilate IASI-NG #805

Merged
merged 27 commits into from
Nov 25, 2024

Conversation

wx20jjung
Copy link
Contributor

Before opening a PR, please note these guidelines:

  • Each PR should only address ONE topic and have an associated issue
  • No hardcoded or paths to personal directories should be present
  • No temporary or backup files should be committed
  • Any code that was disabled by being commented out should be removed
    -->

Description

Metop-sg-a1 will be launched next year. There are several new instruments on this satellite. This pull request addresses some of the changes necessary to read and use IASI-NG data in the GSI and ultimately the global-workflow.

This code adds the read routine for IASI-NG (read_iasing.f90) and adds logic throughout the GSI to assimilate these data. The code is setup to use data from the standard operational feed and both direct broadcast feeds. The cloud and aerosol detection software (CADS) is also setup and can be turned on/off with a flag, similar to the current IASI and CrIS instruments.

There should be no dependencies needed to incorporate these changes. There are several dependencies to be able to use the IASI-NG data including; connecting to the data, various CRTM coefficient files, and choosing to use CADS.

This pull request is addressing issue #804

Type of change

  • [X ] New feature (non-breaking change which adds functionality)

How Has This Been Tested?
I have tested these changes on about 6 hours of proxy IASI-NG data using a modified version of the global-workflow, including the CRTM with IASI-NG coefficient files. I also modified the satinfo and scaninfo files to monitor up to 10,000 channels as well as a potential 500 channel subset. This was conducted on S4 and Hera. These changes do not affect the analysis when the IASI-NG data are not available.

The ctests were conducted on Hera with reproducible results.

The Hera tests:
The global_enkf failed due to time limits on the first try. It passed on the second try.

[Jim.Jung@hfe10 build]$ ctest -j 6
Test project /scratch1/NCEPDEV/jcsda/Jim.Jung/save/ctests/update/build
Start 1: global_4denvar
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens ............. Passed 1526.62 sec
2/6 Test #5: hafs_3denvar_hybens .............. Passed 2256.26 sec
3/6 Test #4: hafs_4denvar_glbens .............. Passed 2323.86 sec
4/6 Test #2: rtma ............................. Passed 2414.80 sec
5/6 Test #6: global_enkf ......................***Failed 2446.27 sec
6/6 Test #1: global_4denvar ................... Passed 2985.02 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) = 2985.07 sec

The following tests FAILED:
6 - global_enkf (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/NCEPDEV/jcsda/Jim.Jung/save/ctests/update/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
[Jim.Jung@hfe10 build]$ ctest -R global_enkf
Test project /scratch1/NCEPDEV/jcsda/Jim.Jung/save/ctests/update/build
Start 6: global_enkf
1/1 Test #6: global_enkf ...................... Passed 1234.16 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 1234.18 sec

My Jet time is limited. I will try again when it is returned from maint.

Running the standard ctests should reproduce results.

None

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

wx20jjung and others added 17 commits June 12, 2024 12:19
…on. In this case it is METImage. Specifically, the CRTM coefficent files are needed by CADS. The CRTM coefficient files used are determined by the satinfo file. The METImage entry in the satinfo file generates error messages in various parts of the GSI. These changes silence the error messages. At this time, there is no METImage assimilation code in the GSI.
Conflicts:
	src/gsi/gsimod.F90
	src/gsi/mrmsmod.f90
	src/gsi/read_cris.f90
	src/gsi/read_iasi.f90
	src/gsi/read_obs.F90
	src/gsi/statsrad.f90
Merge branch 'IASI-NG' of https://github.com/wx20jjung/GSI into IASI-NG
Merge remote-tracking branch 'emc/develop' into IASI-NG
@wx20jjung
Copy link
Contributor Author

@ADCollard , @DavidHuber-NOAA , @InnocentSouopgui-NOAA would you be willing to review these changes?

src/gsi/crtm_interface.f90 Outdated Show resolved Hide resolved
src/gsi/gsimod.F90 Outdated Show resolved Hide resolved
src/gsi/gsimod.F90 Outdated Show resolved Hide resolved
src/gsi/qcmod.f90 Show resolved Hide resolved
src/gsi/qcmod.f90 Outdated Show resolved Hide resolved
src/gsi/read_iasing.f90 Outdated Show resolved Hide resolved
src/gsi/read_iasing.f90 Show resolved Hide resolved
src/gsi/read_iasing.f90 Outdated Show resolved Hide resolved
@RussTreadon-NOAA
Copy link
Contributor

Thank you @DavidHuber-NOAA for reviewing this PR.

@wx20jjung , who is the second peer reviewer we should assign to this PR?

Co-authored-by: David Huber <[email protected]>
@wx20jjung
Copy link
Contributor Author

wx20jjung commented Nov 5, 2024 via email

@wx20jjung
Copy link
Contributor Author

I have pushed @DavidHuber-NOAA changes to github. I am not able to test them until Hera is returned to users tonight.

@wx20jjung
Copy link
Contributor Author

@RussTreadon-NOAA These are the changes I made:
diff regression_param.sh regression_param.sh_orig
104,105c104,105
< topts[1]="0:15:00" ; popts[1]="40/3/" ; ropts[1]="/1"
< topts[2]="0:15:00" ; popts[2]="40/5/" ; ropts[2]="/1"

       topts[1]="0:15:00" ; popts[1]="5/4/"  ; ropts[1]="/1"
       topts[2]="0:15:00" ; popts[2]="10/4/"  ; ropts[2]="/1"

I will commit and push them momentarily.

@wx20jjung
Copy link
Contributor Author

@RussTreadon-NOAA Jet changes were pushed to github and has passed internal checks.

@RussTreadon-NOAA RussTreadon-NOAA self-requested a review November 7, 2024 18:54
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Thank you @wx20jjung for updating the Jet job configuration for the rrfs ctest.

@RussTreadon-NOAA RussTreadon-NOAA self-requested a review November 7, 2024 18:55
@RussTreadon-NOAA
Copy link
Contributor

This PR is awaiting the return of WCOSS2 to developers so WCOSS2 ctests can be run. Assuming acceptable WCOSS2 results, this PR can be merged into develop.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 ctests
Install wx20jjung:IASI-NG at 7518ed8 and develop at b0e3cba on Cactus. Run ctests with the following results

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr805/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens .............***Failed  730.06 sec
2/6 Test #6: global_enkf ......................   Passed  855.88 sec
3/6 Test #2: rtma .............................   Passed  974.56 sec
4/6 Test #4: hafs_4denvar_glbens ..............***Failed  1220.35 sec
5/6 Test #5: hafs_3denvar_hybens ..............***Failed  1220.53 sec
6/6 Test #1: global_4denvar ...................***Failed  1743.61 sec

33% tests passed, 4 tests failed out of 6

Total Test time (real) = 1743.70 sec

The following tests FAILED:
          1 - global_4denvar (Failed)
          3 - rrfs_3denvar_rdasens (Failed)
          4 - hafs_4denvar_glbens (Failed)
          5 - hafs_3denvar_hybens (Failed)

Unfortunately each of the above failures is due to the fact that the updat and contrl analyses differ. The initial penalties are identical. Differences arise in the minimization. It's odd that this behavior is only observed on WCOSS2. ctests pass on other platforms. The WCOSS2 GSI build uses intel/19. The builds on other machines are intel/20+.

@RussTreadon-NOAA
Copy link
Contributor

Create stand-alone script to run global_4denvar on Cactus. Simplify configuration to 3dvar with no FGAT & no constraints. Only assimilate CrIS and IASI. Non-reproducible results between cntrl (develop) and updat (wx20jjung:IASI-NG) remain. The reason(s) for the non-reproducible results must be understand and, hopefully, resolved before this PR can move forward.

@wx20jjung , I see that you have a WCOSS2 account. Can you log onto Cactus and investigate? The stand-alone script I have is /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/scripts/run_global_4denvar.sh

@RussTreadon-NOAA
Copy link
Contributor

global_4denvar debugging on Cactus

Recompile develop and wx20jjung:IASI-NG in debug mode. This is done by setting BUILD_TYPE=Debug in ush/build.sh. Submit run_global_4denvar.sh for each executable. Analysis results are identical.

Does optimization alter the instruction order between the two source codes - develop -vs- wx20jjung:IASI-NG? Note - all global_4denvar debug runs set OMP_NUM_THREADS=1.

@RussTreadon-NOAA
Copy link
Contributor

Build develop and wx20jjung:IASI-NG on Cactus using -O2 optimization. Analysis results differ. Change to -O1 optimization. Analysis results are identical. Debug build use -O0 optimization.

@RussTreadon-NOAA
Copy link
Contributor

Rebuild develop and wx20jjung:IASI-NG on Cactus using -O3 optimization (default value). Set CRIS_CADS=.false. and IASI_CADS=.false.. Analysis results differ.

@wx20jjung
Copy link
Contributor Author

I do not have access to restricted data on wcoss yet. Removing only the restricted data from both develop and update generated the following results:
jim.jung@dlogin03:/lfs/h2/emc/da/noscrub/jim.jung/ctests/update/build> ctest -j 6
Test project /lfs/h2/emc/da/noscrub/jim.jung/ctests/update/build
Start 1: global_4denvar
Start 6: global_enkf
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
1/6 Test #6: global_enkf ...................... Passed 251.05 sec
2/6 Test #3: rrfs_3denvar_rdasens ............. Passed 727.29 sec
3/6 Test #2: rtma ............................. Passed 969.02 sec
4/6 Test #5: hafs_3denvar_hybens .............. Passed 1154.36 sec
5/6 Test #4: hafs_4denvar_glbens .............. Passed 1213.25 sec
6/6 Test #1: global_4denvar ................... Passed 1443.14 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) = 1443.27 sec

I can't do further tests until I am added to the rstprod group.

@RussTreadon-NOAA
Copy link
Contributor

global_4denvar has the following rstprod dump files

+##$nln $datobs/${prefix_obs}.prepbufr                ./prepbufr
+##$nln $datobs/${prefix_obs}.prepbufr.acft_profiles  ./prepbufr_profl
+##$nln $datobs/${prefix_obs}.nsstbufr                ./nsstbufr
+##$nln $datobs/${prefix_obs}.gpsro.${suffix}         ./gpsrobufr
+##$nln $datobs/${prefix_obs}.saphir.${suffix}        ./saphirbufr

As shown above, comment out these dump files from global_4denvar.sh. Run global_4denvar ctest with following result

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr805/build
    Start 1: global_4denvar
1/1 Test #1: global_4denvar ...................***Failed  1568.71 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 1569.30 sec

The following tests FAILED:
          1 - global_4denvar (Failed)

The initial step size on the first iteration of the first loop differ between the updat and contrl. The same behavior is observed when all dump files are processed.

@RussTreadon-NOAA
Copy link
Contributor

@wx20jjung , your passed case does not assimilate microwave radiances. I repeat the no_rstprod test with the link for microwave radiance dump files also commented out in global_4denvar.sh. Unfortunately, my updat and contrl results still differ.

@RussTreadon-NOAA
Copy link
Contributor

Modify russ.treadon environment on Dogwood to mimic that of jim.jung. Recompile wx20jjung:IASI-NG and develop. Still unable to reproduce Passed result @wx20jjung reports.

Install develop and wx20jjung:IASI-NG on Dogwood in role account emc.da. Run global_4denvar ctest with the following results

Test project /lfs/h2/emc/da/noscrub/emc.da/git/gsi/pr805/build
    Start 1: global_4denvar
1/1 Test #1: global_4denvar ...................   Passed  1745.49 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 1745.63 sec

Repeat this test using russ.treadon. ctest global_4denvar fails

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr805/build
    Start 1: global_4denvar
1/1 Test #1: global_4denvar ...................***Failed  1694.93 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 1694.96 sec

The following tests FAILED:
          1 - global_4denvar (Failed)

Comparison of the russ.treadon and emc.da results shows that the contrl runs (ie, the build from develop) are identical. The
updat results differ. Why russ.treadon results for updat differ from emc.da remains unanswered. I will open a WCOSS2 ticket.

Since russ.treadon is the outlier with respect to @wx20jjung and emc.da, I will approve this PR.

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve.

@RussTreadon-NOAA
Copy link
Contributor

Odd WCOSS2 (Dogwood) behavior

The global_4denvar test passes or fails depending on which user runs the test. When users jim.jung, shun.liu, and emc.da build and run gsi.x the test passes. When users russ.treadon and emc.global build and run gsi.x the global_4denvar test fails.

WCOSS2 Ticket#2024112010000039 has been opened reporting this behavior.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS test using spack-stack/1.6.0`

Recompile wx20jjung:IASI-NG and develop using test version of spack-stack/1.6.0. Changes to modulefiles/gsi_wcoss2.intel.lua are as follows

--- a/modulefiles/gsi_wcoss2.intel.lua
+++ b/modulefiles/gsi_wcoss2.intel.lua
@@ -1,13 +1,15 @@
 help([[
 ]])
 
-local PrgEnv_intel_ver=os.getenv("PrgEnv_intel_ver") or "8.1.0"
+prepend_path("MODULEPATH", "/apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304/install/modulefiles/Core")
+
+local PrgEnv_intel_ver=os.getenv("PrgEnv_intel_ver") or "8.3.3"
 local intel_ver=os.getenv("intel_ver") or "19.1.3.304"
-local craype_ver=os.getenv("craype_ver") or "2.7.8"
-local cray_mpich_ver=os.getenv("cray_mpich_ver") or "8.1.7"
+local craype_ver=os.getenv("craype_ver") or "2.7.17"
+local cray_mpich_ver=os.getenv("cray_mpich_ver") or "8.1.9"
 local cmake_ver= os.getenv("cmake_ver") or "3.20.2"
 local python_ver=os.getenv("python_ver") or "3.8.6"
-local prod_util_ver=os.getenv("prod_util_ver") or "2.0.10"
+local prod_util_ver=os.getenv("prod_util_ver") or "2.0.14"
 
 local netcdf_ver=os.getenv("netcdf_ver") or "4.7.4"
 local bufr_ver=os.getenv("bufr_ver") or "11.7.0"
@@ -24,9 +26,9 @@ local crtm_ver=os.getenv("crtm_ver") or "2.4.0.1"
 local ncdiag_ver=os.getenv("ncdiag_ver") or "1.1.1"
 
 load(pathJoin("PrgEnv-intel", PrgEnv_intel_ver))
-load(pathJoin("intel", intel_ver))
+load(pathJoin("stack-intel", intel_ver))
 load(pathJoin("craype", craype_ver))
-load(pathJoin("cray-mpich", cray_mpich_ver))
+load(pathJoin("stack-cray-mpich", cray_mpich_ver))
 load(pathJoin("cmake", cmake_ver))
 load(pathJoin("python", python_ver))

Rerun all 6 ctests with following results

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr805/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens .............   Passed  730.09 sec
2/6 Test #6: global_enkf ......................   Passed  853.87 sec
3/6 Test #2: rtma .............................   Passed  969.74 sec
4/6 Test #5: hafs_3denvar_hybens ..............   Passed  1217.84 sec
5/6 Test #4: hafs_4denvar_glbens ..............   Passed  1334.56 sec
6/6 Test #1: global_4denvar ...................   Passed  1683.77 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) = 1684.00 sec

The environment for russ.treadon was not modified between the above spack-stack test and previous tests using the unmodified modulefiles/gsi_wcoss2.intel.lua (which uses hpc-stack).

@DavidHuber-NOAA
Copy link
Collaborator

@RussTreadon-NOAA It's worth noting that by making the changes that you did for spack-stack, the majority of the GDIT-installed libraries are still being used to build and run the GSI. The main differences in libraries are

PrgEnv 8.1.0 -> 8.3.3
cray-mpich 8.1.7->8.1.9
craype 2.7.8 -> 2.7.17
prod_util 2.0.10 -> 2.0.14
sp, ip, sigio, sfcio, wrf-io, ncio, crtm (same version, but spack-stack)
netcdf, bufr, w3emc, nemsio, ncio, bacio, ncdiag (same version)

I wonder if just incrementing PrgEnv, cray-mpich, craype, and prod_util would have the same result.

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 update

Another data point regarding WCOSS2 behavior for this PR. If develop and wx20jjung:IASI-NG are recompiled with the test spack-stack/1.6.0 crtm/2.4.0.1 all ctests pass.

Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr805_only_crtm/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #3: rrfs_3denvar_rdasens .............   Passed  732.74 sec
2/6 Test #6: global_enkf ......................   Passed  857.12 sec
3/6 Test #2: rtma .............................   Passed  970.45 sec
4/6 Test #5: hafs_3denvar_hybens ..............   Passed  1217.08 sec
5/6 Test #4: hafs_4denvar_glbens ..............   Passed  1276.20 sec
6/6 Test #1: global_4denvar ...................   Passed  1744.99 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) = 1745.08 sec

In this test only the source for crtm was changed. All other modules remain untouched.

Changing from the production to spack-stack crtm/2.4.0.1 alters the initial radiance penalties. < below is the production crtm/2.4.0.1. > is the spack-stack crtm/2.4.0.1.

russ.treadon@dlogin04:/lfs/h2/emc/ptmp/russ.treadon> diff pr805/tmpreg_global_4denvar/global_4denvar_loproc_updat/fort.207 pr805_only_crtm/tmpreg_global_4denvar/global_
4denvar_loproc_updat/fort.207 |more
8785c8785
<  npp         atms          675.78829071    157       8      15       4      59      56       0
---
>  npp         atms          675.78829072    157       8      15       4      59      56       0
8787c8787
<                           675.78829071      4       3      16       2       0      11       9
---
>                           675.78829072      4       3      16       2       0      11       9
8791c8791
<  n20         atms          632.69350100    156      10      11       1      60      56       0
---
>  n20         atms          632.69350099    156      10      11       1      60      56       0
8793c8793
<                           632.69350100      4       5      11       1       0      11       7
---
>                           632.69350099      4       5      11       1       0      11       7
8803c8803
<  n18         avhrr         144.77512846    453       0       0      38     196       8       0
---
>  n18         avhrr         144.74571017    453       0       0      38     196       8       0
8805c8805
<                           144.77512846      0     232      87     435       0       0       0
---
>                           144.74571017      0     232      87     436       0       0       0
8821c8821
<  n19         avhrr         184.51177472    706       0       0     112     311      12       0
---
>  n19         avhrr         184.74342823    706       0       0     112     311      12       0
8823c8823
<                           184.51177472      0     367      80     538       0       0       0
---
>                           184.74342823      0     367      80     537       0       0       0
8839c8839
<  metop-c     amsua         895.87416594    151       6      11       2      59      46       0
---
>  metop-c     amsua         895.87416595    151       6      11       2      59      46       0
8841c8841
<                           895.87416594      8       5       1       1       0      21      17
---
>                           895.87416595      8       5       1       1       0      21      17
8892,8893c8892,8893
<   rad total   penalty_all=   20561.1178041497697    
<   rad total qcpenalty_all=   20561.1178041497697    
---
>   rad total   penalty_all=   20561.3200393643165    
>   rad total qcpenalty_all=   20561.3200393643165    
9540c9540
<    1012     4 avhrr3_n18              227      3      0.680   0.3706525   0.0274468   0.2079673   0.4019023   0.4009640
---
>    1012     4 avhrr3_n18              226      3      0.680   0.3696996   0.0267708   0.2087573   0.4026120   0.4017210
9544c9544
<    1019     5 avhrr3_n19              406     11      0.720   0.1948414   0.0599925   0.3162990   0.4793941   0.4756255
---
>    1019     5 avhrr3_n19              407     11      0.720   0.1937809   0.0586771   0.3160910   0.4793843   0.4757797
11247c11247
< o-g 01 rad   n18          avhrr            844913         1237          536    144.78       144.78      0.27010      0.27010    
---
> o-g 01 rad   n18          avhrr            844913         1237          535    144.75       144.75      0.27055      0.27055    
11250c11250
< o-g 01 rad   n19          avhrr           1711343         1720          739    184.51       184.51      0.24968      0.24968    
---
> o-g 01 rad   n19          avhrr           1711343         1720          740    184.74       184.74      0.24965      0.24965    

The number of assimilated avhrr data differs between the two crtms. n18_avhrr has one more observation assimilated with the production crtm. n19_avhrr has one more observation assimilated with the spack-stack crtm.

These differences are a bit surprising since both the production and spack-stack crtm are supposedly version 2.4.0.1

@DavidHuber-NOAA
Copy link
Collaborator

Thanks for that data point, Russ. I will forward that on to Alex and Wei.

@RussTreadon-NOAA
Copy link
Contributor

updat and contrl results on WCOSS2 are reproducible as is for some developers. For other developers results are reproducible if they build with spack-stack/1.6.0. There is as of yet no explanation for this behavior. Move this PR to the GSI Handling review team for their consideration.

@RussTreadon-NOAA RussTreadon-NOAA merged commit 9ada88e into NOAA-EMC:develop Nov 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants