-
Notifications
You must be signed in to change notification settings - Fork 7
ELM on cades, or condo: environments, building, and running
UPDATED: 2022-03-10
for the most recent E3SM master with modified user-defined PFT
**** NOTE: this is for CCSI users. ****
/software/user_tools/current/cades-ccsi/
This is for group accessible tools.
Currently for PFLOTRAN, we have gcc compiler, with parallel support (OPENMPI), built PETSc (PETSC_DIR=/software/user_tools/current/cades-ccsi/petsc4pf/openmpi-1.10-gcc-5.3
for version 3.8x; OR, PETSC_DIR=/software/user_tools/current/cades-ccsi/petsc-x/openmpi-1.10-gcc-5.3
for version 3.9x or above).
/lustre/or-scratch/cades-ccsi/
this is for group usable file space. Hereafter, it's referred by $CCSI_DIR. Users shall have a folder under this.
LOGIN node: or-slurm-login.ornl.gov
i.e. access by ssh or-slurm-login.ornl.gov
Note: ./proj-shared, ./world-shared, and ./scratch are for what they are supposed to.
(1) It's suggested to edit your own .bashrc, when first time login.
(2) Suggested .bashrc for ELM currently as following,
# the following are with gcc-5.3.0, openmpi-1.10.3 (default)
module purge
module unload PE-intel
module unload PE-pgi
module unload PE-gnu
module load PE-gnu/1.0
module load boost
module load cmake
module load zlib
# needs blas-lapack
module load mkl/2018.1.163
export BLASLAPACK_LIBDIR=/software/dev_tools/swtree/or-condo/mkl/2018.1.163/centos7.5_binary/lib
# parallell-enabled hdf5 with openmpi-1.10.3/intel2016
module load hdf5-parallel/1.8.17
export HDF5_PATH=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/hdf5-parallel/1.8.17/centos7.2_gnu5.3.0
export PATH=$HDF5_PATH/bin:$PATH
export LD_LIBRARY_PATH=$HDF5_PATH/lib:$LD_LIBRARY_PATH
# netcdf built with openmpi-1.10.3/gnu and support of hdf5 (i.e. netcdf-4)
module load netcdf-hdf5parallel/4.3.3.1
export NETCDF_PATH=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/netcdf-hdf5parallel/4.3.3.1/centos7.2_gnu5.3.0
export PATH=$NETCDF_PATH/bin:$PATH
export LD_LIBRARY_PATH=$NETCDF_PATH/lib:$LD_LIBRARY_PATH
module load pnetcdf
export PNETCDF_PATH=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/pnetcdf/1.9.0/centos7.2_gnu5.3.0
export PATH=$PNETCDF_PATH/bin:$PATH
export ELM_PFLOTRAN_SOURCE_DIR=$CLM_PFLOTRAN_SOURCE_DIR
export PETSC_PATH=$PETSC_DIR
export LD_LIBRARY_PATH=$PETSC_PATH/lib:$LD_LIBRARY_PATH
module load perl
#nco
#module load nco/4.6.9 # this system version has issues with ncap
export UDUNITS2_PATH=$CCSI_USERTOOLS/udunits2
export PATH=$UDUNITS2_PATH/bin:$PATH
export GSL_ROOT=$CCSI_USERTOOLS/nco/gsl
export PATH=$GSL_ROOT/bin:$PATH
export ANTLR_ROOT=$CCSI_USERTOOLS/nco/antlr
export PATH=$ANTLR_ROOT/bin:$PATH
export NCOPATH=$CCSI_USERTOOLS/nco/nco-4.7.9
export PATH=$NCOPATH/bin:$PATH
# ncl
module load ncl
module rm python
module load python/3.6.3
# user-installed python packages (.local/lib/python3.6/site-packages/)
export PY_USER_PATH=~/.local/lib/python3.6/site-packages
export PYTHONPATH=$PY_USER_PATH:$PYTHONPATH
There is specific branch for our system, https://github.com/E3SM-Project/E3SM.git, branch fmyuan/ornl-cades-machine-update
trunk E3SM
$CCSI_DIR/proj-shared/project_acme/e3sm-inputdata/
We try to keep a minimal but full copy of E3SM inputdata in this folder, because most CCSI users don't have access right to E3SM data repository.
Additionally, for site-level or project-specific purpose, the following site-specific input data may be used.
/nfs/data/ccsi/proj-shared/E3SM/inputdata
NOTE: For ELM CUSTOMIZED configuration with options of CPL_BYPASS & User-defined PFT
NOTE: For this example and other to work, it is needed to clone pt-e3sm-inputdata repo
- STEP 1. Create a PTCLM case
1) cd $E3SM_MODEL_DIR/cime/scripts
2) ./create_newcase --case /lustre/or-scratch/cades-ccsi/proj-shared/project_acme/cases/testELM_US-Brw_I1850CNPRDCTCBC --res ELM_USRDAT --mach cades --compiler gnu --compset I1850CNPRDCTCBC --project ccsi --walltime 00:30:00
- STEP 2. Build the case created in STEP 1 (user-defined 1 point dataset, namely '1x1pt_US-Brw' from 1985-2015)
3) cd /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cases/testELM_US-Brw_I1850CNPRDCTCBC
4) ./xmlchange, OR, vi env_run.xml
--id SAVE_TIMING --val FALSE
--id DATM_MODE --val CLM1PT
--id DATM_CLMNCEP_YR_START --val 1985
--id DATM_CLMNCEP_YR_END --val 2015
--id CLM_USRDAT_NAME --val 1x1pt_US-Brw
ADDTIONALLY (not availble in E3SM trunk),
if 'accelerated spinup', you need to edit 'env_run.xml' as following:
--id CLM_ACCELERATED_SPINUP --val on
if 'cold startup', you need to edit 'env_run.xml' as following:
--id CLM_FORCED_COLDSTART --val on
5) ./case.setup
(./case.setup --clean FOR cleaning-up)
6) vi user_nl_datm
Edit this file, add the following line in the last (NOTE: this is a bug when doing spinup setup from CESM)
taxmode = "cycle", "extend", "extend"
(NOTE: The first "cycle" mode implies that use climate data repeatedly for COMPSET I1850XXX).
7) ./case.build
(./case.build --clean #cleaning-up ALL ACME model components
./case.build --clean lnd #cleaning-up LND component
./case.build --clean-all #cleaning-up ALL model components and External components)
- STEP 3. Run the case
./case.submit
NOTES:
*(a) the run direcory is:$CCSI_DIR/scratch/$USER/testELM_US-Brw_I1850CNPRDCTCBC/run
*(b) The run length is 5 days, with monthly output, i.e. NO real data write out EXCEPT for restart files at the end.
*(c) If you'd like what-ever time-length of the simulation, editing env_run.xml
, for an example, like the following:
--id STOP_OPTION nyear
--id STOP_N 600
(These editing will let model run 600 years)
-
(a) Additional OPTIONS for coupled code compiling with PFLOTRAN
-
**edit
env_run.xml
as following:
--id CLM_INTERFACE_MODE --val pflotran
- **edit
env_build.mxl
as following:
--id CLM_USE_PETSC --val TRUE
- **edit
env_mach_specific.xml
as following:
--id CLM_PFLOTRAN_COUPLED --val TRUE
--id CLM_PFLOTRAN_COLMODE --val TRUE # only IF vertical transport/flow, e.g. global simulation
--id CLM_PFLOTRAN_SOURCE_DIR --val lustre/or-hydra/cades-ccsi/proj-shared/models/pflotran-interface/src/clm-pflotran # modify as needed
-
(b) COMPILING, clean the build and compile the model as usual. But before this, you need to make sure
libpflotran.a
has been compiled in theCLM_PFLOTRAN_SOURCE_DIR
, as instructed here. -
(c) RUNNING, run the model as usual. For this case, edit
env_run.xml
to run 5 model-years:
--id STOP_OPTION nyear
--id STOP_N 5
- NOTE: if appending
use_pflotran = .false.
anduse_clm_interface = .false.
into fileuser_nl_clm
, the model will run like NOT-PFLOTRAN-coupled, although it's fully compiled*
II-3. ELM only, with user-defined PFT, 1 point Grid-level GSWP3v1, 6 Ecotypes at NGEE-Arctic Kougarok Sites.
-
E3SM https://github.com/E3SM-Project/E3SM.git, Branch:
fmyuan/lnd/elm-userpft
-
STEP 1. Create a PTCLM case
1) cd $CCSI_DIR/proj-shared/models/e3sm/cime/scripts
2) ./create_newcase --case /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cases/ELMuserpft_Kougarok_I1850CNPRDCTCBC --res CLM_USRDAT --mach cades --compiler gnu --compset I1850CNPRDCTCBC --project ccsi --walltime 00:30:00
- STEP 2. Configure case just created in STEP 1: (1) 1 grid-level atm data (1901-2010); (2) 6 user-defined point datasets for NGEE-Arctic Kougarok Sites; (3) user-defined Arctic PFTs.
3) cd /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cases/ELMuserpft_Kougarok_I1850CNPRDCTCBC
4) ./xmlchange, OR, vi env_run.xml, to modify configurations as following:
--id SAVE_TIMING --val FALSE
--id DATM_MODE --val CLMGSWP3v1
--id DATM_CLMNCEP_YR_START --val 1901
--id DATM_CLMNCEP_YR_END --val 2010
--id CLM_USRDAT_NAME --val 1x1pt_kougarok-GRID
--id ATM_DOMAIN_PATH --val /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/share/domains/domain.clm
--id LND_DOMAIN_PATH --val /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/share/domains/domain.clm
--id ATM_DOMAIN_FILE --val domain.lnd.51x63pt_kougarok-NGEE_TransA_navy.nc
--id LND_DOMAIN_FILE --val domain.lnd.51x63pt_kougarok-NGEE_TransA_navy.nc
--id DIN_LOC_ROOT_CLMFORC --val /lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/atm/datm7
ADDTIONALLY (not availble in E3SM trunk), if 'accelerated spinup', you need to edit 'env_run.xml' as following:
--id CLM_ACCELERATED_SPINUP --val on
if 'cold startup', you need to edit 'env_run.xml' as following:
--id CLM_FORCE_COLDSTART --val on
- STEP 3. Configure model run parallelism setting
By default, the model run on CADES is upon some pre-configured setting. For an example, the case setup by now will run model on 1 node with 1 MPI Task. So that's not good, so vi env_mach_pes.xml
as following so that model runs on 1 node with 6 MPI Tasks
5) vi env_mach_pes.xml
<entry id="NTASKS">
<type>integer</type>
<values>
<value compclass="ATM">6</value>
<value compclass="CPL">6</value>
<value compclass="OCN">6</value>
<value compclass="WAV">6</value>
<value compclass="GLC">6</value>
<value compclass="ICE">6</value>
<value compclass="ROF">6</value>
<value compclass="LND">6</value>
<value compclass="ESP">1</value>
</values>
<desc>number of tasks for each component</desc>
</entry>
NOTE: by case-creating in STEP 1, the model run on CADES for maximal 30 minutes (Wallclocktime). For longer simulation, we need to modify that. SO vi env_batch.xml
as following for 'group id="case.run" ', for an example, to run the model up to 48 hours (note: this is the wallclock time allowable currently on CADES):
<entry id="JOB_WALLCLOCK_TIME" value="48:00:00">
<type>char</type>
<valid_values/>
<desc>The machine wallclock setting. Default determined in config_machines.xml can be overwritten by testing</desc>
</entry>
<entry id="JOB_QUEUE" value="batch">
<type>char</type>
<valid_values/>
<desc>The machine queue in which to submit the job. Default determined in config_machines.xml can be overwritten by testing</desc>
</entry>
<entry id="USER_REQUESTED_WALLTIME" value="48:00:00">
<type>char</type>
<desc>Store user override for walltime</desc>
</entry>
- STEP 4. Build the case configured in STEP 2-3
6) ./case.setup
(./case.setup --clean FOR cleaning-up setting-up)
The model setup will not be successful, with error relevant to missing 'fsurdat' file and its associated PFT parameter file. So, vi file user_nl_clm
by adding the following:
fsurdat = '/lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/lnd/clm2/surfdata_map/surfdata_51x63pt_kougarok-NGEE_TransA_simyr1850_c181115-sub12.nc'
paramfile = '/lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/lnd/clm2/paramdata/clm_params_c180524-sub12.nc'
nyears_ad_carbon_only = 25
spinup_mortality_factor = 10
NOTE: the last 2 options are optional, but which can accelerate spinup much better.
ONE model edit is the 'maxpft' (which default is 17), by vi env_run.xml
, modifying CLM_BLDNML_OPTS
, like following
<entry id="CLM_BLDNML_OPTS" value="-maxpft 12 -bgc bgc -nutrient cnp -nutrient_comp_pathway rd -soil_decomp ctc -methane -nitrif_denitrif ">
<type>char</type>
<desc>CLM build-namelist options</desc>
</entry>
AND, finally build the model, as following
7) ./case.build
(./case.build --clean #cleaning-up ALL ACME model components
./case.build --clean lnd #cleaning-up LND component
./case.build --clean-all #cleaning-up ALL model components and External components)
- STEP 5. Run the case
./case.submit
NOTES:
*(a) the run direcory is:$CCSI_DIR/scratch/$USER/ELMuserpft_Kougarok_I1850CNPRDCTCBC/run
*(b) The run length is 5 days, with monthly output, i.e. NO real data write out EXCEPT for restart files at the end.
*(c) If you'd like what-ever time-length of the simulation, editing env_run.xml
, for an example, like the following:
--id STOP_OPTION nyear
--id STOP_N 600
(These editing will let model run 600 years)
NOTE: you probably need to modify JOB_WALLCLOCK_TIME in env_batch.xml
mentioned above in STEP 3.
*(d) If you'd like what-ever time-interval of produce RESTART files, again editing env_run.xml
, for an example, like the following:
--id REST_OPTION nyear
--id REST_N 100
(These editing will let model output restart files every 100 years)
*(e) If your run like that of 600 years may not finish in the wallclock time, you need to editing env_run.xml
to resume the model from the LAST saved restart point, like the following:
--id CONTINUE_RUN TRUE
TIPS: Since we configured to save 'restart' files every 100 years, the model will resume from THE LAST save point. It's critical for user, therefore, to configure 'REST_N', 'STOP_N', and 'JOB_WALLCLOCK_TIME' all properly, so that the run would save both IO and run-time. Normally those restart files are large and its frequent writing cost time.
*(f) AFTER you're done with ACCELERATED_SPINUP (aka ad_spinup), you need to run normal spinup, by vi env_run.xml
as following:
--id CLM_ACCELERATED_SPINUP --val off
--id CLM_FORCE_COLDSTART --val off
AND, adding initial state into user_nl_clm
, as following:
finidat = './ELMuserpft_Kougarok_I1850CNPRDCTCBC.clm2.r.0601-01-01-00000.nc'
Here, we assume the file ELMuserpft_Kougarok_I1850CNPRDCTCBC.clm2.r.0601-01-01-00000.nc
is in the directory: $CCSI_DIR/scratch/$USER/ELMuserpft_Kougarok_I1850CNPRDCTCBC/run
.
THEN, issue command: ./case.submit
as before. This will run model for another 600 years as normal spinup stage. The newly generated ELMuserpft_Kougarok_I1850CNPRDCTCBC.clm2.r.0601-01-01-00000.nc
will be used as ```finidat`` for TRANSIENT run (1850 - 2010 in our case here).
- STEP 6. Run the case for TRANSIENT stage (1850-2010)
THIS TIME you need to: (1) create a fresh case with --compset I20TRCNPRDCTCBC
; (2) configure case with -id STOP_N -val 160
years.
AND, adding initial state into user_nl_clm
, as following:
finidat = '$CCSI_DIR/scratch/$USER/ELMuserpft_Kougarok_I1850CNPRDCTCBC/run/ELMuserpft_Kougarok_I1850CNPRDCTCBC.clm2.r.0601-01-01-00000.nc'
(NOTE: you MUST explicitly expand the $CCSI_DIR and $USER into real directory name; OR, copy that finitdat file into your transient run directory).
- STEP 1. Create a case
cd $CCSI_DIR/proj-shared/models/ACME-fmyuan/cime/script/
./create_newcase --case $CCSI_DIR/proj-shared /project_acme/cases/testalm_hcru_I1850CLM45CN --res hcru_hcru --mach cades --compiler gnu --mpilib openmpi --compset I1850CLM45CN --walltime 1:00:00 --project ccsi
This command will setup a case in the following directory:
$CCSI_DIR/proj-shared/project_acme/cases/testalm_hcru_I1850CLM45CN
if you want to change this case name and directory, modify the command option above following “--case”. Usually you should put this in your home directory, like: --case $HOME/project_acme/cases/????
This command will also create a model build directory and run directory under:
$CCSI_DIR/scratch/$USER/testalm_hcru_I1850CLM45CN
It implies that your executive and libraries are in folder ./bld of that, and model run directory is in folder ./run of that.
- STEP 2. Build a case, under the case directory.
cd $CCSI_DIR/proj-shared/project_acme/cases/testalm_hcru_I1850CLM45CN
./xmlchange: env_run.xml
--id STOP_N --val 1
--id STOP_OPTION --val ndays
(This is for USER AS NEEDED, for examples the above edits will modify model to run 1 (STOP_N) day (STOP_OPTION).)
./case.setup
./case.build
This will build the model in your bld directory as mentioned in ending of STEP 1. You can modify it in ‘env_build.xml’, with ‘entry_id’ of EXEROOT or CESMSCRATCHROOT.
- STEP 3. Run case, under the case directory.
TIP: if don't want to write out CLM netcdf files (usually it's slow on HPC, which not good for code testing), you may want to turn off all output by adding a line of text into user_nl_clm
:
hist_empty_htapes = .true.
Then, run model by issuing command:
./case.submit
It shall run the model under the scratch root for computing node. In this case it is:
$CCSI_DIR/scratch/$USER/testalm_hcru_I1850CLM45CN/run
FINALLY, adding user-defined LULC into user_nl_clm
, as following:
flanduse_timeseries = '/lustre/or-hydra/cades-ccsi/proj-shared/project_acme/cesminput_ngee/lnd/clm2/surfdata_map/landuse.timeseries_51x63pt_kougarok-NGEE_TransA_simyr1850_c181115-sub12.nc'