This subdirectory contains details about how to download and preprocess the WeatherBench data and run according experiments.
Download the 5.625 degree data from the TUM server---which can also be done via the command line, as detailed here--- and store the folders containing the separate variables in (src/dlwpbench/)data/netcdf/weatherbench/
, i.e.,
.
|-- data
| |-- netcdf
| |-- weatherbench
| |-- 10m_u_component_of_wind
| |-- 10m_v_component_of_wind
| |-- 2m_temperature
| |-- constants
| |-- geopotential
| |-- potential_vorticity
| |-- relative_humidity
| |-- specific_humidity
| |-- temperature
| |-- toa_incident_solar_radiation
| |-- total_cloud_cover
| |-- total_precipitation
| |-- u_component_of_wind
| |-- v_component_of_wind
| `-- vorticity
Important
The 5.625deg
suffix must not be part of the directory names and might required to be removed, depending on the download method. Also, 'geopotential_500' and 'temperature_850' must not be part of the directory tree.
Tip
Exemplarily, download the 2m_temperature
field via command line, uncompress it, and move it to the correct directory with
mkdir -p data/netcdf/weatherbench
rsync -P rsync://[email protected]/m1524895/5.625deg/2m_temperature/2m_temperature_5.625deg.zip data/netcdf/weatherbench/2m_tmperature_5.625deg.zip
enter password m1524895
unzip data/netcdf/weatherbench/2m_temperature_5.625deg.zip -d data/netcdf/weatherbench/2m_temperature_5.625deg
rm data/netcdf/weatherbench/2m_temperature_5.625.zip
The entire dataset with 250GB
can be downloaded from https://dataserv.ub.tum.de/s/m1524895/download?path=/5.625deg&files=all_5.625deg.zip
For faster data loading from disk when training and evaluating models, the data can be converted from netcdf
(with .nc
extension) to zarr
(with .zarr
extension) using the nc_to_zarr.py script. That is
python data/processing/nc_to_zarr.py
In the following, the files are expected to be in zarr
format, but the package is compatible with netcdf
files as well. Working with .nc
files, though, requires modifications in the used data config file, by setting data_path: data/netcdf/weatherbench/
and engine: netcdf4
.
Statistics for data normalization are computed and contained in the datasets.py class and can be recomputed using the self._compute_statistics()
function in that class.
The conversion from the rectangular LatLon to the HEALPix mesh can be realized by calling the healpix_mapping.py script with according arguments. For example, to project 2m_temperature_5.625deg
files in zarr
file format to HEALPix, run
python data/processing/healpix_mapping.py -v 2m_temperature_5.625deg
or provide additional arguments if desired.
This will convert the data of shape [time, (level), lat, lon]
to shape [time, (level), face, height, width]
, where level
is optional, depending on the variable, face = 12
and height = width = nside
, with nside
specified in the conversion process.
Training and evaluation require the dlwpbench
environment activated, thus
conda activate dlwpbench
Model training can be invoked via the training script, e.g., calling
python scripts/train.py model=unet model.type=UNetHPX model.name=unet_hpx_example data=example training.epochs=10 device=cuda:0 data=example model.constant_channels=0 model.prescribed_channels=0 model.prognostic_channels=1
to train an exemplary U-Net on 2m_temperature
on the HEALPix projection.
Note
Running the command above will require the 2m_temperature
variable converted to zarr
and projected to HEALPix, as descrobed in the Data section. Also, make sure to follow the naming convention of the data directory tree above, i.e., remove 5.625deg
from the 2m_temperature_5.625deg
directory name.
Note
GraphCast requires a pre-generated icosahedral.json
file, which specifies the mesh on which the model operates. These files can be generated using the icospheres.py script for arbitrary mesh resolutions (number of hierarchical levels) and must be linked in GraphCast's config.yaml file. Originally GraphCast uses six levels, but here it must be reduced to three due to the smaller resolution of the 5.625° WeatherBench data.
A selection of pretrained models (see list below) can be downloaded here. To use a respective model, place its *.ckpt
into the according checkpoints
directory in the model's outputs
folder, e.g., outputs/clstm16m_cyl_4x228_v2/checkpoints/clstm1m_cyl_4x228_v2_best.ckpt
.
The drive contains weights and exemplary outputs of the following models:
clstm16m_cyl_4x228_v2
clstm16m_hpx8_4x228_v2
unet128m_cyl_128-256-512-1024-2014_v2
unet16m_hpx8_92-184-368-736_v0
swint2m_cyl_d88_l2x4_h2x4_v0
swint16m_hpx8_d120_l3x4_h3x4_v2
pangu32m_d216_h6-12-12-6_v1
fno2d64m_cyl_d307_v0
tfno2d128m_cyl_d477_v0
fcnet4m_emb272_nopos_l6_v1
fcnet8m_emb384_nopos_p2x4_l6_v2
fcnet64m_emb940_nopos_p4x4_l8_v0
sfno2d128m_cyl_d686_equi_nonorm_nopos_v0
mgn500k_l4_d116_v0
mgn32m_l8_d470_v0
gcast500k_p4_b1_d99_v0
gcast16m_p4_b1_d565_v2
To evaluate a successfully trained model, run
python scripts/evaluate.py -c outputs/unet_hpx_example
The model.name must be provided as -c argument. The evaluation script will compute the ACC and RMSE metrics and write them to outputs/unet_hpx_example/evaluation/
, create an RMSE-over-leadtime plot called rmse_plot.pdf
in the .
directory, and write a video to the outputs/unet_hpx_example/evaluation/videos/
directory.
More plots in line with the paper can be generated with the plot_results.py script, which requires the installation of the cartopy
package.
Note
To generate videos, the ffmpeg
package will be required.
Persistence
and Climatology
baselines can be generated using the build_baselines.py script. This requires, however, another deep learning model trained and evaluated before, as the script follows the selected variables and resolution in the respective forecast.