From 6819a0f19796ca9c34a079855339b37134b8d930 Mon Sep 17 00:00:00 2001 From: Andrew El-Kadi Date: Tue, 17 Dec 2024 17:11:48 +0000 Subject: [PATCH] Correct Operational GenCast name in README and some minor formatting. --- README.md | 2 +- docs/cloud_vm_setup.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e5b0fc6..6e1bf75 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,7 @@ and later years. This model was described in the paper `GenCast: Diffusion-based ensemble forecasting for medium-range weather` (https://arxiv.org/abs/2312.15796) -2. `GenCast 0p25deg Operational <2019`, GenCast model at 0.25deg resolution, with 13 pressure levels and a 6 +2. `GenCast 0p25deg Operational <2022`, GenCast model at 0.25deg resolution, with 13 pressure levels and a 6 times refined icosahedral mesh. This model is trained on ERA5 data from 1979 to 2018, and fine-tuned on HRES-fc0 data from 2016 to 2021 and can be causally evaluated on 2022 and later years. diff --git a/docs/cloud_vm_setup.md b/docs/cloud_vm_setup.md index b4a7544..92125eb 100644 --- a/docs/cloud_vm_setup.md +++ b/docs/cloud_vm_setup.md @@ -84,13 +84,13 @@ This document describes how to run `gencast_demo_cloud_vm.ipynb` through [Colabo - There are two possible sources of this discrepancy. The first is the fact that the `splash` and `triblockdiag_mha` attention implementations are not exactly numerically equivalent (despite being algebraically equivalent). We have tested the isolated impact of these numerical differences by comparing performance with each attention implementation, both running on TPU. This comparison (scorecard [here](https://github.com/google-deepmind/graphcast/blob/main/docs/GenCast_0p25deg_attention_implementation_scorecard.png)) shows that there is very little difference caused by numerical differences between attention implementations. This implies that the minor degradation is caused primarily by running on GPU instead of TPU, and our initial investigations suggest that the root cause is the difference in the default precision of matmul operations on GPU compared to TPU. -** Memory requirement comparison vs. TPU ** +**Memory requirement comparison vs. TPU** - `triblockdiag_mha` also requires more memory, as such running inference on GPU requires: - 0.25deg GenCast: ~300GB of System Memory and ~60GB of vRAM - 1deg GenCast: ~24GB of System Memory and ~16GB vRAM -** Inference time comparison vs. TPU ** +**Inference time comparison vs. TPU** - We have observed that running inference on H100 is slower than expected. Specifically we saw that a 30-step rollout of 0.25deg GenCast takes ~8min on TPUv5 with `splash_attention` (once compiled) whereas it takes ~25min on GPU with `triblockdiag_mha` attention. - Part of this runtime discrepancy is caused by the fact that using `triblockdiag_mha` attention makes inference ~2x slower, such that running on TPU with `triblockdiag_mha` takes about ~15min, compared to the ~8min using `splash_attention`. However, there remains a discrepancy between the ~15min on a TPU and ~25min on GPU when using `triblockdiag_mha`.