Video Prediction Benchmarks

We provide benchmark results of spatiotemporal prediction learning (STL) methods on various video prediction datasets. More STL methods will be supported in the future. Issues and PRs are welcome!

Currently supported spatiotemporal prediction methods

ConvLSTM (NeurIPS'2015)
PredRNN (NeurIPS'2017)
PredRNN++ (ICML'2018)
E3D-LSTM (ICLR'2018)
MIM (CVPR'2019)
CrevNet (ICLR'2020)
PhyDNet (CVPR'2020)
MAU (NeurIPS'2021)
PredRNN.V2 (TPAMI'2022)
SimVP (CVPR'2022)
SimVP.V2 (ArXiv'2022)
TAU (CVPR'2023)

Currently supported MetaFormer models for SimVP

ViT (ICLR'2021)
Swin-Transformer (ICCV'2021)
MLP-Mixer (NeurIPS'2021)
ConvMixer (Openreview'2021)
UniFormer (ICLR'2022)
PoolFormer (CVPR'2022)
ConvNeXt (CVPR'2022)
VAN (ArXiv'2022)
IncepU (SimVP.V1) (CVPR'2022)
gSTA (SimVP.V2) (ArXiv'2022)
HorNet (NeurIPS'2022)
MogaNet (ArXiv'2022)

Moving MNIST Benchmarks

We provide benchmark results on the popular Moving MNIST dataset using $10\rightarrow 10$ frames prediction setting following PredRNN. Metrics (MSE, MAE, SSIM, pSNR) of the final models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. All methods are trained by Adam optimizer with Onecycle scheduler and single GPU.

STL Benchmarks on MMNIST

For a fair comparison of different methods, we report final results when models are trained to convergence. We provide config files in configs/mmnist.

Method	Params	FLOPs	FPS	MSE	MAE	SSIM	Download
ConvLSTM-S	15.0M	56.8G	113	46.26	142.18	0.878	model \| log
ConvLSTM-L	33.8M	127.0G	50	29.88	95.05	0.925	model \| log
PhyDNet	3.1M	15.3G	182	35.68	96.70	0.917	model \| log
PredRNN	23.8M	116.0G	54	25.04	76.26	0.944	model \| log
PredRNN++	38.6M	171.7G	38	22.45	69.70	0.950	model \| log
MIM	38.0M	179.2G	37	23.66	74.37	0.946	model \| log
E3D-LSTM	51.0M	298.9G	18	36.19	78.64	0.932	model \| log
CrevNet	5.0M	270.7G	10	30.15	86.28	0.935	model \| log
PredRNN.V2	23.9M	116.6G	52	27.73	82.17	0.937	model \| log
SimVP+IncepU	58.0M	19.4G	209	26.69	77.19	0.940	model \| log
SimVP+gSTA-S	46.8M	16.5G	282	15.05	49.80	0.967	model \| log

Benchmark of MetaFormers Based on SimVP

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with training times of 200-epoch and 2000-epoch. We provide config file in configs/mmnist/simvp.

MetaFormer	Setting	Params	FLOPs	FPS	MSE	MAE	SSIM	PSNR	Download
IncepU (SimVPv1)	200 epoch	58.0M	19.4G	209	32.15	89.05	0.9268	37.97	model \| log
gSTA (SimVPv2)	200 epoch	46.8M	16.5G	282	26.69	77.19	0.9402	38.3	model \| log
ViT	200 epoch	46.1M	16.9.G	290	35.15	95.87	0.9139	37.79	model \| log
Swin Transformer	200 epoch	46.1M	16.4G	294	29.70	84.05	0.9331	38.14	model \| log
Uniformer	200 epoch	44.8M	16.5G	296	30.38	85.87	0.9308	38.11	model \| log
MLP-Mixer	200 epoch	38.2M	14.7G	334	29.52	83.36	0.9338	38.19	model \| log
ConvMixer	200 epoch	3.9M	5.5G	658	32.09	88.93	0.9259	37.97	model \| log
Poolformer	200 epoch	37.1M	14.1G	341	31.79	88.48	0.9271	38.06	model \| log
ConvNeXt	200 epoch	37.3M	14.1G	344	26.94	77.23	0.9397	38.34	model \| log
VAN	200 epoch	44.5M	16.0G	288	26.10	76.11	0.9417	38.39	model \| log
HorNet	200 epoch	45.7M	16.3G	287	29.64	83.26	0.9331	38.16	model \| log
MogaNet	200 epoch	46.8M	16.5G	255	25.57	75.19	0.9429	38.41	model \| log
IncepU (SimVPv1)	2000 epoch	58.0M	19.4G	209	21.15	64.15	0.9536	38.81	model \| log
gSTA (SimVPv2)	2000 epoch	46.8M	16.5G	282	15.05	49.80	0.9670	-	model \| log
ViT	2000 epoch	46.1M	16.9.G	290	19.74	61.65	0.9539	38.96	model \| log
Swin Transformer	2000 epoch	46.1M	16.4G	294	19.11	59.84	0.9584	39.03	model \| log
Uniformer	2000 epoch	44.8M	16.5G	296	18.01	57.52	0.9609	39.11	model \| log
MLP-Mixer	2000 epoch	38.2M	14.7G	334	18.85	59.86	0.9589	38.98	model \| log
ConvMixer	2000 epoch	3.9M	5.5G	658	22.30	67.37	0.9507	38.67	model \| log
Poolformer	2000 epoch	37.1M	14.1G	341	20.96	64.31	0.9539	38.86	model \| log
ConvNeXt	2000 epoch	37.3M	14.1G	344	17.58	55.76	0.9617	39.19	model \| log
VAN	2000 epoch	44.5M	16.0G	288	16.21	53.57	0.9646	39.26	model \| log
HorNet	2000 epoch	45.7M	16.3G	287	17.40	55.70	0.9624	39.19	model \| log
MogaNet	2000 epoch	46.8M	16.5G	255	15.67	51.84	0.9661	39.35	model \| log

(back to top)

KittiCaltech Benchmarks

We provide benchmark results on KittiCaltech Pedestrian dataset using $10\rightarrow 1$ frames prediction setting following PredNet. Metrics (MSE, MAE, SSIM, pSNR) of the final models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. The default training setup is trained 100 epochs by Adam optimizer with Onecycle scheduler on single GPU, while some computational consuming methods (denoted by *) using 4GPUs.

STL Benchmarks on KittiCaltech

For a fair comparison of different methods, we report final results when models are trained to convergence. We provide config files in configs/kitticaltech.

Method	Setting	Params	FLOPs	FPS	MSE	MAE	SSIM	PSNR	Download
ConvLSTM-S	100 epoch	15.0M	595.0G	33	139.6	1583.3	0.9345	32.82	model \| log
E3D-LSTM*	100 epoch	54.9M	1004G	10	203.7	1929.7	0.9062	32.04	model \| log
MAU	100 epoch	24.3M	172.0G	16	177.8	1800.4	0.9176	32.24	model \| log
MIM	100 epoch	49.2M	1858G	39	127.3	1461.1	0.9410	33.26	model \| log
PredRNN	100 epoch	23.7M	1216G	17	130.4	1525.5	0.9374	33.01	model \| log
PredRNN++	100 epoch	38.5M	1803G	12	125.5	1453.2	0.9433	33.27	model \| log
PredRNN.V2	100 epoch	23.8M	1223G	52	147.8	1610.5	0.9330	32.67	model \| log
SimVP+IncepU	100 epoch	8.6M	60.6G	57	160.2	1690.8	0.9338	32.48	model \| log
SimVP+gSTA-S	100 epoch	15.6M	96.3G	40	129.7	1507.7	0.9454	33.05	model \| log

Benchmark of MetaFormers Based on SimVP

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with 100-epoch training. We provide config file in configs/kitticaltech/simvp.

MetaFormer	Setting	Params	FLOPs	FPS	MSE	MAE	SSIM	PSNR	Download
IncepU (SimVPv1)	100 epoch	8.6M	60.6G	57	160.2	1690.8	0.9338	32.48	model \| log
gSTA (SimVPv2)	100 epoch	15.6M	96.3G	40	129.7	1507.7	0.9454	33.05	model \| log
ViT*	100 epoch	12.7M	155.0G	25	146.4	1615.8	0.9379	32.58	model \| log
Swin Transformer	100 epoch	15.3M	95.2G	49	155.2	1588.9	0.9299	32.98	model \| log
Uniformer*	100 epoch	11.8M	104.0G	28	135.9	1534.2	0.9393	32.94	model \| log
MLP-Mixer	100 epoch	22.2M	83.5G	60	207.9	1835.9	0.9133	32.37	model \| log
ConvMixer	100 epoch	1.5M	23.1G	129	174.7	1854.3	0.9232	31.88	model \| log
Poolformer	100 epoch	12.4M	79.8G	51	153.4	1613.5	0.9334	32.79	model \| log
ConvNeXt	100 epoch	12.5M	80.2G	54	146.8	1630.0	0.9336	32.58	model \| log
VAN	100 epoch	14.9M	92.5G	41	132.1	1501.5	0.9437	33.10	model \| log
HorNet	100 epoch	15.3M	94.4G	43	152.8	1637.9	0.9365	32.70	model \| log
MogaNet	100 epoch	15.6M	96.2G	36	131.4	1512.1	0.9442	32.93	model \| log

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video_benchmarks.md

video_benchmarks.md

Video Prediction Benchmarks

Moving MNIST Benchmarks

STL Benchmarks on MMNIST

Benchmark of MetaFormers Based on SimVP

KittiCaltech Benchmarks

STL Benchmarks on KittiCaltech

Benchmark of MetaFormers Based on SimVP

Files

video_benchmarks.md

Latest commit

History

video_benchmarks.md

File metadata and controls

Video Prediction Benchmarks

Moving MNIST Benchmarks

STL Benchmarks on MMNIST

Benchmark of MetaFormers Based on SimVP

KittiCaltech Benchmarks

STL Benchmarks on KittiCaltech

Benchmark of MetaFormers Based on SimVP