Skip to content

Latest commit

 

History

History

video_classification

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Video classification with UniFormer

We currenent release the code and models for:

  • Kintics-400
  • Kinetics-600
  • Something-Something V1
  • Something-Something V2

Update

05/21/2022

Lightweight models are released, which surpass X3D and MoViNet.

01/13/2022

Pretrained models on Kinetics-400, Kinetics-600, Something-Something V1&V2 d

Model Zoo

The followed models and logs can be downloaded on Google Drive: total_models, total_logs.

We also release the models on Baidu Cloud: total_models (gphp), total_logs (q5bw).

Note

  • All the config.yaml in our exp are NOT the training config actually used, since some hyperparameters are changed in the run.sh or test.sh.
  • All the models are pretrained on ImageNet-1K without Token Labeling and Layer Scale. You can find those pre-trained models in image_classification. Reason can be found in issue #12.
  • #Frame = #input_frame x #crop x #clip
  • #input_frame means how many frames are input for model per inference
  • #crop means spatial crops (e.g., 3 for left/right/center)
  • #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)

Kinetics-400

Model #Frame Resolution FLOPs Top1 Model Log Shell
UniFormer-XXS 4x1x1 128 1.0G 63.2 google google run.sh/config
UniFormer-XXS 4x1x1 160 1.6G 65.8 google google run.sh/config
UniFormer-XXS 8x1x1 128 2.0G 68.3 google google run.sh/config
UniFormer-XXS 8x1x1 160 3.3G 71.4 google google run.sh/config
UniFormer-XXS 16x1x1 128 4.2G 73.3 google google run.sh/config
UniFormer-XXS 16x1x1 160 6.9G 75.1 google google run.sh/config
UniFormer-XXS 32x1x1 160 15.4G 77.9 google google run.sh/config
UniFormer-XS 32x1x1 192 34.2G 78.6 google google run.sh/config

We adopt sparse sampling method for lightweight models. And to avoid loss NAN , we use the following techiniques:

  • Close mixed precision training.
  • Use weaker data augmentation.
  • Add Layer Scale.
Model #Frame Sampling Stride FLOPs Top1 Model Log Shell
UniFormer-S 8x1x4 8 70G 78.4 google google run.sh/config
UniFormer-S 16x1x4 4 167G 80.8 google google run.sh/config
UniFormer-S 16x1x4 8 167G 80.8 google google run.sh/config
UniFormer-S 32x1x4 4 438G 82.0 - google run.sh/config
UniFormer-B 8x1x4 8 161G 79.8 google google run.sh/config
UniFormer-B 16x1x4 4 387G 82.0 google google run.sh/config
UniFormer-B 16x1x4 8 387G 81.7 google google run.sh/config
UniFormer-B 32x1x4 4 1036G 82.9 google google run.sh/config

Kinetics-600

Model #Frame Sampling Stride FLOPs Top1 Model Log Shell
UniFormer-S 16x1x4 4 167G 82.8 google google run.sh/config
UniFormer-S 16x1x4 8 167G 82.7 google google run.sh/config
UniFormer-B 16x1x4 4 387G 84.0 google google run.sh/config
UniFormer-B 16x1x4 8 387G 83.4 google google run.sh/config
UniFormer-B 32x1x4 4 1036G 84.5* google google run.sh/config

* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.

For Multi-node training, please install submitit or follow the training scripts in our UniFormerV2.

Something-Something V1

Model Pretrain #Frame FLOPs Top1 Model Log Shell
UniFormer-S K400 16x3x1 125G 57.2 google google run.sh/config
UniFormer-S K600 16x3x1 125G 57.6 google google run.sh/config
UniFormer-S K400 32x3x1 329G 58.8 google google run.sh/config
UniFormer-S K600 32x3x1 329G 59.9 google google run.sh/config
UniFormer-B K400 16x3x1 290G 59.1 google google run.sh/config
UniFormer-B K600 16x3x1 290G 58.8 google google run.sh/config
UniFormer-B K400 32x3x1 777G 60.9 google google run.sh/config
UniFormer-B K600 32x3x1 777G 61.0 google google run.sh/config

Something-Something V2

Model Pretrain #Frame FLOPs Top1 Model Log Shell
UniFormer-S K400 16x3x1 125G 67.7 google google run.sh/config
UniFormer-S K600 16x3x1 125G 69.4 google google run.sh/config
UniFormer-S K400 32x3x1 329G 69.0 google google run.sh/config
UniFormer-S K600 32x3x1 329G 70.4 google google run.sh/config
UniFormer-B K400 16x3x1 290G 70.4 google google run.sh/config
UniFormer-B K600 16x3x1 290G 70.2 google google run.sh/config
UniFormer-B K400 32x3x1 777G 71.1 google google run.sh/config
UniFormer-B K600 32x3x1 777G 71.2 google google run.sh/config

UCF101

Model #Frame Sampling Stride FLOPs Top1 Model Log Shell
UniFormer-S 16x3x5 4 625G 98.3 google google run.sh/config

HMDB51

Model #Frame Sampling Stride FLOPs Top1 Model Log Shell
UniFormer-S 16x3x5 4 625G 77.5 google google run.sh/config

Usage

Installation

Please follow the installation instructions in INSTALL.md. You may follow the instructions in DATASET.md to prepare the datasets.

Training

  1. Download the pretrained models in our repository.

  2. Simply run the training scripts in exp as followed:

    bash ./exp/uniformer_s8x8_k400/run.sh

[Note]:

  • Due to some bugs in the SlowFast repository, the program will be terminated in the final testing.

  • During training, we follow the SlowFast repository and randomly crop videos for validation. For accurate testing, please follow our testing scripts.

  • For more config details, you can read the comments in slowfast/config/defaults.py.

  • To avoid out of memory, you can use torch.utils.checkpoint (in config.yaml or run.sh):

    MODEL.USE_CHECKPOINT True # whether use checkpoint
    MODEL.CHECKPOINT_NUM [0, 0, 4, 0] # index for using checkpoint in every stage

Testing

We provide testing example as followed:

bash ./exp/uniformer_s8x8_k400/test.sh

Specifically, we need to create our new config for testing and run multi-crop/multi-clip test:

  1. Copy the training config file config.yaml and create new testing config test.yaml.

  2. Change the hyperparameters of data (in test.yaml or test.sh):

    DATA:
      TRAIN_JITTER_SCALES: [224, 224]
      TEST_CROP_SIZE: 224
  3. Set the number of crops and clips (in test.yaml or test.sh):

    Multi-clip testing for Kinetics

    TEST.NUM_ENSEMBLE_VIEWS 4
    TEST.NUM_SPATIAL_CROPS 1

    Multi-crop testing for Something-Something

    TEST.NUM_ENSEMBLE_VIEWS 1
    TEST.NUM_SPATIAL_CROPS 3
  4. You can also set the checkpoint path via:

    TEST.CHECKPOINT_FILE_PATH your_model_path

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformer,
      title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, 
      author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
      year={2022},
      eprint={2201.04676},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

This repository is built based on SlowFast repository.