The pretraining is implemented with the bevfusion framework. We use the LiDAR-only model in bevfusion (69.28 NDS) for pretraining.
We provide pretrained models that have been trained on the NuScenes dataset. These models can be used as a starting point for your own tasks or fine-tuning.
Config | Epoch | Download |
---|---|---|
Swin-Base | 50 | Model |
Swin-Large | 50 | Model |
The code is built with following libraries:
- Python >= 3.8, <3.9
- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
- Pillow = 8.4.0 (see here)
- PyTorch >= 1.9, <= 1.10.2
- tqdm
- torchpack
- mmcv = 1.4.0
- mmdetection = 2.20.0
- nuscenes-dev-kit
After installing these dependencies, please run this command to install the codebase:
python setup.py develop
Please follow the instructions from here to download and preprocess the nuScenes dataset. After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
To set up the LiDAR model for pretraining, follow these steps:
Download the LiDAR model weights from bevfusion.
Download the Swin-Base/Swin-Large weights from MixMAE.
We provide instructions to reproduce our results on nuScenes. You can use pytorch or slurm for distributed training.
For example, the Swin-Base model can be pretrained with:
sh run_pretrain.sh partition 8 config/pretrain_base_50ep.yaml runs/pretrain/pretrain_base_50ep
The large model can be pretrained as the same.
The pretraining code is based on bevfusion.