Skip to content

Latest commit

 

History

History
75 lines (50 loc) · 2.58 KB

README.md

File metadata and controls

75 lines (50 loc) · 2.58 KB

Introduction

This is an unofficial replication of "Pix2seq: A Language Modeling Framework for Object Detection" with pretrained model on mmdetection.

License

This project is released under the Apache 2.0 license.

Installation

Please refer to get_started.md for installation.

Train & Evaluation

Train by running (about 10 days with 8*V100 32GB)

python -m torch.distributed.launch --nproc_per_node=8 --master_port=5003 \
  tools/train.py configs/pix2seq/pix2seq_r50_8x4_50e_coco.py --work-dir pix2seq-output --gpus 8 --launcher pytorch

or

Download pretrained pix2seq weights.

Evaluate with single gpu:

python tools/test.py configs/pix2seq/pix2seq_r50_8x4_300_coco.py \
  weights/checkpoints.pth --work-dir pix2seq-output --eval bbox --show-dir pix2seq-vis

Evaluate with 8 gpus:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=5003 \
  tools/test.py configs/pix2seq/pix2seq_r50_8x4_300_coco.py weights/checkpoints.pth \
  --work-dir pix2seq-output --eval bbox --launcher pytorch
Method backbone Epoch Batch Size AP AP50 AP75 Weights
Ours R50 300 32 36.4 52.8 38.5 model
Paper R50 300 128 43.0 61.0 45.6 -

Visualization

TO-DO

  • random shuffle targets
  • training from scratch
  • drop class token
  • stochastic depth
  • large scale jittering
  • support for custom dataset
  • two independent augmentations for each image
  • FrozenBatchNorm2d in backbones
  • auto-argument
  • nucleus sampling

Acknowledgement

https://github.com/gaopengcuhk/Pretrained-Pix2Seq

https://github.com/open-mmlab/mmdetection