Skip to content

Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"

License

Notifications You must be signed in to change notification settings

amitrana001/DynaMITe

Repository files navigation

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Computer Vision Group, RWTH Aachen University

Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe

[Paper] [ArXiv] [Project-Page] [BibTeX)]


Interactive Segmentation Demo


  1. Pick a model and its config file from model checkpoints, for example, configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml.
  2. We provide demo.py that is able to demo builtin configs. Run it with:
python demo.py --config-file configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml \
  --model-weights /path/to/checkpoint_file

The configs are made for training, therefore we need to specify 'model-weights' to a model from model zoo for evaluation. This command will open an OpenCV window where you can select any image and perform interactive segementation on it.

Interactive segmentation options
  • Clicks management
    • add instance button to add a new instance; a button for the new instance would be created with the same color as the color of the instance mask.
    • bg clicks button to add background clicks.
    • reset clicks button to remove all clicks and instances.
  • Visualisation parameters
    • show masks only button to visualize only the masks without point clicks.
    • Alpha blending coefficient slider adjusts the intensity of all predicted masks.
    • Visualisation click radius slider adjusts the size of red and green dots depicting clicks.

Model Checkpoints

We provide pretrained models with different backbones for interactive segmentation.

You can find the model weights and evaluation results in the tables below. Although we provide hyperlinks against the respective table entries, all models are trained in the multi-instance setting, and are applicable for both single and multi-instance settings.

Multi-instance Interactive Segmentation
Model Strategy COCO SBD DAVIS
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
Segformer-B0 best 6.13 15219 2485 81.3 2.83 655 342 90.2 3.29 546 364 87.5
random 6.04 12986 2431 84.9 2.76 528 313 90.6 3.27 549 356 87.9
worst 6.02 19758 2414 83.0 2.75 842 315 90.3 3.25 707 354 87.1
Swin-Large best 5.80 13876 2305 82.4 2.47 497 266 90.7 3.06 483 330 88.4
random 5.70 11958 2242 85.3 2.42 428 249 91.0 3.03 479 320 88.8
worst 5.66 18133 2242 83.7 2.41 671 251 90.8 2.99 620 314 88.1
Single-instance Interactive Segmentation
Model GrabCut Berkeley SBD DAVIS Pascal
VOC
COCO
MVal
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
Resent50 1.62 1.82 1.47 2.19 3.93 6.56 4.10 5.45 2.13 2.51 2.36 3.20
HRNet32 1.62 1.68 1.46 2.04 3.83 6.35 3.83 5.20 2.07 2.43 2.35 3.14
Segformer-B0 1.58 1.68 1.61 2.06 3.89 6.48 3.85 5.08 2.04 2.40 2.47 3.28
Swin-Tiny 1.64 1.78 1.39 1.96 3.75 6.32 3.87 5.23 1.94 2.27 2.24 3.14
Swin-Large 1.62 1.72 1.39 1.90 3.32 5.64 3.80 5.09 1.83 2.12 2.19 2.88

Installation

See Installation Instructions.

Datasets

See Preparing Datasets for DynaMITe.

Getting Started

See Training and Evaluation.

Reproducibility

We train all the released checkpoints using a fixed seed, mentioned in the corresponding config files for each backbone. We use 16 GPUs with batch size of 32 and initial global learning rate of 0.0001 for training. Each GPU is an NVIDIA A100 Tensor Core GPU with 40 GB. The evaluation is also done on the same GPUs.
Note: different machines will exhibit distinct hardware and software stacks, potentially resulting in minute variations in the outcomes of floating-point operations.

We train the Swin-Tiny model 3 times with different seeds during training and observe the variance in evaluation metrics as follows:

Multi-instance Interactive Segmentation
Model Best
Strategy
COCO SBD DAVIS
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
NCI
85%
NFO
85%
NFI
85%
mIoU
85%
Swin-Tiny mean 6.05 14845 2453 82.0 2.71 616 328 90.0 3.16 499 344 88.0
std 0.006 56 6 0.0 0.006 5 2 0.0 0.023 8 5 0.0
Single-instance Interactive Segmentation
Model
Swin-Tiny
GrabCut Berkeley SBD DAVIS Pascal
VOC
COCO
MVal
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
NoC
85%
NoC
90%
mean 1.49 1.59 1.37 2.00 3.72 6.26 3.79 5.08 1.95 2.27 2.22 3.08
std 0.05 0.08 0.04 0.11 0.04 0.01 0.10 0.10 0.03 0.02 0.08 0.09

License

Shield: License: MIT

The majority of DynaMITe is licensed under a MIT License.

Some codebase is inspired from Mask2Former which is majorly licensed under MIT license along with additional licenses mentioned in Mask2Former and interactive demo tool is adapted from RITM which is also licensed under MIT License.

Citing DynaMITe

If you use our codebase then please cite the papers mentioned below.

@inproceedings{RanaMahadevan23Arxiv,
      title={DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer},
      author={Rana, Amit and Mahadevan, Sabarinath and Hermans, Alexander and Leibe, Bastian},
      booktitle={ICCV},
      year={2023}
}

@inproceedings{RanaMahadevan23cvprw,
      title={Clicks as Queries: Interactive Transformer for Multi-instance Segmentation},
      author={Rana, Amit and Mahadevan, Sabarinath and Alexander Hermans and Leibe, Bastian},
      booktitle={CVPRW},
      year={2023}
}

Acknowledgement

The main codebase is built on top of detectron2 framework and is inspired from Mask2Fromer.

The interactive segementation demo tool is modified from RITM.

About

Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published