DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Computer Vision Group, RWTH Aachen University

Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe

[Paper] [ArXiv] [Project-Page] [BibTeX)]

Interactive Segmentation Demo

Pick a model and its config file from model checkpoints, for example, configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml.
We provide demo.py that is able to demo builtin configs. Run it with:

python demo.py --config-file configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml \
  --model-weights /path/to/checkpoint_file

The configs are made for training, therefore we need to specify 'model-weights' to a model from model zoo for evaluation. This command will open an OpenCV window where you can select any image and perform interactive segementation on it.

Interactive segmentation options

Clicks management

add instance button to add a new instance; a button for the new instance would be created with the same color as the color of the instance mask.
bg clicks button to add background clicks.
reset clicks button to remove all clicks and instances.

Visualisation parameters

show masks only button to visualize only the masks without point clicks.
Alpha blending coefficient slider adjusts the intensity of all predicted masks.
Visualisation click radius slider adjusts the size of red and green dots depicting clicks.

Model Checkpoints

We provide pretrained models with different backbones for interactive segmentation.

You can find the model weights and evaluation results in the tables below. Although we provide hyperlinks against the respective table entries, all models are trained in the multi-instance setting, and are applicable for both single and multi-instance settings.

Multi-instance Interactive Segmentation
Model	Strategy	COCO				SBD				DAVIS
Model	Strategy	NCI 85%	NFO 85%	NFI 85%	mIoU 85%	NCI 85%	NFO 85%	NFI 85%	mIoU 85%	NCI 85%	NFO 85%	NFI 85%	mIoU 85%
Segformer-B0	best	6.13	15219	2485	81.3	2.83	655	342	90.2	3.29	546	364	87.5
	random	6.04	12986	2431	84.9	2.76	528	313	90.6	3.27	549	356	87.9
	worst	6.02	19758	2414	83.0	2.75	842	315	90.3	3.25	707	354	87.1
Swin-Large	best	5.80	13876	2305	82.4	2.47	497	266	90.7	3.06	483	330	88.4
	random	5.70	11958	2242	85.3	2.42	428	249	91.0	3.03	479	320	88.8
	worst	5.66	18133	2242	83.7	2.41	671	251	90.8	2.99	620	314	88.1

Single-instance Interactive Segmentation
Model	GrabCut		Berkeley		SBD		DAVIS		Pascal VOC		COCO MVal
Model	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%
Resent50	1.62	1.82	1.47	2.19	3.93	6.56	4.10	5.45	2.13	2.51	2.36	3.20
HRNet32	1.62	1.68	1.46	2.04	3.83	6.35	3.83	5.20	2.07	2.43	2.35	3.14
Segformer-B0	1.58	1.68	1.61	2.06	3.89	6.48	3.85	5.08	2.04	2.40	2.47	3.28
Swin-Tiny	1.64	1.78	1.39	1.96	3.75	6.32	3.87	5.23	1.94	2.27	2.24	3.14
Swin-Large	1.62	1.72	1.39	1.90	3.32	5.64	3.80	5.09	1.83	2.12	2.19	2.88

Installation

See Installation Instructions.

Datasets

See Preparing Datasets for DynaMITe.

Getting Started

See Training and Evaluation.

Reproducibility

We train all the released checkpoints using a fixed seed, mentioned in the corresponding config files for each backbone. We use 16 GPUs with batch size of 32 and initial global learning rate of 0.0001 for training. Each GPU is an NVIDIA A100 Tensor Core GPU with 40 GB. The evaluation is also done on the same GPUs.
Note: different machines will exhibit distinct hardware and software stacks, potentially resulting in minute variations in the outcomes of floating-point operations.

We train the Swin-Tiny model 3 times with different seeds during training and observe the variance in evaluation metrics as follows:

Multi-instance Interactive Segmentation
Model	Best Strategy	COCO				SBD				DAVIS
Model	Best Strategy	NCI 85%	NFO 85%	NFI 85%	mIoU 85%	NCI 85%	NFO 85%	NFI 85%	mIoU 85%	NCI 85%	NFO 85%	NFI 85%	mIoU 85%
Swin-Tiny	mean	6.05	14845	2453	82.0	2.71	616	328	90.0	3.16	499	344	88.0
Swin-Tiny	std	0.006	56	6	0.0	0.006	5	2	0.0	0.023	8	5	0.0

Single-instance Interactive Segmentation
Model Swin-Tiny	GrabCut		Berkeley		SBD		DAVIS		Pascal VOC		COCO MVal
Model Swin-Tiny	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%	NoC 85%	NoC 90%
mean	1.49	1.59	1.37	2.00	3.72	6.26	3.79	5.08	1.95	2.27	2.22	3.08
std	0.05	0.08	0.04	0.11	0.04	0.01	0.10	0.10	0.03	0.02	0.08	0.09

License

Shield:

The majority of DynaMITe is licensed under a MIT License.

Some codebase is inspired from Mask2Former which is majorly licensed under MIT license along with additional licenses mentioned in Mask2Former and interactive demo tool is adapted from RITM which is also licensed under MIT License.

Citing DynaMITe

If you use our codebase then please cite the papers mentioned below.

@inproceedings{RanaMahadevan23Arxiv,
      title={DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer},
      author={Rana, Amit and Mahadevan, Sabarinath and Hermans, Alexander and Leibe, Bastian},
      booktitle={ICCV},
      year={2023}
}

@inproceedings{RanaMahadevan23cvprw,
      title={Clicks as Queries: Interactive Transformer for Multi-instance Segmentation},
      author={Rana, Amit and Mahadevan, Sabarinath and Alexander Hermans and Leibe, Bastian},
      booktitle={CVPRW},
      year={2023}
}

Acknowledgement

The main codebase is built on top of detectron2 framework and is inspired from Mask2Fromer.

The interactive segementation demo tool is modified from RITM.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
configs/coco_lvis		configs/coco_lvis
dynamite		dynamite
interactive_demo		interactive_demo
.gitgnore		.gitgnore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Interactive Segmentation Demo

Model Checkpoints

Installation

Datasets

Getting Started

Reproducibility

License

Citing DynaMITe

Acknowledgement

About

Releases

Packages

Languages

License

amitrana001/DynaMITe

Folders and files

Latest commit

History

Repository files navigation

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Interactive Segmentation Demo

Model Checkpoints

Installation

Datasets

Getting Started

Reproducibility

License

Citing DynaMITe

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages