Computer Vision Group, RWTH Aachen University
Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe
[Paper
] [ArXiv
] [Project-Page
] [BibTeX
)]
- Pick a model and its config file from
model checkpoints,
for example,
configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml
. - We provide
demo.py
that is able to demo builtin configs. Run it with:
python demo.py --config-file configs/coco_lvis/swin/dynamite_swin_tiny_bs32_ep50.yaml \
--model-weights /path/to/checkpoint_file
The configs are made for training, therefore we need to specify 'model-weights' to a model from model zoo for evaluation. This command will open an OpenCV window where you can select any image and perform interactive segementation on it.
Interactive segmentation options
- Clicks management
- add instance button to add a new instance; a button for the new instance would be created with the same color as the color of the instance mask.
- bg clicks button to add background clicks.
- reset clicks button to remove all clicks and instances.
- Visualisation parameters
- show masks only button to visualize only the masks without point clicks.
- Alpha blending coefficient slider adjusts the intensity of all predicted masks.
- Visualisation click radius slider adjusts the size of red and green dots depicting clicks.
We provide pretrained models with different backbones for interactive segmentation.
You can find the model weights and evaluation results in the tables below. Although we provide hyperlinks against the respective table entries, all models are trained in the multi-instance setting, and are applicable for both single and multi-instance settings.
Multi-instance Interactive Segmentation | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Strategy | COCO | SBD | DAVIS | |||||||||
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
||
Segformer-B0 | best | 6.13 | 15219 | 2485 | 81.3 | 2.83 | 655 | 342 | 90.2 | 3.29 | 546 | 364 | 87.5 |
random | 6.04 | 12986 | 2431 | 84.9 | 2.76 | 528 | 313 | 90.6 | 3.27 | 549 | 356 | 87.9 | |
worst | 6.02 | 19758 | 2414 | 83.0 | 2.75 | 842 | 315 | 90.3 | 3.25 | 707 | 354 | 87.1 | |
Swin-Large | best | 5.80 | 13876 | 2305 | 82.4 | 2.47 | 497 | 266 | 90.7 | 3.06 | 483 | 330 | 88.4 |
random | 5.70 | 11958 | 2242 | 85.3 | 2.42 | 428 | 249 | 91.0 | 3.03 | 479 | 320 | 88.8 | |
worst | 5.66 | 18133 | 2242 | 83.7 | 2.41 | 671 | 251 | 90.8 | 2.99 | 620 | 314 | 88.1 |
Single-instance Interactive Segmentation | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | GrabCut | Berkeley | SBD | DAVIS | Pascal VOC |
COCO MVal |
||||||
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
|
Resent50 | 1.62 | 1.82 | 1.47 | 2.19 | 3.93 | 6.56 | 4.10 | 5.45 | 2.13 | 2.51 | 2.36 | 3.20 |
HRNet32 | 1.62 | 1.68 | 1.46 | 2.04 | 3.83 | 6.35 | 3.83 | 5.20 | 2.07 | 2.43 | 2.35 | 3.14 |
Segformer-B0 | 1.58 | 1.68 | 1.61 | 2.06 | 3.89 | 6.48 | 3.85 | 5.08 | 2.04 | 2.40 | 2.47 | 3.28 |
Swin-Tiny | 1.64 | 1.78 | 1.39 | 1.96 | 3.75 | 6.32 | 3.87 | 5.23 | 1.94 | 2.27 | 2.24 | 3.14 |
Swin-Large | 1.62 | 1.72 | 1.39 | 1.90 | 3.32 | 5.64 | 3.80 | 5.09 | 1.83 | 2.12 | 2.19 | 2.88 |
See Installation Instructions.
See Preparing Datasets for DynaMITe.
We train all the released checkpoints using a fixed seed, mentioned in the corresponding config files for each backbone. We use 16 GPUs with batch size of 32 and initial global learning rate of 0.0001 for training. Each GPU is an NVIDIA A100 Tensor Core GPU with 40 GB. The evaluation is also done on the same GPUs.
Note: different machines will exhibit distinct hardware and software stacks, potentially resulting in minute variations in the outcomes of floating-point operations.
We train the Swin-Tiny model 3 times with different seeds during training and observe the variance in evaluation metrics as follows:
Multi-instance Interactive Segmentation | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Best Strategy |
COCO | SBD | DAVIS | |||||||||
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
NCI 85% |
NFO 85% |
NFI 85% |
mIoU 85% |
||
Swin-Tiny | mean | 6.05 | 14845 | 2453 | 82.0 | 2.71 | 616 | 328 | 90.0 | 3.16 | 499 | 344 | 88.0 |
std | 0.006 | 56 | 6 | 0.0 | 0.006 | 5 | 2 | 0.0 | 0.023 | 8 | 5 | 0.0 |
Single-instance Interactive Segmentation | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model Swin-Tiny |
GrabCut | Berkeley | SBD | DAVIS | Pascal VOC |
COCO MVal |
||||||
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
NoC 85% |
NoC 90% |
|
mean | 1.49 | 1.59 | 1.37 | 2.00 | 3.72 | 6.26 | 3.79 | 5.08 | 1.95 | 2.27 | 2.22 | 3.08 |
std | 0.05 | 0.08 | 0.04 | 0.11 | 0.04 | 0.01 | 0.10 | 0.10 | 0.03 | 0.02 | 0.08 | 0.09 |
The majority of DynaMITe is licensed under a MIT License.
Some codebase is inspired from Mask2Former which is majorly licensed under MIT license along with additional licenses mentioned in Mask2Former and interactive demo tool is adapted from RITM which is also licensed under MIT License.
If you use our codebase then please cite the papers mentioned below.
@inproceedings{RanaMahadevan23Arxiv,
title={DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer},
author={Rana, Amit and Mahadevan, Sabarinath and Hermans, Alexander and Leibe, Bastian},
booktitle={ICCV},
year={2023}
}
@inproceedings{RanaMahadevan23cvprw,
title={Clicks as Queries: Interactive Transformer for Multi-instance Segmentation},
author={Rana, Amit and Mahadevan, Sabarinath and Alexander Hermans and Leibe, Bastian},
booktitle={CVPRW},
year={2023}
}
The main codebase is built on top of detectron2 framework and is inspired from Mask2Fromer.
The interactive segementation demo tool is modified from RITM.