This repo is the implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time. Whereas, empirical studies reveal that advanced detectors generally assign lower scores to those novel instances, which are inadvertently suppressed during inference by commonly adopted greedy strategies like Non-Maximum Suppression (NMS), leading to sub-optimal detection performance for novel classes. This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm. Specifically, in the region-proposal stage, proposals that contain novel instances showcase lower objectness scores, since they are treated as background proposals during the training phase. Meanwhile, in the object-classification stage, novel objects share lower region-text similarities (i.e., classification scores) due to the biased visual-language alignment by seen training samples. To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects: (1) a class-agnostic localization quality estimate via overlap degree of region/object proposals, and (2) a text-guided visual similarity estimate with proxy prototypes for novel classes. Integrated with adjusting techniques specifically designed for the region-proposal and object-classification stages, this paper derives the aggregated confidence estimate for the open-vocabulary object detection paradigm AggDet.
-
Installation
Following the Installation instructions of CoDet to setup environment.
Setup environment
conda create --name aggdet python=3.8 -y && conda activate aggdet pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 git clone https://github.com/WarlockWendell/AggDet.git
Install Apex and xFormer (You can skip this part if you do not use EVA-02 backbone)
pip install ninja pip install -v -U git+https://github.com/facebookresearch/xformers.git@7e05e2caaaf8060c1c6baadc2b04db02d5458a94 git clone https://github.com/NVIDIA/apex && cd apex pip install packaging pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && cd ..
Install detectron2 and other dependencies
cd AggDet/third_party/detectron2 pip install -e . cd ../.. pip install -r requirements.txt
-
Datasets
Please refer to DATA.md for more details.
-
Pretrained weights
You can download the pre-trained weights from the official GitHub repos of Detic and CoDet, and put them under the
<AGGDET_ROOT>/ckpt/models
directory.model dataset download Detic_RN50 COCO model CoDet_RN50 COCO model Detic_SwinB LVIS model CoDet_RN50 LVIS model CoDet_SwinB LVIS model CoDet_EVA02 LVIS model
Take Detic with a ResNet50 backbone on the OV-COCO dataset as an example.
python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml
You can modify the following parameters in the yaml file to adjust the parameters described in the paper.
OVERLAP_TOPK: 3
ALPHA: 0.05
BETA: 0.75
For example, use the following command to test the baseline model:
python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml \
MODEL.OVERLAP_TOPK=0 MODEL.ALPHA 0.0 MODEL.BETA 0.0
You can change the config-file
to change the model and dataset. Refer to REPRODUCE.md for more details.
@article{
title={Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation},
author={Yanhao Zheng, Kai Liu},
journal={arXiv preprint arXiv:2404.08603},
year={2024}
}
AggDet is built upon the awesome works Codet, EVA and Detic. Many thanks for their wonderful work.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.