Skip to content

Latest commit

 

History

History
93 lines (77 loc) · 5.52 KB

README.md

File metadata and controls

93 lines (77 loc) · 5.52 KB

AggDet

This repo is the implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Abstract

Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time. Whereas, empirical studies reveal that advanced detectors generally assign lower scores to those novel instances, which are inadvertently suppressed during inference by commonly adopted greedy strategies like Non-Maximum Suppression (NMS), leading to sub-optimal detection performance for novel classes. This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm. Specifically, in the region-proposal stage, proposals that contain novel instances showcase lower objectness scores, since they are treated as background proposals during the training phase. Meanwhile, in the object-classification stage, novel objects share lower region-text similarities (i.e., classification scores) due to the biased visual-language alignment by seen training samples. To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects: (1) a class-agnostic localization quality estimate via overlap degree of region/object proposals, and (2) a text-guided visual similarity estimate with proxy prototypes for novel classes. Integrated with adjusting techniques specifically designed for the region-proposal and object-classification stages, this paper derives the aggregated confidence estimate for the open-vocabulary object detection paradigm AggDet.

framewroks

Preparations

  • Installation

    Following the Installation instructions of CoDet to setup environment.

    Setup environment

    conda create --name aggdet python=3.8 -y && conda activate aggdet
    pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
    git clone https://github.com/WarlockWendell/AggDet.git

    Install Apex and xFormer (You can skip this part if you do not use EVA-02 backbone)

    pip install ninja
    pip install -v -U git+https://github.com/facebookresearch/xformers.git@7e05e2caaaf8060c1c6baadc2b04db02d5458a94
    git clone https://github.com/NVIDIA/apex && cd apex
    pip install packaging
    pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && cd ..

    Install detectron2 and other dependencies

    cd AggDet/third_party/detectron2
    pip install -e .
    cd ../..
    pip install -r requirements.txt
  • Datasets

    Please refer to DATA.md for more details.

  • Pretrained weights

    You can download the pre-trained weights from the official GitHub repos of Detic and CoDet, and put them under the <AGGDET_ROOT>/ckpt/models directory.

    model dataset download
    Detic_RN50 COCO model
    CoDet_RN50 COCO model
    Detic_SwinB LVIS model
    CoDet_RN50 LVIS model
    CoDet_SwinB LVIS model
    CoDet_EVA02 LVIS model

Inference

Take Detic with a ResNet50 backbone on the OV-COCO dataset as an example.

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml

You can modify the following parameters in the yaml file to adjust the parameters described in the paper.

OVERLAP_TOPK: 3
ALPHA: 0.05
BETA: 0.75

For example, use the following command to test the baseline model:

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml  \
MODEL.OVERLAP_TOPK=0 MODEL.ALPHA 0.0 MODEL.BETA 0.0

You can change the config-file to change the model and dataset. Refer to REPRODUCE.md for more details.

Citation

@article{
  title={Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation},
  author={Yanhao Zheng, Kai Liu},
  journal={arXiv preprint arXiv:2404.08603},
  year={2024}
}

Acknowledgment

AggDet is built upon the awesome works Codet, EVA and Detic. Many thanks for their wonderful work.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.