Skip to content

Commit

Permalink
OTX D-Fine Detection Algorithm Integration (#4142)
Browse files Browse the repository at this point in the history
* init

* remove convertbox

* Refactor D-FINE detector: remove unused components and update model configuration

* update

* update

* Update

* update recipes

* Add d-fine-m

* Fix recipes

* dfine-l

* Add dfine m - no aug

* format changes

* learnable params + disable teacher distillation

* update

* add recipes

* update

* update

* update recipes

* add dfine_hgnetv2_x

* Update recipes

* add tile DFine recipes

* update recipes and tile batch size

* update

* update LR

* DFine revert LR changes

* make multi-scale optional

* update tile recipes

* update tiling recipes

* add backbone pretrained weights

* updawte

* update

* loss

* update

* Update

* refactor d-fine criterion

* * Fix docstring punctuation and remove unused aux_loss parameter in DFINETransformerModule
* Refactor DFineCriterion

* Update style changes

* conv batchnorm fuse

* update hybrid encoder

* Refactor DFINE HybridEncoderModule to improve code clarity and remove redundant parameters

* minor update

* Refactor D-FINE module structure by removing obsolete detector file and reorganizing imports

* Refactor import paths in D-FINE module and clean up unused code

* Refactor D-FINE module by removing commented code, cleaning up imports, and updating documentation

* Refactor D-FINE module by updating type hints, improving error messages, and enhancing documentation for RandomIoUCrop

* Refactor D-FINE module by improving the weighting function's return structure and updating type hints in DFINECriterion

* Update d-fine unit test

* Refactor D-FINE module by enhancing docstrings for clarity and updating parameter names for consistency

* Add D-Fine Detection Algorithm entries to CHANGELOG and object detection documentation

* Fix device assignment for positional embeddings in HybridEncoderModule

* Refactor D-FINE module by removing unused functions and integrating dfine_bbox2distance in DFINECriterion

* Update codeowners

* Add advanced parameters to optimization config in DFine model

* Remove DFINE M, S, N model configuration files

* disable tiling mem cache

* Update codeowners

* revert codeowner changes

* Remove unused DFINE model configurations from unit tests

* Add heavy unit test workflow and mark tests accordingly

* Add container configuration for Heavy-Unit-Test job in pre_merge.yaml

* Add additional transformations to D-Fine configuration and update test skips for unsupported models

* Reduce batch size and remove heavy markers from unit tests in test_tiling.py

* Revert "Add additional transformations to D-Fine configuration and update test skips for unsupported models"

This reverts commit d5c66f5.

* Revert "Reduce batch size and remove heavy markers from unit tests in test_tiling.py"

This reverts commit 563e033.

* Add additional transformations to D-Fine configuration in YAML files

* disable pytest heavy tag

* update

* Remove unused DFine-L model configurations and update unit tests

* Add DFine-X model template for class-incremental object detection

* Update docs/source/guide/explanation/algorithms/object_detection/object_detection.rst

Co-authored-by: Samet Akcay <[email protected]>

* Update copyright years from 2024 to 2025 in multiple files

* Rename heavy unit tests to intense unit tests and update related configurations

* Update container image in pre_merge.yaml for Intense-Unit-Test job

* update pre-merge

* update ubuntu container image

* update container image

* Add new object detection model configuration for DFine HGNetV2 X

* update image

* Update pre-merge workflow to use Ubuntu 24.04 and simplify unit test coverage reporting

* install sqlite

* Remove sudo from apt-get command in pre-merge workflow

* Remove sudo from apt-get command in pre-merge workflow

* Update pre-merge workflow to install additional dependencies and correct model name in converter

* Update detection configuration: increase warmup steps and patience, add min_lr, and remove unused callbacks

* Remove D-Fine model recipes from object detection documentation

* Skip tests for unsupported models: add check for D-Fine

* Skip tests for unsupported models: add check for D-Fine

* Skip tests for unsupported models: add check for DFine

* Refactor DFine model: remove unused checkpoint loading and update optimizer configuration documentation; change reg_scale to float in DFINETransformer.

---------

Co-authored-by: Samet Akcay <[email protected]>
  • Loading branch information
eugene123tw and samet-akcay authored Jan 17, 2025
1 parent a6d5795 commit d663fd7
Show file tree
Hide file tree
Showing 24 changed files with 3,736 additions and 11 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/pre_merge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,38 @@ jobs:
curl -Os https://uploader.codecov.io/latest/linux/codecov
chmod +x codecov
./codecov -t ${{ secrets.CODECOV_TOKEN }} --sha $COMMIT_ID -U $HTTP_PROXY -f .tox/coverage_unit-test-${{ matrix.tox-env }}.xml -F ${{ matrix.tox-env }}
Intense-Unit-Test:
runs-on: [otx-gpu-a10g-1]
container:
image: "ubuntu:24.04"
needs: Code-Quality-Checks
timeout-minutes: 120
strategy:
fail-fast: false
matrix:
include:
- python-version: "3.10"
tox-env: "py310"
- python-version: "3.11"
tox-env: "py311"
name: Intense-Unit-Test-with-Python${{ matrix.python-version }}
steps:
- name: Install dependencies
run: apt-get update && apt-get install -y libsqlite3-0 libsqlite3-dev libgl1 libglib2.0-0
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Install Python
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install tox
run: |
python -m pip install --require-hashes --no-deps -r .ci/requirements.txt
pip-compile --generate-hashes --output-file=/tmp/requirements.txt --extra=ci_tox pyproject.toml
python -m pip install --require-hashes --no-deps -r /tmp/requirements.txt
rm /tmp/requirements.txt
- name: Run unit test
run: tox -vv -e intense-unit-test-${{ matrix.tox-env }}
Integration-Test:
if: |
github.event.pull_request.draft == false &&
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ All notable changes to this project will be documented in this file.
(<https://github.com/openvinotoolkit/training_extensions/pull/3979>)
- Add OpenVINO inference for 3D Object Detection task
(<https://github.com/openvinotoolkit/training_extensions/pull/4017>)
- Add D-Fine Detection Algorithm
(<https://github.com/openvinotoolkit/training_extensions/pull/4142>)

### Enhancements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ We support the following ready-to-use model recipes:
+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Object_Detection_ResNeXt101_ATSS <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/detection/atss_resnext101.yaml>`_ | ResNeXt101-ATSS | 434.75 | 344.0 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `D-Fine X Detection <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/detection/dfine_x.yaml>` | D-Fine X | 202.486 | 240.0 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+

Above table can be found using the following command

Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,7 @@ convention = "google"
markers = [
"gpu", # mark tests which require NVIDIA GPU
"cpu",
"xpu", # mark tests which require Intel dGPU
"xpu", # mark tests which require Intel dGPU,
"intense", # intense unit tests which require better CI machines
]
python_files = "tests/**/*.py"
148 changes: 147 additions & 1 deletion src/otx/algo/common/layers/transformer_layers.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2024-2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
"""Implementation of common transformer layers."""
Expand All @@ -10,6 +10,7 @@
from typing import Callable

import torch
import torch.nn.functional as f
from otx.algo.common.utils.utils import get_clones
from otx.algo.modules.transformer import deformable_attention_core_func
from torch import Tensor, nn
Expand Down Expand Up @@ -306,6 +307,151 @@ def forward(
return self.output_proj(output)


class MSDeformableAttentionV2(nn.Module):
"""Multi-Scale Deformable Attention Module V2.
Note:
This is different from vanilla MSDeformableAttention where it uses
distinct number of sampling points for features at different scales.
Refer to RTDETRv2.
Args:
embed_dim (int): The number of expected features in the input.
num_heads (int): The number of heads in the multiheadattention models.
num_levels (int): The number of levels in MSDeformableAttention.
num_points_list (list[int]): Number of distinct points for each layer. Defaults to [3, 6, 3].
"""

def __init__(
self,
embed_dim: int = 256,
num_heads: int = 8,
num_levels: int = 4,
num_points_list: list[int] = [3, 6, 3], # noqa: B006
) -> None:
super().__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
self.num_levels = num_levels
self.num_points_list = num_points_list

num_points_scale = [1 / n for n in num_points_list for _ in range(n)]
self.register_buffer(
"num_points_scale",
torch.tensor(num_points_scale, dtype=torch.float32),
)

self.total_points = num_heads * sum(num_points_list)
self.head_dim = embed_dim // num_heads

self.sampling_offsets = nn.Linear(embed_dim, self.total_points * 2)
self.attention_weights = nn.Linear(embed_dim, self.total_points)

self._reset_parameters()

def _reset_parameters(self) -> None:
"""Reset parameters of the model."""
init.constant_(self.sampling_offsets.weight, 0)
thetas = torch.arange(self.num_heads, dtype=torch.float32) * (2.0 * math.pi / self.num_heads)
grid_init = torch.stack([thetas.cos(), thetas.sin()], -1)
grid_init = grid_init / grid_init.abs().max(-1, keepdim=True).values # noqa: PD011
grid_init = grid_init.reshape(self.num_heads, 1, 2).tile([1, sum(self.num_points_list), 1])
scaling = torch.concat([torch.arange(1, n + 1) for n in self.num_points_list]).reshape(1, -1, 1)
grid_init *= scaling
self.sampling_offsets.bias.data[...] = grid_init.flatten()

# attention_weights
init.constant_(self.attention_weights.weight, 0)
init.constant_(self.attention_weights.bias, 0)

def forward(
self,
query: Tensor,
reference_points: Tensor,
value: Tensor,
value_spatial_shapes: list[list[int]],
) -> Tensor:
"""Forward function of MSDeformableAttention.
Args:
query (Tensor): [bs, query_length, C]
reference_points (Tensor): [bs, query_length, n_levels, 2], range in [0, 1], top-left (0,0),
bottom-right (1, 1), including padding area
value (Tensor): [bs, value_length, C]
value_spatial_shapes (List): [n_levels, 2], [(H_0, W_0), (H_1, W_1), ..., (H_{L-1}, W_{L-1})]
Returns:
output (Tensor): [bs, Length_{query}, C]
"""
bs, len_q = query.shape[:2]
_, n_head, c, _ = value[0].shape
num_points_list = self.num_points_list

sampling_offsets = self.sampling_offsets(query).reshape(
bs,
len_q,
self.num_heads,
sum(self.num_points_list),
2,
)

attention_weights = self.attention_weights(query).reshape(
bs,
len_q,
self.num_heads,
sum(self.num_points_list),
)
attention_weights = f.softmax(attention_weights, dim=-1)

if reference_points.shape[-1] == 2:
offset_normalizer = torch.tensor(value_spatial_shapes)
offset_normalizer = offset_normalizer.flip([1]).reshape(1, 1, 1, self.num_levels, 1, 2)
sampling_locations = (
reference_points.reshape(
bs,
len_q,
1,
self.num_levels,
1,
2,
)
+ sampling_offsets / offset_normalizer
)
elif reference_points.shape[-1] == 4:
num_points_scale = self.num_points_scale.to(query).unsqueeze(-1)
offset = sampling_offsets * num_points_scale * reference_points[:, :, None, :, 2:] * 0.5
sampling_locations = reference_points[:, :, None, :, :2] + offset
else:
msg = (f"Last dim of reference_points must be 2 or 4, but get {reference_points.shape[-1]} instead.",)
raise ValueError(msg)

# sampling_offsets [8, 480, 8, 12, 2]
sampling_grids = 2 * sampling_locations - 1

sampling_grids = sampling_grids.permute(0, 2, 1, 3, 4).flatten(0, 1)
sampling_locations_list = sampling_grids.split(num_points_list, dim=-2)

sampling_value_list = []
for level, (h, w) in enumerate(value_spatial_shapes):
value_l = value[level].reshape(bs * n_head, c, h, w)
sampling_grid_l = sampling_locations_list[level]
sampling_value_l = f.grid_sample(
value_l,
sampling_grid_l,
mode="bilinear",
padding_mode="zeros",
align_corners=False,
)

sampling_value_list.append(sampling_value_l)

attn_weights = attention_weights.permute(0, 2, 1, 3).reshape(bs * n_head, 1, len_q, sum(num_points_list))
weighted_sample_locs = torch.concat(sampling_value_list, dim=-1) * attn_weights
output = weighted_sample_locs.sum(-1).reshape(bs, n_head * c, len_q)

return output.permute(0, 2, 1)


class VisualEncoderLayer(nn.Module):
"""VisualEncoderLayer module consisting of MSDeformableAttention and feed-forward network.
Expand Down
Loading

0 comments on commit d663fd7

Please sign in to comment.