January 2021
tl;dr: Add positive sample selector (PSS) branch to FCOS to achieve end-to-end detection.
PSS keeps FCOS's structure and training scheme as much as possible, and introduced one positive sample selector branch.
PSS uses a simple binary classification head to enable selection of one positive sample for each instance, while DeFCN uses 3D max filtering to aggregate multi-scale features to suppress duplicated detection.
- PSS branch: one additional head attached to regression head.
- PSS loss. Cross entropy loss between GT and pred of
$\hat{P}= \sigma(\text{cls}) \sigma(\text{centerness}) \sigma(\text{pss})$ .- Still train FOCS with one-to-many label assignment. This helps convergence as DeFCN still uses it as auxiliary loss.
- Stop gradient operation to reconcile the conflict between PSS classification loss and the original FCOS loss.
- This is in theory equivalent to training the original FCOS until convergence and freezing FCOS, and then training the PSS head only until convergence (thus PSS is a learnable NMS). In practice, two-step training leads to slightly worse performance.