Skip to content

Latest commit

 

History

History
25 lines (16 loc) · 1.77 KB

shift_rcnn.md

File metadata and controls

25 lines (16 loc) · 1.77 KB

October 2019

tl;dr: Extend the work of deep3Dbox by regressing residual center positions.

Overall impression

The paper has a good summary on mono 3DOD in introduction.

The geometric constraints become a closed-formed one. This is similar to deep3Dbox but slightly different (over-constraint vs exact-constraint).

The idea of shift RCNN and FQNet are quite similar. Both builds on deep3Dbox and refines the first guess. But FQNet passively densely sample around the GT and train a regressor to tell the difference to GT, shift RCNN actively learns to regress the difference. The followup work of FQNet is RAR-Net which also actively predicts the offset, but does that iteratively with a DRL agent.

Key ideas

  • RoiAligned feature to regress 3D orientation and 3D dimension.
  • Optimization to solve for 3D bbox location t'.
  • Shift Net work is 2 layer FC network to regress improved final translation of 3D center t''. The input features are t', 2d bbox, dimension, local yaw, global yaw, and camera projection matrix.
  • The volume displacement loss is decomposed into 3 sums of 3 terms, each term is $\Delta x \times h \times w$ and alike. w and h are estimated 3D dimension.

Technical details

  • They used best IoU to pick the best configuration. This is a bit different from the previous method of picking one that mininizes residual from least square fitting, such as FQNet or Deep3DBox. This is also used in MVRA.

Notes

  • Questions and notes on how to improve/revise the current work