Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.87 KB

pillar_od.md

File metadata and controls

33 lines (26 loc) · 1.87 KB

November 2020

tl;dr: Improved multi-view fusion.

Overall impression

Three key improvements based on MVF. The ablation studies in this paper is super clean and persuasive.

Key ideas

  • Multiview architecture
    • Voxelize points in BEV or spherical view or cylindrical view to pillars.
    • Extract pillar features.
    • Project pillar features to points with nearest neighbor or bilinear interpolation and concat to point features.
    • Transform point features to BEV
    • Detection backbone + head
  • Anchor-based Pillar-based prediction: like CenterPoint and Pixor.
    • Both PointPillars and MVF uses anchor-based prediction.
    • Anchor free avoids complicated anchor matching strategy.
    • Ablation studies show that anchor-based < point-based << pillar-based.
  • Cylindrical view: height z, azimuth angle, radial distance. The radial distance is treated as channels.
    • Cylindrical view is better than spherical view as the vehicle size for distant cars are not distorted. Distant cars appears smaller in spherical view but the same in cylindrical view. --> LaserNet uses a range view (RV) which is very similar to spherical view. The original MVF is also a spherical view.
  • Bilinear upsampling when transferring pillar features to point.
    • This avoids the spatial inconsistency and dependency of quantization into diff bins.
    • Bilinear interpolation is better than nearest neighbor. This observation is consistent with the comparison between RoIAlign with RoIPooling.

Technical details

  • Positive anchors in lidar BEV is very sparse (< 0.1%).

Notes

  • Questions and notes on how to improve/revise the current work