November 2020
tl;dr: Improved multi-view fusion.
Three key improvements based on MVF. The ablation studies in this paper is super clean and persuasive.
- Multiview architecture
- Anchor-based Pillar-based prediction: like CenterPoint and Pixor.
- Both PointPillars and MVF uses anchor-based prediction.
- Anchor free avoids complicated anchor matching strategy.
- Ablation studies show that anchor-based < point-based << pillar-based.
- Cylindrical view: height z, azimuth angle, radial distance. The radial distance is treated as channels.
- Cylindrical view is better than spherical view as the vehicle size for distant cars are not distorted. Distant cars appears smaller in spherical view but the same in cylindrical view. --> LaserNet uses a range view (RV) which is very similar to spherical view. The original MVF is also a spherical view.
- Bilinear upsampling when transferring pillar features to point.
- This avoids the spatial inconsistency and dependency of quantization into diff bins.
- Bilinear interpolation is better than nearest neighbor. This observation is consistent with the comparison between RoIAlign with RoIPooling.
- Positive anchors in lidar BEV is very sparse (< 0.1%).
- Questions and notes on how to improve/revise the current work