PointNet: learning on point sets for 3D classification and segmentation

Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas (2017)

Key points

Novel deep net architecture that directly consumes point clouds, without voxelization or rendering
- Unified architecture that learns both global and local point features
- Output: class category score, feature map
- Using point clouds directly avoids quantization artifacts, high-cost computations and the need for pre-processing
Properties of point clouds:
1. Unordered: so network that consumes N 3D points needs to be invariant to N! permutations in feeding order
2. Interaction among points: neighboring points are useful --> model needs to be able to capture local structure from nearby points, and the combinatorial interactions among local structures
3. Invariance under transformations: rotating and translating points all together shouldn't change category/segmentation
3 solutions for this:
1. Use symmetric function: max pooling to aggregate information from all points (invariant to input order)
2. Combining local and global features: concatenate global feature with each of the point features --> network is able to predict per-point quantities that rely on local geometry and global semantics
3. Joint alignment network: learn the affine transformation and apply to both points and features + add regularization term
Network learns to summarize a point cloud by a sparse set of key points (skeleton of objects): global features
- Highly robust to small perturbations of inputs and corruption (through inserting outliers or deletion)