Inverse compositional spatial transformer networks

Chen-Hsuan Lin, Simon Lucey (2017)

Key points

STN: learnable module which can learn invariance to all kinds of transformations/warps and can be inserted into existing models
- Resolving misalignment instead of tolerating
- Combine with inverse-compositional Lucas-Kanade: better performance with less capacity for image alignment and joint alignment/classification
IC-STNs:
- Propagate warp parameters instead of intensities
- Same geometric predictor for all modules
Original STN suffers from boundary effects (some warp info is discarded)
- The propagation of the warp parameters by IC-STN prevents this!
Learning alignment from data using supervised descent method (SDM)
- Works, but separate step: we want 1 model that can be optimized completely
Also: STN not as efficient (1 large step instead of multiple small ones)
Warp parameters are insensitive to vanishing gradients
Even when recurrent transformation is applied more times than trained with, error continues to decrease --> able to capture correlation between appearance and geometry
IC-STNs are not a replacement for CNNs (CNNs are better if the spatial variance is small)
How can it learn at all?
- Shared parameters, maybe the cropping helps as well
So:
- STN: outputs warped image
- IC-STN: stores geometric warp + outputs original image