You only look once: unified, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016)

Key points

Only looks once at an image
Object detection as a regression problem for bounding boxes and object class probabilities with 1 pass through a CNN
Image into 7x7 grid, each cell predicts a distribution over class labels and bounding boxes for the object whose center falls into it
Much faster (1 stage) + can be trained end-to-end
YOLO makes more localization errors and less background errors than Fast R-CNN: combine the 2 at minimal increase in computation time
- A significant part of this error comes from badly placed/sized bounding boxes
However, YOLO falls short of state-of-the-art less-than-real-time algorithms