Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (2016)
- Only looks once at an image
- Object detection as a regression problem for bounding boxes and object class probabilities with 1 pass through a CNN
- Image into 7x7 grid, each cell predicts a distribution over class labels and bounding boxes for the object whose center falls into it
- Much faster (1 stage) + can be trained end-to-end
- YOLO makes more localization errors and less background errors than Fast R-CNN: combine the 2 at minimal increase in computation time
- A significant part of this error comes from badly placed/sized bounding boxes
- However, YOLO falls short of state-of-the-art less-than-real-time algorithms