James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman (2007)
- Returns ranked list of images containing object specified in query image (region)
- Bag-of-visual-words: more words & structure (spatial) than normal BoW, but much noisier
- 2 improvements:
- In terms of visual vocabulary: flat k-means with approximate nearest-neighbor gives best performance, making use of randomized k-d trees (splits randomly across dimensions with highest variance, which helps to mitigate quantization effects)
- Incorporating spatial information in ranking (deterministic)
- Dealing with image variations:
- Estimate transformation between query and target image, taking advantage of the fact that images of landmarks are usually oriented correctly (so no need for in-plane rotations)
- Re-rank based on how well the verified features discriminate