Activations on Convolutional Neural Networks (CNNs) served as image descriptors have reached its peak in the field of image retrieval due to their outstanding efficiency and compactness of representation. However, there is a massive need of annotated data and high quality annotation is a significance to achieve reasonable results. Throughout this work, we do fine-tune CNNs for image retrieval system on a collection of unordered images automatically. The selection of the train data could be guided using state-of-the-art retrieval and Structure-from-Motion methods to reconstruct 3D models. We additionally apply a novel trainable Generalized-Mean pooling layer generalizing max and average pooling for a boosting in retrieval performance. And we would conduct our experiments with VGG and ResNet architectures on Oxford5k, Paris6k, ROxford5k and RParis6k benchmarks.
Keywords: Image Retrieval, Convolutional Neural Networks, Deep Learning
🎃🎃 Our full report is shown here
🎃🎃 Our demo video is here
This project is hosted by:
Full name | Role |
---|---|
Tan Ngoc Pham | Leader |
An Vo | Member |
Dzung Tri Bui | Member |
Throughout this work, we choose the approach as the unsupervised CNNs fine-tuning for image retrieval. Firstly, we harness SfM information and enforce for both hard unmatched and matched examples for CNNs training. Secondly, we let our architectures learn the whitening through the same training data to avoid the short representations that are the limitations from traditional whitening performance. We choose to use a trainable pooling layer which generalizes existing popular pooling schemes for CNNs and thus both enhances the performance and preserving the same descriptor dimensionality as well, lastly.
- src: All of our source code
- public
- css
- img: assets of our work
- script/cnnimageretrieval-pytorch: the Python core on handling models and systems lies behind our demo
- resources
- scss
- views: Frontend code
- routes
- index.js: Javascript core to process logical beyond Frontend
- index.js: main js file to route the demo
- package.json
- public
- notebook: Log results on running our work
- .gitattributes
- .gitignore
- LICENSE
- Procfile
- deploy.sh
- package.json
- requirements.txt
- yarn.lock
- report.pdf: Our final report on this work
The total time for processing both cropping the uploaded image into the new one and processing the query is 18 seconds on average.
- Install yarn
- Install dependencies:
pip install -r requirements.txt
- Run project:
yarn
yarn start
- Reproduce our final results
>>> cd src/public/script/cnnimageretrieval-pytorch
>>> python3 -m cirtorch.examples.test \
--gpu-id '0' \
--network-path 'retrievalSfM120k-resnet101-gem' \
--datasets 'oxford5k' \
--whitening 'retrieval-SfM-120k' \
--multiscale '[1, 1/2**(1/2), 1/2]'
We used pre-trained ResNet101-GeM and VGG16-GeM to perform the fine-tuning. We conduct our experiments using NVIDIA @ RTX 3060 GPU, 16GB RAM with 11th Gen Intel® Core™ i7-11700K @ 3.60GHz×16 CPU and PyTorch framework.
Our work is inspired from: CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch