Skip to content

Multiple techniques to enhance performance in CNN-based Image Retrieval systems

License

Notifications You must be signed in to change notification settings

DTA-UIT/ImageRetrievalSystem

Repository files navigation

Multiple techniques to enhance performance in CNN-based Image Retrieval systems

Abstract

Activations on Convolutional Neural Networks (CNNs) served as image descriptors have reached its peak in the field of image retrieval due to their outstanding efficiency and compactness of representation. However, there is a massive need of annotated data and high quality annotation is a significance to achieve reasonable results. Throughout this work, we do fine-tune CNNs for image retrieval system on a collection of unordered images automatically. The selection of the train data could be guided using state-of-the-art retrieval and Structure-from-Motion methods to reconstruct 3D models. We additionally apply a novel trainable Generalized-Mean pooling layer generalizing max and average pooling for a boosting in retrieval performance. And we would conduct our experiments with VGG and ResNet architectures on Oxford5k, Paris6k, ROxford5k and RParis6k benchmarks.

cnnimageretrieval_network_medium

Keywords: Image Retrieval, Convolutional Neural Networks, Deep Learning


Report

🎃🎃 Our full report is shown here
🎃🎃 Our demo video is here

This project is hosted by:

Full name Role
Tan Ngoc Pham Leader
An Vo Member
Dzung Tri Bui Member

Table of contents

  1. Introduction
  2. Repo structure
  3. Demo
  4. Experimental configuration
  5. Results
  6. References

1. Introduction

Throughout this work, we choose the approach as the unsupervised CNNs fine-tuning for image retrieval. Firstly, we harness SfM information and enforce for both hard unmatched and matched examples for CNNs training. Secondly, we let our architectures learn the whitening through the same training data to avoid the short representations that are the limitations from traditional whitening performance. We choose to use a trainable pooling layer which generalizes existing popular pooling schemes for CNNs and thus both enhances the performance and preserving the same descriptor dimensionality as well, lastly.

2. Repo structure

  • src: All of our source code
    • public
      • css
      • img: assets of our work
      • script/cnnimageretrieval-pytorch: the Python core on handling models and systems lies behind our demo
    • resources
      • scss
      • views: Frontend code
    • routes
      • index.js: Javascript core to process logical beyond Frontend
    • index.js: main js file to route the demo
    • package.json
  • notebook: Log results on running our work
  • .gitattributes
  • .gitignore
  • LICENSE
  • Procfile
  • deploy.sh
  • package.json
  • requirements.txt
  • yarn.lock
  • report.pdf: Our final report on this work

3. Demo

The total time for processing both cropping the uploaded image into the new one and processing the query is 18 seconds on average.

Run demo

  • Install yarn
  • Install dependencies:
pip install -r requirements.txt
  • Run project:
yarn 
yarn start
  • Reproduce our final results
>>> cd src/public/script/cnnimageretrieval-pytorch 

>>> python3 -m cirtorch.examples.test \
          --gpu-id '0' \
          --network-path 'retrievalSfM120k-resnet101-gem' \
          --datasets 'oxford5k' \ 
          --whitening 'retrieval-SfM-120k' \ 
          --multiscale '[1, 1/2**(1/2), 1/2]'

Screenshot from our demo

demo4 demo1 demo2 demo3

4. Experimental configuration

We used pre-trained ResNet101-GeM and VGG16-GeM to perform the fine-tuning. We conduct our experiments using NVIDIA @ RTX 3060 GPU, 16GB RAM with 11th Gen Intel® Core™ i7-11700K @ 3.60GHz×16 CPU and PyTorch framework.

5. Results

Our work is inspired from: CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch