diff --git a/README.md b/README.md
index cfff5e2..7d8f0d3 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,12 @@
# SGRAF
-PyTorch implementation for AAAI2021 paper of [**“Similarity Reasoning and Filtration for Image-Text Matching”**](https://drive.google.com/file/d/1tAE_qkAxiw1CajjHix9EXoI7xu2t66iQ/view?usp=sharing).
-It is built on top of the [SCAN](https://github.com/kuanghuei/SCAN) and [Cross-modal_Retrieval_Tutorial](https://github.com/Paranioar/Cross-modal_Retrieval_Tutorial).
+*PyTorch implementation for AAAI2021 paper of [**“Similarity Reasoning and Filtration for Image-Text Matching”**](https://drive.google.com/file/d/1tAE_qkAxiw1CajjHix9EXoI7xu2t66iQ/view?usp=sharing).*
+
+*It is built on top of the [SCAN](https://github.com/kuanghuei/SCAN) and [Awesome_Matching](https://github.com/Paranioar/Awesome_Matching_Pretraining_Transfering).*
+
+*We have released two versions of SGRAF: **Branch `main` for python2.7**; **Branch `python3.6` for python3.6**.*
+
+*If any problems, please contact me at r1228240468@gmail.com. (r1228240468@mail.dlut.edu.cn is deprecated)*
+
## Introduction
@@ -8,48 +14,52 @@ It is built on top of the [SCAN](https://github.com/kuanghuei/SCAN) and [Cross-m
-**The updated results (Better than the original paper)**
-
- Dataset | Module |
- Sentence retrieval | Image retrieval |
- R@1 | R@5 | R@10 | R@1 | R@5 | R@10 |
- Flick30k |
- SAF | 75.6 | 92.7 | 96.9 | 56.5 | 82.0 | 88.4 |
- SGR | 76.6 | 93.7 | 96.6 | 56.1 | 80.9 | 87.0 |
- SGRAF | 78.4 | 94.6 | 97.5 | 58.2 | 83.0 | 89.1 |
- MSCOCO1k |
- SAF | 78.0 | 95.9 | 98.5 | 62.2 | 89.5 | 95.4 |
- SGR | 77.3 | 96.0 | 98.6 | 62.1 | 89.6 | 95.3 |
- SGRAF | 79.2 | 96.5 | 98.6 | 63.5 | 90.2 | 95.8 |
- MSCOCO5k |
- SAF | 55.5 | 83.8 | 91.8 | 40.1 | 69.7 | 80.4 |
- SGR | 57.3 | 83.2 | 90.6 | 40.5 | 69.6 | 80.3 |
- SGRAF | 58.8 | 84.8 | 92.1 | 41.6 | 70.9 | 81.5 |
-
+## Requirements
+We recommended the following dependencies for ***Branch `python3.6`***.
+* Python 3.6
+* [PyTorch (>=0.4.1)](http://pytorch.org/)
+* [NumPy (>=1.12.1)](http://www.numpy.org/)
+* [TensorBoard](https://github.com/TeamHG-Memex/tensorboard_logger)
+[Note]: The code applies to ***Python3.6 + Pytorch1.7***.
+
+## Acknowledgements
+Thanks to the exploration and discussion with [KevinLight831](https://github.com/KevinLight831), we made some adjustments as follows:
+**1. Adjust `evaluation.py`**:
+*for i, (k, v) in enumerate(self.meters.iteritems()):*
+***------>** ```for i, (k, v) in enumerate(self.meters.items()):```*
+*for k, v in self.meters.iteritems():*
+***------>** ```for k, v in self.meters.items():```*
+
+**2. Adjust `model.py`**:
+*cap_emb = (cap_emb[:, :, :cap_emb.size(2)/2] + cap_emb[:, :, cap_emb.size(2)/2:])/2*
+***------>** ```cap_emb = (cap_emb[:, :, :cap_emb.size(2)//2] + cap_emb[:, :, cap_emb.size(2)//2:])/2```*
+
+**3. Adjust `data.py`**:
+*img_id = index/self.im_div*
+***------>** ```img_id = index//self.im_div```*
-
+*for line in open(loc+'%s_caps.txt' % data_split, 'rb'):*
+*tokens = nltk.tokenize.word_tokenize(str(caption).lower().decode('utf-8'))*
-## Requirements
-We recommended the following dependencies.
+***------>** ```for line in open(loc+'%s_caps.txt' % data_split, 'rb'):```*
+***------>** ```tokens = nltk.tokenize.word_tokenize(caption.lower().decode('utf-8'))```*
-* Python **(2.7 not 3.\*)**
-* [PyTorch](http://pytorch.org/) **(0.4.1 not 1.\*)**
-* [NumPy](http://www.numpy.org/) **(>1.12.1)**
-* [TensorBoard](https://github.com/TeamHG-Memex/tensorboard_logger)
-* Punkt Sentence Tokenizer:
-```python
-import nltk
-nltk.download()
-> d punkt
-```
+or
+
+***------>** ```for line in open(loc+'%s_caps.txt' % data_split, 'r', encoding='utf-8'):```*
+***------>** ```tokens = nltk.tokenize.word_tokenize(str(caption).lower())```*
## Download data and vocab
We follow [SCAN](https://github.com/kuanghuei/SCAN) to obtain image features and vocabularies, which can be downloaded by using:
```bash
-wget https://scanproject.blob.core.windows.net/scan-data/data.zip
-wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip
+https://www.kaggle.com/datasets/kuanghueilee/scan-features
+```
+Another download link is available below:
+
+```bash
+https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
```
## Pre-trained models and evaluation
@@ -82,16 +92,18 @@ For Flickr30K:
If SGRAF is useful for your research, please cite the following paper:
- @inproceedings{Diao2021SGRAF,
- title={Similarity Reasoning and Filtration for Image-Text Matching},
- author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
- booktitle={AAAI},
- year={2021}
- }
+ @inproceedings{Diao2021SGRAF,
+ title={Similarity reasoning and filtration for image-text matching},
+ author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
+ booktitle={Proceedings of the AAAI conference on artificial intelligence},
+ volume={35},
+ number={2},
+ pages={1218--1226},
+ year={2021}
+ }
## License
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
-If any problems, please contact me at (r1228240468@mail.dlut.edu.cn) or (r1228240468@gmail.com).
diff --git a/data.py b/data.py
index 03c0b9e..2ab256a 100644
--- a/data.py
+++ b/data.py
@@ -20,9 +20,15 @@ def __init__(self, data_path, data_split, vocab):
# load the raw captions
self.captions = []
- with open(loc+'%s_caps.txt' % data_split, 'rb') as f:
- for line in f:
- self.captions.append(line.strip())
+
+ # -------- The main difference between python2.7 and python3.6 --------#
+ # The suggestion from Hongguang Zhu (https://github.com/KevinLight831)
+ # ---------------------------------------------------------------------#
+ # for line in open(loc+'%s_caps.txt' % data_split, 'r', encoding='utf-8'):
+ # self.captions.append(line.strip())
+
+ for line in open(loc+'%s_caps.txt' % data_split, 'rb'):
+ self.captions.append(line.strip())
# load the image features
self.images = np.load(loc+'%s_ims.npy' % data_split)
@@ -40,14 +46,18 @@ def __init__(self, data_path, data_split, vocab):
def __getitem__(self, index):
# handle the image redundancy
- img_id = index/self.im_div
+ img_id = index//self.im_div
image = torch.Tensor(self.images[img_id])
caption = self.captions[index]
vocab = self.vocab
+ # -------- The main difference between python2.7 and python3.6 --------#
+ # The suggestion from Hongguang Zhu(https://github.com/KevinLight831)
+ # ---------------------------------------------------------------------#
+ # tokens = nltk.tokenize.word_tokenize(str(caption).lower())
+
# convert caption (string) to word ids.
- tokens = nltk.tokenize.word_tokenize(
- str(caption).lower().decode('utf-8'))
+ tokens = nltk.tokenize.word_tokenize(caption.lower().decode('utf-8'))
caption = []
caption.append(vocab(''))
caption.extend([vocab(token) for token in tokens])
diff --git a/evaluation.py b/evaluation.py
index 5d300ff..4a48a9b 100644
--- a/evaluation.py
+++ b/evaluation.py
@@ -59,7 +59,7 @@ def __str__(self):
"""Concatenate the meters in one log line
"""
s = ''
- for i, (k, v) in enumerate(self.meters.iteritems()):
+ for i, (k, v) in enumerate(self.meters.items()):
if i > 0:
s += ' '
s += k + ' ' + str(v)
@@ -68,7 +68,7 @@ def __str__(self):
def tb_log(self, tb_logger, prefix='', step=None):
"""Log using tensorboard
"""
- for k, v in self.meters.iteritems():
+ for k, v in self.meters.items():
tb_logger.log_value(prefix + k, v.val, step=step)
@@ -125,7 +125,7 @@ def evalrank(model_path, data_path=None, split='dev', fold5=False):
opt.data_path = data_path
# load vocabulary used by the model
- vocab = deserialize_vocab(os.path.join(opt.vocab_path, '%s_vocab.json' % opt.data_name))
+ vocab = deserialize_vocab('./vocab/%s_vocab.json' % opt.data_name)
opt.vocab_size = len(vocab)
# construct model
@@ -295,5 +295,5 @@ def t2i(images, captions, caplens, sims, npts=None, return_ranks=False):
if __name__ == '__main__':
- evalrank("/apdcephfs/share_1313228/home/haiwendiao/SGRAF-master/runs/SAF_module/checkpoint/model_best.pth.tar",
- data_path="/apdcephfs/share_1313228/home/haiwendiao", split="test", fold5=False)
+ evalrank("./runs/Flickr30K_SGRAF/f30k_SAF/model_best.pth.tar",
+ data_path='./data', split="test", fold5=False)
diff --git a/model.py b/model.py
index 1b1171b..1985b1e 100644
--- a/model.py
+++ b/model.py
@@ -119,7 +119,7 @@ def forward(self, captions, lengths):
cap_emb, _ = pad_packed_sequence(out, batch_first=True)
if self.use_bi_gru:
- cap_emb = (cap_emb[:, :, :cap_emb.size(2)/2] + cap_emb[:, :, cap_emb.size(2)/2:])/2
+ cap_emb = (cap_emb[:, :, :cap_emb.size(2)//2] + cap_emb[:, :, cap_emb.size(2)//2:])/2
# normalization in the joint embedding space
if not self.no_txtnorm:
diff --git a/requirements.txt b/requirements.txt
index 870eadc..dcfe491 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,56 +1,51 @@
-backports.functools-lru-cache==1.6.1
-backports.weakref==1.0.post1
-bleach==1.5.0
-boto3==1.17.8
-botocore==1.20.8
-certifi==2019.11.28
-cffi==1.14.0
+absl-py==0.12.0
+astor==0.8.1
+boto3==1.17.53
+botocore==1.20.53
+cached-property==1.5.2
+certifi==2020.12.5
+cffi==1.14.5
chardet==4.0.0
click==7.1.2
-cloudpickle==1.3.0
-cycler==0.10.0
-Cython==0.29.13
-decorator==4.4.2
-enum34==1.1.10
-funcsigs==1.0.2
-futures==3.3.0
-html5lib==0.9999999
+docopt==0.6.2
+gast==0.4.0
+google-pasta==0.2.0
+grpcio==1.37.0
+h5py==3.1.0
idna==2.10
+importlib-metadata==3.10.1
jmespath==0.10.0
-joblib==0.14.1
-kiwisolver==1.1.0
-Markdown==3.1.1
-matplotlib==2.2.4
-mock==3.0.5
-networkx==2.2
-nltk==3.4.5
-numpy==1.16.5
+joblib==1.0.1
+Keras-Applications==1.0.8
+Keras-Preprocessing==1.1.2
+Markdown==3.3.4
+mkl-fft==1.3.0
+mkl-random==1.1.1
+mkl-service==2.3.0
+nltk==3.6.1
+numpy==1.16.4
olefile==0.46
-opencv-python==4.2.0.32
-pandas==0.24.2
-Pillow==6.2.1
-protobuf==3.12.2
-ptflops==0.6.4
-pycocotools==2.0
+Pillow==8.2.0
+pipreqs==0.4.10
+protobuf==3.15.8
pycparser==2.20
-pyparsing==2.4.7
python-dateutil==2.8.1
-pytz==2020.1
-PyWavelets==1.0.3
-regex==2020.11.13
+regex==2021.4.4
requests==2.25.1
-s3transfer==0.3.4
-sacremoses==0.0.43
-scikit-image==0.14.5
-scipy==1.2.3
-singledispatch==3.4.0.3
+s3transfer==0.3.7
+scipy==1.5.4
six==1.15.0
-subprocess32==3.5.4
+tensorboard==1.14.0
tensorboard-logger==0.1.0
-tensorflow==1.4.0
-tensorflow-tensorboard==0.4.0
-torch==0.4.1.post2
-torchvision==0.2.0
-tqdm==4.56.2
-urllib3==1.26.3
+tensorflow-estimator==1.14.0
+tensorflow-gpu==1.14.0
+termcolor==1.1.0
+torch==1.1.0
+torchvision==0.3.0
+tqdm==4.60.0
+typing-extensions==3.7.4.3
+urllib3==1.26.4
Werkzeug==1.0.1
+wrapt==1.12.1
+yarg==0.1.9
+zipp==3.4.1
diff --git a/visualize.py b/visualize.py
new file mode 100644
index 0000000..2bc6577
--- /dev/null
+++ b/visualize.py
@@ -0,0 +1,16 @@
+"""
+# Please refer to https://github.com/Paranioar/RCAR for related visualization code.
+# It now includes visualize_attention_mechanism, visualize_similarity_distribution, visualize_rank_result, and etc.
+
+# I will continue to update more related visualization codes when I am free.
+# If you find these codes are useful, please cite our papers and star our projects. (We do need it! HaHaHaHa.)
+# Thanks for the interest in our projects.
+"""
+
+
+
+
+
+
+
+
diff --git a/vocab.py b/vocab.py
index c0e5329..c727bb3 100644
--- a/vocab.py
+++ b/vocab.py
@@ -1,11 +1,3 @@
-# -----------------------------------------------------------
-# Stacked Cross Attention Network implementation based on
-# https://arxiv.org/abs/1803.08024.
-# "Stacked Cross Attention for Image-Text Matching"
-# Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He
-#
-# Writen by Kuang-Huei Lee, 2018
-# ---------------------------------------------------------------
"""Vocabulary wrapper"""
import nltk