Skip to content

Latest commit

 

History

History

StructVBERT

StructVBERT:Visual-Linguistic Pre-training for Visual Question Answering

VQA challenge 2020 Runner up

Introduction

We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning.

Pre-trained models

Model Description #params Download
structvbert.en.base StructVBERT using the BERT-base architecture 110M structvbert.en.base
structvbert.en.large StructVBERT using the BERT-large architecture 355M Coming soon

Results

The results of VQA & NLVR2 tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.

structvbert.en.base

Split VQA NLVR2
Local Validation 71.80% 77.66%
Test-Dev 74.11% 78.13% (Test-P)

Example usage

Requirements and Installation

  • PyTorch version >= 1.3.0

  • Install other libraries via

pip install -r requirements.txt
REPO ROOT
 |
 |-- data                  
 |    |-- vqa
 |    |    |-- train.json
 |    |    |-- minival.json
 |    |    |-- nominival.json
 |    |    |-- test.json
 |    |    |-- trainval_ans2label.json
 |    |    |-- trainval_label2ans.json
 |    |    |-- all_ans.json
 |    |    |-- coco_minival_img_ids.json
 |    |
 |    |-- mscoco_imgfeat
 |    |    |-- train_npz
 |    |    |    |-- *.npz
 |    |    |-- valid_npz
 |    |    |    |-- *.npz
 |    |    |-- test_npz
 |    |	   |    |-- *.npz
 |    |
 |    |-- nlvr2_imgfeat
 |    |    |-- nlvr2_train_npz
 |    |    |    |-- *.npz
 |    |    |-- nlvr2_valid_npz
 |    |    |    |-- *.npz
 |    |    |-- nlvr2_test_npz
 |    |    |    |-- *.npz
 | 
 |-- pretrained_model
 |    |-- bert_config.json
 |    |-- pytorch_model.bin
 |    |-- vocab.txt
 |-- lxrt
 |-- tasks
 |-- run_vqa.sh
 |-- run_vqa_predict.sh
 |-- run_nlvr.sh
 |-- *.py

Please also kindly contact us if anything is missing!

VQA

Fine-tuning

After all the data and models are downloaded and arranged as the example of REPO ROOT, you can directly finetune with our script as:

sh run_vqa.sh

Now we use NVIDIA's apex library for faster training, which can increase the speed by more than 1.5 times. Note: it can only be used on V100 machine, if you do not have V100 GPU card, please remove the --amp_type O1 option.

Predict & Submitted to VQA test server
sh run_vqa_predict.sh

The test results will be saved in output/$name/test_predict.json. The VQA 2.0 challenge for this year is host on EvalAI at https://eval.ai/web/challenges/challenge-page/830/leaderboard/2278. After registration, you should only upload the test_predict.json file to the server and check the submission result.

NLVR2

Fine-tuning

After all the data and models are downloaded and arranged as the example of REPO ROOT, just as fine-tuning for VQA, you can directly finetune with our script as: For local validation set,

sh run_nlvr_val.sh

For Test-P set,

sh run_nlvr_test.sh

Reference

For more details, you can also see our slides and talk on CVPR 2020 VQA Challenge Workshop.