Name		Name	Last commit message	Last commit date
parent directory ..
lxrt		lxrt
tasks		tasks
README.md		README.md
StructVBERT-talk.pdf		StructVBERT-talk.pdf
param.py		param.py
requirements.txt		requirements.txt
run_nlvr_test.sh		run_nlvr_test.sh
run_nlvr_val.sh		run_nlvr_val.sh
run_vqa.sh		run_vqa.sh
run_vqa_predict.sh		run_vqa_predict.sh
utils.py		utils.py

README.md

StructVBERT：Visual-Linguistic Pre-training for Visual Question Answering

VQA challenge 2020 Runner up

Introduction

We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning.

Pre-trained models

Model	Description	#params	Download
structvbert.en.base	StructVBERT using the BERT-base architecture	110M	structvbert.en.base
structvbert.en.large	StructVBERT using the BERT-large architecture	355M	Coming soon

Results

The results of VQA & NLVR2 tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.

structvbert.en.base

Split	VQA	NLVR2
Local Validation	71.80%	77.66%
Test-Dev	74.11%	78.13% (Test-P)

Example usage

Requirements and Installation

PyTorch version >= 1.3.0
Install other libraries via

pip install -r requirements.txt

For faster training install NVIDIA's apex library
The codebase is built on top of LXMERT codebase. Please first download the pretrained structvbert model, VQA and NLVR2 data and VQA and NLVR2 image features VQA train, VQA val, VQA test, NLVR2 train, NLVR2 val and NLVR2 test.
After downloading the data and features from the drives, please re-organize them according to the following example:

REPO ROOT
 |
 |-- data                  
 |    |-- vqa
 |    |    |-- train.json
 |    |    |-- minival.json
 |    |    |-- nominival.json
 |    |    |-- test.json
 |    |    |-- trainval_ans2label.json
 |    |    |-- trainval_label2ans.json
 |    |    |-- all_ans.json
 |    |    |-- coco_minival_img_ids.json
 |    |
 |    |-- mscoco_imgfeat
 |    |    |-- train_npz
 |    |    |    |-- *.npz
 |    |    |-- valid_npz
 |    |    |    |-- *.npz
 |    |    |-- test_npz
 |    |	   |    |-- *.npz
 |    |
 |    |-- nlvr2_imgfeat
 |    |    |-- nlvr2_train_npz
 |    |    |    |-- *.npz
 |    |    |-- nlvr2_valid_npz
 |    |    |    |-- *.npz
 |    |    |-- nlvr2_test_npz
 |    |    |    |-- *.npz
 | 
 |-- pretrained_model
 |    |-- bert_config.json
 |    |-- pytorch_model.bin
 |    |-- vocab.txt
 |-- lxrt
 |-- tasks
 |-- run_vqa.sh
 |-- run_vqa_predict.sh
 |-- run_nlvr.sh
 |-- *.py

Please also kindly contact us if anything is missing!

VQA

Fine-tuning

After all the data and models are downloaded and arranged as the example of REPO ROOT, you can directly finetune with our script as:

sh run_vqa.sh

Now we use NVIDIA's apex library for faster training, which can increase the speed by more than 1.5 times. Note: it can only be used on V100 machine, if you do not have V100 GPU card, please remove the --amp_type O1 option.

Predict & Submitted to VQA test server

sh run_vqa_predict.sh

The test results will be saved in output/$name/test_predict.json. The VQA 2.0 challenge for this year is host on EvalAI at https://eval.ai/web/challenges/challenge-page/830/leaderboard/2278. After registration, you should only upload the test_predict.json file to the server and check the submission result.

NLVR2

Fine-tuning

After all the data and models are downloaded and arranged as the example of REPO ROOT, just as fine-tuning for VQA, you can directly finetune with our script as: For local validation set,

sh run_nlvr_val.sh

For Test-P set,

sh run_nlvr_test.sh

Reference

For more details, you can also see our slides and talk on CVPR 2020 VQA Challenge Workshop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StructVBERT

StructVBERT

README.md

StructVBERT：Visual-Linguistic Pre-training for Visual Question Answering

Introduction

Pre-trained models

Results

structvbert.en.base

Example usage

Requirements and Installation

VQA

Fine-tuning

Predict & Submitted to VQA test server

NLVR2

Fine-tuning

Reference

Files

StructVBERT

Directory actions

More options

Directory actions

More options

Latest commit

History

StructVBERT

Folders and files

parent directory

README.md

StructVBERT：Visual-Linguistic Pre-training for Visual Question Answering

Introduction

Pre-trained models

Results

structvbert.en.base

Example usage

Requirements and Installation

VQA

Fine-tuning

Predict & Submitted to VQA test server

NLVR2

Fine-tuning

Reference