We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning.
Model | Description | #params | Download |
---|---|---|---|
structvbert.en.base | StructVBERT using the BERT-base architecture | 110M | structvbert.en.base |
structvbert.en.large | StructVBERT using the BERT-large architecture | 355M | Coming soon |
The results of VQA & NLVR2 tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
Split | VQA | NLVR2 |
---|---|---|
Local Validation | 71.80% | 77.66% |
Test-Dev | 74.11% | 78.13% (Test-P) |
-
PyTorch version >= 1.3.0
-
Install other libraries via
pip install -r requirements.txt
-
For faster training install NVIDIA's apex library
-
The codebase is built on top of LXMERT codebase. Please first download the pretrained structvbert model, VQA and NLVR2 data and VQA and NLVR2 image features VQA train, VQA val, VQA test, NLVR2 train, NLVR2 val and NLVR2 test.
-
After downloading the data and features from the drives, please re-organize them according to the following example:
REPO ROOT
|
|-- data
| |-- vqa
| | |-- train.json
| | |-- minival.json
| | |-- nominival.json
| | |-- test.json
| | |-- trainval_ans2label.json
| | |-- trainval_label2ans.json
| | |-- all_ans.json
| | |-- coco_minival_img_ids.json
| |
| |-- mscoco_imgfeat
| | |-- train_npz
| | | |-- *.npz
| | |-- valid_npz
| | | |-- *.npz
| | |-- test_npz
| | | |-- *.npz
| |
| |-- nlvr2_imgfeat
| | |-- nlvr2_train_npz
| | | |-- *.npz
| | |-- nlvr2_valid_npz
| | | |-- *.npz
| | |-- nlvr2_test_npz
| | | |-- *.npz
|
|-- pretrained_model
| |-- bert_config.json
| |-- pytorch_model.bin
| |-- vocab.txt
|-- lxrt
|-- tasks
|-- run_vqa.sh
|-- run_vqa_predict.sh
|-- run_nlvr.sh
|-- *.py
Please also kindly contact us if anything is missing!
After all the data and models are downloaded and arranged as the example of REPO ROOT, you can directly finetune with our script as:
sh run_vqa.sh
Now we use NVIDIA's apex library for faster training, which can increase the speed by more than 1.5 times. Note: it can only be used on V100 machine, if you do not have V100 GPU card, please remove the --amp_type O1 option.
sh run_vqa_predict.sh
The test results will be saved in output/$name/test_predict.json
. The VQA 2.0 challenge for this year is host on EvalAI at https://eval.ai/web/challenges/challenge-page/830/leaderboard/2278. After registration, you should only upload the test_predict.json
file to the server and check the submission result.
After all the data and models are downloaded and arranged as the example of REPO ROOT, just as fine-tuning for VQA, you can directly finetune with our script as: For local validation set,
sh run_nlvr_val.sh
For Test-P set,
sh run_nlvr_test.sh
For more details, you can also see our slides and talk on CVPR 2020 VQA Challenge Workshop.