Adversarial VQA

Download

Annotations

Input Questions

Input Images

Training annotations v1.0

692,480 answers for verified questions

73,075 answers for unverified questions
Validation annotations v1.0

87,700 answers for verified questions

Training questions v1.0

69,248 verified questions

73,075 unverified questions
Validation questions v1.0

8,770 verified questions
Testing questions v1.0

26,392 verified questions

Training images v1.0* [features†]

25,482 images from CC3M, Fakeddit and VCR
Validation images v1.0* [features†]

3,793 images from CC3M, Fakeddit and VCR
Testing images v1.0* [features†]

11,362 images from CC3M, Fakeddit and VCR

Validation annotations v1.0

100,000 answers for verified questions (last updated: Oct. 15th, 2021)

Validation questions v1.0

10,000 verified questions
Testing questions v1.0

36,807 verified questions

Validation images v1.0

5,000 images from COCO val2014
Testing images v1.0

36,807 image from COCO test2015

* Due to copyright issues, we cannot directly release raw images from Conceptual Captions 3M dataset (CC3M). Please use the original image urls included in question JSON files to download. Follow links to download raw images from Fakeddit and VCR.
† Image features are extracted from Faster R-CNN model.

Input Questions Format

The questions are stored using the JSON file format.

The questions format has the following data structure:

{ "info" : info, "task_type" : str, "data_type": str, "data_subtype": str, "questions" : [question], "license" : license } info { "year" : int, "version" : str, "description" : str, "contributor" : str, "url" : str, "date_created" : datetime } license{ "name" : str, "url" : str } question{ "question_id" : int, "image_id" : int, "question" : str, "image_url": str }

task_type: type of annotations in the JSON file (Open-Ended).
data_type (image_source in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
image_url (only for CC3M images in AVQA): original image urls in CC3M release.

Annotation Format

The annotations are stored using the JSON file format.

The annotations format has the following data structure:

{ "info" : info, "data_type": str, "data_subtype": str, "annotations" : [annotation], "license" : license } info { "year" : int, "version" : str, "description" : str, "contributor" : str, "url" : str, "date_created" : datetime } license{ "name" : str, "url" : str } annotation{ "question_id" : int, "image_id" : int, "answer_type" : str, "answers" : [answer], } answer{ "answer_id" : int, "answer" : str }

data_type (image_source in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
answer_type: type of the answer. Currently, "yes/no", "number", and "other".