692,480 answers for verified questions
73,075 answers for unverified questions
87,700 answers for verified questions
69,248 verified questions
73,075 unverified questions
8,770 verified questions
26,392 verified questions
100,000 answers for verified questions (last updated: Oct. 15th, 2021)
10,000 verified questions
36,807 verified questions
5,000 images from COCO val2014
36,807 image from COCO test2015
* Due to copyright issues, we cannot directly release raw images from Conceptual Captions 3M dataset (CC3M). Please use the original image urls included in question JSON files to download. Follow links to download raw images from Fakeddit and VCR.
† Image features are extracted from Faster R-CNN model.
The questions are stored using the JSON file format.
The questions format has the following data structure:
{
"info" : info,
"task_type" : str,
"data_type": str,
"data_subtype": str,
"questions" : [question],
"license" : license
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}
license{
"name" : str,
"url" : str
}
question{
"question_id" : int,
"image_id" : int,
"question" : str,
"image_url": str
}
task_type
: type of annotations in the JSON file (Open-Ended).
data_type
(image_source
in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype
: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
image_url
(only for CC3M images in AVQA): original image urls in CC3M release.
The annotations are stored using the JSON file format.
The annotations format has the following data structure:
{
"info" : info,
"data_type": str,
"data_subtype": str,
"annotations" : [annotation],
"license" : license
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}
license{
"name" : str,
"url" : str
}
annotation{
"question_id" : int,
"image_id" : int,
"answer_type" : str,
"answers" : [answer],
}
answer{
"answer_id" : int,
"answer" : str
}
data_type
(image_source
in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype
: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
answer_type
: type of the answer. Currently, "yes/no", "number", and "other".