Download


Annotations

Input Questions

Input Images

 

 

  • Training images v1.0* [features†]

    25,482 images from CC3M, Fakeddit and VCR

     

  • Validation images v1.0* [features†]

    3,793 images from CC3M, Fakeddit and VCR

  • Testing images v1.0* [features†]

    11,362 images from CC3M, Fakeddit and VCR

 

 

* Due to copyright issues, we cannot directly release raw images from Conceptual Captions 3M dataset (CC3M). Please use the original image urls included in question JSON files to download. Follow links to download raw images from Fakeddit and VCR.
† Image features are extracted from Faster R-CNN model.


Input Questions Format


The questions are stored using the JSON file format.

The questions format has the following data structure:

{
"info" : info,
"task_type" : str,
"data_type": str,
"data_subtype": str,
"questions" : [question],
"license" : license
}

info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}

license{
"name" : str,
"url" : str
}

question{
"question_id" : int,
"image_id" : int,
"question" : str,
"image_url": str
}

task_type: type of annotations in the JSON file (Open-Ended).
data_type (image_source in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
image_url (only for CC3M images in AVQA): original image urls in CC3M release.



Annotation Format


The annotations are stored using the JSON file format.

The annotations format has the following data structure:

{
"info" : info,
"data_type": str,
"data_subtype": str,
"annotations" : [annotation],
"license" : license
}

info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}

license{
"name" : str,
"url" : str
}

annotation{
"question_id" : int,
"image_id" : int,
"answer_type" : str,
"answers" : [answer],
}

answer{
"answer_id" : int,
"answer" : str
}

data_type (image_source in AVQA): source of the images (mscoco or CC3M/VCR/Fakeddit).
data_subtype: data splits (e.g. val2014/test2015 of mscoco for AdVQA, R1/2/3 train/val/test for AVQA).
answer_type: type of the answer. Currently, "yes/no", "number", and "other".