<- Back to Datasets

TVQA Dataset

A Localized, Compositional Video Question Answering Dataset

TVQA Dataset

TVQA is a large-scale video QA dataset based on 6 popular TV shows (Friends, The Big Bang Theory, How I Met Your Mother, House M.D., Grey's Anatomy, Castle). It consists of 152.5K QA pairs from 21.8K video clips, spanning over 460 hours of video. The questions are designed to be compositional, requiring systems to jointly localize relevant moments within a clip, comprehend subtitles-based dialogue, and recognize relevant visual concepts.TVQA+ is a subset of TVQA dataset, additionally augmented with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers.

View this Dataset
->
View author website
Task
Visual Question Answering
Annotation Types
Bounding Boxes
310800
Items
Classes
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
Research Only
Gain control of your training data
15,000+ ML engineers can’t be wrong