Back

IconQA

A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

IconQA

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. In this work, we propose a new challenging benchmark, icon question answering (IconQA), which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, we build up a large-scale IconQA dataset that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.

Try V7 now
->
Center for Vision, Cognition, Learning and Autonomy, UCLA
View author website
Task
Visual Question Answering
Annotation Types
57672
Items
377
Classes
57672
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
CC-BY-NC-SA
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now