Back

CapGaze

Human attention in image captioning

CapGaze

There are two parts of the dataset: capgaze1: contains 1000 images, and raw data (eye-fixations and verbal description) from 5 native English speakers. This part of data was used for the analysis. For data privacy reason, the voice of the verbal description was converted by a masking process (pitch modulation, the content was preserved). capgaze2: contains 3000 images, and processed data (we combined all the eye-fixations from different people for each image into a fixation map). This part of data was used for developing saliency prediction model under the image captioning task.

Try V7 now
->
View author website
Task
Visual Attention
Annotation Types
Bounding Boxes
4000
Items
Classes
4000
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
Unknown
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now