Back

Audio-Visual Event (AVE) Dataset

Dataset for audio-visual video understanding research

Audio-Visual Event (AVE) Dataset

We introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event (AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization.Audio-Visual Event (AVE) dataset contains 4143 videos covering 28 event categories and videos in AVE are temporally labeled with audio-visual event boundaries.

Try V7 now
->
University of Rochester
View author website
Task
Event Detection
Annotation Types
Bounding Boxes
4143
Items
28
Classes
4143
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
Research Only
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now