Back

VGG-Sound

A large scale audio-visual dataset

VGG-Sound

VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube.VGG-Sound contains audios spanning a large number of challenging acoustic environments and noise characteristics of real applications. All videos are captured "in the wild" with audio-visual correspondence in the sense that the sound source is visually evident. VGG-Sound consists of both audio and video. Each segment is 10 seconds long.

Try V7 now
->
VGG-SOUND
View author website
Task
Video Classification
Annotation Types
Classification Tags
210000
Items
310
Classes
210000
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
CC-BY
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now