<- Back to Datasets

VGG-Sound

A large scale audio-visual dataset

VGG-Sound

VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube.VGG-Sound contains audios spanning a large number of challenging acoustic environments and noise characteristics of real applications. All videos are captured "in the wild" with audio-visual correspondence in the sense that the sound source is visually evident. VGG-Sound consists of both audio and video. Each segment is 10 seconds long.

View this Dataset
->
VGG-SOUND
View author website
Task
Video Classification
Annotation Types
Classification Tags
210000
Items
310
Classes
210000
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
CC-BY
Gain control of your training data
15,000+ ML engineers can’t be wrong