<- Back to Datasets

VoxCeleb2

A large-scale audio-visual dataset of human speech

VoxCeleb2

VoxCeleb2 is a large scale speaker recognition dataset obtained automatically from open-source media. VoxCeleb2 consists of over a million utterances from over 6k speakers. Since the dataset is collected ‘in the wild’, the speech segments are corrupted with real world noise including laughter, cross-talk, channel effects, music and other sounds. The dataset is also multilingual, with speech from speakers of 145 different nationalities, covering a wide range of accents, ages, ethnicities and languages. The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.

View this Dataset
->
EPSRC
https://epsrc.ukri.org
Task
Annotation Types
Items
Classes
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
Unknown
Label your own datasets on V7
Try our trial or talk to one of our experts.