4D Face Dataset with Voice Animation
VOCASET is a 4D face dataset with about 29 minutes of high-fidelity 3D scans captured at 60 fps (with a 3dMD head scanner) and synchronized audio. In total, the dataset contains audio-4D scan pairs captured from 6 female and 6 male subjects. For each subject, the dataset contains 40 sequences of English spoken sentences, each of length three to five seconds. Publicly available are raw scanner data (i.e. raw audio-4D scan pairs), registered data (i.e. in FLAME topology), and unposed data (i.e. registered data where effects of global rotation, translation, and head rotation around the neck are removed).