A Synthetic Dataset for Semantic Amodal Instance Level Video Object Segmentation
The SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation) is a dataset aiming to stimulate semantic amodal segmentation research. Humans can effortlessly recognize partially occluded objects and reliably estimate their spatial extent beyond the visible. However, few modern computer vision techniques are capable of reasoning about occluded parts of an object. This is partly due to the fact that very few image datasets and no video dataset exist which permit development of those methods. To address this issue, we present the SAIL-VOS dataset, a synthetic dataset extracted from the photo-realistic game GTA-V.The SAIL-VOS dataset contains in total 201 video sequences and 111,654 frames. The training set contains 160 video sequences (84,781 images, 1,388,389 objects) while the validation set contains 41 video sequences (26,873 images, 507,906 objects). In addition to the training and validation set, we retain a test-dev set and a test-challenge set for future use.