Back

ScanRefer Dataset

3D Object Localization in RGB-D Scans using Natural Language

ScanRefer Dataset

We introduce the new task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, where the core idea is to learn a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor then correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of the 3D bounding box of the target object. In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.

View this Dataset
->
Technical University of Munich and Simon Fraser University.
View author website
Task
3D Object Detection
Annotation Types
Bounding Boxes
46173
Items
8
Classes
9943
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
CC-BY-NC-SA