Back

The WikiScenes Dataset

Large-scale dataset of landmark photo collections with descriptions

The WikiScenes Dataset

Our WikiScenes dataset consists of paired images and language descriptions capturing world landmarks and cultural sites, with associated 3D models and camera poses. WikiScenes is derived from the massive public catalog of freely-licensed crowdsourced data in the Wikimedia Commons project, which contains a large variety of images with captions and other metadata. We extract two forms of textual descriptions for each image: (1) Captions associated with images, describing the image using free-form language, and (2) The WikiCategory hierarchy obtained according to the hierarchy of WikiCategories associated with each image (see the examples in the image below). Overall, WikiScenes contains approximately 63K images with textual descriptions.

Try V7 now
->
Tsinghua University
View author website
Task
Image Captioning
Annotation Types
Semantic Segmentation
63000
Items
11
Classes
63000
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
CC-BY-SA
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now