Back

The Massively Multilingual Image Dataset

Words and their images in 100 languages

The Massively Multilingual Image Dataset

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word’s translation into English (and corresponding images.)By far the largest dataset of its kind, it has 100 languages (including English) and up to 10,000 words per language! (and many more for English.)

Try V7 now
->
University of Pennsylvania
View author website
Task
Image Classification
Annotation Types
Classification Tags
Items
Classes
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
Research Only
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now