Back

LAION-400M

The world’s largest openly available image-text-pair dataset with 400 million samples

LAION-400M

The LAION-400M dataset is completely openly, freely accessible.All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021.

Try V7 now
->
TRULY OPEN AI
View author website
Task
Image Captioning
Annotation Types
Bounding Boxes
400000000
Items
7
Classes
400000000
Labels
Models using this dataset
Last updated on 
October 31, 2023
Licensed under 
CC-BY
Blog
Learn about machine learning and latests advancements in AI.
Read More
Playbooks
Discover how to optimize AI for your business.
Learn more
Case Studies
Discover how V7 empowers AI industry greats.
Explore now
Webinars
Explore AI topics, gain insights, and learn from experts.
Watch now