<- Back to Datasets

LAION-400M

The world’s largest openly available image-text-pair dataset with 400 million samples

LAION-400M

The LAION-400M dataset is completely openly, freely accessible.All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021.

View this Dataset
->
TRULY OPEN AI
https://openai.com
400000000
Items
7
Classes
400000000
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
CC-BY
Label your own datasets on V7
Try our trial or talk to one of our experts.
Start 14 Day Trial
->
Explore Datasets
->