AI is driven by data—not code.
This bold statement could have sounded outlandish a few years back, but not anymore. However—
There is still one problem.
But, worry not.
In this article, we've put together a comprehensive list of quality computer vision datasets that you can access for free.
Have a look.
Solve any video or image labeling task 10x faster and with 10x less manual work.
Don't start empty-handed. Explore our repository of 500+ open datasets and test-drive V7's tools.
It is V7’s original dataset containing 6500 images of AP/PA chest X-Rays with pixel-level polygonal lung segmentations. There are 517 cases of COVID-19 amongst these.
Each image contains:
Lung annotations are polygons following pixel-level boundaries. You can export them in COCO, VOC, or Darwin JSON formats. Each annotation file contains a URL to the original full resolution image and a reduced size thumbnail.
For more details, check out: COVID-19 X-Ray dataset (Github)
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
CIFAR-10 contains 60000 32x32 color images with 10 classes (animals and real-life objects). There are 6000 images per class. This dataset has 50000 training images and 10000 test images. The classes are mutually exclusive, without any overlaps.
CIFAR-100 consists of 100 classes containing 600 images each. There are 500 training images and 100 testing images per class.
ImageNet is one of the most popular image databases with more than 14 million hand-annotated images.
This database is organized according to the WordNet hierarchy (currently only the nouns), in which hundreds and thousands of images depict each node of the hierarchy. Object-level annotations provide a bounding box around the (visible part of the) indicated object.
It is a large video dataset consisting of 650,000 clips covering 700 human action classes.
The videos include human-object interactions like playing instruments and human-human interactions like hugging. Each action class has at least 700 video clips, and each clip is annotated with an action class lasting for about 10 seconds.
It’s a large database of handwritten single digits containing 60,000 training images and 10,000 testing images.
It was released in 1999 and is used for classification tasks.
LSUN (The Large-scale Scene Understanding) contains close to one million labeled images for each of 10 scene categories and 20 object categories.
For training data, each category contains from 120,000 to even 300,000,000 images. The validation data includes 300 images, and the test data has 1000 images for each category.
It is one of the largest publicly available datasets of human faces with gender, age, and name.
It contains 523,051 images in total, with 460,723 face images from 20,284 celebrities from IMDb and 62,328 from Wikipedia.
The MS COCO (Microsoft Common Objects in Context) dataset is consisting of 328K images. It contains annotations for object detection, keypoints detection, panoptic segmentation, stuff image segmentation, captioning, and Dense human pose estimation.
It is a large-scale database of 13.000 face photographs designed for facial recognition tasks. Each face has been labeled with the person’s name.
Cityscapes is a database containing a diverse set of stereo video sequences recorded in street scenes from 50 different cities. The images were captured over time in various light conditions and weather.
Cityscapes dataset includes semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories. It provides pixel-level annotations of 5000 frames and 20,000 coarsely annotated frames.
This dataset contains 50,000 JPEG images (40,000 for training and 10,000 for testing) with 12 classes. The images are extracted from LabelMe.
Classes include objects such as a car, a person, a tree, or a keyboard. 50% of the images in the training and testing set show a centered object, while the remaining 50% show a randomly selected region of a randomly selected image ("clutter").
This dataset can be used for object recognition.
Places dataset consists of 2.5 million images (with a category label) and 205 scene categories. There are more than 5,000 images per category. It’s trained using CNNs and can be used for scene recognition tasks.
Another dataset contributed by MIT. There are 1.8 million images from 365 scene categories. The dataset contains 50 images per category in the validation set and 900 in the testing set. Places2 Database can be used for scene recognition and generic deep scene features for visual recognition.
It is a large dataset and knowledge base with 108,077 images with annotated objects, attributes, and their relationships.
This dataset has been built using images and annotations (class labels, bounding boxes) from ImageNet. It is a large-scale dataset containing images of 120 breeds of dogs from around the world. There are 20.580 images and 120 categories.
This dataset contains 16,185 images and 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images, where each class has been split roughly in a 50-50 split.
You have to download the images and their class labels and bounding boxes separately.
The CAT dataset includes over 9,000 cat images with annotated facial features. There are annotations of the cat’s head with nine points for each image: two for eyes, one for the mouth, and six for the ears.
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200.000 celebrity images, each with 40 attribute annotations. The annotations include 10,177 unique identities and five landmark locations per image.
The dataset can be used as training and test sets for face detection, face attribute recognition, localization, and landmark (or facial part) localization.
This dataset contains 853 images belonging to the 3 classes and their bounding boxes in the PASCAL VOC format. The classes include “with mask”, “without mask” and “Mask worn incorrectly”.
It is a dataset with more than 7000 unique images in HD resolution.
It consists of early fire and smoke images captured using mobile phones in real-world scenarios. The images were captured under a wide variety of lighting conditions and weather. This dataset can be used for fire and smoke recognition, detection, plus anomaly detection.
It also contains various domestic scenes, including garbage and field crop burning, as well as domestic cooking, etc.
This dataset consists of high-resolution UAS imageries with detailed semantic annotation regarding the damages caused by hurricanes.
The data is collected with a small UAS platform, DJI Mavic Pro quadcopters, after Hurricane Harvey. The whole dataset has 2343 images, divided into training (~60%), validation (~20%), and test (~20%) sets.
PS. Floodnet Dataset was annotated using V7.
Curious to learn more about labeling and training data?
Here’s a couple of resources to get you started:
And if you are ready to take action, check out: