<- Back to Datasets

MedICaT

A Dataset of Medical Images, Captions, and Textual References

MedICaT

MedICaT is a dataset of medical images, captions, subfigure-subcaption annotations, and inline textual references. Instructions for access are provided here.Figures and captions are extracted from open access articles in PubMed Central and corresponding reference text is derived from S2ORC.The dataset consists of:217,060 figures from 131,410 open access papers7507 subcaption and subfigure annotations for 2069 compound figuresInline references for ~25K figures in the ROCO dataset

View this Dataset
->
Allen Institute for AI
View author website
Task
Medical Images
Annotation Types
Semantic Segmentation
131000
Items
28
Classes
217000
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
Research Only
Gain control of your training data
15,000+ ML engineers can’t be wrong