A Dataset of Medical Images, Captions, and Textual References
MedICaT is a dataset of medical images, captions, subfigure-subcaption annotations, and inline textual references. Instructions for access are provided here.Figures and captions are extracted from open access articles in PubMed Central and corresponding reference text is derived from S2ORC.The dataset consists of:217,060 figures from 131,410 open access papers7507 subcaption and subfigure annotations for 2069 compound figuresInline references for ~25K figures in the ROCO dataset