Back

PadChest

A large chest x-ray image dataset with multi-label annotated reports

PadChest

We present a labeled large-scale, high resolution chest x-ray dataset for automated ex-ploration of medical images along with their associated reports. This dataset includes more than 160,000 images from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography.The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS) terminology. A 27% of the reports were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms.Generated labels were validated, achieving a 0.93 Micro-F1 score using an independent test set.To the best of our knowledge, this is the first public database of chest x-rays annotated with the largest number of different labels suitable for training supervised on radiographs, and the first one in Spanish containing radiographic reports.

View this Dataset
->
BIMCV
View author website
Task
Medical Images
Annotation Types
Semantic Segmentation
160000
Items
297
Classes
160000
Labels
Models using this dataset
Last updated on 
January 20, 2022
Licensed under 
Research Only