Dataset for Text Detection and Recognition
In order to facilitate a new document recognition research, we introduce a Distorted Document Images dataset (DDI-100). To create the dataset we collected 6658 unique document pages, and extended it by applying different types of distortions and geometric transformations. In total, DDI-100 contains 99870 document images together with text masks, stamp masks, text and character locations in terms of bounding boxes.