Recognizing text from degraded and lowresolution document images is still an open
challenge in the vision community. Existing text recognition systems require a certain
resolution and fails if the document is of lowresolution or heavily degraded or noisy.
This paper presents an end-to-end trainable deep-learning based framework for
joint optimization of document enhancement and recognition. We are using a generative
adversarial network (GAN) based framework to perform image denoising followed by
deep back projection network (DBPN) for super-resolution and use these super-resolved
features to train a bidirectional long short term memory (BLSTM) with Connectionist
Temporal Classification (CTC) for recognition of textual sequences.
The entire network is end-to-end trainable and we obtain improved results than
state-of-the-art for both the image enhancement and document recognition tasks. We
demonstrate results on both printed and handwritten degraded document datasets to
show the generalization capability of our proposed robust framework.