Offline Handwritten Text Extraction and Recognition Using CNN-BLSTM-CTC Network

Authors

  • Ranila Shrestha National College of Engineering, Talchikhel, Lalitpur, Nepal
  • Oshin Shrestha National College of Engineering, Talchikhel, Lalitpur, Nepal
  • Monika Shakya National College of Engineering, Talchikhel, Lalitpur, Nepal
  • Urja Bajracharya National College of Engineering, Talchikhel, Lalitpur, Nepal
  • Subash Panday National College of Engineering, Talchikhel, Lalitpur, Nepal

DOI:

https://doi.org/10.3126/injet.v1i1.60941

Keywords:

handwritten forms, offline handwriting recognition, NCE Admission forms, image segmentation, CNN-BLSTM-CTC

Abstract

Offline handwriting recognition is a significant research area that aims at tackling problems encountered with handwritten forms in college application and registration processes. The objective of this study is to address the problems of English language offline handwriting recognition via CNN-BLSTM-CTC neural network applied for an NCE Admission form. The system uses OpenCV for image processing, TensorFlow for neural network training and handwritten text recognition, and trains and tests it on the IAM database using image segmentation-based handwriting recognition. With the help of proper image verification, the system allows the users to upload images of the NCE Admission form provided that they strictly comply with the specified format; it denies access to images not conforming to the set standards. Following the successful delivery of a valid image, the form goes through extensive processing that includes text extraction from specific regions of interest (ROIs). The extracted texts are then passed to text recognition block. The recognized texts are then recorded in a CSV file under respective fields. The text recognition model has a CER of approximately 9.33%. The study performed with 15 NCE Admission forms found that the average Character Error Rate (CER) was approximately 12.2% for scanned images and 19.3% for camera-captured images. The results show that accuracy depends on aspects such as the quality and orientation of the image; thus, scanned images are preferred for better performance.

Downloads

Download data is not yet available.
Abstract
174
PDF
66

Downloads

Published

2023-12-21

How to Cite

Shrestha, R., Shrestha, O., Shakya, M., Bajracharya, U., & Panday, S. (2023). Offline Handwritten Text Extraction and Recognition Using CNN-BLSTM-CTC Network. International Journal on Engineering Technology, 1(1), 166–180. https://doi.org/10.3126/injet.v1i1.60941

Issue

Section

Articles