Lip Reading Using Convolutional Neural Networks

Authors

  • Chirag Khatiwada Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800
  • Bishesh Pokharel Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800
  • Mahim Rawal Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800
  • Rowel Maharjan Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800

DOI:

https://doi.org/10.3126/joeis.v4i1.81574

Keywords:

Lip Reading, Convolutional Neural Networks (CNN), Bidirectional Long Short Term Memory(BiLSTM)

Abstract

Lip reading, or the decoding of speech from facial movements, is crucial for enhancing communication for individuals with hearing or speech impairments, as well as for generating accurate captions when audio is compromised. Traditional Automatic Speech (ASR) systems often fall in noisy environments, creating a need for robust visual-based alternatives. The main objective of this study was to develop and evaluate a highly accurate, visual-only automated lip-reading system based on a novel deep-learning architecture.

The methodology employed a hybrid model that combined 3D Convolutional Neural Networks (CNNs) for spatial feature extraction from video frames and Bidirectional Long Short-Term Memory (BiLSTM) networks to analyze temporal dependencies. This model was trained on the GRID corpus dataset, which contains thousands of spoken sentences. Performance was evaluated using Word Error Rate (WER) and Character Error Rate (CER) metrics.

The implemented model demonstrated strong performance, achieving an average WER of 0.1706 and an average CER of 0.0712 on 50 unseen test videos. This translates to a word prediction accuracy of approximately 83% and a character prediction accuracy of 93%. The study concludes that the hybrid CNN-BiLSTM architecture is highly effective for visual speech recognition. The findings have significant implications for creating practical assistive technologies that can serve as a hearing aid for the deaf and a voice for the mute, ultimately improving accessibility and communication.

Downloads

Download data is not yet available.
Abstract
113
pdf
72

Author Biographies

Chirag Khatiwada, Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800

Department of Computer and Electronics Engineering, Khwopa College of
Engineering, Libali, Bhaktapur, Nepal 44800

Bishesh Pokharel, Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800

Department of Computer and Electronics Engineering, Khwopa College of
Engineering, Libali, Bhaktapur, Nepal 44800

Mahim Rawal, Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800

Department of Computer and Electronics Engineering, Khwopa College of
Engineering, Libali, Bhaktapur, Nepal 44800

Rowel Maharjan, Department of Computer and Electronics Engineering, Khwopa College of Engineering, Libali, Bhaktapur, Nepal 44800

Department of Computer and Electronics Engineering, Khwopa College of
Engineering, Libali, Bhaktapur, Nepal 44800

Downloads

Published

2025-07-21

How to Cite

Khatiwada, C., Pokharel, B., Rawal, M., & Maharjan, R. (2025). Lip Reading Using Convolutional Neural Networks. Journal of Engineering Issues and Solutions, 4(1), 187–198. https://doi.org/10.3126/joeis.v4i1.81574

Issue

Section

Research Articles