ExerLiteNet: Lightweight CNN-LSTM Architecture for Binary Exercise Recognition from Webcam RGB Video
DOI:
https://doi.org/10.3126/injet.v3i2.95537Keywords:
Exercise Recognition, MobileNetV3Small, CNN-LSTM, Spatiotemporal Learning, Transfer Learning, RGB Video, Lightweight Deep Learning, Fine-tuning, OverfitAbstract
Exercise recognition from video is important for building digital workout tracking systems and automated fitness monitoring tools that can provide coaching without the need for expensive equipment. This paper presents ExerLiteNet, a lightweight deep learning model designed for resource-constrained devices, which uses standard RGB webcam video to classify two common resistance exercises, squats and bicep curls. The proposed model combines a fine-tuned MobileNetV3Small CNN wrapped in a Time Distributed architecture to extract spatial features from each video frame, and a Long Short-Term Memory (LSTM) network to capture the motion patterns across a sequence of 15 frames. The training data consisted of 299 videos across two exercise classes, each averaging 10 seconds in length, collected from both stock video sources and webcam recordings. The model achieved a classification accuracy of 92.08% on the test dataset after fine-tuning, outperforming a frozen ResNet50 baseline (82.31%) and an intermediate frozen MobileNetV3Small + LSTM configuration (87.08%). In terms of computational efficiency, the proposed model has a total size of approximately 178 MB and requires only 1.8 GFLOPs per inference sequence, which is significantly lower than VGG16+LSTM at approximately 553 MB and 15.43 GFLOPs, and ResNet50 at approximately 3.8 GFLOPs with no temporal modeling capability. The model also runs at approximately 15 frames per second with an end-to-end inference latency of 2.3 seconds per sequence on a standard webcam setup. These results show that combining a lightweight convolutional architecture with sequence modeling can achieve competitive accuracy while remaining practical for deployment on everyday hardware without any specialized sensors or depth cameras.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.