Self-Supervised and Semi-Supervised Learning for Nepali ASR with Limited Labeled Data

Authors

  • Rajesh Raskoti Department of Electronics and Computer Engineering, Himalaya College of Engineering, Kathmandu, Nepal
  • Kobid Karkee Department of Electronics and Computer Engineering, Thapathali Campus, Kathmandu, Nepal

DOI:

https://doi.org/10.3126/injet.v3i2.95511

Keywords:

Automatic Speech Recognition, Devanagari, Low-resource ASR, Nepali ASR, Pseudo-labeling, Self-Supervised Learning, Semi-Supervised Learning

Abstract

Nepali ASR suffers from a severe lack of manually transcribed data. This paper proposes a hybrid framework combining Self-Supervised Learning (SSL) and Semi-Supervised Learning (Semi-SL) to develop high-performance ASR systems under low-resource conditions. Three pre-trained transformer architectures, Wav2Vec2, XLSR-53, and CLSRIL-23 are adapted to the Nepali domain through a two-stage strategy: supervised CTC fine-tuning on limited labeled data, followed by an iterative pseudo-labeling loop that progressively incorporates unlabeled data. Experiments are conducted across four supervised training budgets (1h, 5h, 10h, 20h), evaluated on 16,136 test utterances from the SLR54 Nepali speech corpus. Results demonstrate that the proposed framework achieves competitive, with XLSR-53 reaching 23.05% WER and 5.05% CER at 20h, comparable to a strong supervised baseline. Critically, the proposed method shows measurable data-efficiency gains at intermediate training sizes, where for the XLSR-53 model at 20 hours, the proposed method reduces WER by 17.1% relative to the supervised baseline. Linguistic error analysis reveals that consonant confusion and vowel matra errors consistently account for 66 to 68% of all character-level errors, pointing to language-model integration as the highest-impact next step.

Downloads

Download data is not yet available.
Abstract
5
PDF
4

Downloads

Published

2026-06-18

How to Cite

Raskoti, R., & Karkee, K. (2026). Self-Supervised and Semi-Supervised Learning for Nepali ASR with Limited Labeled Data. International Journal on Engineering Technology, 3(2), 107–114. https://doi.org/10.3126/injet.v3i2.95511