Nepali Speech Emotion Detection Using Deep Learning

Authors

  • Uttam Pandeya Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
  • Basanta Joshi Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal

DOI:

https://doi.org/10.3126/injet.v3i2.95516

Keywords:

Speech Emotion Recognition, Nepali Language, Deep Learning, MFCC, CNN

Abstract

Emotionally intelligent human-computer interaction solutions depend on Speech Emotion Recognition (SER), which

attempts to recognize emotional states from speech. There is still little research on SER for languages with limited resources, like Nepali. In this work, a one-dimensional Convolutional Neural Network (1D-CNN) and Mel-Frequency Cepstral Coefficients(MFCCs) are used in a deep learning-based Nepali speech emotion detection system. 1,810 audio samples of 632 happy, 560 neutral, and 618 sad utterances were gathered from studio recordings, mobile recordings, podcasts, and broadcast sources to create a specific Nepali emotional speech dataset. Every audio sample underwent preprocessing, resampling to 16 kHz, and conversion to mono. A 1D-CNN model was fed MFCC features that had been retrieved. The suggested model yields an overall accuracy of 88% on the Nepali dataset, according to experimental results. With a precision of 0.96, recall of 0.92, and F1-score of 0.94, the Sad emotion class performed the best. The Neutral class received a precision of 0.89 and an F1-score of 0.81, but the Happy class received a recall of 0.98 and an F1-score of 0.89. Strong discrimination was shown by ROC analysis, with AUC values of 0.97 for neutral and 0.99 for happy and sad.

Downloads

Download data is not yet available.
Abstract
8
PDF
4

Downloads

Published

2026-06-18

How to Cite

Pandeya, U., & Joshi, B. (2026). Nepali Speech Emotion Detection Using Deep Learning. International Journal on Engineering Technology, 3(2), 133–141. https://doi.org/10.3126/injet.v3i2.95516