Nepali Speech Emotion Detection Using Deep Learning

Uttam Pandeya; Basanta Joshi

doi:10.3126/injet.v3i2.95516

Nepali Speech Emotion Detection Using Deep Learning

Authors

Uttam Pandeya Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal
Basanta Joshi Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Lalitpur, Nepal

Keywords:

Speech Emotion Recognition, Nepali Language, Deep Learning, MFCC, CNN

Abstract

Emotionally intelligent human-computer interaction solutions depend on Speech Emotion Recognition (SER), which

attempts to recognize emotional states from speech. There is still little research on SER for languages with limited resources, like Nepali. In this work, a one-dimensional Convolutional Neural Network (1D-CNN) and Mel-Frequency Cepstral Coefficients(MFCCs) are used in a deep learning-based Nepali speech emotion detection system. 1,810 audio samples of 632 happy, 560 neutral, and 618 sad utterances were gathered from studio recordings, mobile recordings, podcasts, and broadcast sources to create a specific Nepali emotional speech dataset. Every audio sample underwent preprocessing, resampling to 16 kHz, and conversion to mono. A 1D-CNN model was fed MFCC features that had been retrieved. The suggested model yields an overall accuracy of 88% on the Nepali dataset, according to experimental results. With a precision of 0.96, recall of 0.92, and F1-score of 0.94, the Sad emotion class performed the best. The Neutral class received a precision of 0.89 and an F1-score of 0.81, but the Happy class received a recall of 0.98 and an F1-score of 0.89. Strong discrimination was shown by ROC analysis, with AUC values of 0.97 for neutral and 0.99 for happy and sad.

Abstract

PDF

Downloads

Published

2026-06-18

How to Cite

Pandeya, U., & Joshi, B. (2026). Nepali Speech Emotion Detection Using Deep Learning. International Journal on Engineering Technology, 3(2), 133-141. https://doi.org/10.3126/injet.v3i2.95516

Download Citation

Issue

Vol. 3 No. 2 (2026): Special Issue, KEC Conference 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

How to Cite

Pandeya, U., & Joshi, B. (2026). Nepali Speech Emotion Detection Using Deep Learning. International Journal on Engineering Technology, 3(2), 133-141. https://doi.org/10.3126/injet.v3i2.95516

Download Citation