Nepali Music Genre Classification Using CNN-SVM Hybrid Architecture
DOI:
https://doi.org/10.3126/injet.v3i2.95495Keywords:
Music Genre Classification, CNN, SVM, Mel Spectrogram, Machine Learning, Audio FeaturesAbstract
Nepali music genre classification using CNN–SVM hybrid model was developed to address the challenge of categorizing local genres such as Gazal, Lok Dohori, Nephop, and Pop. A dataset of 1,000 manually curated songs (250 per genre) was collected from YouTube, segmented into 30‑second clips at 25%, 50%, and 75% of each track’s duration, resulting in approximately 3,000 audio segments. Each segment was converted into a 128×128 Log‑Mel spectrogram. A four‑layer CNN extracted a 64‑dimensional embedding, which was then passed to an SVM classifier. Experiments showed that the CNN–SVM hybrid with an RBF kernel achieved 88.29% accuracy, outperforming the standalone CNN baseline (84.28%). Among evaluated kernels, RBF and Linear both achieved the highest accuracy of 88.29%, while the Sigmoid kernel performed worst at 79.60%. The results demonstrate that combining deep learning feature extraction with a traditional machine learning classifier is effective for Nepali music genre classification on moderate‑sized, domain‑specific datasets.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.