NEPTUN: Normalization for Romanized Nepali Sentiment Analysis

Chandra Prakash Chaudhary; Basanta Joshi; Aman Shakya; Santosh Giri

doi:10.3126/jhcoe.v2i1.91508

Authors

Chandra Prakash Chaudhary Institute of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Basanta Joshi Institute of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Aman Shakya Institute of Engineering, Tribhuvan University (TU), Lalitpur, Nepal
Santosh Giri Institute of Engineering, Tribhuvan University (TU), Lalitpur, Nepal

Keywords:

Romanize Nepali, Phonetic Normalization, Sentiment Analysis, NEPTUN, Text Reprocessing

Abstract

The growth of e-commerce has led to rise in user-generated reviews, many of which in Nepal are written in Romanized Nepali a non-standard form with inconsistent spelling, grammar and code-switching with English. These irregularities challenge traditional sentiment analysis methods. This study presents NEPTUN (NEpali Phonetic Translation-Based Unified Normalization), a novel module for normalizing Romanized Nepali, NEPTUN uses phonetic transliteration to map Romanized words to Devnagari, verifies them via a Nepali dictionary, and then back-transliterates them into standardized Romanized forms. It also applies frequency-based filtering to retain common variants, improving consistency. While similar techniques exist for Romanized Hindi and Urdu, NEPTUN is the first tailored to Romanized Nepali. Its effectiveness was tested using various sentiment classifiers- Logistic Regression, Naive Bayes, K-Nearest Neighbors, and BERT. NEPTUN-enhanced preprocessing improved model accuracy, with BERT achieving the highest at 87.56%. These results emphasize the need for domain-specific preprocessing in low-resource language like Nepali.

Abstract

244

PDF

0

NEPTUN: Normalization for Romanized Nepali Sentiment Analysis

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

How to Cite

Information