Enhancing Spam Detection on Nepali Language SMS
DOI:
https://doi.org/10.3126/jost.v5i1.92659Keywords:
spam filtering, spam, ham, Decision Tree, Logistic Regression, Naive Bayes, Support Vector MachineAbstract
Any junk message that is delivered to a mobile phone as text messaging through the SMS is called SMS spam. As the popularity of mobile phone devices has increased over the recent years, SMS has grown into a multi-billion dollar industry. At the same time, the reduction in the cost of messaging services has resulted in growth in unsolicited commercial advertisements (spam) being sent to mobile phones. The most common filtering technique is content-based filtering which uses the actual text of the message to determine whether it is spam or ham. Since the characteristics used by the filter to identify spam message are constantly changing over time, it is very challenging to represent all information in a mathematical model of classification. The Nepali language is morphologically rich and it is a challenging task to build a model for such language. Different supervised learning classifiers, Decision Tree, Logistic Regression, NB and SVM along with the combination of SVM and NB classifiers as SVM-NB are used for classification of spam and ham on Nepali text in this research. The accuracy, recall rate, precision rate and f1-score for these classifiers are analyzed.