Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis

Authors

  • Saurab Adhikari Nesfield International College

DOI:

https://doi.org/10.3126/batuk.v10i1.62303

Keywords:

machine learning, natural language processing, sentiment analysis, text analysis

Abstract

This paper shows the comparison of five different supervised machine learning models by showing the accuracy and classification report of these models when used for tweets sentiments analysis while showing the improvement in accuracy when data was preprocessed and parameters were tuned. The five different models that were used are: NaiveBayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM) and XG Boost. Total of 25000 tweets were processed, analyzed and predicted the output as positive, negative, or neutral using those models. This research would help to understand which models should be used and followed and which model would yield higher accuracy while using various approaches of data preprocessing and parameters tuning. The paper also tries to show that the standard models can still perform better and are still viable for sentiment analysis while SVM and Random Forest classifiers maybe viewed as standard learning strategies.

Downloads

Download data is not yet available.
Abstract
79
Pdf
36

Author Biography

Saurab Adhikari, Nesfield International College

Faculty

Downloads

Published

2024-01-29

How to Cite

Adhikari, S. (2024). Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis. The Batuk, 10(1), 133–151. https://doi.org/10.3126/batuk.v10i1.62303

Issue

Section

Part II: Humanities and Social Sciences