Multi-Label Toxic Comment Detection Using BERT-Based NLP and Machine Learning
DOI:
https://doi.org/10.3126/injet-indev.v2i2.95699Keywords:
Natural Language Processing, Machine Learning, BERT, Toxic Comment Detection, Multi-Label Classification, Online SafetyAbstract
The rapid growth of online communication platforms has significantly increased the prevalence of toxic and abusive language, creating major challenges for content moderation and online safety. Traditional keyword-based and machine learning approaches often fail to capture contextual and semantic nuances present in toxic comments. This study proposes a BERT-based toxic comment detection system using Natural Language Processing (NLP) and deep learning techniques for multi-label classification of online comments. The proposed model classifies comments into six toxicity categories: toxic, severe toxic, obscene, threat, insult, and identity hate using the Jigsaw Toxic Comment Classification Challenge dataset. The methodology includes data preprocessing, tokenization using the BERT tokenizer, and fine-tuning of the pre-trained BERT-base-uncased model. Experimental evaluation demonstrates that the proposed model achieved a micro-average F1-score of 0.7354, outperforming traditional machine learning approaches such as Logistic Regression, Random Forest, Naive Bayes, and Support Vector Machine (SVM). The results indicate that transformer-based architectures effectively capture contextual relationships and implicit toxic expressions compared to conventional methods. However, performance on minority classes such as threat and identity hate remained limited because of severe dataset imbalance. The findings demonstrate the effectiveness of BERT for context-aware toxic comment classification and highlight its potential application in automated moderation systems for safer online communication platforms.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology and Infrastructure Development

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.