Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification

Kriti Nemkul

doi:10.3126/tuj.v39i1.66679

Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification

Authors

Kriti Nemkul Ratna Rajyalaxmi Campus, TU, Kathmandu, Nepal

DOI:

https://doi.org/10.3126/tuj.v39i1.66679

Keywords:

Transformer, NLP, LSTM, BERT, RoBERTa, AdamW, SVM

Abstract

News classification is a technique of classifying news documents into predefined groups. One of the earliest problems in Natural Language Processing was the categorization of news. Huge number of news is generated from different news portals each day and it is difficult to consign the specific types of news from that portal. News must be assigned into respective appropriate classes as users want to read certain type of news automatically as per the need. Text classification has been done by using different machine learning algorithm like Support Vector Machine (SVM), Long Short-Term Memory (LSTM). However, Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) have not been fully scrutinized for Nepali news classification tasks. This research develops two models for Nepali news classification namely BERT and RoBERTa by collecting news data from various national news portals. Precision, Recall, F1 score and accuracy are used to evaluate the effectiveness of the model. Both models are trained and tested with AdamW optimizer with learning rate 1e-5 i.e., 0.0001. While comparing both models, RoBERTa found to be better than BERT model with accuracy 95.3 percent.

Downloads

Download data is not yet available.

Abstract

653

PDF

536

Downloads

Published

2024-06-20

How to Cite

Nemkul, K. (2024). Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification. Tribhuvan University Journal, 39(1), 124–137. https://doi.org/10.3126/tuj.v39i1.66679

Download Citation

Issue

Vol. 39 No. 1 (2024)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.

Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information