Extractive Nepali Question Answering System

Authors

  • Yunika Bajracharya Department of Electronics and Computer Engineering, Pulchowk Campus, IOE, TU, Nepal.
  • Suban Shrestha 2Department of Electronics and Computer Engineering, Pulchowk Campus, IOE, TU, Nepal.
  • Saurav Bastola 2Department of Electronics and Computer Engineering, Pulchowk Campus, IOE, TU, Nepal.
  • Sanjivan Satyal Assoc. Professor, Department of Electronics and Computer Engineering, Pulchowk Campus, IOE, TU, Nepal

DOI:

https://doi.org/10.3126/kjse.v9i1.78368

Keywords:

Extractive Question Answering, Low-Resource NLP, Nepali Language, BERT, SQuAD

Abstract

There is a noticeable gap in language processing tools and resources for Nepali, a language spoken by more than 17 million people [1] yet significantly underrepresented in computational linguistics. We present an Extractive Nepali Question Answering System designed to generate precise, contextually accurate responses in Nepali. Addressing the lack of high-quality training data, we contribute three key datasets: a Nepali and Hindi translation of SQuAD 1.1, a Nepali translation of XQuAD for benchmarking, and a curated Nepali QA dataset derived from Belebele’s MCQ data. To mitigate translation-induced answer span loss, we utilize translation-invariant tokens, improving span retention from 50% to 93%, and evaluate translation quality using human assessment and GPT-4, confirming a faithful answer span distribution. We evaluate our models on XQuAD and our curated dataset, demonstrating the effectiveness of fine-tuning multilingual models for Nepali QA. Our best-performing model achieves an exact match (EM) score of 72.99 and an F1 score of 84.13 on XQuAD-Nepali. These results establish a strong baseline for Nepali QA and highlight the impact of utilizing cross-lingual transfer from same language family data. All datasets and code are publicly available, encouraging further advancements in Nepali NLP research.

Downloads

Download data is not yet available.
Abstract
265
PDF
156

Downloads

Published

2025-05-07

How to Cite

Yunika Bajracharya, Suban Shrestha, Saurav Bastola, & Sanjivan Satyal. (2025). Extractive Nepali Question Answering System. KEC Journal of Science and Engineering, 9(1), 95–102. https://doi.org/10.3126/kjse.v9i1.78368

Issue

Section

Articles