Tagging Angika Corpus Using BIS Scheme: A Preliminary Study

Authors

  • Jyoti Kumari Department of Linguistics, Banaras Hindu University

DOI:

https://doi.org/10.3126/nl.v39i1.86158

Keywords:

Angika, NLP, POS Tagset, low-resource language

Abstract

Angika is an Eastern Indo-Aryan language spoken mainly in the southeastern regions of Bihar, Jharkhand and in some areas of Nepal. Angika is a Low-resource language due to the absence of linguistic resources and NLP tools.. The primary challenge for developing NLP tools for the Angika language is the lack of corpora. In this context, the BIS POS Tagset for Indian languages has been adopted to facilitate Part-of-Speech (POS) tagging for Angika. Part-of-Speech (POS) tagging is a fundamental task in Natural Language Processing (NLP) that involves assigning grammatical categories, such as nouns, verbs, adjectives, and adverbs, to words in a text. This article aims to explore the application of the BIS POS Tagset for Angika.

Downloads

Download data is not yet available.
Abstract
1
PDF
2

Downloads

Published

2025-11-12

How to Cite

Kumari , J. (2025). Tagging Angika Corpus Using BIS Scheme: A Preliminary Study . Nepalese Linguistics, 39(1), 58–64. https://doi.org/10.3126/nl.v39i1.86158

Issue

Section

Articles