Tagging Angika Corpus Using BIS Scheme: A Preliminary Study
DOI:
https://doi.org/10.3126/nl.v39i1.86158Keywords:
Angika, NLP, POS Tagset, low-resource languageAbstract
Angika is an Eastern Indo-Aryan language spoken mainly in the southeastern regions of Bihar, Jharkhand and in some areas of Nepal. Angika is a Low-resource language due to the absence of linguistic resources and NLP tools.. The primary challenge for developing NLP tools for the Angika language is the lack of corpora. In this context, the BIS POS Tagset for Indian languages has been adopted to facilitate Part-of-Speech (POS) tagging for Angika. Part-of-Speech (POS) tagging is a fundamental task in Natural Language Processing (NLP) that involves assigning grammatical categories, such as nouns, verbs, adjectives, and adverbs, to words in a text. This article aims to explore the application of the BIS POS Tagset for Angika.