A Hybrid Cross-Lingual News Aggregator for Nepali Media using mT5 and Dense Vector Embeddings
DOI:
https://doi.org/10.3126/injet.v3i2.95500Keywords:
Hybrid Search, mT5 Summarization, Cross-Lingual Retrieval, Dense Vector Embeddings, Zero-Shot Classification, Nepali NLPAbstract
The rapid proliferation of digital journalism in Nepal has created a fragmented information landscape, where navigating hundreds of independent portals leads to significant cognitive load and a widening semantic gap in information retrieval. A hybrid search engine is featured in the system’s core architecture, through which lexical precision (BM25) is fused with deep conceptual understanding via dense vector embeddings. Seamless cross-lingual accessibility is enabled by this dual-path retrieval mechanism, by which semantically relevant Nepali content is accurately retrieved using English queries, achieving an average Cosine Similarity score exceeding 0.72. To alleviate information density, an automated synthesis layer is implemented using a fine-tuned mT5 (Multilingual Text-to-Text Transfer Transformer) model, through which long-form journalism is distilled into concise abstractive summaries with ROUGE-1 of 0.33. Furthermore, Zero-Shot Classification based on Natural Language Inference (NLI) is integrated into the platform so that unstructured news streams are dynamically categorized into thematic verticals without the requirement for manual labeling. It is demonstrated by experimental results that retrieval recall and organizational efficiency are significantly improved by the proposed framework, and a scalable solution for modernizing regional news consumption in low-resource linguistic environments is provided.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.