Hybrid Text Summarizer Using SBERT Extractive Filtering and Fine-Tuned BART Abstractive Generation on a Custom Dataset
DOI:
https://doi.org/10.3126/injet.v3i2.95541Keywords:
Text Summarization, Hybrid Summarization, Extractive Summarization, Abstractive Summarization, BART, DistilRoBERTa, SBERT, Natural Language ProcessingAbstract
The exponential growth of digital information has made efficient extraction of key insights from large text corpora an increasingly critical challenge. Traditional extractive summarization methods often yield disjointed, incoherent summaries, while purely abstractive approaches, despite their fluency, are prone to hallucination and demand considerable computational resources. This paper presents a hybrid deep learning framework that integrates the complementary strengths of both paradigms. The system employs DistilRoBERTa, an encoder-only transformer, to identify the most semantically relevant sentences through a greedy labeling strategy. A Sentence-BERT (SBERT) semantic filtering module then re-ranks the extracted candidates using cosine similarity before serializing them as input to the abstractive module. The abstractive module is built upon the Facebook/BART-Large-CNN architecture, fine-tuned on a custom hybrid dataset of 18,000 samples constructed programmatically from CNN/DailyMail. Evaluation using ROUGE metrics yielded a ROUGE-1 score of 0.4935 and a ROUGE-2 score of 0.2421 at Epoch 2. The final system is deployed with a graphical user interface enabling users to upload documents and receive high-quality, factually grounded summaries.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.