Semantic Similarity Analysis for Exam Questions Using Sentence Transformer Model

Authors

  • Nischal Shakya Pulchowk Engineering Campus, Tribhuvan University, Pulchowk, Nepal
  • Milan Shrestha Pulchowk Engineering Campus, Tribhuvan University, Pulchowk, Nepal
  • Roshan Subedi Pulchowk Engineering Campus, Tribhuvan University, Pulchowk, Nepal
  • Nitesh Swarnakar Pulchowk Engineering Campus Tribhuvan University, Pulchowk, Nepal
  • Upendra Prasad Neupane Sagarmatha Engineering College, Tribhuvan University, Sanepa, Lalitpur, Nepal
  • Sharad Kumar Ghimire Pulchowk Engineering Campus, Tribhuvan University, Pulchowk, Lalitpur, Nepal

DOI:

https://doi.org/10.3126/injet.v2i2.78597

Keywords:

Indexing, Information retrieval, Sentence transformer, Vector database

Abstract

The aim is to explore the effectiveness of using the SBERT model and vector database for performing question similarity analysis. This involves building a vector database by training a sentence transformer model on a large corpus of text data. The vector dataset is then used to analyse question similarity by retrieving similar questions and similarity scores to a given search query. The model is trained on a large corpus of ALLNLI datasets, other paraphrase datasets such as MRPC and PAWS, and the semantic similarity of datasets such as STS, and finally adapted on 9,282 custom-prepared engineering datasets. The sentence transformer model is trained using the datasets mentioned above with MNR Loss as the loss function. The effectiveness of the model is evaluated by using the STS test dataset and test set of the MRPC. The results demonstrate that using a sentence transformer model and vector database for question similarity analysis outperforms the baseline method of keyword matching. The approach achieved a Spearman correlation value of 0.863 on the STS benchmark and an accuracy of 88.7% on the MRPC test. The Spearman correlation value in the SBERT paper for the NLI-large dataset was below 0.80. These values show that continuous training of the model on other datasets besides NLI helps to increase the performance and performs better for downstream tasks. This suggests that the use of the sentence transformer model and vector database is a promising approach for performing question similarity analysis, which could have significant implications for information retrieval systems.

Downloads

Download data is not yet available.
Abstract
258
PDF
178

Downloads

Published

2025-05-19

How to Cite

Shakya, N., Shrestha, M., Subedi, R., Swarnakar, N., Neupane, U. P., & Ghimire, S. K. (2025). Semantic Similarity Analysis for Exam Questions Using Sentence Transformer Model. International Journal on Engineering Technology, 2(2), 98–103. https://doi.org/10.3126/injet.v2i2.78597

Issue

Section

Articles