Skin Lesion Segmentation using Vision Transformer and UNet

Authors

  • Bibat Thokar Department of Computer Engineering, Lalitpur Engineering College,Lalitpur,Nepal

DOI:

https://doi.org/10.3126/injet-indev.v2i1.82195

Keywords:

Medical image, Segmentation, UNet, Vision Transformer, Adam Optimization

Abstract

Medical images play a crucial role in diagnosing and analyzing serious illnesses. To make the typically lengthy process of reviewing these images more efficient, an automated approach for segmenting abnormal features is essential. Due to the limited availability of medical image data, deep learning frameworks for multi-class image segmentation have been developed. However, many current deep learning frameworks lack flexibility. To address the problem, advanced architectures have been added to improve segmentation performance. In particular, the UNet model has been enhanced with a Vision Transformer (ViT), enabling it to better capture structural features in medical images. The ISIC dataset has been preprocessed through methods like image augmentation; contrast limited adaptive histogram equalization (CLAHE), and normalization. The dataset has been then split into training, validation, and testing sets for optimal use. The training set has been used to train the model, with the adaptive moment estimation (Adam) optimizer aiding in optimization. The model performance has been evaluated using categorical cross-entropy to assess loss. The model has shown accuracy of 0.9378, a precision of 0.8713, a sensitivity of 0.8345, and an F1-Score of 0.8525 for the ISIC dataset. 

Downloads

Download data is not yet available.
Abstract
115
PDF
57

Downloads

Published

2025-08-01

How to Cite

Thokar, B. (2025). Skin Lesion Segmentation using Vision Transformer and UNet. International Journal on Engineering Technology and Infrastructure Development, 2(1), 1–13. https://doi.org/10.3126/injet-indev.v2i1.82195

Issue

Section

Articles