Skin Lesion Segmentation using Vision Transformer and UNet
DOI:
https://doi.org/10.3126/injet-indev.v2i1.82195Keywords:
Medical image, Segmentation, UNet, Vision Transformer, Adam OptimizationAbstract
Medical images play a crucial role in diagnosing and analyzing serious illnesses. To make the typically lengthy process of reviewing these images more efficient, an automated approach for segmenting abnormal features is essential. Due to the limited availability of medical image data, deep learning frameworks for multi-class image segmentation have been developed. However, many current deep learning frameworks lack flexibility. To address the problem, advanced architectures have been added to improve segmentation performance. In particular, the UNet model has been enhanced with a Vision Transformer (ViT), enabling it to better capture structural features in medical images. The ISIC dataset has been preprocessed through methods like image augmentation; contrast limited adaptive histogram equalization (CLAHE), and normalization. The dataset has been then split into training, validation, and testing sets for optimal use. The training set has been used to train the model, with the adaptive moment estimation (Adam) optimizer aiding in optimization. The model performance has been evaluated using categorical cross-entropy to assess loss. The model has shown accuracy of 0.9378, a precision of 0.8713, a sensitivity of 0.8345, and an F1-Score of 0.8525 for the ISIC dataset.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal on Engineering Technology and Infrastructure Development

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.