AI-Driven Window Detection and Semantic Segmentation from Street View Imagery Using Grounding DINO and DeepLabV3 for Digital Twin Modeling

Authors

  • Sumeer Koirala Survey Department, Nepal
  • Xiaoxiang Zhu Technical University of Munich, Germany
  • Yao Sun German Aerospace Center, Germany
  • Alejandro Rueda Segura Technical University of Munich, Germany

DOI:

https://doi.org/10.3126/njg.v25i1.95080

Keywords:

Deep learning,, Semantic segmentation, Grounding DINO, DeepLabV3, Digital twins

Abstract

AI-Driven automated generation of facade information using streed view images can be a vital step towards large-scale urban digital twin generation. Traditional approaches rely on rule-based methods and manual annotation, which poses a significant time lag and is difficult on a large scale. This study focused on a state-of-the-art AI-based pipeline for window detection from street view images and semantic segmentation for windows parameter generation. The proposed workflow consists of image rectification (correcting perspective distortion in street view images). Secondly, window regions are detected using a zero-shot object detection model (GroundingDINO) followed by semantic segmentation using a fine-tuned DeepLabV3 model trained on the WinSyn dataset. Through systematic experimentation with different parameters and hyperparameters, the optimization of label classes from 11 to 3 classes significantly improved segmentation performance. The refined model achieved a mean Intersection over Union (mIoU) of 80.74%, representing an improvement of 44.31% compared to the baseline performance of 36.43% obtained using four classes. This class optimization reduced ambiguity among window components and improved segmentation consistency. Segmentation outputs are further refined using morphological operations to improve frame continuity and remove noise in window panes. Geometric parameters such as pane arrangement, frame thickness, and window layout are extracted from the refined masks and structured into a parametric representation. The proposed pipeline demonstrates the potential of combining zero-shot detection and semantic segmentation for automated façade analysis from street-view imagery. The extracted window information can support applications in urban digital twin generation, building energy modeling, and large-scale architectural analysis.

Downloads

Download data is not yet available.
Abstract
0
PDF
0

Downloads

Published

2026-05-28

How to Cite

Koirala, S., Zhu, X., Sun, Y., & Rueda Segura, A. (2026). AI-Driven Window Detection and Semantic Segmentation from Street View Imagery Using Grounding DINO and DeepLabV3 for Digital Twin Modeling. Journal on Geoinformatics, Nepal, 25(1), 1–12. https://doi.org/10.3126/njg.v25i1.95080

Issue

Section

Articles