D2FLS-Net: Dual-Stage DEM-guided Fusion Transformer for landslide segmentation
Chengwei Zhao,
Long Li,
Yubo Wang,
Xuqing Li,
Chong Xu,
Yubin Song,
Dongsheng Ren and
Cheng Xiao
PLOS ONE, 2025, vol. 20, issue 11, 1-19
Abstract:
Landslide segmentation from remote sensing imagery is crucial for rapid disaster assessment and risk mitigation. Owing to the pronounced heterogeneity of landslide scales and the subtle visual contrast between some landslide bodies and their background, this task remains highly challenging. Although Transformers surpass convolutional neural networks in modeling long-range contextual dependencies, channel-level or feature-level fusion strategies provide only intermittent terrain cues, leading models to underutilize digital elevation model (DEM) information and to lack fine-grained adaptability to terrain variability. To address this, We propose a Swin-Transformer–based framework, Dual-Stage DEM-guided Fusion Transformer for landslide segmentation (D2FLS-Net), which embeds terrain features via two modules: (1) The Dual-Stage DEM-Guided Fusion (DSDF) module that injects DEM cues twice, where the early stage emphasizes DEM related discontinuities before feature extraction, and the late stage coordinates high-level RGB and DEM semantics through a cross-attention mechanism. (2) The Terrain-aware Pixel-wise Adaptive Context Enhancement (T-PACE) module that optimizes intermediate features using a DEM-gated, pixel-adaptive hybrid of multi-dilation atrous convolutions, enabling broader context aggregation within homogeneous landslide interiors and more precise discrimination at boundaries. We evaluate D2FLS-Net on the Bijie and Landslide4Sense 2022 datasets. On Bijie, the mean Intersection over Union (mIoU) reaches 88.77%, Recall 95.27%, and Precision 94.60%, exceeding the best competing model SegFormer by 7.96%, 7.90%, and 4.05%, respectively. On Landslide4Sense2022, mIoU 72.86%, Recall 82.55%, and Precision 93.30%, surpassing SegFormer by 7.06%, 6.56%, and 5.02%, respectively. Ablation studies indicate that DSDF primarily reduces missed detections of landslide traces, whereas T-PACE refines pixel level context selection. Injecting DEM at the Swin-1 and Swin-4 stages consistently outperforms other stage combinations. In summary, the model shows good detection performance and is suitable for fusing DEM and remote sensing imagery for landslide recognition.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0337412 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 37412&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0337412
DOI: 10.1371/journal.pone.0337412
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().