HA-RoadFormer: Hybrid Attention Transformer with Multi-Branch for Large-Scale High-Resolution Dense Road Segmentation

Zhang, Zheng; Miao, Chunle; Liu, Changan; Tian, Qing; Zhou, Yongsheng

HA-RoadFormer: Hybrid Attention Transformer with Multi-Branch for Large-Scale High-Resolution Dense Road Segmentation

Zheng Zhang, Chunle Miao, Changan Liu, Qing Tian and Yongsheng Zhou
Additional contact information
Zheng Zhang: School of Information, North China University of Technology, Beijing 100144, China
Chunle Miao: School of Information, North China University of Technology, Beijing 100144, China
Changan Liu: School of Information, North China University of Technology, Beijing 100144, China
Qing Tian: School of Information, North China University of Technology, Beijing 100144, China
Yongsheng Zhou: College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China

Mathematics, 2022, vol. 10, issue 11, 1-15

Abstract: Road segmentation is one of the essential tasks in remote sensing. Large-scale high-resolution remote sensing images originally have larger pixel sizes than natural images, while the existing models based on Transformer have the high computational cost of square complexity, leading to more extended model training and inference time. Inspired by the long text Transformer model, this paper proposes a novel hybrid attention mechanism to improve the inference speed of the model. By calculating several diagonals and random blocks of the attention matrix, hybrid attention achieves linear time complexity in the token sequence. Using the superposition of adjacent and random attention, hybrid attention introduces the inductive bias similar to convolutional neural networks (CNNs) and retains the ability to acquire long-distance dependence. In addition, the dense road segmentation result of remote sensing image still has the problem of insufficient continuity. However, multiscale feature representation is an effective means in the network based on CNNs. Inspired by this, we propose a multi-scale patch embedding module, which divides images by patches with different scales to obtain coarse-to-fine feature representations. Experiments on the Massachusetts dataset show that the proposed HA-RoadFormer could effectively preserve the integrity of the road segmentation results, achieving a higher Intersection over Union (IoU) 67.36% of road segmentation compared to other state-of-the-art (SOTA) methods. At the same time, the inference speed has also been greatly improved compared with other Transformer based models.

Keywords: dense road segmentation; transformer; multiscale patches; hybrid-attention (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/11/1915/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/11/1915/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:11:p:1915-:d:830854

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().