HyFormer: Hybrid Transformer and CNN for Pixel-Level Multispectral Image Land Cover Classification
Chuan Yan,
Xiangsuo Fan (),
Jinlong Fan,
Ling Yu,
Nayi Wang,
Lin Chen and
Xuyang Li
Additional contact information
Chuan Yan: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
Xiangsuo Fan: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
Jinlong Fan: National Satellite Meteorological Center, China Meteorological Administration, Beijing 100081, China
Ling Yu: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
Nayi Wang: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
Lin Chen: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
Xuyang Li: School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China
IJERPH, 2023, vol. 20, issue 4, 1-25
Abstract:
To effectively solve the problems that most convolutional neural networks cannot be applied to the pixelwise input in remote sensing (RS) classification and cannot adequately represent the spectral sequence information, we propose a new multispectral RS image classification framework called HyFormer based on Transformer. First, a network framework combining a fully connected layer (FC) and convolutional neural network (CNN) is designed, and the 1D pixelwise spectral sequences obtained from the fully connected layers are reshaped into a 3D spectral feature matrix for the input of CNN, which enhances the dimensionality of the features through FC as well as increasing the feature expressiveness, and can solve the problem that 2D CNN cannot achieve pixel-level classification. Secondly, the features of the three levels of CNN are extracted and combined with the linearly transformed spectral information to enhance the information expression capability, and also used as the input of the transformer encoder to improve the features of CNN using the powerful global modelling capability of the Transformer, and finally the skip connection of the adjacent encoders to enhance the fusion between different levels of information. The pixel classification results are obtained by MLP Head. In this paper, we mainly focus on the feature distribution in the eastern part of Changxing County and the central part of Nanxun District, Zhejiang Province, and conduct experiments based on Sentinel-2 multispectral RS images. The experimental results show that the overall accuracy of HyFormer for the study area classification in Changxing County is 95.37% and that of Transformer (ViT) is 94.15%. The experimental results show that the overall accuracy of HyFormer for the study area classification in Nanxun District is 95.4% and that of Transformer (ViT) is 94.69%, and the performance of HyFormer on the Sentinel-2 dataset is better than that of the Transformer.
Keywords: pixelwise classification; transformer; CNN; multispectral RS image classification (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/20/4/3059/pdf (application/pdf)
https://www.mdpi.com/1660-4601/20/4/3059/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:20:y:2023:i:4:p:3059-:d:1063704
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().