MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification
A. Arun Solomon and
S. Akila Agnes ()
Additional contact information
A. Arun Solomon: Department of Civil Engineering, GMR Institute of Technology, Rajam 532127, India
S. Akila Agnes: Department of Computer Science and Engineering, GMR Institute of Technology, Rajam 532127, India
Geographies, 2024, vol. 4, issue 3, 1-19
Abstract:
Recent advancements in deep learning have significantly improved the performance of remote sensing scene classification, a critical task in remote sensing applications. This study presents a new aerial scene classification model, the Multi-Scale Swin–CNN Aerial Classifier (MSCAC), which employs the Swin Transformer, an advanced architecture that has demonstrated exceptional performance in a range of computer vision applications. The Swin Transformer leverages shifted window mechanisms to efficiently model long-range dependencies and local features in images, making it particularly suitable for the complex and varied textures in aerial imagery. The model is designed to capture intricate spatial hierarchies and diverse scene characteristics at multiple scales. A framework is developed that integrates the Swin Transformer with a multi-scale strategy, enabling the extraction of robust features from aerial images of different resolutions and contexts. This approach allows the model to effectively learn from both global structures and fine-grained details, which is crucial for accurate scene classification. The model’s performance is evaluated on several benchmark datasets, including UC-Merced, WHU-RS19, RSSCN7, and AID, where it demonstrates a superior or comparable accuracy to state-of-the-art models. The MSCAC model’s adaptability to varying amounts of training data and its ability to improve with increased data make it a promising tool for real-world remote sensing applications. This study highlights the potential of integrating advanced deep-learning architectures like the Swin Transformer into aerial scene classification, paving the way for more sophisticated and accurate remote sensing systems. The findings suggest that the proposed model has significant potential for various remote sensing applications, including land cover mapping, urban planning, and environmental monitoring.
Keywords: aerial scene classification; Swin Transformer; deep learning in remote sensing; geospatial analysis; computer vision for aerial imagery; terrain mapping (search for similar items in EconPapers)
JEL-codes: Q1 Q15 Q5 Q53 Q54 Q56 Q57 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2673-7086/4/3/25/pdf (application/pdf)
https://www.mdpi.com/2673-7086/4/3/25/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jgeogr:v:4:y:2024:i:3:p:25-480:d:1445012
Access Statistics for this article
Geographies is currently edited by Ms. Fannie Xu
More articles in Geographies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().