A Lightweight Hybrid CNN-ViT Network for Weed Recognition in Paddy Fields
Tonglai Liu,
Yixuan Wang,
Chengcheng Yang (),
Youliu Zhang and
Wanzhen Zhang ()
Additional contact information
Tonglai Liu: College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510550, China
Yixuan Wang: Department of Architecture and Civil Engineering, City University of Hong Kong, Hong Kong SAR 999077, China
Chengcheng Yang: College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510550, China
Youliu Zhang: College of Engineering, South China Agricultural University, Guangzhou 510642, China
Wanzhen Zhang: College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510550, China
Mathematics, 2025, vol. 13, issue 17, 1-15
Abstract:
Accurate identification of weed species is a fundamental task for promoting efficient farmland management. Existing recognition approaches are typically based on either conventional Convolutional Neural Networks (CNNs) or the more recent Vision Transformers (ViTs). CNNs demonstrate strong capability in capturing local spatial patterns, yet they are often limited in modeling long-range dependencies. In contrast, ViTs can effectively capture global contextual information through self-attention, but they may neglect fine-grained local features. These inherent shortcomings restrict the recognition performance of current models. To overcome these limitations, we propose a lightweight hybrid architecture, termed RepEfficientViT ,which integrates convolutional operations with Transformer-based self-attention. This design enables the simultaneous aggregation of both local details and global dependencies. Furthermore, we employ a structural re-parameterization strategy to enhance the representational capacity of convolutional layers without introducing additional parameters or computational overhead. Experimental evaluations reveal that RepEfficientViT consistently surpasses state-of-the-art CNN and Transformer baselines. Specifically, the model achieves an accuracy of 94.77%, a precision of 94.75%, a recall of 94.93%, and an F1-score of 94.84%. In terms of efficiency, RepEfficientViT requires only 223.54 M FLOPs and 1.34 M parameters, while attaining an inference latency of merely 25.13 ms on CPU devices. These results demonstrate that the proposed model is well-suited for deployment in edge-computing scenarios subject to stringent computational and storage constraints.
Keywords: weed recognition; EfficientViT; RepMBConv; re-parameterization (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/17/2899/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/17/2899/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:17:p:2899-:d:1744745
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().