DCFA-YOLO: A Dual-Channel Cross-Feature-Fusion Attention YOLO Network for Cherry Tomato Bunch Detection
Shanglei Chai,
Ming Wen,
Pengyu Li,
Zhi Zeng and
Yibin Tian ()
Additional contact information
Shanglei Chai: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Ming Wen: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Pengyu Li: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Zhi Zeng: School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
Yibin Tian: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Agriculture, 2025, vol. 15, issue 3, 1-19
Abstract:
To better utilize multimodal information for agriculture applications, this paper proposes a cherry tomato bunch detection network using dual-channel cross-feature fusion. It aims to improve detection performance by employing the complementary information of color and depth images. Using the existing YOLOv8_n as the baseline framework, it incorporates a dual-channel cross-fusion attention mechanism for multimodal feature extraction and fusion. In the backbone network, a ShuffleNetV2 unit is adopted to optimize the efficiency of initial feature extraction. During the feature fusion stage, two modules are introduced by using re-parameterization, dynamic weighting, and efficient concatenation to strengthen the representation of multimodal information. Meanwhile, the CBAM mechanism is integrated at different feature extraction stages, combined with the improved SPPF_CBAM module, to effectively enhance the focus and representation of critical features. Experimental results using a dataset obtained from a commercial greenhouse demonstrate that DCFA-YOLO excels in cherry tomato bunch detection, achieving an mAP50 of 96.5%, a significant improvement over the baseline model, while drastically reducing computational complexity. Furthermore, comparisons with other state-of-the-art YOLO and other object detection models validate its detection performance. This provides an efficient solution for multimodal fusion for real-time fruit detection in the context of robotic harvesting, running at 52fps on a regular computer.
Keywords: cherry tomato bunch detection; robotic harvesting; multimodal image; feature extraction; feature fusion; YOLO network (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2077-0472/15/3/271/pdf (application/pdf)
https://www.mdpi.com/2077-0472/15/3/271/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:15:y:2025:i:3:p:271-:d:1577959
Access Statistics for this article
Agriculture is currently edited by Ms. Leda Xuan
More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().