EconPapers    
Economics at your fingertips  
 

DCFA-YOLO: A Dual-Channel Cross-Feature-Fusion Attention YOLO Network for Cherry Tomato Bunch Detection

Shanglei Chai, Ming Wen, Pengyu Li, Zhi Zeng and Yibin Tian ()
Additional contact information
Shanglei Chai: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Ming Wen: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Pengyu Li: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
Zhi Zeng: School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
Yibin Tian: College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China

Agriculture, 2025, vol. 15, issue 3, 1-19

Abstract: To better utilize multimodal information for agriculture applications, this paper proposes a cherry tomato bunch detection network using dual-channel cross-feature fusion. It aims to improve detection performance by employing the complementary information of color and depth images. Using the existing YOLOv8_n as the baseline framework, it incorporates a dual-channel cross-fusion attention mechanism for multimodal feature extraction and fusion. In the backbone network, a ShuffleNetV2 unit is adopted to optimize the efficiency of initial feature extraction. During the feature fusion stage, two modules are introduced by using re-parameterization, dynamic weighting, and efficient concatenation to strengthen the representation of multimodal information. Meanwhile, the CBAM mechanism is integrated at different feature extraction stages, combined with the improved SPPF_CBAM module, to effectively enhance the focus and representation of critical features. Experimental results using a dataset obtained from a commercial greenhouse demonstrate that DCFA-YOLO excels in cherry tomato bunch detection, achieving an mAP50 of 96.5%, a significant improvement over the baseline model, while drastically reducing computational complexity. Furthermore, comparisons with other state-of-the-art YOLO and other object detection models validate its detection performance. This provides an efficient solution for multimodal fusion for real-time fruit detection in the context of robotic harvesting, running at 52fps on a regular computer.

Keywords: cherry tomato bunch detection; robotic harvesting; multimodal image; feature extraction; feature fusion; YOLO network (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2077-0472/15/3/271/pdf (application/pdf)
https://www.mdpi.com/2077-0472/15/3/271/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:15:y:2025:i:3:p:271-:d:1577959

Access Statistics for this article

Agriculture is currently edited by Ms. Leda Xuan

More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-22
Handle: RePEc:gam:jagris:v:15:y:2025:i:3:p:271-:d:1577959