Automatic Correction of Labeling Errors Applied to Tomato Detection
Ángel Eduardo Zamora Suárez,
Gerardo Antonio Alvarez Hernandez,
Juan Irving Vasquez (),
Hind Taud,
Abril Valeria Uriarte-Arcia and
Erik Zamora
Additional contact information
Ángel Eduardo Zamora Suárez: Unidad Profesional Interdisciplinaria de Biotecnología, Instituto Politécnico Nacional, Av. Acueducto S/N, La Laguna Ticoman, Gustavo A. Madero, Mexico City 07340, Mexico
Gerardo Antonio Alvarez Hernandez: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Mexico City 07340, Mexico
Juan Irving Vasquez: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Mexico City 07340, Mexico
Hind Taud: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Mexico City 07340, Mexico
Abril Valeria Uriarte-Arcia: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Mexico City 07340, Mexico
Erik Zamora: Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz S/N, Nueva Industrial Vallejo, Gustavo A. Madero, Mexico City 07738, Mexico
Agriculture, 2025, vol. 15, issue 12, 1-20
Abstract:
Accurate labeling is critical for training reliable deep learning models in agricultural applications. However, manual labeling is often error-prone, especially when performed by non-experts, and such errors (modeled as noise) can significantly degrade model performance. This study addresses the problem of correcting labeling errors in object detection datasets without human intervention. We hypothesize that label noise can be reduced by exploiting the feature space representation of the data, enabling automatic refinement through repeated model-based filtering. To test this, we propose a recursive methodology that employs a YOLOv5 detector to iteratively relabel a dataset of Prunaxx and Paipai tomato images captured in greenhouse environments. The correction process involves training the detector, predicting new labels, and replacing existing labelings over multiple iterations. Experimental results show substantial improvements: the mean Average Precision at an IoU threshold of 0.50 (mAP-50) increased from 0.8 to 0.86, the mean Average Precision across IoU thresholds from 0.50 to 0.95 (mAP-50:95) increased from 0.46 to 0.63, and Recall improved from 0.68 to 0.82. These results demonstrate that the model was able to detect more true positives after filtering, while also achieving more accurate bounding box predictions. Although a slight decrease in Precision was observed in later iterations due to false positives, the overall quality of the dataset improved consistently. In conclusion, the proposed filtering method effectively enhances label quality without manual intervention and offers a scalable solution for improving object detection datasets in precision agriculture.
Keywords: YOLO detector; tomato detection; dataset labeling errors; Prunaxx tomatoes; Paipai tomatoes (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2077-0472/15/12/1291/pdf (application/pdf)
https://www.mdpi.com/2077-0472/15/12/1291/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:15:y:2025:i:12:p:1291-:d:1679485
Access Statistics for this article
Agriculture is currently edited by Ms. Leda Xuan
More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().