Use of Unsupervised Machine Learning for Agricultural Supply Chain Data Labeling
Roberto F. Silva (),
Gustavo M. Mostaço (),
Fernando Xavier (),
Antonio M. Saraiva () and
Carlos E. Cugnasca ()
Additional contact information
Roberto F. Silva: Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP)
Gustavo M. Mostaço: Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP)
Fernando Xavier: Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP)
Antonio M. Saraiva: Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP)
Carlos E. Cugnasca: Department of Computer Engineering and Digital Systems, Escola Politécnica da Universidade de São Paulo (USP)
A chapter in Information and Communication Technologies for Agriculture—Theme II: Data, 2022, pp 267-288 from Springer
Abstract:
Abstract The heterogeneous data produced in agricultural supply chains can be divided into three main systems: (i) product identification and traceability, related to identifying production batches and locations of the product throughout the supply chain; (ii) environmental monitoring, considering environmental variables in production, storage and transportation; and (iii) processes monitoring, related to the data describing the production processes and inputs used. Data labeling on the different systems can improve decision-making, traceability, and coordination in the chains. Nevertheless, this is a labor-intensive task. The objective of this Chapter was to evaluate if unsupervised machine learning techniques could be used to identify patterns in the data, clusters of data, and generate labels for an unlabeled agricultural supply chain dataset. A dataset was generated through merging seven datasets that contained information from the three systems, and the k-means and self-organizing maps (SOM) models were evaluated on clustering the data and generating labels. The use of principal component analysis (PCA) was also evaluated together with the k-means model. Several supervised and unsupervised learning metrics were evaluated. The SOM model with the Gaussian neighborhood function provided the best results, with an F1-score of 0.91 and a more defined clusters map. A series of recommendations for the use of unsupervised learning techniques on supply chain data are discussed. The methodology used in this Chapter can be implemented on other supply chains and unsupervised machine learning research. Future work is related to improving the dataset and implementing other clustering models and dimensionality reduction techniques.
Keywords: Clustering; k-means; Self-organizing maps; Supply chains; Unsupervised learning (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:spochp:978-3-030-84148-5_11
Ordering information: This item can be ordered from
http://www.springer.com/9783030841485
DOI: 10.1007/978-3-030-84148-5_11
Access Statistics for this chapter
More chapters in Springer Optimization and Its Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().