Extracting Knowledge from Big Data for Sustainability: A Comparison of Machine Learning Techniques

Garg, Raghu; Aggarwal, Himanshu; Centobelli, Piera; Cerchione, Roberto

Extracting Knowledge from Big Data for Sustainability: A Comparison of Machine Learning Techniques

Raghu Garg, Himanshu Aggarwal, Piera Centobelli and Roberto Cerchione
Additional contact information
Raghu Garg: Department of Computer Engineering, Punjabi University, Patiala 147002, India
Himanshu Aggarwal: Department of Computer Engineering, Punjabi University, Patiala 147002, India
Piera Centobelli: Department of Industrial Engineering, University of Naples Federico II, P.le Tecchio 80, 80125 Naples, Italy
Roberto Cerchione: Department of Engineering, Centro Direzionale di Napoli, Isola C4, 80143 Naples, Italy

Sustainability, 2019, vol. 11, issue 23, 1-17

Abstract: At present, due to the unavailability of natural resources, society should take the maximum advantage of data, information, and knowledge to achieve sustainability goals. In today’s world condition, the existence of humans is not possible without the essential proliferation of plants. In the photosynthesis procedure, plants use solar energy to convert into chemical energy. This process is responsible for all life on earth, and the main controlling factor for proper plant growth is soil since it holds water, air, and all essential nutrients of plant nourishment. Though, due to overexposure, soil gets despoiled, so fertilizer is an essential component to hold the soil quality. In that regard, soil analysis is a suitable method to determine soil quality. Soil analysis examines the soil in laboratories and generates reports of unorganized and insignificant data. In this study, different big data analysis machine learning methods are used to extracting knowledge from data to find out fertilizer recommendation classes on behalf of present soil nutrition composition. For this experiment, soil analysis reports are collected from the Tata soil and water testing center. In this paper, Mahoot library is used for analysis of stochastic gradient descent (SGD), artificial neural network (ANN) performance on Hadoop environment. For better performance evaluation, we also used single machine experiments for random forest (RF), K-nearest neighbors K-NN, regression tree (RT), support vector machine (SVM) using polynomial function, SVM using radial basis function (RBF) methods. Detailed experimental analysis was carried out using overall accuracy, AUC–ROC (receiver operating characteristics (ROC), and area under the ROC curve (AUC)) curve, mean absolute prediction error (MAE), root mean square error (RMSE), and coefficient of determination (R 2 ) validation measurements on soil reports dataset. The results provide a comparison of solution classes and conclude that the SGD outperforms other approaches. Finally, the proposed results support to select the solution or recommend a class which suggests suitable fertilizer to crops for maximum production.

Keywords: agriculture industry; artificial neural network (ANN); big data analytics; Hadoop framework; fertilizer recommendations; K-NN; stochastic gradient descent (SGD); SVM; random forest (RF); regression tree (RT); sustainability-oriented performance (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
https://www.mdpi.com/2071-1050/11/23/6669/pdf (application/pdf)
https://www.mdpi.com/2071-1050/11/23/6669/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:11:y:2019:i:23:p:6669-:d:290792

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().