Exploration of Biodegradable Substances Using Machine Learning Techniques
Alaa M. Elsayad (),
Medien Zeghid,
Hassan Yousif Ahmed and
Khaled A. Elsayad
Additional contact information
Alaa M. Elsayad: Department of Electrical Engineering, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir 11991, Saudi Arabia
Medien Zeghid: Department of Electrical Engineering, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir 11991, Saudi Arabia
Hassan Yousif Ahmed: Department of Electrical Engineering, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir 11991, Saudi Arabia
Khaled A. Elsayad: Pharmacy Department, Cairo University Hospitals, Cairo University, Cairo 11662, Egypt
Sustainability, 2023, vol. 15, issue 17, 1-22
Abstract:
The concept of being readily biodegradable is crucial in evaluating the potential effects of chemical substances on ecosystems and conducting environmental risk assessments. Substances that readily biodegrade are generally associated with lower environmental persistence and reduced risks to the environment compared to those that do not easily degrade. The accurate development of quantitative structure–activity relationship (QSAR) models for biodegradability prediction plays a critical role in advancing the design and creation of sustainable chemicals. In this paper, we report the results of our investigation into the utilization of classification and regression trees (CARTs) in classifying and selecting features of biodegradable substances based on 2D molecular descriptors. CARTs are a well-known machine learning approach renowned for their simplicity, scalability, and built-in feature selection capabilities, rendering them highly suitable for the analysis of large datasets. Curvature and interaction tests were employed to construct efficient and unbiased trees, while Bayesian optimization (BO) and repeated cross-validation techniques were utilized to improve the generalization and stability of the trees. The main objective was to classify substances as either readily biodegradable (RB) or non-readily biodegradable (NRB). We compared the performance of the proposed CARTs with support vector machine (SVM), K nearest neighbor (kNN), and regulated logistic regression (RLR) models in terms of overall accuracy, sensitivity, specificity, and receiver operating characteristics (ROC) curve. The experimental findings demonstrated that the proposed CART model, which integrated curvature–interaction tests, outperformed other models in classifying the test subset. It achieved accuracy of 85.63%, sensitivity of 87.12%, specificity of 84.94%, and a highly comparable area under the ROC curve of 0.87. In the prediction process, the model identified the top ten most crucial descriptors, with the SpMaxB(m) and SpMin1_Bh(v) descriptors standing out as notably superior to the remaining descriptors.
Keywords: quantitative structure–activity (QSAR); biodegradable substances; decision tree; Bayesian optimization; feature ranking; support vector machine; K-nearest neighbor; logistic regression (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/15/17/12764/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/17/12764/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:17:p:12764-:d:1223629
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().