An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
Fabián Santos () and
Nicole Acosta
Additional contact information
Fabián Santos: Centro de Investigación Para el Territorio y el Hábitat Sostenible (CITEHS), Universidad Indoamérica, Quito 170301, Ecuador
Nicole Acosta: Research Unit Sustainability and Climate Risks, Universität Hamburg, 20144 Hamburg, Germany
Agriculture, 2023, vol. 13, issue 5, 1-19
Abstract:
Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion.
Keywords: web scraping; denoising autoencoders; plant traits; food security; Ecuador (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2077-0472/13/5/1015/pdf (application/pdf)
https://www.mdpi.com/2077-0472/13/5/1015/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:13:y:2023:i:5:p:1015-:d:1140420
Access Statistics for this article
Agriculture is currently edited by Ms. Leda Xuan
More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().