EconPapers    
Economics at your fingertips  
 

An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets

Fabián Santos () and Nicole Acosta
Additional contact information
Fabián Santos: Centro de Investigación Para el Territorio y el Hábitat Sostenible (CITEHS), Universidad Indoamérica, Quito 170301, Ecuador
Nicole Acosta: Research Unit Sustainability and Climate Risks, Universität Hamburg, 20144 Hamburg, Germany

Agriculture, 2023, vol. 13, issue 5, 1-19

Abstract: Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion.

Keywords: web scraping; denoising autoencoders; plant traits; food security; Ecuador (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2077-0472/13/5/1015/pdf (application/pdf)
https://www.mdpi.com/2077-0472/13/5/1015/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:13:y:2023:i:5:p:1015-:d:1140420

Access Statistics for this article

Agriculture is currently edited by Ms. Leda Xuan

More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jagris:v:13:y:2023:i:5:p:1015-:d:1140420