Cereal and Rapeseed Yield Forecast in Poland at Regional Level Using Machine Learning and Classical Statistical Models
Edyta Okupska,
Dariusz Gozdowski,
Rafał Pudełko and
Elżbieta Wójcik-Gront ()
Additional contact information
Edyta Okupska: Seed and Agricultural Farm, “Bovinas” Ltd., Chodow 17, Chodow, 62-652 Poznań, Poland
Dariusz Gozdowski: Department of Biometry, Institute of Agriculture, Warsaw University of Life Sciences, Nowoursynowska 159, 02-776 Warsaw, Poland
Rafał Pudełko: Department of Bioeconomy and Systems Analysis, Institute of Soil Science and Plant Cultivation—State Research Institute (IUNG-PIB), Czartoryskich 8, 24-100 Pulawy, Poland
Elżbieta Wójcik-Gront: Department of Biometry, Institute of Agriculture, Warsaw University of Life Sciences, Nowoursynowska 159, 02-776 Warsaw, Poland
Agriculture, 2025, vol. 15, issue 9, 1-16
Abstract:
This study performed in-season yield prediction, about 2–3 months before the harvest, for cereals and rapeseed at the province level in Poland for 2009–2024. Various models were employed, including machine learning algorithms and multiple linear regression. The satellite-derived normalized difference vegetation index (NDVI) and climatic water balance (CWB), calculated using meteorological data, were treated as predictors of crop yield. The accuracy of the models was compared to identify the optimal approach. The strongest correlation coefficients with crop yield were observed for the NDVI at the beginning of March, ranging from 0.454 for rapeseed to 0.503 for rye. Depending on the crop, the highest R 2 values were observed for different prediction models, ranging from 0.654 for rapeseed based on the random forest model to 0.777 for basic cereals based on linear regression. The random forest model was best for rapeseed yield, while for cereal, the best prediction was observed for multiple linear regression or neural network models. For the studied crops, all models had mean absolute errors and root mean squared errors not exceeding 6 dt/ha, which is relatively small because it is under 20% of the mean yield. For the best models, in most cases, relative errors were not higher than 10% of the mean yield. The results proved that linear regression and machine learning models are characterized by similar predictions, likely due to the relatively small sample size (256 observations).
Keywords: grain yield; satellite data; remote sensing; random forest; neural networks (search for similar items in EconPapers)
JEL-codes: Q1 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2077-0472/15/9/984/pdf (application/pdf)
https://www.mdpi.com/2077-0472/15/9/984/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jagris:v:15:y:2025:i:9:p:984-:d:1647723
Access Statistics for this article
Agriculture is currently edited by Ms. Leda Xuan
More articles in Agriculture from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().