EconPapers    
Economics at your fingertips  
 

LSTM-attention-guided graph neural networks for integrated genotype–Environment modeling in maize yield prediction

Amir Morshedian and Mike Domaratzki

PLOS Computational Biology, 2026, vol. 22, issue 5, 1-20

Abstract: This paper presents a deep-learning framework that combines an LSTM, a graph neural network (GNN), and transformer-style attention to model genotype–environment (G×E) effects for maize yield prediction. Weather data for a growing season is summarized using LSTM and encoded into a 21-dimensional embedding that is used as the environment node feature; 437,214 SNPs are summarized into 548 principal components that instantiate genotype nodes. Multi-head attention dynamically weights the edges during message passing. Three architectures are compared: A (fully bipartite graph), B (A with intra-set top-k similarity within genotype and within environment), and C (B with a single learnable supernode readout that attends over all nodes after message passing). The joint representations feed a compact MLP for yield prediction. Using a forward-time split (2014–2021 train; 2022 test with unseen genotypes and unseen environments), performance improves monotonically from A to C: A (RMSE 2.7749, PCC 0.4115, R2 0.1693), B (2.3683, 0.6622, 0.4385), C (2.2120, 0.6945, 0.4823). Compared to A, C has a reduction in RMSE by 0.5629 (∼20.3%) and an increase in PCC by 0.283 (∼68.8%), indicating that global, content-adaptive aggregation promotes local G×E propagation. Performance of proposed approach remains consistent regardless of the number of genotypes per environment and has strong performance under variable or unbalanced genotype sampling expression across environments. The proposed approach is compared with methods from the Global G×E Prediction Competition and show that two of three architectures improve predictive performance, with the best architecture achieving a lower RMSE (2.2120) and a higher Pearson correlation (0.6945) than the competition-winning model.Author summary: This paper considers the relationship between plant genomics and environmental effects and its effect on yield. By studying a maize dataset that combines nearly 5,000 varieties of the crop in 280 location-year combinations, we make predictions on the yield of a variety when grown in a particular environment. Environmental data that is directly used in the prediction includes solar radiation, temperature, wind speed and precipitation.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013729 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13729&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013729

DOI: 10.1371/journal.pcbi.1013729

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-05-24
Handle: RePEc:plo:pcbi00:1013729