EconPapers    
Economics at your fingertips  
 

Novel feature selection methods for construction of accurate epigenetic clocks

Adam Li, Amber Mueller, Brad English, Anthony Arena, Daniel Vera, Alice E Kane and David A Sinclair

PLOS Computational Biology, 2022, vol. 18, issue 8, 1-18

Abstract: Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural networks, genetic algorithms, and ‘chained’ combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpGs as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field.Author summary: Epigenetic clocks accurately predict a person’s age by measuring the levels of a chemical mark (methylation) at specific sites of the DNA. More of these clocks are being built all the time, and there is a need for tools to best construct these clocks, and particularly to pick the specific DNA sites to include. We propose several novel machine-learning tools for the optimised selection of these DNA sites, known as feature selection approaches. We applied our approaches to a large human blood dataset to develop several clocks that accurately predict age using 35 or less DNA sites with more accuracy than previously published clocks when applied to other datasets for validation. Some of the DNA sites identified may be associated with interesting genes to explore further for their role in aging. These approaches should enable the building of more accurate, generalizable age prediction clocks from a low number of DNA sites.

Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009938 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 09938&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1009938

DOI: 10.1371/journal.pcbi.1009938

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-03
Handle: RePEc:plo:pcbi00:1009938