EconPapers    
Economics at your fingertips  
 

Cluster-Localized Sparse Logistic Regression for SNP Data

Binder Harald, Müller Tina, Schwender Holger, Golka Klaus, Steffens Michael, Hengstler Jan G., Ickstadt Katja and Schumacher Martin
Additional contact information
Binder Harald: Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center Johannes Gutenberg University Mainz
Müller Tina: Global Drug Discovery Statistics, Bayer Pharma AG
Schwender Holger: Faculty of Statistics, TU Dortmund University
Golka Klaus: Department of Toxicology, IfADo - Leibniz Research Centre for Working Environment and Human Factors
Steffens Michael: Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center Johannes Gutenberg University Mainz
Hengstler Jan G.: Department of Toxicology, IfADo - Leibniz Research Centre for Working Environment and Human Factors
Ickstadt Katja: Faculty of Statistics, TU Dortmund
Schumacher Martin: Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg

Statistical Applications in Genetics and Molecular Biology, 2012, vol. 11, issue 4, 31

Abstract: The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.

Keywords: single nucleotide polymorphisms; weighted regression; clustering (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1515/1544-6115.1694 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:11:y:2012:i:4:n:13

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.1515/1544-6115.1694

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:11:y:2012:i:4:n:13