Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data
Birkner Merrill D.,
Sinisi Sandra E. and
J. van der Laan Mark
Additional contact information
Birkner Merrill D.: University of California, Berkeley
Sinisi Sandra E.: University of California, Berkeley
J. van der Laan Mark: Division of Biostatistics, School of Public Health, University of California, Berkeley
Statistical Applications in Genetics and Molecular Biology, 2005, vol. 4, issue 1, 30
Abstract:
Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target-specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. In this paper, we apply two techniques to a data set consisting of 317 patients, each with 282 sequenced protease and reverse transcriptase codons. The first application is recently developed multiple testing procedures to find codons which have significant univariate associations with the replication capacity of the virus. A single-step multiple testing procedure (Pollard and van der Laan 2003) method was used to control the family wise error rate (FWER) at the five percent alpha level as well as the application of augmentation multiple testing procedures to control the generalized family wise error (gFWER) or the tail probability of the proportion of false positives (TPPFP). We also applied a data adaptive multiple regression algorithm to obtain a prediction of viral replication capacity based on an entire mutant/non-mutant sequence profile. This is a loss-based, cross-validated Deletion/Substitution/Addition regression algorithm (Sinisi and van der Laan 2004), which builds candidate estimators in the prediction of a univariate outcome by minimizing an empirical risk. These methods are two separate techniques with distinct goals used to analyze this structure of viral data.
Keywords: Bootstrap; codon; generalized family wise error rate; HIV-1; multiple testing; prediction; tail probability of the proportion of false positives; type I error; variable selection. (search for similar items in EconPapers)
Date: 2005
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/10.2202/1544-6115.1110 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:4:y:2005:i:1:n:8
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.2202/1544-6115.1110
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().