Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data

D., Birkner Merrill; E., Hubbard Alan; van der Laan Mark, J.; F., Skibola Christine; M., Hegedus Christine; T., Smith Martyn

Issues of Processing and Multiple Testing of SELDI-TOF MS Proteomic Data

Birkner Merrill D., Hubbard Alan E., J. van der Laan Mark, Skibola Christine F., Hegedus Christine M. and Smith Martyn T.
Additional contact information
Birkner Merrill D.: Division of Biostatistics, School of Public Health, University of California, Berkeley
Hubbard Alan E.: Division of Biostatistics, School of Public Health, University of California, Berkeley
J. van der Laan Mark: Division of Biostatistics, School of Public Health, University of California, Berkeley
Skibola Christine F.: Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley
Hegedus Christine M.: Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley
Smith Martyn T.: Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley

Statistical Applications in Genetics and Molecular Biology, 2006, vol. 5, issue 1, 24

Abstract: A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of ``interesting'' proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth selection; 3) peak finding; and 4) methods to correct for multiple testing (van der Laan et al. (2005)). The result is a list of proteins (indexed by m/z) where average expression is significantly different among disease (or treatment, etc.) groups. The procedures are intended to provide a sensible and statistically driven algorithm, which we argue provides a list of proteins that have a significant difference in expression. Given no sources of unmeasured bias (such as confounding of experimental conditions with disease status), proteins found to be statistically significant using this technique have a low probability of being false positives.

Keywords: proteomics; mass-spectrometry; multiple testing; preprocessing; leukemia; tail probability (search for similar items in EconPapers)
Date: 2006
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1198 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:5:y:2006:i:1:n:11

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.2202/1544-6115.1198

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().