EconPapers    
Economics at your fingertips  
 

Detection of Sparse and Weak Effects in High-Dimensional Feature Space, with an Application to Microbiome Data Analysis

Tatjana Pavlenko (), Annika Tillander (), Justine Debelius () and Fredrik Boulund ()
Additional contact information
Tatjana Pavlenko: KTH Royal Institute of Technology, Department of Mathematics
Annika Tillander: Linköping University, Department of Statistics and Machine Learning
Justine Debelius: Karolinska Institutet, The Centre for Translational Microbiome Research (CTMR), Department of Microbiology, Tumor, and Cell Biology
Fredrik Boulund: Karolinska Institutet, The Centre for Translational Microbiome Research (CTMR), Department of Microbiology, Tumor, and Cell Biology

Chapter Chapter 17 in Recent Developments in Multivariate and Random Matrix Analysis, 2020, pp 287-311 from Springer

Abstract: Abstract We present a family of goodness-of-fit (GOF) test statistics specifically designed for detection of sparse-weak mixtures, where only a small fraction of the observational units are contaminated arising from a different distribution. The test statistics are constructed as sup-functionals of weighted empirical processes where the weight functions employed are the Chibisov-O’Reilly functions of a Brownian bridge. The study recovers and extends a number of previously known results on sparse detection using a weighted GOF (wGOF) approach. In particular, the results obtained demonstrate the advantage of our approach over a common approach that utilizes a family of regularly varying weight functions. We show that the Chibisov-O’Reilly family has important advantages over better known approaches as it allows for optimally adaptive, fully data-driven test procedures. The theory is further developed to demonstrate that the entire family is a flexible device that adapts to many interesting situations of modern scientific practice where the number of observations stays fixed or grows very slowly while the number of automatically measured features grows dramatically and only a small fraction of these features are useful. Numerical studies are performed to investigate the finite sample properties of the theoretical results. We shown that the Chibisov-O’Reilly family compares favorably to related test statistics over a broad range of sparsity and weakness regimes for the Gaussian and high-dimensional Dirichlet types of sparse mixture. Finally, an example of human gut microbiome data set is presented to illustrate that the family of tests has found applications in real-life sparse signal detection problems where the sample size is small in relation to the features dimension.

Date: 2020
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-030-56773-6_17

Ordering information: This item can be ordered from
http://www.springer.com/9783030567736

DOI: 10.1007/978-3-030-56773-6_17

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-06-08
Handle: RePEc:spr:sprchp:978-3-030-56773-6_17