The Two Sample Problem for Multiple Categorical Variables
DiRienzo A. G.
Additional contact information
DiRienzo A. G.: Harvard University
The International Journal of Biostatistics, 2006, vol. 2, issue 1, 24
Abstract:
Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.
Keywords: conditional independence; family-wise error rate; genomics; genotype sequence; HIV-1 genotype evolution; multiple hypothesis testing; simultaneous inference (search for similar items in EconPapers)
Date: 2006
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.2202/1557-4679.1019 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:ijbist:v:2:y:2006:i:1:n:5
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/ijb/html
DOI: 10.2202/1557-4679.1019
Access Statistics for this article
The International Journal of Biostatistics is currently edited by Antoine Chambaz, Alan E. Hubbard and Mark J. van der Laan
More articles in The International Journal of Biostatistics from De Gruyter
Bibliographic data for series maintained by Peter Golla ().