EconPapers    
Economics at your fingertips  
 

A Set of Efficient Methods to Generate High-Dimensional Binary Data With Specified Correlation Structures

Wei Jiang, Shuang Song, Lin Hou and Hongyu Zhao

The American Statistician, 2021, vol. 75, issue 3, 310-322

Abstract: High-dimensional correlated binary data arise in many areas, such as observed genetic variations in biomedical research. Data simulation can help researchers evaluate efficiency and explore properties of different computational and statistical methods. Also, some statistical methods, such as Monte Carlo methods, rely on data simulation. Lunn and Davies proposed linear time complexity methods to generate correlated binary variables with three common correlation structures. However, it is infeasible to specify unequal probabilities in their methods. In this article, we introduce several computationally efficient algorithms that generate high-dimensional binary data with specified correlation structures and unequal probabilities. Our algorithms have linear time complexity with respect to the dimension for three commonly studied correlation structures, namely exchangeable, decaying-product and K-dependent correlation structures. In addition, we extend our algorithms to generate binary data of specified nonnegative correlation matrices satisfying the validity condition with quadratic time complexity. We provide an R package, CorBin, to implement our simulation methods. Compared to the existing packages for binary data generation, the time cost to generate a 100-dimensional binary vector with the common correlation structures and general correlation matrices can be reduced up to 105 folds and 103 folds, respectively, and the efficiency can be further improved with the increase of dimensions. The R package CorBin is available on CRAN at https://cran.r-project.org/.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2020.1816213 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:75:y:2021:i:3:p:310-322

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2020.1816213

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:amstat:v:75:y:2021:i:3:p:310-322