EconPapers    
Economics at your fingertips  
 

Fast R Functions for Robust Correlations and Hierarchical Clustering

Peter Langfelder and Steve Horvath

Journal of Statistical Software, 2012, vol. 046, issue i11

Abstract: Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA. The hierarchical clustering algorithm implemented in R function hclust is an order n3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n2, leading to substantial time savings when clustering large data sets.

Date: 2012-03-07
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.jstatsoft.org/index.php/jss/article/view/v046i11/v46i11.pdf
https://www.jstatsoft.org/index.php/jss/article/do ... hClust_1.01-1.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... 11/WGCNA_1.19.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... ile/v046i11/v46i11.R
https://www.jstatsoft.org/index.php/jss/article/do ... 6i11-replication.zip

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:jss:jstsof:v:046:i11

DOI: 10.18637/jss.v046.i11

Access Statistics for this article

Journal of Statistical Software is currently edited by Bettina Grün, Edzer Pebesma and Achim Zeileis

More articles in Journal of Statistical Software from Foundation for Open Access Statistics
Bibliographic data for series maintained by Christopher F. Baum ().

 
Page updated 2025-03-19
Handle: RePEc:jss:jstsof:v:046:i11