Mining strongly correlated item pairs in large transaction databases
Swarup Roy and
Dhruba Kr Bhattacharyya
International Journal of Data Mining, Modelling and Management, 2013, vol. 5, issue 1, 76-96
Abstract:
Correlation mining is an approach of drawing statistical relationship between items from transaction data. Most of the existing techniques use Pearson's correlation coefficient as a measure of correlation, which may not always perform well when data are noisy and binary in nature. Moreover, they require multi-pass over the database. This paper presents an effective and faster correlation mining technique to extract most strongly correlated item pairs from large transaction databases. As an alternative to Pearson's correlation coefficient, it presents a method of computing Spearman's rank order correlation coefficient from transaction data. The proposed technique found to perform satisfactorily in terms of execution time over several real and synthetic datasets, while comparing to other similar techniques. To justify its usefulness, an application of the proposed technique for extracting yeast genetic network from gene expression data is also reported.
Keywords: correlation mining; correlation coefficient; strongly correlated item pairs; support; Spearman; rank order correlation; large transaction databases; bioinformatics; yeast genetic networks; gene expression data. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=51920 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:5:y:2013:i:1:p:76-96
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().