Parallel extraction of association rules from genomics data
Giuseppe Agapito,
Pietro Hiram Guzzi and
Mario Cannataro
Applied Mathematics and Computation, 2019, vol. 350, issue C, 434-446
Abstract:
High-throughput experimental platforms like microarrays produce massive amounts of omics data for each analyzed sample. As an example, the Affymetrix DMET (Drug Metabolizing Enzymes and Transporters) microarray platform can discover Single Nucleotide Polymorphisms (SNPs) from 225 human genes involved in absorption, distribution, metabolism, and excretion (ADME) of drugs, enabling large pharmacogenomics studies. Moreover, the application of such platforms to large populations of subjects is further increasing the size of experimental datasets produced in clinical studies. Thus, the production of big omics datasets is a first reason to use parallel computing in bioinformatics. Such omics datasets are usually analyzed with classical statistical analysis and, more recently, by using data mining methods that can extract knowledge hidden in the data, e.g. by highlighting multiple associations among features of the data. However, the use of standard off-the-shelf data mining algorithms to large omic datasets, especially when considering association rule mining, poses two main issues: (i) huge requests of central memory that may prevent the execution of data mining software on personal/desktop computers; and (ii) very long response time, that may increase the time requested for completing extensive pharmacogenomics studies. To overcome the limits of standard association rule mining algorithms when applied to omics datasets, we propose PARES (Parallel Association Rules Extractor from SNPs), a novel parallel algorithm for the efficient extraction of association rules from omics datasets. PARES is implemented as a multi-thread version of an optimized version of the Frequent Pattern Growth (FP-Growth) algorithm. Moreover, it includes a customized SNPs datasets preprocessing strategy based on a Fisher’s Test Filter to discard the trivial transactions from the input dataset, reducing the search space from which to build many independent FP-Trees. The experimental results show that PARES has a good speedup and a high memory management efficiency, with respect to several association rule mining algorithms implemented in main off-the-shelf data mining platforms.
Keywords: Frequent Pattern Growth algorithm; Association rules mining; Data mining; Genomics; Parallel computing (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0096300317306471
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:apmaco:v:350:y:2019:i:c:p:434-446
DOI: 10.1016/j.amc.2017.09.026
Access Statistics for this article
Applied Mathematics and Computation is currently edited by Theodore Simos
More articles in Applied Mathematics and Computation from Elsevier
Bibliographic data for series maintained by Catherine Liu ().