H-CLAP: hierarchical clustering within a linear array with an application in genetics

Samiran, Ghosh; P., Townsend Jeffrey

H-CLAP: hierarchical clustering within a linear array with an application in genetics

Ghosh Samiran () and Townsend Jeffrey P.
Additional contact information
Ghosh Samiran: Department of Family Medicine and Public Health Sciences and Center of Molecular Medicine and Genetics, Wayne State University School of Medicine, 3127 Scott Hall, 540 East Canfield, Detroit, MI, USA
Townsend Jeffrey P.: Department of Biostatistics and Program in Computational, Biology and Bioinformatics, Yale University, 135 College Street, Suite 200, New Haven, CT 06510, USA

Statistical Applications in Genetics and Molecular Biology, 2015, vol. 14, issue 2, 125-141

Abstract: In most cases where clustering of data is desirable, the underlying data distribution to be clustered is unconstrained. However clustering of site types in a discretely structured linear array, as is often desired in studies of linear sequences such as DNA, RNA or proteins, represents a problem where data points are not necessarily exchangeable and are directionally constrained within the array. Each position in the linear array is fixed, and could be either “marked” (i.e., of interest such as polymorphic or substitute sites) or “non-marked.” Here we describe a method for clustering of those marked sites. Since the cluster-generating process is constrained by discrete locality inside such an array, traditional clustering methods need adjustment to be appropriate. We develop a hierarchical Bayesian approach. We adopt a Markov clustering algorithm, revealing any natural partitioning in the pattern of marked sites. The resulting recursive partitioning and clustering algorithm is named hierarchical clustering in a linear array (H-CLAP). It employs domain-specific directional constraints directly in the likelihood construction. Our method, being fully Bayesian, is more flexible in cluster discovery compared to a standard agglomerative hierarchical clustering algorithm. It not only provides hierarchical clustering, but also cluster boundaries, which may have their own biological significance. We have tested the efficacy of our method on data sets, including two biological and several simulated ones.

Keywords: constrained prior; genetics; hierarchical clustering; linear array; Markov clustering (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2013-0076 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:14:y:2015:i:2:p:125-141:n:2

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.1515/sagmb-2013-0076

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().