Quantification and Visualization of LD Patterns and Identification of Haplotype Blocks
Yan Wang and
Sandrine Dudoit
Additional contact information
Yan Wang: Division of Biostatistics, School of Public Health, University of California, Berkeley
Sandrine Dudoit: Division of Biostatistics, School of Public Health, University of California, Berkeley
No 1150, U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Abstract:
Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distance-based LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than using classical LD measures. Moreover, the patterns are highly compatible with recently suggested models of haplotype block structure. We propose to apply the new LD measure to define haplotype blocks through cluster analysis. Specifically, we present a distance-based clustering algorithm, DHPBlocker, which performs hierarchical partitioning of an ordered sequence of markers into disjoint and adjacent blocks with a hierarchical structure. The proposed method integrates information on the two main existing criteria in defining haplotype blocks, namely, LD and haplotype diversity, through the use of silhouette width and description length as cluster validity measures, respectively. The new LD measure and clustering procedure are applied to single nucleotide polymorphism (SNP) datasets from the human 5q31 region (Daly et al. 2001) and the class II region of the human major histocompatibility complex (Jeffreys et al. 2001). Our results are in good agreement with published results. In addition, analyses performed on different subsets of markers indicate that the method is robust with regards to the allele frequency and density of the genotyped markers. Unlike previously proposed methods, our new cluster-based method can uncover hierarchical relationships among blocks and can be applied to polymorphic DNA markers or amino acid sequence data.
Keywords: Block; cluster analysis; distance; genetic mapping; haplotype; hierarchical; imputation; linkage disequilibrium; minimum description length; partitioning; silhouette width; single nucleotide polymorphism (search for similar items in EconPapers)
Date: 2004-07-11
Note: oai:bepress.com:ucbbiostat-1150
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.bepress.com/cgi/viewcontent.cgi?article=1150&context=ucbbiostat (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bep:ucbbio:1150
Access Statistics for this paper
More papers in U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Bibliographic data for series maintained by Christopher F. Baum ().