EconPapers    
Economics at your fingertips  
 

Interactive Exploration of Large Dendrograms with Prototypes

Andee Kaplan and Jacob Bien

The American Statistician, 2023, vol. 77, issue 2, 201-211

Abstract: Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a dataset. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter datasets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this article we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility.

Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2022.2087734 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:77:y:2023:i:2:p:201-211

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2022.2087734

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:amstat:v:77:y:2023:i:2:p:201-211