EconPapers    
Economics at your fingertips  
 

Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

Cheng Li (), Santu Rana, Dinh Phung and Svetha Venkatesh
Additional contact information
Cheng Li: Deakin University
Santu Rana: Deakin University
Dinh Phung: Deakin University
Svetha Venkatesh: Deakin University

Annals of Data Science, 2016, vol. 3, issue 2, No 5, 205-223

Abstract: Abstract The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering.

Keywords: Bayesian nonparametric; Dirichlet process; Mixture models; Pairwise constraints; Constrained clustering; Short-text clustering (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40745-016-0082-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:3:y:2016:i:2:d:10.1007_s40745-016-0082-z

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-016-0082-z

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:aodasc:v:3:y:2016:i:2:d:10.1007_s40745-016-0082-z