Association pattern discovery via theme dictionary models
Ke Deng,
Zhi Geng and
Jun S. Liu
Journal of the Royal Statistical Society Series B, 2014, vol. 76, issue 2, 319-347
Abstract:
type="main" xml:id="rssb12032-abs-0001">
Discovering patterns from a set of text or, more generally, categorical data is an important problem in many disciplines such as biomedical research, linguistics, artificial intelligence and sociology. We consider here the well-known ‘market basket’ problem that is often discussed in the data mining community, and is also quite ubiquitous in biomedical research. The data under consideration are a set of ‘baskets’, where each basket contains a list of ‘items’. Our goal is to discover ‘themes’, which are defined as subsets of items that tend to co-occur in a basket. We describe a generative model, i.e. the theme dictionary model, for such data structures and describe two likelihood-based methods to infer themes that are hidden in a collection of baskets. We also propose a novel sequential Monte Carlo method to overcome computational challenges. Using both simulation studies and real applications, we demonstrate that the new approach proposed is significantly more powerful than existing methods, such as association rule mining and topic modelling, in detecting weak and subtle interactions in the data.
Date: 2014
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://hdl.handle.net/10.1111/rssb.2014.76.issue-2 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssb:v:76:y:2014:i:2:p:319-347
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9868
Access Statistics for this article
Journal of the Royal Statistical Society Series B is currently edited by P. Fryzlewicz and I. Van Keilegom
More articles in Journal of the Royal Statistical Society Series B from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().