ClickClust: An R Package for Model-Based Clustering of Categorical Sequences
Volodymyr Melnykov
Journal of Statistical Software, 2016, vol. 074, issue i09
Abstract:
The R package ClickClust is a new piece of software devoted to finite mixture modeling and model-based clustering of categorical sequences. As a special kind of time series, categorical sequences, also known as categorical time series, exhibit a time-dependent nature and are traditionally modeled by means of Markov chains. Clustering categorical sequences is an important problem with multiple applications, but grouping sequences of sites or web-pages, also known as clickstreams, is one of the most well-known problems that helps discover common navigation patterns and routes taken by users. This popular application is recognized in the package title ClickClust. The paper discusses methodological and algorithmic foundations of the package based on finite mixtures of Markov models. The number of Markov chain states can often be large leading to high-dimensional transition probability matrices. The high number of model parameters can affect clustering performance severely. As a remedy to this problem, backward and forward selection algorithms are proposed for grouping states. This extends the original clustering problem to a biclustering framework. Among other capabilities of ClickClust, there are the estimation of the variance-covariance matrix corresponding to model parameter estimates, prediction of future states visited, and the construction of a display named click-plot that helps illustrate the obtained clustering solutions. All available functions and the utility of the package are thoroughly discussed and illustrated on multiple examples.
Date: 2016-10-23
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://www.jstatsoft.org/index.php/jss/article/view/v074i09/v74i09.pdf
https://www.jstatsoft.org/index.php/jss/article/do ... ckClust_1.1.5.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... ile/v074i09/v74i09.R
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:jss:jstsof:v:074:i09
DOI: 10.18637/jss.v074.i09
Access Statistics for this article
Journal of Statistical Software is currently edited by Bettina Grün, Edzer Pebesma and Achim Zeileis
More articles in Journal of Statistical Software from Foundation for Open Access Statistics
Bibliographic data for series maintained by Christopher F. Baum ().