COWORDS: a probabilistic model for multiple word clouds
Luís G. Silva e Silva and
Renato M. Assunção
Journal of Applied Statistics, 2018, vol. 45, issue 15, 2697-2717
Abstract:
Word clouds constitute one of the most popular statistical tools for the visual analysis of text documents because they provide users with a quick and intuitive understanding of the content. Despite their popularity for visualizing single documents, word clouds are not appropriate to compare different text documents. Independently generating word clouds for each document leads to configurations where the same word is typically located in widely different positions. This makes it very difficult to compare two or more word clouds. This paper introduces COWORDS, a new stochastic algorithm to create multiple word clouds, including one for each document. The shared words in multiple documents are placed in the same position in all clouds. Similar documents produce similar and compact clouds, making it easier to simultaneously compare and interpret several word clouds. The algorithm is based on a probability distribution in which the most probable configurations are those with a desirable visual aspect, such as a low value for the total distance between the words in all clouds. The algorithm output is a set of word clouds that are randomly selected from this probability distribution. The selection procedure uses a Markov chain Monte Carlo simulation method. We present several examples that illustrate the performance and visual results that can be obtained by our algorithm.
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2018.1435633 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:45:y:2018:i:15:p:2697-2717
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20
DOI: 10.1080/02664763.2018.1435633
Access Statistics for this article
Journal of Applied Statistics is currently edited by Robert Aykroyd
More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().