EconPapers    
Economics at your fingertips  
 

COWORDS: a probabilistic model for multiple word clouds

Luís G. Silva e Silva and Renato M. Assunção

Journal of Applied Statistics, 2018, vol. 45, issue 15, 2697-2717

Abstract: Word clouds constitute one of the most popular statistical tools for the visual analysis of text documents because they provide users with a quick and intuitive understanding of the content. Despite their popularity for visualizing single documents, word clouds are not appropriate to compare different text documents. Independently generating word clouds for each document leads to configurations where the same word is typically located in widely different positions. This makes it very difficult to compare two or more word clouds. This paper introduces COWORDS, a new stochastic algorithm to create multiple word clouds, including one for each document. The shared words in multiple documents are placed in the same position in all clouds. Similar documents produce similar and compact clouds, making it easier to simultaneously compare and interpret several word clouds. The algorithm is based on a probability distribution in which the most probable configurations are those with a desirable visual aspect, such as a low value for the total distance between the words in all clouds. The algorithm output is a set of word clouds that are randomly selected from this probability distribution. The selection procedure uses a Markov chain Monte Carlo simulation method. We present several examples that illustrate the performance and visual results that can be obtained by our algorithm.

Date: 2018
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2018.1435633 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:45:y:2018:i:15:p:2697-2717

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2018.1435633

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:45:y:2018:i:15:p:2697-2717