Dimensionality reduction framework for blog mining and visualisation
Flora S. Tsai
International Journal of Data Mining, Modelling and Management, 2012, vol. 4, issue 3, 267-285
Abstract:
The growing abundance of blogs and new forms of social media has created a critical need for new technologies to transfer the digital realm of social media into a manageable form. Blog mining addresses the domain-specific problem of mining information from blog data. Although mining blogs may share many similarities to web and text documents, existing data mining techniques need to be reevaluated and adapted for the multidimensional representation of blog data, which exhibit dimensions not present in traditional documents. In this paper, a new approach is presented for blog mining and visualisation based on dimensionality reduction techniques. The author-topic model based on latent Dirichlet allocation was extended for analysing and visualising blog authors, links, and time. A framework based on dimensionality reduction is proposed to visualise the blog dimensions of content, tags, authors, links, and time. This framework has been successfully designed, implemented, and evaluated on real-world blog data.
Keywords: blog mining; dimensionality reduction; visualisation; multidimensional scaling; MDS; isometric feature mapping; Isomap; locally linear embedding; LLE; latent Dirichlet allocation; LDA; blogs; blogging; weblogs; data mining; blog data. (search for similar items in EconPapers)
Date: 2012
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=48108 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:4:y:2012:i:3:p:267-285
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().