Sequence clustering approach for clustering web user session
Pradeep Kumar
International Journal of Business Information Systems, 2018, vol. 28, issue 1, 67-78
Abstract:
Clustering web usage data is useful to discover interesting patterns pertaining to user traversals, behaviour and their usage characteristics. It is also useful for trend discovery as well as for building personalisation and recommendation engines. Since web is dynamic, clustering web user transactions results in arbitrary shapes. Moreover, users accesses web pages in an order in which they are interested and hence incorporating sequence nature of their usage is crucial for clustering web transactions. In this paper, we present an approach to cluster web usage sequence data and removing noise using DBSCAN algorithm. We also study the impact of clustering process when both sequence and content information is incorporated while computing similarity measure. We use sequence and set similarity (S3M) measure to capture both the order of occurrence of page visits and the page information itself, and compared the results with Euclidean distance and Jaccard similarity measures. The inter-cluster and intra-cluster distances are computed using average Levensthein distance (ALD) to demonstrate the usefulness of the proposed approach in the context of web usage mining.
Keywords: sequence clustering; web usage data; similarity measures; average Levensthein distance; ALD. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=91163 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:28:y:2018:i:1:p:67-78
Access Statistics for this article
More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().