Data pre-processing for web log mining: Case study of commercial bank website usage analysis
Jozef Kapusta,
Anna Pilková,
Michal Munk and
Peter Švec
Additional contact information
Jozef Kapusta: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia
Anna Pilková: Department of Strategy and Entrepreneurship, Commenius Univeristy in Bratislava, Šafárikovo nám. 6, 818 06 Bratislava, Slovakia
Michal Munk: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia
Peter Švec: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 2013, vol. 61, issue 4, 973-979
Abstract:
We use data cleaning, integration, reduction and data conversion methods in the pre-processing level of data analysis. Data processing techniques improve the overall quality of the patterns mined. The paper describes using of standard pre-processing methods for preparing data of the commercial bank website in the form of the log file obtained from the web server. Data cleaning, as the simplest step of data pre-processing, is non-trivial as the analysed content is highly specific. We had to deal with the problem of frequent changes of the content and even frequent changes of the structure. Regular changes in the structure make use of the sitemap impossible. We presented approaches how to deal with this problem. We were able to create the sitemap dynamically just based on the content of the log file. In this case study, we also examined just the one part of the website over the standard analysis of an entire website, as we did not have access to all log files for the security reason. As the result, the traditional practices had to be adapted for this special case. Analysing just the small fraction of the website resulted in the short session time of regular visitors. We were not able to use recommended methods to determine the optimal value of session time. Therefore, we proposed new methods based on outliers identification for raising the accuracy of the session length in this paper.
Keywords: association rules; web log mining; business intelligence; financial regulation; market discipline; data preprocessing methodology (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://acta.mendelu.cz/doi/10.11118/actaun201361040973.html (text/html)
http://acta.mendelu.cz/doi/10.11118/actaun201361040973.pdf (application/pdf)
free of charge
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:mup:actaun:actaun_2013061040973
DOI: 10.11118/actaun201361040973
Access Statistics for this article
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis is currently edited by Markéta Havlásková
More articles in Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis from Mendel University Press
Bibliographic data for series maintained by Ivo Andrle ().