EconPapers    
Economics at your fingertips  
 

Data pre-processing for web log mining: Case study of commercial bank website usage analysis

Jozef Kapusta, Anna Pilková, Michal Munk and Peter Švec
Additional contact information
Jozef Kapusta: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia
Anna Pilková: Department of Strategy and Entrepreneurship, Commenius Univeristy in Bratislava, Šafárikovo nám. 6, 818 06 Bratislava, Slovakia
Michal Munk: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia
Peter Švec: Department of Computer Science, Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74 Nitra, Slovakia

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 2013, vol. 61, issue 4, 973-979

Abstract: We use data cleaning, integration, reduction and data conversion methods in the pre-processing level of data analysis. Data processing techniques improve the overall quality of the patterns mined. The paper describes using of standard pre-processing methods for preparing data of the commercial bank website in the form of the log file obtained from the web server. Data cleaning, as the simplest step of data pre-processing, is non-trivial as the analysed content is highly specific. We had to deal with the problem of frequent changes of the content and even frequent changes of the structure. Regular changes in the structure make use of the sitemap impossible. We presented approaches how to deal with this problem. We were able to create the sitemap dynamically just based on the content of the log file. In this case study, we also examined just the one part of the website over the standard analysis of an entire website, as we did not have access to all log files for the security reason. As the result, the traditional practices had to be adapted for this special case. Analysing just the small fraction of the website resulted in the short session time of regular visitors. We were not able to use recommended methods to determine the optimal value of session time. Therefore, we proposed new methods based on outliers identification for raising the accuracy of the session length in this paper.

Keywords: association rules; web log mining; business intelligence; financial regulation; market discipline; data preprocessing methodology (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://acta.mendelu.cz/doi/10.11118/actaun201361040973.html (text/html)
http://acta.mendelu.cz/doi/10.11118/actaun201361040973.pdf (application/pdf)
free of charge

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:mup:actaun:actaun_2013061040973

DOI: 10.11118/actaun201361040973

Access Statistics for this article

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis is currently edited by Markéta Havlásková

More articles in Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis from Mendel University Press
Bibliographic data for series maintained by Ivo Andrle ().

 
Page updated 2025-03-19
Handle: RePEc:mup:actaun:actaun_2013061040973