The Algorithm of Data Preprocessing in Web Log Mining Based on Cloud Computing

Zhang, Guanglu; Zhang, Mingxin

The Algorithm of Data Preprocessing in Web Log Mining Based on Cloud Computing

Guanglu Zhang () and Mingxin Zhang ()
Additional contact information
Guanglu Zhang: Northwest Normal University
Mingxin Zhang: Changshu Institute of Technology

A chapter in 2012 International Conference on Information Technology and Management Science(ICITMS 2012) Proceedings, 2013, pp 467-474 from Springer

Abstract: Abstract In the structure of distributed cluster server, web log data mining model based on data warehouse has the defects of bottlenecks in the network and computing, transmission errors caused by the large data transmission, the paper makes use of the advantages of cloud computing, distributed processing and virtualization technology, designs a type of Web log analysis platform based on cloud computing Hadoop cluster framework, finally, a new hybrid algorithm of distributed procession in the cloud computing environment is proposed. To further verify the efficiency of the platform, we use the improved data pretreatment algorithm on the platform of processing large number of Web logs, experimental results show that it can improve the efficiency of Web data mining.

Keywords: Web log mining; Expected page; Website structure; Navigated path (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-642-34910-2_54

Ordering information: This item can be ordered from
http://www.springer.com/9783642349102

DOI: 10.1007/978-3-642-34910-2_54

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().