EconPapers    
Economics at your fingertips  
 

WeDoCWT: A New Method for Web Document Clustering Using Discrete Wavelet Transforms

Hanan Al-Mofareji (), Mahmoud Kamel () and Mohamed Y. Dahab ()
Additional contact information
Hanan Al-Mofareji: Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 23218, Saudi Arabia
Mahmoud Kamel: Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 23218, Saudi Arabia
Mohamed Y. Dahab: Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 23218, Saudi Arabia

Journal of Information & Knowledge Management (JIKM), 2017, vol. 16, issue 01, 1-19

Abstract: Organizing web information is an important aspect of finding information in the easiest and most efficient way. We present a new method for web document clustering called WeDoCWT, which exploits the discrete wavelet transform and term signal, to improve the document representation. We studied different methods for document segmentation to construct the term signals. We used two datasets, UW-CAN and WebKB, to evaluate the proposed method. The experimental results indicated that dividing the documents into fixed segments is preferable to dividing them into logical segments based on HTML features because the web pages do not have the same structure. Mean TF–IDF reduction technique gives the best results in most cases. WeDoCWT gives F-measure better than most of the previous approaches described in the literature. We used Munkres assignment algorithm to assign each produced cluster to the original class in order to evaluate the clustering results.

Keywords: Term signal; document segmentation; web document representation; web document clustering; wavelet transform (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649217500046
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:16:y:2017:i:01:n:s0219649217500046

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649217500046

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:16:y:2017:i:01:n:s0219649217500046