EconPapers    
Economics at your fingertips  
 

Multi-Label Genre Classification of Web Pages Using an Adaptive Centroid-Based Classifier

Chaker Jebari ()
Additional contact information
Chaker Jebari: IT Department, College of Applied Sciences, IBRI, BOX 516, Sultanate of Oman

Journal of Information & Knowledge Management (JIKM), 2016, vol. 15, issue 01, 1-21

Abstract: This paper proposes an adaptive centroid-based classifier (ACC) for multi-label classification of web pages. Using a set of multi-genre training dataset, ACC constructs a centroid for each genre. To deal with the rapid evolution of web genres, ACC implements an adaptive classification method where web pages are classified one by one. For each web page, ACC calculated its similarity with all genre centroids. Based on this similarity, ACC either adjusts the genre centroid by including the new web page or discards it. A web page is a complex object that contains different sections belonging to different genres. To handle this complexity, ACC implements a multi-label classification where a web page can be assigned to multiple genres at the same time. To improve the performance of genre classification, we propose to aggregate the classifications produced using character n-grams extracted from URL, title, headings and anchors. Experiments conducted using a known multi-label dataset show that ACC outperforms many other multi-label classifiers and has the lowest computational complexity.

Keywords: Multi-label classification; adaptive classification; genre centroid; aggregation (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649216500088
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:15:y:2016:i:01:n:s0219649216500088

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649216500088

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:15:y:2016:i:01:n:s0219649216500088