Improving blog spam filters via machine learning
Weiwen Yang and
Linchi Kwok
International Journal of Data Analysis Techniques and Strategies, 2017, vol. 9, issue 2, 99-121
Abstract:
As an important platform of electronic commerce, blogs can greatly influence internet users' purchasing decisions. Spam, however, can substantially reduce blogs' positive impact on electronic commerce. This paper introduces SK, an alternative algorithm combining supervised learning (SVM) and unsupervised learning (K-means++) to detect blog spam. If either classifies a blog as spam, then the blog is assigned to the spam category. Feature selection includes term frequency, inverse document frequency, binary representation, stop words, outgoing links, advertiser content, and burst with keywords. Accuracy of each model was tested and compared in experiments with 3,000 blog pages from University of Maryland and 3,560 internet blogs. Findings suggest that combining the SVM algorithm and K-means++ clustering can increase accuracy of filtering spams by about 7% as compared to using just one of these methods. Strengths and weaknesses of various spam-filtering methods were discussed, providing considerations for businesses when choosing a spam filter.
Keywords: spam filter; support vector machine; SVM; K-means++; machine learning; neural network. (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.inderscience.com/link.php?id=85901 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:9:y:2017:i:2:p:99-121
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().