Effect of N-Grams Technique in Preprocessing of Email Spam Filtering
Aakanksha Sharaff and
Naresh Kumar Nagwani
Additional contact information
Aakanksha Sharaff: National Institute of Technology Raipur, Raipur, India
Naresh Kumar Nagwani: National Institute of Technology Raipur, Raipur, India
International Journal of Applied Evolutionary Computation (IJAEC), 2017, vol. 8, issue 1, 26-37
Abstract:
In this paper, the process of spam categorization based on character level; content-based approach has been demonstrated. Spam categorization has been performed by using N-gram technique. The general technique of using N-grams on words, creating a “Bag of Words” representation of documents, has been replaced by ‘Bag of Characters'.‘Bag of Character' is created by treating the whole email document as a single string and splitting it character-wise. In this approach, multiple N-grams i.e. bi-grams, tri-grams and quad-grams have been used simultaneously. It results in ‘bag of character' representation of email documents containing N-grams of sizes 2, 3 and 4. It enhances the results by enabling us to solve the problems occurring in Word N-grams. All the experiments have been performed on Ling Spam Corpus.
Date: 2017
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijaec.2017010102 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jaec00:v:8:y:2017:i:1:p:26-37
Access Statistics for this article
International Journal of Applied Evolutionary Computation (IJAEC) is currently edited by Sukhpal Singh Gill
More articles in International Journal of Applied Evolutionary Computation (IJAEC) from IGI Global
Bibliographic data for series maintained by Journal Editor ().