EconPapers    
Economics at your fingertips  
 

Effect of N-Grams Technique in Preprocessing of Email Spam Filtering

Aakanksha Sharaff and Naresh Kumar Nagwani
Additional contact information
Aakanksha Sharaff: National Institute of Technology Raipur, Raipur, India
Naresh Kumar Nagwani: National Institute of Technology Raipur, Raipur, India

International Journal of Applied Evolutionary Computation (IJAEC), 2017, vol. 8, issue 1, 26-37

Abstract: In this paper, the process of spam categorization based on character level; content-based approach has been demonstrated. Spam categorization has been performed by using N-gram technique. The general technique of using N-grams on words, creating a “Bag of Words” representation of documents, has been replaced by ‘Bag of Characters'.‘Bag of Character' is created by treating the whole email document as a single string and splitting it character-wise. In this approach, multiple N-grams i.e. bi-grams, tri-grams and quad-grams have been used simultaneously. It results in ‘bag of character' representation of email documents containing N-grams of sizes 2, 3 and 4. It enhances the results by enabling us to solve the problems occurring in Word N-grams. All the experiments have been performed on Ling Spam Corpus.

Date: 2017
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijaec.2017010102 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jaec00:v:8:y:2017:i:1:p:26-37

Access Statistics for this article

International Journal of Applied Evolutionary Computation (IJAEC) is currently edited by Sukhpal Singh Gill

More articles in International Journal of Applied Evolutionary Computation (IJAEC) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jaec00:v:8:y:2017:i:1:p:26-37