Phish webpage classification using hybrid algorithm of machine learning and statistical induction ratios
Hiba Zuhair and
Ali Selamat
International Journal of Data Mining, Modelling and Management, 2020, vol. 12, issue 3, 255-276
Abstract:
Although the conventional machine learning-based anti-phishing techniques outperform their competitors in phishing detection, they are still targeted by zero-hour phish webpages due to their constraints of phishing induction. Therefore, phishing induction must be boosted up with the extraction of new features, the selection of robust subsets of decisive features, the active learning of classifiers on a big webpage stream. In this paper, we propose a hybrid feature-based classification algorithm (HFBC) for decisive phish webpage classification. HFBC hybridises two statistical criteria optimised feature occurrence (OFC) and phishing induction ratio (PIR) with the induction settings of the most salient machine learning algorithms, Naïve bays and decision tree. Additionally, we propose two constituent algorithms of features extraction and features selection for holistic phish webpage characterisation. The superiority of our proposed approach is justified and proven throughout chronological, real-time, and comparative analyses against existing machines learning-based anti-phishing techniques.
Keywords: phish webpage; machine learning; optimised feature occurrence; OFC; phishing induction ratio; PIR; hybrid feature-based classifier; HFBC. (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=108727 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:12:y:2020:i:3:p:255-276
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().