Using Machine Learning for Web Page Classification in Search Engine Optimization
Goran Matošević,
Jasminka Dobša and
Dunja Mladenić
Additional contact information
Goran Matošević: Faculty of Economics and Tourism Dr. Mijo Mirković, University of Pula, 52100 Pula, Croatia
Jasminka Dobša: Faculty of Organization and Informatics Varaždin, University of Zagreb, 10000 Zagreb, Croatia
Dunja Mladenić: Institute Jozes Stefan Ljubljana, 1000 Ljubljana, Slovenia
Future Internet, 2021, vol. 13, issue 1, 1-20
Abstract:
This paper presents a novel approach of using machine learning algorithms based on experts’ knowledge to classify web pages into three predefined classes according to the degree of content adjustment to the search engine optimization (SEO) recommendations. In this study, classifiers were built and trained to classify an unknown sample (web page) into one of the three predefined classes and to identify important factors that affect the degree of page adjustment. The data in the training set are manually labeled by domain experts. The experimental results show that machine learning can be used for predicting the degree of adjustment of web pages to the SEO recommendations—classifier accuracy ranges from 54.59% to 69.67%, which is higher than the baseline accuracy of classification of samples in the majority class (48.83%). Practical significance of the proposed approach is in providing the core for building software agents and expert systems to automatically detect web pages, or parts of web pages, that need improvement to comply with the SEO guidelines and, therefore, potentially gain higher rankings by search engines. Also, the results of this study contribute to the field of detecting optimal values of ranking factors that search engines use to rank web pages. Experiments in this paper suggest that important factors to be taken into consideration when preparing a web page are page title, meta description, H1 tag (heading), and body text—which is aligned with the findings of previous research. Another result of this research is a new data set of manually labeled web pages that can be used in further research.
Keywords: search engine optimization; SEO optimization; on-page optimization; classification; machine learning (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://www.mdpi.com/1999-5903/13/1/9/pdf (application/pdf)
https://www.mdpi.com/1999-5903/13/1/9/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:13:y:2021:i:1:p:9-:d:473960
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().