Economics at your fingertips  

Predicting football outcomes from Spanish league using machine learning models

Michał Lewandowski () and Marcin Chlebus ()
Additional contact information
Michał Lewandowski: Faculty of Economic Sciences, University of Warsaw

No 2021-22, Working Papers from Faculty of Economic Sciences, University of Warsaw

Abstract: High-quality football predictive models can be very useful and profitable. Therefore, in this research, we undertook to construct machine learning models to predict football outcomes in games from Spanish LaLiga and then we compared them with historical forecasts extracted from bookmakers, which knowledge is commonly considered to be deep and high-quality. The aim of the paper was to design models with the highest possible predictive performances, get results close to bookmakers or even building better estimators. The work included detailed feature engineering based on previous achievements of this domain and own proposals. A built and selected set of variables was used with four machine learning methods, namely Random Forest, AdaBoost, XGBoost and CatBoost. The algorithms were compared based on: Area Under the Curve (AUC) and Ranked Probability Score (RPS). RPS was used as a benchmark in the comparison of estimated probabilities from trained models and forecasts from bookmakers' odds. For a deeper understanding and explanation of the demonstrated methods, which are considered as black-box approaches, Permutation Feature Importance (PFI) was used to evaluate the impacts of individual variables. Features extracted from bookmakers odds’ occurred the most important in terms of PFI. Furthermore, XGBoost achieved the best results on the validation set (RPS equals 0.1989), which obtained similar predictive power to bookmakers' odds (their RPS between 0.1977 and 0.1984). Results of the trained estimators were promising and this article showed that competition with bookmakers is possible using demonstrated techniques.

Keywords: predicting football outcomes; machine learning; betting; adaboost; random forest; xgboost; catboost; ranked probability score; auc; permutation feature importance (search for similar items in EconPapers)
JEL-codes: C13 C51 C52 C53 C61 L83 Z29 (search for similar items in EconPapers)
Pages: 35 pages
Date: 2021
New Economics Papers: this item is included in nep-big, nep-cmp, nep-for and nep-spo
References: Add references at CitEc
Citations: Track citations by RSS feed

Downloads: (external link) First version, 2021 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Access Statistics for this paper

More papers in Working Papers from Faculty of Economic Sciences, University of Warsaw Contact information at EDIRC.
Bibliographic data for series maintained by Marcin Bąba ().

Page updated 2024-02-15
Handle: RePEc:war:wpaper:2021-22