Automated Machine Learning in Action: Performance Evaluation for Predictive Analytics Tasks

Leyh, Nicolas

Automated Machine Learning in Action: Performance Evaluation for Predictive Analytics Tasks

Nicolas Leyh

Acta Informatica Pragensia, vol. preprint

Abstract: Background: As organizations increasingly seek data-driven insights, the demand for machine learning (ML) expertise outpaces the current workforce supply. Automated Machine Learning (AutoML) frameworks help close this gap by streamlining the ML pipeline, making advanced modeling accessible to non-specialists.Objective: This study evaluates the performance of four open-source AutoML frameworks-Auto-Keras, Auto-Sklearn, H2O, and TPOT-in predictive analytics, focusing on both binary and multiclass classification. The goal is to identify performance strengths and limitations under varying dataset conditions and propose improvements for framework optimization.Methods: Quantitative experimental research design was employed. 22 publicly available datasets were selected from established benchmarking sources, covering diverse predictive analytics challenges. Framework performance was assessed across twelve data segments, defined by characteristics such as sample size, feature count, and categorical feature proportion. Evaluation metrics included AUC for binary and accuracy/F1 for multiclass classification tasks, with standardized runtime constraints applied to ensure comparability.Results: The findings show that H2O delivered strong results across diverse datasets, particularly for binary classification. However, no single framework achieved superior performance across all data segments. Auto-Sklearn performed well in multiclass classification, especially with higher feature counts, while Auto-Keras and TPOT demonstrated variable outcomes depending on dataset complexity. Performance declined notably in scenarios with high categorical proportions, severe class imbalance, or extensive missing values.Conclusion: This study demonstrates that AutoML frameworks can substantially support predictive analytics but exhibit distinct strengths and limitations under specific data conditions. While H2O proved most robust overall, targeted refinements such as enhancing feature selection in Auto-Keras and improving categorical variable handling in Auto-Sklearn could further optimize performance. The findings provide actionable insights for both practitioners selecting frameworks and developers enhancing AutoML design, highlighting the need for ongoing innovation to ensure adaptability to complex predictive analytics tasks.

References: Add references at CitEc
Citations:

Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.288.html (text/html)
free of charge

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:preprint:id:288

Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz

DOI: 10.18267/j.aip.288

Access Statistics for this article

Acta Informatica Pragensia is currently edited by Editorial Office

More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().