A comparative study of machine learning models for taxi-demand prediction using a big data framework

Alam, Shafiq; Ayub, Muhammad Sohaib; Cui, Hao; Khan, Muhammad Asad

A comparative study of machine learning models for taxi-demand prediction using a big data framework

Shafiq Alam (), Muhammad Sohaib Ayub (), Hao Cui () and Muhammad Asad Khan ()
Additional contact information
Shafiq Alam: Massey University
Muhammad Sohaib Ayub: Lahore University of Management Sciences
Hao Cui: Whitireia Community Polytechnic
Muhammad Asad Khan: Hazara University

Public Transport, 2025, vol. 17, issue 3, No 7, 803-833

Abstract: Abstract The increase in urban vehicle numbers and ownership extends traffic congestion, with road expansion often escalating the problem, while growing individual car usage further contributes to the issue. To address these challenges, the development of efficient and accurate demand predictors, particularly for taxi-passenger demand, has attracted a significant attention from both industry and academia. This research aims to address the question of how predictive models, evaluated within a big data framework, can be effectively utilized to forecast taxi-passenger demand and enhance urban mobility. The study is structured around two key objectives: first, to evaluate existing predictive models for analyzing traffic data and forecasting taxi-passenger demand using a dataset from the New York City Taxi and Limousine Commission (TLC), and second, to provide practical recommendations for improving demand prediction through a comparative analysis of machine learning models—such as multinomial logistic regression, generalized linear regression, random forest, and decision tree algorithms. Our results demonstrate that the glr outperforms both random forest and decision tree models in predicting taxi-passenger demand, achieving approximately 90% $$\text{R}^{2}$$ R 2 score. The multinomial logistic regression classifier predicts demand levels with over 70% accuracy. The lower performance of random forest and decision tree algorithms is attributed to the highly imbalanced nature of the demand, leading to overfitting and local optimization issues. The application of these predictive models can contribute to optimizing fleet management, reducing passenger waiting times, and improving taxi distribution, potentially alleviating congestion and supporting a better integration with other transport modes.

Keywords: Traffic prediction; Machine learning; Predictive models; Urban mobility; Big data framework; Machine learning; Classification (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s12469-025-00401-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:pubtra:v:17:y:2025:i:3:d:10.1007_s12469-025-00401-1

Ordering information: This journal article can be ordered from
https://www.springer ... search/journal/12469

DOI: 10.1007/s12469-025-00401-1

Access Statistics for this article

Public Transport is currently edited by Stefan Voß

More articles in Public Transport from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().