New perspectives on the performance of machine learning classifiers for mode-choice prediction
It appears to be a commonly held belief that Machine Learning (ML) classification algo-rithms should achieve substantially higher predictive performance than manually speci-fied Random Utility Models (RUMs) for choice modelling. This belief is supported byseveral papers in the mode choice literature, which highlight stand-out performance ofnon-linear ML classifiers compared with linear models. However, many studies whichcompare ML classifiers with linear models have a fundamental flaw in how they validatemodels on out-of-sample data. This paper investigates the implications of this issue bycomparing out-of-sample validation using two different sampling methods for panel data:(i) trip-wise sampling, where validation folds are sampled independently from all tripsin the dataset and (ii) grouped sampling, where validation folds are sampled grouped byhousehold/person.This paper includes two linked investigations: (i) adataset investigationwhich quan-tifies the proportion of matching trips across training and validation data when using trip-wise sampling for Out-Of-Sample (OOS) validation and (ii) amodelling investigationwhich compares OOS validation results obtained using trip-wise sampling and groupedsampling. These investigations make use of the data and methodologies of three publishedstudies which explore ML classification of mode choice.The results of the dataset investigation indicate that using trip-wise sampling withtravel diary data results in significant data leakage, with up to 96% of the trips in typicaltrip-wise sampling validation folds having matching trips with the same mode choice inthe training data. Furthermore, the modelling investigation demonstrates that this dataleakage introduces substantial bias in model performance estimates, particularly for flexi-ble non-linear classifiers. Grouped sampling is found to address the issues associated withtrip-wise sampling and provides reliable estimates of true OOS predictive performance.The use of trip-wise sampling with panel data has led to incorrect conclusions beingmade in two of the investigated studies, with the original results substantially overstatingthe performance of ML models compared with linear Logistic Regression (LR) models.Whilst the results from this study indicate that there is a slight predictive performanceadvantage of non-linear classifiers (in particular Ensemble Learning (EL) models) overlinear LR models, this advantage is much more modest than has been suggested by previ-ous investigations.
2020
TRANSP-OR publications
NON-REVIEWED