Ortelli, Nicola MarcoCochon de Lapparent, Matthieu MarieBierlaire, Michel2024-02-232024-02-232024-02-232024-03-0110.1016/j.jocm.2023.100467https://infoscience.epfl.ch/handle/20.500.14299/205436WOS:001153682000001In the context of discrete choice modeling, the extraction of potential behavioral insights from large datasets is often limited by the poor scalability of maximum likelihood estimation. This paper proposes a simple and fast dataset-reduction method that is specifically designed to preserve the richness of observations originally present in a dataset, while reducing the computational complexity of the estimation process. Our approach, called LSH-DR, leverages locality -sensitive hashing to create homogeneous clusters, from which representative observations are then sampled and weighted. We demonstrate the efficacy of our approach by applying it on a real -world mode choice dataset: the obtained results show that the samples generated by LSH-DR allow for substantial savings in estimation time while preserving estimation efficiency at little cost.Discrete Choice ModelsMaximum Likelihood EstimationDataset ReductionSample SizeLocality-Sensitive HashingResampling estimation of discrete choice modelstext::journal::journal article::research article