Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy

Takahama, Satoshi; Dillner, Ann M.

doi:10.1002/cem.2761

Takahama, Satoshi; Dillner, Ann M.

2015

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade-offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user-defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross-validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid-infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0-30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. (C) 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.

Details

Title Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy

Author(s) Takahama, Satoshi ; Dillner, Ann M.

Published in Journal of Chemometrics

Pagination 10

Volume 29

Issue 12

Pages 659-668

Date 2015

Publisher Hoboken, Wiley-Blackwell

ISSN 0886-9383

Keywords

partial least squares (PLS); multivariate calibration; bias/variance tradeoff; over-fitting; latent variable

DOI https://doi.org/10.1002/cem.2761

Other identifier(s) View record in Web of Science

Laboratories APRL

Record Appears in Scientific production and competences > ENAC - School of Architecture, Civil and Environmental Engineering > IIE - Environmental Engineering Institute > APRL - Atmospheric Particle Research Laboratory
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2015-12-03

Actions

Preview

Select file: