Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy
 
research article

Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy

Takahama, Satoshi  
•
Dillner, Ann M.
2015
Journal of Chemometrics

In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade-offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user-defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross-validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid-infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0-30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. (C) 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

TakahamaDillner_Model selection for partial least squares calibration and implications for_2015.pdf

Type

Publisher's Version

Version

Published version

Access type

openaccess

Size

2.28 MB

Format

Adobe PDF

Checksum (MD5)

0452df874d84ef948137f3f18415d06e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés