An automated baseline correction protocol for infrared spectra of atmospheric aerosols collected on polytetrafluoroethylene (Teflon) filters
A growing body of research on statistical applications for characterization of atmospheric aerosol Fourier transform infrared (FT-IR) samples collected on polytetrafluoroethylene (PTFE) filters (e.g., Russell et al., 2011; Ruthenburg et al., 2014) and a rising interest in analyzing FT-IR samples collected by air quality monitoring networks call for an automated PTFE baseline correction solution. The existing polynomial technique (Takahama et al., 2013) is not scalable to a project with a large number of aerosol samples because it contains many parameters and requires expert intervention. Therefore, the question of how to develop an automated method for baseline correcting hundreds to thousands of ambient aerosol spectra given the variability in both environmental mixture composition and PTFE baselines remains. This study approaches the question by detailing the statistical protocol, which allows for the precise definition of analyte and background subregions, applies nonparametric smoothing splines to reproduce sample-specific PTFE variations, and integrates performance metrics from atmospheric aerosol and blank samples alike in the smoothing parameter selection. Referencing 794 atmospheric aerosol samples from seven Interagency Monitoring of PROtected Visual Environment (IMPROVE) sites collected during 2011, we start by identifying key FT-IR signal characteristics, such as non-negative absorbance or analyte segment transformation, to capture sample-specific transitions between background and analyte. While referring to qualitative properties of PTFE background, the goal of smoothing splines interpolation is to learn the baseline structure in the background region to predict the baseline structure in the analyte region. We then validate the model by comparing smoothing splines baseline-corrected spectra with uncorrected and polynomial baseline (PB)-corrected equivalents via three statistical applications: (1) clustering analysis, (2) functional group quantification, and (3) thermal optical reflectance (TOR) organic carbon (OC) and elemental carbon (EC) predictions. The discrepancy rate for a four-cluster solution is 10 %. For all functional groups but carboxylic COH the discrepancy is <= 10 %. Performance metrics obtained from TOR OC and EC predictions (R-2 >= 0.94 %, bias <= 0.01 mu g m(-3), and error <= 0.04 mu g m(-3)) are on a par with those obtained from uncorrected and PB-corrected spectra. The proposed protocol leads to visually and analytically similar estimates as those generated by the polynomial method. More importantly, the automated solution allows us and future users to evaluate its analytical reproducibility while minimizing reducible user bias. We anticipate the protocol will enable FT-IR researchers and data analysts to quickly and reliably analyze a large amount of data and connect them to a variety of available statistical learning methods to be applied to analyte absorbances isolated in atmospheric aerosol samples.