On multivariate calibration with unlabeled data
In principal component regression (PCR) and partial least-squares regression (PLSR), the use of unlabeled data, in addition to labeled data, helps stabilize the latent subspaces in the calibration step, typically leading to a lower prediction error. A non-sequential approach based on optimal filtering (OF) has been proposed in the literature to use unlabeled data with PLSR. In this work, a sequential version of the OF-based PLSR and a PCA-based PLSR (PLSR applied to PCA-preprocessed data) are proposed. It is shown analytically that the sequential version of the OF-based PLSR is equivalent to PCA-based PLSR, which leads to a new interpretation of OF. Simulated and experimental data sets are used to point out the usefulness and pitfalls of using unlabeled data. Unlabeled data can replace labeled data to some extent, thereby leading to an economic benefit. However, in the presence of drift, the use of unlabeled data can result in an increase in prediction error compared to that obtained with a model based on labeled data alone.