Infoscience

Thesis

Blowing in the Wind: Regularizations and Outlier Removal

Every day tons of pollutants are emitted into the atmosphere all around the world. These pollutants are altering the equilibrium of our planet, causing profound changes in its climate, increasing global temperatures, and raising the sea level. The need to curb these emissions is clear and urgent. To do so, it is first necessary to estimate the quantity of pollutants that is being emitted. Hence, the central challenge of this thesis: how can we estimate the spatio-temporal emissions of a pollutant from many later observations of the concentration of that pollutant at different times and locations? Mathematically speaking, given such observations and an atmospheric dispersion model, this is a linear inverse problem. Using real datasets, we show that the main difficulties in solving this problem are ill-conditioning and outliers. Ill-conditioning amplifies the effect of additive noise, and each outlier strongly deflects our estimate from the ground truth. We proceed in two different ways to design new estimation methods that can handle these challenges. In the first approach, we enhance traditional estimators, which are already equipped to deal with ill-conditioning, with a preprocessing step to make them robust against outliers. This preprocessing step blindly localizes outliers in the dataset to remove them completely or to downgrade their influence. We propose two ways of localizing outliers: the first one uses several transport models, while the second one uses random sampling techniques. We show that our preprocessing step significantly improves the performance of traditional estimators, both in synthetic datasets as well as in real-world measurements. The second approach is based on enhancing existing robust estimators, which are already equipped to deal with outliers, with suitable regularizations, so that they are stable when the problem is ill-conditioned. We analyze the properties of our new estimators and compare them with the properties of existing estimators, showing the advantages of introducing the regularization. Our new estimators perform well both in the presence and in the absence of outliers, making them generally applicable. They have good performance with up to 50 % of outliers in the dataset. They are also stable when the problem is ill-conditioned. We demonstrate their performance using real-world measurements. Two different algorithms to compute the new estimators are given: one is based on an iterative re-weighted least squares algorithm and the other on a proximal gradient algorithm. Software implementations of all our proposed estimators, along with sample datasets, are provided as part of our commitment to reproducible results. In addition, we provide LinvPy, an open-source python package that contains tested, documented, and user-friendly implementations of our regularized robust algorithms.

Related material