Infoscience

Thesis

# Functional data analysis by matrix completion

Traditional approaches to analysing functional data typically follow a two-step procedure, consisting in first smoothing and then carrying out a functional principal component analysis. The idea underlying this procedure is that functional data are well approximated by smooth functions, and that rough variations are due to noise. However, it may very well happen that localised features are rough at a global scale but still smooth at some finer scale. In this thesis we put forward a new statistical approach for functional data arising as the sum of two uncorrelated components: one smooth plus one rough. We give non-parametric conditions under which the covariance operators of the smooth and of the rough components are jointly identifiable on the basis of discretely observed data: the covariance operator corresponding to the smooth component must be of finite rank and have real analytic eigenfunctions, while the one corresponding to the rough component must have a banded covariance function. We construct consistent estimators of both covariance operators without assuming knowledge of the true rank or bandwidth. We then use them to estimate the best linear predictors of the the smooth and the rough components of each functional datum. In both the identifiability and the inference part, we do not follow the usual strategy used in functional data analysis which is to first employ smoothing and work with continuous estimate of the covariance operator. Instead, we work directly with the covariance matrix of the discretely observed data, which allows us to use results and tools from linear algebra. In fact, we show that the whole problem of uniquely recovering the covariance operator of the smooth component given the one of the raw data can be seen as a low-rank matrix completion problem, and we make great use of a classical relation between the rank and the minors of a matrix to solve this matrix completion problem. The finite-sample performance of our approach is studied by means of simulation study.

Thèse École polytechnique fédérale de Lausanne EPFL, n° 7616 (2017)
Programme doctoral Mathématiques
Faculté des sciences de base
Institut de mathématiques d'analyse et applications
Chaire de statistique mathématique
Jury: Prof. Stephan Morgenthaler (président) ; Prof. Victor Panaretos (directeur de thèse) ; Prof. Anthony Davison, Prof. John Aston, Prof. Giles Hooker (rapporteurs)

Public defense: 2017-6-13

#### Reference

Record created on 2017-06-08, modified on 2017-06-14