Distributional Regression and Autoregression via Optimal Transport

Ghodrati, Laya

doi:10.5075/epfl-thesis-9780

doctoral thesis

Distributional Regression and Autoregression via Optimal Transport

2023

We present a framework for performing regression when both covariate and response are probability distributions on a compact and convex subset of $\R^d$. Our regression model is based on the theory of optimal transport and links the conditional Fr'echet mean of the response to the covariate via an optimal transport map. We define a Fr'echet-least-squares estimator of this regression map, and establish its consistency and rate of convergence to the true map under full observation of the regression pairs.

For the specific case when $d=1$, we obtain additional results: we establish the minimax rate of estimation of such a regression function, by deriving a lower bound that matches the convergence rate attained by the Fr'echet least squares estimator.
Additionally, we find an upper-bound for the convergence rate of an estimator when observing only samples from the covariate and response distributions. Also in this case, the computation of the estimator is shown to reduce to a standard convex optimisation problem, and thus our regression model can be implemented with ease. We illustrate our methodology using real and simulated data.

We explore the problem of defining and fitting models of autoregressive time series of probability distributions on a compact interval of $\R$. In this context, an order-$1$ autoregressive model is a Markov chain that specifies a certain structure (regression) for the one-step conditional Fr'echet mean with respect to a natural probability metric. We construct and investigate different models based on iterated random function systems of optimal transport maps. While the properties and interpretation of these models depend on how they relate to the iterated transport system, they can all be analyzed theoretically in a unified way. We present such a theoretical analysis, including convergence rates, and illustrate our methodology using real and simulated data. Our models generalise or extend certain existing models of transportation-based regression and autoregression, and in doing so also provides some new insights on those previous models.

Name

EPFL_TH9780.pdf

Type

n/a

Access type

openaccess

License Condition

copyright

Size

6.47 MB

Format

Adobe PDF

Checksum (MD5)

42a47faabcfaa293e9ddebe7f3dc0e35