Files

Abstract

Multiple generalized additive models are a class of statistical regression models wherein parameters of probability distributions incorporate information through additive smooth functions of predictors. The functions are represented by basis function expansions, whose coefficients are the regression parameters. The smoothness is induced by a quadratic roughness penalty on the functions’ curvature, which is equivalent to a weighted $L_2$ regularization controlled by smoothing parameters. Regression fitting relies on maximum penalized likelihood estimation for the regression coefficients, and smoothness selection relies on maximum marginal likelihood estimation for the smoothing parameters. Owing to their nonlinearity, flexibility and interpretability, generalized additive models are widely used in statistical modeling, but despite recent advances, reliable and fast methods for automatic smoothing in massive datasets are unavailable. Existing approaches are either reliable, complex and slow, or unreliable, simpler and fast, so a compromise must be made. A bridge between these categories is needed to extend use of multiple generalized additive models to settings beyond those possible in existing software. This thesis is one step in this direction. We adopt the marginal likelihood approach to develop approximate expectation-maximization methods for automatic smoothing, which avoid evaluation of expensive and unstable terms. This results in simpler algorithms that do not sacrifice reliability and achieve state-of-the-art accuracy and computational efficiency. We extend the proposed approach to big-data settings and produce the first reliable, high-performance and distributed-memory algorithm for fitting massive multiple generalized additive models. Furthermore, we develop the underlying generic software libraries and make them accessible to the open-source community.

Details

Actions