Computer-intensive statistical methods: saddlepoint approximations with applications in bootstrap and robust inference
The saddlepoint approximation was introduced into statistics in 1954 by Henry E. Daniels. This basic result on approximating the density function of the sample mean has been generalized to many situations. The accuracy of this approximation is very good, particularly in the tails of the distribution and for small sample sizes, compared with normal or Edgeworth approximation methods. Before applying saddlepoint approximations to the bootstrap, this thesis will focus on saddlepoint approximations for the distribution of quadratic forms in normal variables and for the distribution of the waiting time in the coupon collector's problem. Both developments illustrate the modern art of statistics relying on the computer and embodying both numeric and analytic approximations. Saddlepoint approximations are extremely accurate in both cases. This is underlined in the first development by means of an extensive study and several applications to nonparametric regression, and in the second by several examples, including the exhaustive bootstrap seen from a collector's point of view. The remaining part of this thesis is devoted to the use of saddlepoint approximations in order to replace the computer-intensive bootstrap. The recent massive increases in computer power have led to an upsurge in interest in computer-intensive statistical methods. The bootstrap is the first computer-intensive method to become widely known. It found an immediate place in statistical theory and, more slowly, in practice. The bootstrap seems to be gaining ground as the method of choice in a number of applied fields, where classical approaches are known to be unreliable, and there is sustained interest from theoreticians in its development. But it is known that, for accurate approximations in the tails, the nonparametric bootstrap requires a large number of replicates of the statistic. As this is time-intensive other methods should be considered. Saddlepoint methods can provide extremely accurate approximations to resampling distributions. As a first step I develop fast saddlepoint approximations to bootstrap distributions that work in the presence of an outlier, using a saddlepoint mixture approximation. Then I look at robust M-estimates of location like Huber's M-estimate of location and its initially MAD scaled version. One peculiarity of the current literature is that saddlepoint methods are often used to approximate the density or distribution functions of bootstrap estimators, rather than related pivots, whereas it is the latter which are more relevant for inference. Hence the aim of the final part of this thesis is to apply saddlepoint approximations to the construction of studentized confidence intervals based on robust M-estimates. As examples I consider the studentized versions of Huber's M-estimate of location, of its initially MAD scaled version and of Huber's proposal 2. In order to make robust inference about a location parameter there are three types of robustness one would like to achieve: robustness of performance for the estimator of location, robustness of validity and robustness of efficiency for the resulting confidence interval method. Hence in the context of studentized bootstrap confidence intervals I investigate these in more detail in order to give recommendations for practical use, underlined by an extensive simulation study.