Smoothing Spline Distribution Function Estimation: Validation and Application
The estimation of cumulative distributions is classically performed using the empirical distribution function. This estimator has excellent properties but is lacking continuity. Smooth versions of the empirical distribution function have been obtained by kernel methods. We apply the smoothing spline minimization criterion, known from regression, to the empirical distribution function $\edf$. An approach exploiting the connection with the Anderson--Darling statistic is used for the choice of the smoothing parameter. A small simulation study shows that the new estimator behaves similarly to the kernel distribution function estimator. The application to several datasets assesses the estimator's usefulness in data analysis. Finally, the estimation procedure is applied to the smoothing of the Kaplan--Meier survival function estimator.