Saturday 21 July 2012

Efficient computation of nonparametric survival functions via a hierarchical mixture formulation

Yong Wang and Stephen Taylor have a new paper in Statistics and Computing. This develops a new algorithm for computing the non-parametric maximum likelihood estimate for interval-censored (or part interval-censored) survival data. The problem has a mixture representation (first noted by Bohning et al, Biometrika 1996) and hence methods for finding the NPMLE of a mixing distribution can be applied (for instance the Constrained Newton method of Wang (2007, JRSS B). However, each iteration of the constrained Newton method requires the solution of a constrained least squares problem. If the overall sample size is large, and/or the data are composed of a proportion of exact as well as interval censored observations, the set of candidate support intervals for the NPMLE will be large and the constrained LS problem may be computationally expensive. Wang and Taylor propose to re-formulate the problem as a hierarchy of blocks of intervals. The new algorithm is shown to work well regardless of the sample size of the data and proportion of exactly observed failure times (indeed it essentially tends to the standard constrained Newton algorithm for small samples). However, it is most useful for very large sample sizes and where the proportion of exact observations (and hence unique candidate support points) is high. One thing that looks slightly unfair about the comparison of algorithms was the use of the Icens package to implement some of the older algorithms. While algorithms like EM are obviously inferior, the fact that they are implemented entirely in R whereas others like the support reduction algorithm in MLEcens are implemented in C, make comparison of computation times difficult. However, it is obviously beyond expectations for an author to have to program all algorithms in an equally efficient way!

No comments: