Wednesday, 22 August 2012

Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring

Ronald Butler and Douglas Bronson have a new paper in Biometrika. This develops approaches to non-parametric estimation of survival or hitting time distributions in multi-state semi-Markov models with particular emphasis on cases where (multiple) backward transitions are possible. The paper is an extension of the authors' previous JRSS B paper. Here the methodology additionally allows for right-censored observations.

The key property used throughout is that, under a semi-Markov model, the data can be partitioned into separate independent sets relating to the sojourn and destinations for each state.
A straightforward, but computationally intensive, approach to finding bootstrap confidence bands for the survival distribution of a semi-Markov process involves a nested re-sampling scheme, first sampling with replacement from the set of pairs of sojourn times and destination states for each state in the network, and for each set of data simulating a large number of walks through the network (starting from the initial state and ending at the designated 'hitting' state) from the empirical distributions implied from the first layer of re-sampling. An alternative, faster approach proposed involves replacing the inner step of the bootstrap with a step which computes saddlepoint approximations of the empirical sojourn distributions and then using flowgraph techniques to find the implied hitting time distributions. A drawback of both approaches is that they require all the mass of the sojourn distributions to be allocated. Hence, in the presence of right-censoring some convention for the mass at the tail of the distribution must be made. Here, the convention of "redistribute-to-the-right" is used, which effectively re-weights the probability mass of all the observed survival times so that it sums to 1. On the face of it this seems a rash assumption. However, right-censoring in the sample only occurs at the end of a sequence of sojourns in several states. As such, in most cases where this technique would be used, the censoring in any particular state is likely to be quite light, even if the overall observed hitting times are heavily censored. Alternative conventions on the probability mass (for instance assuming a constant hazard beyond some point) could be made, but in all cases would be arbitrary, but hopefully would have little impact on the overall estimates.

Unlike non-parametric estimates under a Markov assumption, for which the overall survival distribution will essentially equal the overall Kaplan-Meier estimate, with increasing uncertainty as time increases, under a (homogeneous) semi-Markov assumption the estimated hazard tends to a constant limit and can hence be estimated to a relatively large degree of precision at arbitrarily high times.

No comments: