Sunday 29 April 2012

Nonparametric multistate representations of survival and longitudinal data with measurement error


Bo Hu, Liang Li, Xiaofeng Wang and Tom Greene have a new paper in Statistics in Medicine. This develops approaches for summarizing longitudinal and survival data in terms of marginal prevalences. The authors' use of "multistate" is perhaps not entirely in line with its typical usage. They consider data consisting of right censored competing risks data plus additional continuous longitudinal measurements which persist until a competing event has occurred (or censoring). For the purpose of creating a summary measure, the longitudinal measurement can be partitioned into a set of discrete states. There are thus states corresponding to absorbing competing risks plus a series of transient states corresponding to the longitudinal measurements. The aim of the paper is to develop a nonparametric estimate of the marginal probability of being in a particular state at a particular time.

The approach taken is firstly to use standard non-parametric estimates for competing risks data to get estimates of the probability of being in each of the absorbing states. For the longitudinal part, it is assumed that the "true" longitudinal process is not directly observed but instead observed with measurement error. As a consequence the authors propose to use smoothing splines to get an individual estimate of each subject's true trajectory. The combined state occupancy probability at time t for a longitudinal state then consists of the overall survival probability from the competing risks multiplied by the proportion of subjects still at risk at time t who are estimated (on the basis of their spline smooth) to be within that interval. The probability of being in an absorbing state is computed directly from the competing risks estimates. Overall a stacked probability plot consisting of the stacked CIFs for each of the competing risks plus the (not necessarily monotonic) partition of the longitudinal states.

The use of individual smoothing splines seems to present practical problems. Firstly, it assumes that the true longitudinal process is itself in some way "smooth". In some cases the change in state in a biological process may manifest itself in a rapid collapse of a biomarker. Secondly, it seems to require a relatively large number of longitudinal measurements per person in order to get a reasonable estimate of their "true" process. Presumably the level of the longitudinal measure is likely to have a bearing on the cause-specific-hazards of the competing risks. The occurrence of one of the competing risks is thus informative for the longitudinal process. The authors claim to have got around this by averaging only over people currently in the risk set at time t. However, the longitudinal measurements are intermittent. If they are sparse then someone may be observed at say 1 year, 2 years and then die at 10 years. The method would estimate a smooth spline based on years 1 and 2 and extrapolate up to 10 years not using the fact the subject died at 10 years. Similarly, there might be one or fewer longitudinal observations before a competing event for some patients making estimation of the true trajectory near impossible. Also, the estimator as it stands attempts no weighting to take account of the relative uncertainties about different individuals true trajectories at particular times. Overall as a descriptive tool it may be useful in some circumstances; primarily if subjects has regular longitudinal measurements. In this respect it is similar to the "prevalence counts" (Gentleman et al, 1994) method of obtaining non-parametric prevalence estimates for interval censored multi-state data.

In the appendix, a brief description is given of an approach to allowing the transition probabilities between states to be calculated. They only illustrate the method for a case of going from a longitudinal state to an absorbing state (presumably the procedure for transitions between longitudinal states would be different). Nevertheless, there doesn't seem to be any guarantee that estimated transition probabilities will lie in [0,1].

No comments: