Tuesday, 17 August 2010

A novel semiparametric regression method for interval-censored data




Seungbong Han, Adin-Cristian Andrei and Kam-Wah Tsui have a paper currently available as a University of Wisconsin Biostatistics and Medical Informatics Department working paper. This essentially extends the concept of pseudo-observations, which up till now have only concerned right-censored, to interval censored survival data. The idea remains the same except that S(t) is estimated using the NPMLE of the survival function (e.g. via Turnbull self-consistent estimator or iterative convex minorant).

The paper is a little disappointing in only providing a very brief heuristic justification for using pseudo-observations in the interval-censoring case. For right-censored data, an estimate of the baseline survival function can be obtained as well as the regression parameters. No discussion of whether this is possible for the interval censored case is given. However, the baseline estimates are likely to be highly unreliable (e.g. non-monotonic) because particular subjects may have extreme influence because they effect where the mass points of the NPMLE occur. For example, the plot above is based on 1000 subjects with survival generated from an exponential with rate 0.25, subject to independent current status observation (uniformly on (0,10)). Estimating the baseline survival from the pseudo-observations (calcuated at times 1,2,3,...,10) leads to a survivor function which increases at time 5. It seems necessary that there should be more consideration of this issue as well as the choice of how many time points to evaluate the pseudo-observations at.

The authors choose to transform the pseudo-observations before regressing on the covariates, rather than using a link function in a GLM. One problem with this approach is presumably that if the estimate of S(t) is 0 or 1, g(S(t))=-Inf or Inf.

On a practical level the authors use Icens to calculate the NPMLE. As noted previously the MLEcens package seems to perform considerably better than Icens and would presumably speed up computation of the pseudo-observations method.

A natural next step would be to consider pseudo-observations for interval censored multi-state data. The lack of non-parametric methods except in a few simple cases is an obvious bar to development in this direction.

Update: The paper has now been published in Communications in Statistics - Simulation and Computation.

No comments: