Saturday, 18 August 2012

A semi-Markov model for stroke with piecewise-constant hazards in the presence of left, right and interval censoring


Venediktos Kapetanakis, Fiona Matthews and Ardo van den Hout have a new paper in Statistics in Medicine. This develops a progressive three-state illness-death model for strokes with interval-censored data. The proposed model is a time non-homogeneous Markov model. The main approach to computation is to assume a continuous effect of age (time) on transition intensities but to use a piecewise constant approximation to actually fit it. The intensity to death from the stroke state additionally depends on the time since entry into the state (i.e. age at stroke) and since the exact time is typically unknown, it is necessary to numerically integrate over the possible range of transition times (here using Simpson's rule).

The data include subjects whose time to stroke is left-censored because they have already suffered a stroke before the baseline measurement. The authors state that they cannot integrate out the unknown time of stroke because the left-interval (i.e. the last age at which the subject is known to have been healthy) is unknown. They then proceed to propose a seemingly unnecessary ad-hoc EM-type approach based on estimating the stroke age for these individuals, which requires the arbitrary choice of an age at which it can be assumed the subject was stroke free. However, surely if we can assume a , we can just use as the lower limit in the integral for the likelihood?

The real issue seems to be that all subjects are effectively left-truncated at the time of entry into the study (in the sense that they are only sampled due to not having died before their current age). For subjects who are healthy at baseline this left-truncation is accounted for by just integrating the hazards of transition out of state 1 from their age at baseline rather than age 0. For subjects who have already had a stroke things are more complicated because the fact they have survived provides information on the timing of the stroke (e.g. if stroke increases hazard of death, the fact they have survived implies the stroke occurred sooner than one would assume if no information on survival were known). Essentially the correct likelihood is conditional on survival to time and so the unconditional probability of the observed data needs to be divided through by the unconditional probability of survival to time . For instance, in their notation, a subject in state 2 at baseline censored at time should have likelihood contribution: The authors claim that their convoluted EM-approach has "bypassed the problem of left truncation". In reality, they have explicitly corrected for left-truncation (because the expected transition time is conditional on being in state 2 at baseline) but in a way that is seemingly much more computationally demanding than directly computing the left-truncated likelihood would be.

No comments: