Monday, 22 February 2010

Non-Markov Multistate Modeling Using Time-Varying Covariates

Bacchetti et al have a new paper in The International Journal of Biostatistics. This considers modelling progression of liver fibrosis due to hepatitis C following liver transplant using a 5-state progressive multi-state model. The data are panel observed at irregular time points, so it would be most natural to model the data in continuous time. In addition the observed states are subject to classification error. To avoid having to making either Markov or time homogeneity assumptions, the authors adopt a discrete time assumption: assuming 4 time periods per year. They then model the transition probabilities as linear on the log-odds scale, depending on covariates such as medical center, donor age, year of transplant as well as log time since entry into the current state (relaxing the Markov assumption). Potentially, time since transplant could also be included in the model (for non-homogeneity).

The main challenge in fitting the models is to enumerate all possible complete "paths" of true states at the discrete time points that could result in the observed data. Obviously an iterative algorithm is needed for this. The computational complexity will depend on the complexity of the state transition matrix, the misclassification probability matrix and the degree of discretization.

The main drawback of approximating a continuous time process by a discrete-time process is the restriction that only 1 transition may occur between time points. While this can be acceptable for progressive models such as the one considered here, it may be more problematic when backward transitions are allowable.

The authors have developed an R package called mspath, designed to complement the existing package for continuous-time Markov and hidden Markov model msm, to allow non-Markov models through the discrete-time approximation.

Monday, 8 February 2010

A Note on Variance Estimation of the Aalen-Johansen Estimator of the Cumulative Incidence Function in Competing Risks

Allignol, Schumacher and Beyersmann have a new paper in Biometrical Journal. This considers variance estimation for the cumulative incidence functions in a cause-specific hazards competing risks analysis. They compare a Greenwood-type estimator of variance with the counting process derived estimator for left-truncated and right-censored data. They find that the Greenwood-type estimator is generally preferable for finite sample sizes.

Friday, 5 February 2010

Modelling the likely effect of the increase of the upper age limit from 70 to 73 for breast screening in the UK National Programme

Duffy, Sasieni, Olsen and Cafferty have a new paper in Statistical Methods in Medical Research. This considers the possible effect of increasing the upper age of breast cancer screening from 70 years to 73 years. Two approaches are taken. Firstly, a 4-state time homogeneous Markov model is considered, with the states representing healthy, asymptomatic breast cancer (detectable by screening), symptomatic breast cancer and death. Secondly, a discrete-time model where incidence of breast cancer and mortality from other causes is assumed to be uniform in each year of age. Other rates are still considered to be time homogeneous. The benefit in life years gained up to 88 years of age is considered. The intensities are estimated from external data. Both approaches give similar results, with about 1 life-year gained per 1000 women screened.

Wednesday, 3 February 2010

Estimating Transition Probabilities for Ignorable Intermittent Missing Data in a Discrete-Time Markov Chain

Yeh, Chan, Symanski and Davis have a new paper in Communications in Statistics - Simulation and Computation. This considers data from a discrete-time Markov model where some observations are missing in an ignorable way. This is a very common situation in longitudinal data. If the transition matrix is P for a single time period, then it is just P^2, P^3 etc. for 2,3,... time periods. For some reason the authors think that not being able to have a closed form expression for the MLE is a problem. They consider a naive approach based on only considering the observed one-step transitions, an EM algorithm approach and what they term a non-linear equations method. This latter approach is essentially computing the maximum likelihood estimate. However, the authors have failed to consider the possibility of boundary maximums. Hence they have sought to find where the gradient is 0 w.r.t. transition probabilities lying between 0 and 1. They run into problems in some cases simply because the MLE is at p=0, meaning there is no solution to the gradient equation within the allowable limits so negative values are suggested.

A further problem with the paper is that they have incorrectly estimated the standard errors under missing data in the EM algorithm, plugging estimates of the marginal counts into the complete-case formula. This is equivalent to using the full likelihood information rather than the observed information. Hence the standard errors will be underestimates.

In topic the paper has some similarities to Deltour et al, Biometrics (1999). However, that paper was considering a non-trivial problem of non-ignorable missing data, where a Stochastic-EM algorithm was proposed.