Wednesday, 3 February 2010

Estimating Transition Probabilities for Ignorable Intermittent Missing Data in a Discrete-Time Markov Chain

Yeh, Chan, Symanski and Davis have a new paper in Communications in Statistics - Simulation and Computation. This considers data from a discrete-time Markov model where some observations are missing in an ignorable way. This is a very common situation in longitudinal data. If the transition matrix is P for a single time period, then it is just P^2, P^3 etc. for 2,3,... time periods. For some reason the authors think that not being able to have a closed form expression for the MLE is a problem. They consider a naive approach based on only considering the observed one-step transitions, an EM algorithm approach and what they term a non-linear equations method. This latter approach is essentially computing the maximum likelihood estimate. However, the authors have failed to consider the possibility of boundary maximums. Hence they have sought to find where the gradient is 0 w.r.t. transition probabilities lying between 0 and 1. They run into problems in some cases simply because the MLE is at p=0, meaning there is no solution to the gradient equation within the allowable limits so negative values are suggested.

A further problem with the paper is that they have incorrectly estimated the standard errors under missing data in the EM algorithm, plugging estimates of the marginal counts into the complete-case formula. This is equivalent to using the full likelihood information rather than the observed information. Hence the standard errors will be underestimates.

In topic the paper has some similarities to Deltour et al, Biometrics (1999). However, that paper was considering a non-trivial problem of non-ignorable missing data, where a Stochastic-EM algorithm was proposed.

No comments: