Friday, 9 March 2012

Estimating Discrete Markov Models From Various Incomplete Data Schemes

Alberto Pasanisi, Shuai Fu and Nicolas Bousquet have a new paper in Computational Statistics & Data Analysis. This considers approaches to Bayesian inference for time-homogeneous discrete-time Markov models under incomplete observation. Firstly, they consider where there are missing observations in a sequence of states (considering different missingness assumptions). Secondly, they consider the case of aggregate data where all that is known is the number of subjects in each state at each time. A Bayesian approach is adopted throughout, which the authors claim is the most convenient in this situation.

The part involving incomplete data follows similar ground to Deltour et al (Biometrics, 1999). The problem only becomes non-trivial if the missingness mechanism is non-ignorable.

The treatment of aggregate data is incorrect as the authors state the likelihood as being the product of independent multinomial random variables with probabilities corresponding to the probability of being in a state at time t given the initial state distribution at time 0. As a result they claim the likelihood is proportional to the case of current status data where each subject or unit is only observed once. The reason that likelihood-based inference for aggregate data is so difficult is that we observe all units multiple times but don't know number or nature of the transitions that occurred. Hence, the full likelihood would require summing over all possible transitions consistent with the aggregate counts. Kalbfleisch and Lawless (Canadian Journal of Statistics, 1984) derived the mean and covariance of the aggregated counts across times to establish a least-squares estimation procedure. Pasanisi et al's procedure is only relevant when the data consist of a series of independent cross-sectional surveys at different time points all assumed to come from different units. An MCMC or simulation based approach would be necessary to compute the exact likelihood or posterior distribution in the true aggregate data case, which the authors did not pursue. However, the gain in efficiency compared to the least-squares approach is probably not worth the trouble except for very small counts. Crowder and Stephens (2011) pursued an approach based on matching the coefficients of the probability generating function of the aggregate counts.

No comments: