Friday, 7 January 2011

Multi-State Models for Panel Data: The msm Package for R

The final paper in the special issue in Journal of Statistical Software is by Chris Jackson and is about his package msm. msm has been around for many (over 8) years and has steadily built up functionality over time. As noted by Hein Putter in his introduction, msm is different from the other packages featured in the issue in that it deals with panel observed/interval censored data and concentrates on parametric models. In particular, hidden Markov models with both discrete and continuous responses may be fitted.

The paper covers much old ground, the early part repeating similar themes from Jackson & Sharples 2002 (Statistics in Medicine) and Jackson et al 2003 (JRSS D), the section on model diagnostics closely follows Titman and Sharples 2010 (SMMR). One error in msm is in the implementation of prevalence counts/plots. It is made clear by Gentleman et al (1994, Stat Med) and Titman and Sharples that the denominator for the counts at time t is the number "under observation". In particular, subjects who reach the absorbing state should stay under observation only until the time at which they would have been censored. msm assumes subjects who reach the absorbing state remain under observation indefinitely which leads to overestimation of the empirical prevalence in the absorbing state(s). This can be seen clearly in the bottom-right panel of figure 4 of the paper that is suggesting spurious lack of fit. Patients either need to be removed from observation at a known administrative censoring time, or else the censoring distribution needs to be empirically estimated to allow time-dependent weighting of dead patients.

In Section 6 Jackson discusses model extensions which are generally not available in the package, but seems to suggest that to maintain generality these extensions, such as a wider range of time inhomogeneous models, random effects models and semi-Markov models, will not be incorporated into msm.

In general msm is a very good package and lots of effort (e.g. C programming, use of first derivatives, direct coding of transition probabilities for more basic model structures) has been put in to provide a fast performance. However, some improvement in computation speed is no doubt possible since msm uses the BFGS algorithm in optim to fit models rather than applying the well-known Fisher scoring algorithm.

No comments: