Saturday, 12 February 2011

Flexible Nonhomogeneous Markov models for panel observed data

Andrew Titman has a new paper in Biometrics. This develops an approach to fitting non-homogeneous Markov models to panel data by use of direct numerical solution of the Kolmogorov forward equations. Existing methods to non-homogeneity have concentrated on special cases where the forward equations have matrix analytic solutions (i.e. piecewise constant intensities or time transformation models), although numerical solutions have been used in Bayesian analyses mainly to accommodate use in WinBUGS (see e.g. Welton and Ades or Pan and Chen). The approach is clearly somewhat more computationally intensive than matrix analytic methods. However, a couple of computational tricks are used to improve the situation. In particular a Fisher scoring algorithm is maintained by solving an extended system of ODEs incorporating the first derivatives of the transition probabilities with respect to the model parameters. The real point at which the method struggles is when there are continuous covariates because a separate ODE must be integrated for each covariate value in the data. For large datasets an exact approach becomes untenable. An approximate method is proposed in these situations where a clustering algorithm is used to reduce the number of unique covariate patterns and then each patient is assumed to have covariate pattern equal to the mean value within their cluster. This approach gives pretty close results to the exact method for 10 clusters and is even better for 50 or 100 covariate values where the method is often still practical.

B-spline functions for the transition intensities based on a known set of knot points are proposed. This gives a model which can viewed both as a generalization of the flexible time transformation approach of Hubbard et al, and also as a smooth alternative to piecewise constant intensities. One downside of the great flexibility is that it is quickly possible to run into models with identifiability problems and singular Fisher information matrices. Titman proposes to limit the spline to a maximum time and assume constant intensities beyond this range. Similarly, he suggests there often wont be enough data to allow inhomogeneity on all transition intensities and the method is thus most useful when one or two intensities are of most interest. For the CAV data analyzed, disease onset is of most importance and the method performs better at picking up the increasing hazard than a time transformation model that requires the inhomogeneity to be proportional between intensities.

As part of the supplementary materials some fairly general R code, working in conjunction with the R package deSolve for the ODE solver, is provided. This is flexible in allowing user defined forms for the generator matrix, but there is no easy interface for someone to specify they want a 5-state model with a certain set of allowable intensities and inhomogeneity in a particular set of states, in the same way as for piecewise constant intensities in msm say.

No comments: