Tuesday 29 June 2010

A multistate model for events defined by prolonged observation

Vern Farewell and Li Su have a new paper in Biostatistics. This models remission in psoriatic arthritis, based on panel data assessing joints at clinic visits. Existing models use a two-state model. However, a spell in remission should last a discernible amount of time. Rather than specify an artificial length of time (e.g. 6 months) e.g. a guarantee time in a state, Farewell and Su adopt a model with two states referring to remission: they refer to these as "early stage remission" and "established remission". It is assumed an individual must progress through both early and established remission before returning to active disease. A subject is observed to be in early stage remission if they have no active joints at a visit not preceded by at least 2 other zero count visits. They are in established remission if there is a zero count and at least 2 previous zero counts. State misclassification is allowed in the model through misclassification of the active disease count, i.e. patients may have 0 active joints without being in remission. It is assumed that misclassification to the early stage remission is possible but not to the established remission stage. In the example the misclassification probability is also allowed to depend on whether the previous observed count was zero or not.

The basic problem with the method is that having the states defined by the pattern of previous observations means that it is essentially impossible for the observed data (in terms of the three-state model) to come from the claimed Markov model: In the observed data, an established remission stage must be preceded by two early stage remission observations. Yet the actual Markov model allows the passage time from active disease to established remission to be arbitrarily close to zero (e.g. just the sum of two independent exponential - or perhaps piecewise exponential - distributions). As a result it is not clear how to interpret the resulting transition intensity estimates since the estimated process will not reproduce the original data.

Misclassification is effectively dealt with twice in the model. Firstly in an ad hoc way through rules on what early and established remission are. Then by allowing these observed states to have classification error over some true states. But in fitting a hidden Markov model the misclassification is assumed independent conditional on the underlying state. There is then an inherent contradiction because on the one hand the model says P(Observed zero | Active disease) >0, but at the same time P(Observed zero and two previous zero | Active disease) =0.

An approach using a guarantee time (e.g. Kang and Lagakos (2007)) or perhaps an Erlang distribution through latent states would be far more satisfactory even if it might require "special software". Potentially the guarantee time could be dependent on covariates.

No comments: