van de Kassteele, Hoogenveen, Engelfriet, van Baal and Boshuizen have a new paper in Statistics in Medicine. This considers the estimation of the transition probabilities in a non-homogeneous discrete-time Markov model, when the only available information is cross-sectional data, i.e. for each time (or age) we have only a sample of individuals and their state occupancy from which the prevalence at that time can be estimated. Note this type of observation is more extreme than aggregate data, considered for instance by Crowder and Stephens, where we only have prevalences at a series of times but the state occupation counts correspond to the same set of subjects.

The authors take a novel, if slightly quirky approach, to estimation. They firstly use P-splines to smooth the observed prevalences. Having obtained these they then need to translate them into transition probabilities. This is not straightforward since there are more parameters to estimate than degrees of freedom. To get around this problem the authors restrict their estimate to be the values that minimize a transportation problem. Essentially this assigns a "cost" to transitions, penalizing those to further apart states and giving zero cost to remaining in the same state. So gives a solution that aims to maximize the diagonals of the transition probability matrices whilst constraining the prevalences to take their P-spline smoothed values.

What is absent from the paper is formal justification for the approach. Presumably a similar outcome could be achieved by applying a penalized likelihood approach, possibly formulating the problem in continuous time and setting the penalty to be the magnitude of the transition intensities (and possibly their derivatives). However, this would require some calibration to choose the penalty weights and it is not clear how this would be done (the usual approach of cross-validation would not work here).

Subscribe to:
Post Comments (Atom)

## No comments:

Post a Comment