Monday, 6 February 2012

A mixed non-homogeneous hidden Markov model for categorical data, with application to alcohol consumption

Antonello Maruotti and Roberto Rocci have a new paper in Statistics in Medicine. This develops a hidden Markov model for modelling longitudinal data on alcohol consumption in discrete time. The observed data are taken to consist of a three-level ordinal variable denoting whether no drinking (0 drinks), light drinking (1–c drinks), and intense drinking (c+ drinks) occurred in that period of time. The model considered is both time non-homogeneous and mixed, in the sense that there is additional patient level heterogeneity after accounting for covariates. Rather than specifying a continuous distribution for the random effects, the authors adopt the non-parametric mixing distribution approach. Computationally, a finite mixture random effect is much simpler than having a continuous random effect if the random effect is multi-dimensional. However, computation of the full non-parametric maximum likelihood estimate of the mixing distribution is not in itself straightforward. The authors adopt the approach of Aitkin (Statistics and Computing, 1996) which is essentially to work up from a small number of components, performing an EM-algorithm at the fixed level of mixture components. EM based approaches to obtaining the NPMLE of a mixing distribution are known to perform badly and approaches using directional derivatives are preferred (see for instance Wang 2007, JRSS B). The best model, with m components, is assumed to have been reached once taking m+1 components does not produce a better model in terms of AIC or BIC. The main issue with this approach is that the EM algorithm is typically very sensitive to the initial parameter values chosen and prone to fail to find a global maximum. A further danger with these models is to ascribe too great a physical significance to the mixture components estimated.

To reach the final model, choices have to be made regarding: the categorization for the responses (observed level of drinking), the latent Markov states for the HMM (e.g. 2, 3 or 4 latent states), the number of mixture components for the random effect (how many "archetypes" of longitudinal behavior) and the degree of time non-homogeneity of transition probabilities. As a result, while the model is likely to explain the observed data reasonably well, a leap of faith is required to believe the model is an accurate representation of the process of binge drinking/alcoholism.

No comments: