## Friday, 7 September 2012

### Effect of vitamin A deficiency on respiratory infection: Causal inference for a discretely observed continuous time non-stationary Markov process

Mingyuan Zhang and Dylan Small have a paper to appear in The Canadian Journal of Statistics, currently available here. The paper uses a multi-state model approach to obtain estimates of the causal effects of vitamin D deficiency on respiratory infection.

The observed data consist of observations of respiratory infection status, vitamin D deficiency status and whether the child is stunted at time t. Each of these is a binary variable, leading to 8 possible observation patterns.

The data are assumed to be generated from an underlying non-homogeneous Markov chain on 32 states, consisting of a latent 4-level definition of vitamin D deficiency, the stunting variable, the observed respiratory infection status and additionally a counter-factual respiratory infection status defined as the status at time t hat would have occurred had the child maintained the lowest level of vitamin D deficiency from time 0 to time t.

Let $\inline Y_t, A^{*}_t, Y_t^{\bar{A}^{*}_{t-},0}$ represent the observed infection status, the underlying vitamin deficiency and the counter-factual infection status, the assumed relationship between them is given by $P(Y_t = 1 | Y_t^{\bar{A}^{*}_{t-},0} =1 , A^{*}_t) = 1$, $P(Y_t = 1 | Y_t^{\bar{A}^{*}_{t-},0} =0 , A^{*}_t = 0) = 0$ and $P(Y_t = 1 | Y_t^{\bar{A}^{*}_{t-},0} =0 , A^{*}_t = j) = \delta_{j}$ for j>0. $\delta_j$ then measures the additional risk of having a respiratory infection at a particular time, given a current vitamin D deficiency at level j>0.

The overall model is a pretty innovative use of a hidden Markov model structure to obtain those causal estimates. The true process is assumed to occur in continuous time. However, it is desired that the underlying transition intensities are not time constant. As a result, the authors choose to approximate the process by one in discrete time (with some similarities to the approach of Bacchetti et al 2010).

In practical terms, the weakness of the model seems to be the assumption that the relative effect of a current vitamin deficiency compared to a perfect record of vitamin D levels, is both constant in time and does not depend on the past history of vitamin D deficiency. The latter assumption is essentially the Markov assumption and would be quite difficult to relax. The observed infection status is effectively a misclassified version of the counter-factual infection status. As a result the former assumption could be relaxed by letting the $\delta_j$ depend on time, perhaps in a piecewise constant fashion.

The definition of the process as initially being continuous is a little artificial and ill-specified in places. For instance, the transition intensities are defined in terms of a logit transformation from the outset. Also, it is stated that the underlying 32 state process (including both of $\inline Y_t ,Y_t^{\bar{A}^{*}_{t-},0}$) is a continuous-time Markov process. However if $Y_t$ is defined only through the misclassification equations, there is no limiting intensity for $P(Y_{t + \Delta t} | X_t)$ as $\Delta t \rightarrow 0$. Once the process is in discrete time this is not a problem, but it would be more sensible to define the underlying process in continuous time to be 16 states not including $Y_t$ and then specify that the observed $Y_t$ are observations in a hidden Markov model.