Saturday 28 May 2011

Lie Markov Models

Jeremy Sumner, Jesus Fernandez-Sanchez and Peter Jarvis have a paper recently made available at thehttp://www.blogger.com/img/blank.gif Arxiv. The paper is theoretical in nature but gives an interesting application of group theory to Markov models.

The practical problem addressed is determining the conditions under which a non homogeneous continuous time Markov model can be represented by a "time averaged" homogeneous Markov model, i.e. what constraints are required to ensure a rate matrix exists such that

for rate matrices . This has application for phylogenetic methods, where typically a single rate matrix is fitted to an evolutionary history. However, rates may in fact change over time. The question is then under what conditions could the single rate still be in some sense valid in summarising the time averaged process.

The authors show that the model for Q must be a Lie algebra. They also give the possible model forms for three and four state models under symmetry constraints. Update: This paper has now been published in the Journal of Theoretical Biology.

Tuesday 10 May 2011

A proportional hazards regression model for the subdistribution with right-censored and left-truncated competing risks data

Xu Zhang, Mei-Jie Zhang and Jason Fine have a new paper in Statistics in Medicine. This covers the same ground as the paper by Geskus in Biometrics, in developing an approach to fitting the Fine-Gray proportional subdistribution hazard model for competing risks data with left truncated and right censored observations by using inverse probability weights (IPW). Bizarrely, the paper makes no reference at all to the Geskus paper. Presumably this is because the paper was first submitted in 2009 before Geskus's work was published (April 2010). However, it is strange that neither the authors nor the referees became aware of the work in the interim (i.e. acceptance of the paper wasn't until March 2011).

What is interesting is the differences between the approach taken in this paper compared to Geskus. The authors work on the basis that since X = min(T,C) is only observable if X > L, where T is the time of failure, L the time of left truncation and C the time of right censoring, the IPW should be calculated conditional on L < X. Zhang et al use a stabilised weight rather than the IPW to reduce the variability in the original weight. The weights they derive seem quite different to Geskus's as they depend on an estimate of overall survival, which will have to depend on the covariates if the semi-parametric model for the subdistribution hazard is to apply.
The authors suggest using Aalen additive hazard models for the overall survival (thus allowing for time varying covariate effects that can ensure the weights are consistent with the proportional subdistribution hazard model).

Zhang et al start from the general case where the truncation and censoring distributions depend on covariates (but are independent conditional on these covariates), though they only detail non-parametric estimates of the weights. Geskus argued that even if the censoring/truncation distribution depended on covariates that didn't imply it was necessary to include these covariates in the weightings.

Given these discrepancies it would be of interest to contrast and compare the two approaches to the same problem. If both approaches are effective, Geskus's seems preferable because the weights are much easier to calculate.

Wednesday 4 May 2011

Discrete-time semi-Markov modeling of human papillomavirus persistence

Mitchell, Hudgens, King, Cu-Uvin, Lo, Rompalo, Sobel and Smith have a new paper in Statistics in Medicine. This considers a non-parametric estimator for a 2-state discrete-time semi-Markov process. Following Kang and Lagakos who assume one of the two states (e.g. state 0) is Markov, the process can be characterised by

where is the probability of making a transition from 1 to 0 given i time units spent in state 1,
is the maximum length of state sequence observable in the data and
Extensions to the model, allowing to depend on time in state and to allow an additional disease-free state corresponding to disease-free with no past disease, are also proposed. Missing observations can be dealt with by summing over all possible observed states at the missing times. Estimation of is by maximum likelihood. An acknowledged limitation is the inability to cope with the case of unknown initiation times if either the sojourn distribution of state 0 is non-geometric or observation can start in the disease state.

Of particular interest to the authors is an estimate of 'persistence' of the disease state. This is defined as spending j time units in the disease state, counting single disease free (negative) observations surrounded by positive observations as time spent in the disease. The probability of persistence is just a function of the transition probabilities and so readily estimable.

The authors claim that their discrete time model doesn't not require a "guarantee time" unlike Kang and Lagakos. This is obviously ridiculous, the discrete time model requires a guarantee time of 1 time unit for all transitions! While adopting a discrete time model simplifies the problem of inference to something quite trivial, one has to question how realistic it is to model something that is clearly a continuous time process as discrete time. Bachetti et al's more general approach is along similar lines. Similarly, while the estimation is nominally non-parametric, the discrete time assumption is in many respects more severe than, say, constraining sojourn distributions to be Weibull distributed.

The clinical definition of persistence which makes the assumption that a negative observation between two positive observations counts as a positive is easily accommodated for via the discrete time model. However, a more satisfactory approach would be to adopt a more formal definition, based in continuous time, e.g. persistence if disease free period is less than say 6 months. This would have parallels with the approach taken by Mandel (2010) in defining a hitting time in terms of having a sojourn of more than some length in the disease state. Farewell and Su also dealt with a similar problem but their approach seems to be best avoided.