Multi-state modelling: semi-Markov

Showing posts with label semi-Markov. Show all posts

Friday, 26 October 2012

Estimating parametric semi-Markov models from panel data using phase-type approximations

Andrew Titman has a new paper in Statistics and Computing. This extends previous work on fitting semi-Markov models to panel data using phase-type distributions. Here, rather than assume a model in which each state is assumed to have a Coxian phase-type sojourn distribution, the model assumes a standard parametric sojourn distribution (e.g. Weibull or Gamma). The computational tractability of phase-type distributions is exploited by approximating the parametric semi-Markov model by a model in which each state has a 5-phase phase-type distribution. In order to achieve this, a family of approximations to the parametric distribution, with scale parameter 1, is developed by solving a relatively large one-off optimization assuming the optimal phase-type parameters for given shape parameters evolve as B-spline functions. The resulting phase-type approximations can then be scaled to give an approximation for any combination of scale and shape parameter and then embedded into the overall semi-Markov process. The resulting approximate likelihood appears to be very close to the exact likelihood, both in terms of shape and magnitude.

In general a 2-phase Coxian phase-type model is likely to give similar results to a Weibull or Gamma model. The only advantages of the Weibull or Gamma model are that the parameters are identifiable under the null (Markov) model so standard likelihood ratio tests can be used to compare models (unlike for the 2-phase model). Also the Weibull or Gamma model requires fewer parameters so may be useful for smaller sample sizes.

Wednesday, 22 August 2012

Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring

Ronald Butler and Douglas Bronson have a new paper in Biometrika. This develops approaches to non-parametric estimation of survival or hitting time distributions in multi-state semi-Markov models with particular emphasis on cases where (multiple) backward transitions are possible. The paper is an extension of the authors' previous JRSS B paper. Here the methodology additionally allows for right-censored observations.

The key property used throughout is that, under a semi-Markov model, the data can be partitioned into separate independent sets relating to the sojourn and destinations for each state.
A straightforward, but computationally intensive, approach to finding bootstrap confidence bands for the survival distribution of a semi-Markov process involves a nested re-sampling scheme, first sampling with replacement from the set of pairs of sojourn times and destination states for each state in the network, and for each set of data simulating a large number of walks through the network (starting from the initial state and ending at the designated 'hitting' state) from the empirical distributions implied from the first layer of re-sampling. An alternative, faster approach proposed involves replacing the inner step of the bootstrap with a step which computes saddlepoint approximations of the empirical sojourn distributions and then using flowgraph techniques to find the implied hitting time distributions. A drawback of both approaches is that they require all the mass of the sojourn distributions to be allocated. Hence, in the presence of right-censoring some convention for the mass at the tail of the distribution must be made. Here, the convention of "redistribute-to-the-right" is used, which effectively re-weights the probability mass of all the observed survival times so that it sums to 1. On the face of it this seems a rash assumption. However, right-censoring in the sample only occurs at the end of a sequence of sojourns in several states. As such, in most cases where this technique would be used, the censoring in any particular state is likely to be quite light, even if the overall observed hitting times are heavily censored. Alternative conventions on the probability mass (for instance assuming a constant hazard beyond some point) could be made, but in all cases would be arbitrary, but hopefully would have little impact on the overall estimates.

Unlike non-parametric estimates under a Markov assumption, for which the overall survival distribution will essentially equal the overall Kaplan-Meier estimate, with increasing uncertainty as time increases, under a (homogeneous) semi-Markov assumption the estimated hazard tends to a constant limit and can hence be estimated to a relatively large degree of precision at arbitrarily high times.

Saturday, 18 August 2012

A semi-Markov model for stroke with piecewise-constant hazards in the presence of left, right and interval censoring

Venediktos Kapetanakis, Fiona Matthews and Ardo van den Hout have a new paper in Statistics in Medicine. This develops a progressive three-state illness-death model for strokes with interval-censored data. The proposed model is a time non-homogeneous Markov model. The main approach to computation is to assume a continuous effect of age (time) on transition intensities but to use a piecewise constant approximation to actually fit it. The intensity to death from the stroke state additionally depends on the time since entry into the state (i.e. age at stroke) and since the exact time is typically unknown, it is necessary to numerically integrate over the possible range of transition times (here using Simpson's rule).

The data include subjects whose time to stroke is left-censored because they have already suffered a stroke before the baseline measurement. The authors state that they cannot integrate out the unknown time of stroke because the left-interval (i.e. the last age at which the subject is known to have been healthy) is unknown. They then proceed to propose a seemingly unnecessary ad-hoc EM-type approach based on estimating the stroke age for these individuals, which requires the arbitrary choice of an age $\inline A_0$ at which it can be assumed the subject was stroke free. However, surely if we can assume a $\inline A_0$ , we can just use as the lower limit in the integral for the likelihood?

The real issue seems to be that all subjects are effectively left-truncated at the time of entry into the study (in the sense that they are only sampled due to not having died before their current age). For subjects who are healthy at baseline this left-truncation is accounted for by just integrating the hazards of transition out of state 1 from their age at baseline rather than age 0. For subjects who have already had a stroke things are more complicated because the fact they have survived provides information on the timing of the stroke (e.g. if stroke increases hazard of death, the fact they have survived implies the stroke occurred sooner than one would assume if no information on survival were known). Essentially the correct likelihood is conditional on survival to time $\inline A_b$ and so the unconditional probability of the observed data needs to be divided through by the unconditional probability of survival to time $\inline A_b$ . For instance, in their notation, a subject in state 2 at baseline censored at time $\inline A_N$ should have likelihood contribution: $\frac{\int_{A_0}^{A_b} S_1(A_0 , W | \lambda_1)q_{12}(W)S_2(W,A_N | \lambda_2) dW}{S_1(A_0,A_b | \lambda_1) + \int_{A_0}^{A_b} S_1(A_0 , W | \lambda_1)q_{12}(W)S_2(W,A_b | \lambda_2) dW}$ The authors claim that their convoluted EM-approach has "bypassed the problem of left truncation". In reality, they have explicitly corrected for left-truncation (because the expected transition time is conditional on being in state 2 at baseline) but in a way that is seemingly much more computationally demanding than directly computing the left-truncated likelihood would be.

Wednesday, 11 July 2012

Bayesian analysis of a disability model for lung cancer survival

Armero, Cabras, Castellanos, Perra, Quirós, Oruezábal and Sánchez-Rubio have a new paper in Statistical Methods in Medical Research. This develops a Bayesian three-state illness-death type model to the progression and survival of lung cancer patients. The data considered are assumed to be complete up to right censoring (in reality there may be some interval censoring but the authors argue the patients can be considered `quasi-continuously' followed. A Weibull semi-Markov model is assumed for the transition intensities and covariates are accommodated via an accelerated failure time model (it's worth noting that for a Weibull distribution the proportional hazard and accelerated failure time models are equivalent up to reparameterization). A feature of the dataset used is the rather small sample size (35 patients) which is perhaps the strongest reason for taking a parametric and Bayesian approach in this case.

Friday, 6 July 2012

Fitting and Interpreting Continuous-Time Latent Markov Models for Panel Data

Jane Lange and Vladimir Minin have a paper currently available as a University of Washington Working paper. This considers the computational aspects of fitting a latent continuous time Markov model to multi-state interval censored or panel data. The latent continuous time Markov model proposed is essentially a generalization of the phase-type semi-Markov models considered by Titman and Sharples. It is assumed that for each observable state r, there exist $\inline s_r$ latent states: $r_{1}, \ldots, r_{s_r}$ Observation in state r can correspond to occupation into any one of these latent states. This induces a type of hidden Markov model and can allow the observed process to be semi-Markov. Lange and Minin propose an EM algorithm to fit the model and compare this with other possible optimization techniques, concluding that their EM algorithm generally performs better, both in terms of reliability and speed of convergence. On this issue, it might have been worth considering the performance of, say, L-BFGS-B (or other constrained optimization methods) using the natural [0,Inf) range for the transition intensities, because many convergence problems for BFGS are due to cases where the true value is near (or at) zero and the exponential parameterization means it never gets there. **Update: This paper is now published in Statistics in Medicine.

Wednesday, 20 June 2012

Investigating hospital heterogeneity with a multi-state frailty model

Benoit Liquet, Jean Francois Timsit and Virginie Rondeau have a new paper in BMC Medical Research Methodology. They consider data on the evolution of patient's status for patients in intensive care units. The data originate from 16 distinct ICUs and there is therefore inherent clustering in the data. A progressive four-state model, with admission as state 0, ventilator-associated pneumonia infection (VAP) as state 1, death as state 2 and discharge as state 3 is considered. To account for the clustering, shared Gamma frailty terms are included in the intensities for particular transitions. To maintain simplicity of the method the authors consider cases where either each transition intensity has a ICU related frailty that is assumed independent of frailties for other intensities, or where there are individual level frailties that act in common across intensities. In each, there is only one level of clustering (i.e. either independent frailties on each intensity common to each centre or a subject specific intensity affecting multiple intensities). In the first case, a non-homogeneous Markov or semi-Markov allows separate models to be fitted for each transition using methods applicable to univariate survival analysis. In the second case, multiple intensities can be fitted in a single survival model by specifying separate "strata" for each intensity. In either case the presence of only one level of clustering makes estimation, at least if a Gamma frailty is assumed, relatively straightforward. The authors use the R package frailtypack which Rondeau maintains. This allows fully parametric models (via either Weibull or piecewise constant intensities) or semi-parametric models using spline intensities fitted using penalized likelihood. The emphasis on simple(r) approaches, for which existing software exist, is a good one. But some mention of the existing capacity of the more general survival package to fit Cox models with gamma frailties by using frailty() in the formula.

The wider issue of what to do in the case of nested frailties, i.e. where there are multiple layers of clustering, is an interesting one, but is not discussed which is surprising given that frailtypack has some capability for fitting such models. It is also questionable whether the assumption of independent frailties across intensities is a realistic one. To some extent this doesn't matter because if the data are Markov or semi-Markov conditional on the frailties, then even if the frailties are dependent assuming independence should produce unbiased estimates of the marginal frailty distribution (provided it is Gamma). Similarly estimates of the individual frailties should be reasonably unbiased (empirical Bayes estimates aren't fully unbiased anyway) but will be inefficient if frailty terms across intensities are actually correlated. It would be relatively straightforward to look at the correlation in the empirical Bayes estimates of the frailties associated with the intensities under the independence model as a semi-informal diagnostic. The difficult part would be fitting the multivariate frailty model if the independence model appeared inadequate.

Monday, 28 November 2011

Frailties in multi-state models: Are they identifiable? Do we need them?

Hein Putter and Hans van Houwelingen have a new paper in Statistical Methods in Medical Research. This reviews the use of frailties within multi-state models, concentrating on the case of models observed up to right-censoring and primarily looking at models where the semi-Markov (Markov renewal) assumption is made. The paper clarifies some potential confusion over when frailty models are identifiable. For instance, for simple survival data with no covariates, if a non-parametric form is taken for the hazard then no frailty distribution is identifiable. A similar situation is true for competing risks data. Things begin to improve once we can observe multiple events for each patient (e.g. in illness-death models), although clearly if the baseline hazard is non-parametric, some parametric assumptions will be required for the frailty distribution. When covariates are present in the model, the frailty term has a dual role of modelling dependence between transition times but also soaking up lack of fit in the covariate (e.g. proportional hazards) model.

The authors consider two main approaches to fitting a frailty; assuming a shared Gamma frailty which acts as a multiplier on all the intensities of the multi-state model and a two-point mixture frailty, where there are two mixture components with a corresponding set of multiplier terms for the intensities for each component. Both approaches admit a closed form expression for the marginal transition intensities and so are reasonably straightforward to fit, but the mixture approach has the advantage of permitting a greater range of dependencies, e.g. it can allow negative as well as positive correlations between sojourn times.

In the data example the authors consider the predictive power (via K-fold cross-validation) of a series of models on a forward (progressive) multi-state model. In particular they compare models which allow sojourn times in later states to depend on the time to first event, with frailty models and find the frailty model performs a little better. Essentially, these two models do the same thing and as the authors note, a more flexible model for the effect of time of first event on the second sojourn time may well give a better fit than the frailty model.

Use of frailties in models with truly recurrent events seems relatively uncontroversial. However for models where few intermediate events are possible their use, rather than some non-homogeneous model allowing dependence both on time since entry to the state and time since initiation, the choice is largely related to either what is more interpretable in terms of the application at hand or possibly what is more computationally convenient.

Thursday, 20 October 2011

Estimation of the Expected Number of Earthquake Occurrences Based on Semi-Markov Models

Irene Votsi, Nikolaos Limnios, George Tsaklidis and Eleftheria Papadimitriou have a new paper in Methodology and Computing in Applied Probability. In the paper major earthquakes in the Northern Aegean Sea are modelled by a semi-Markov process. Earthquakes of above 5.5 in magnitude are categorized into three groups, [5.5,5.6], [5.7,6.0] and [6.1,7.2] (7.2 being the highest record event in the dataset). The model assumes that probability the next earthquake is of a particular category depends only on the category of the current earthquake (embedded Markov chain), and given a particular transition i->j is to occur, the waiting times between events has some distribution f_{ij}(t). Hence the inter-event times will follow a mixture distribution dependent on i. Note that the process can renew in the same state (i.e. two earthquakes of similar magnitude can occur consecutively).
The dataset used is quite small, involving only 33 events.
The observed matrix of transitions is
$T = \begin{bmatrix} 6 &6 &3 \\ 5& 2 & 2\\ 4 & 1 & 3 \end{bmatrix}$
Clearly something of basic interest would be whether there is any evidence of dependence on the previous earthquake type. A simple $\inline \chi^2$ test gives 2.55 on 2 df suggesting no evidence against independence.
The authors instead proceed with a fully non-parametric method which uses the limited number of observations per transition type (i.e. between 1 & 6) to estimate the conditional staying time distributions. While a semi-Markov process of this type may well be appropriate for modelling earthquakes, the limited dataset available is insufficient to get usable estimates.

There are some odd features to the paper. Figure 3 in particular is puzzling as the 95% confidence intervals for the expected number of magnitude [6.1,7.2] earthquakes are clearly far too narrow. It also seems strange that there is essentially no mention of the flowgraph/saddlepoint approximation approach to analyzing semi-Markov processes of this nature.

Friday, 29 July 2011

Parametric inference for time-to-failure in multi-state semi-Markov models: A comparison of marginal and process approaches

Yang Yang and Vijayan Nair have a new paper in Canadian Journal of Statistics as part of the Kalbfleisch and Lawless special issue. The paper considers inference for progressive semi-Markov processes for complete, right-censored and interval-censored data for parametric models with Gamma or inverse Gamma sojourn distributions. The advantage of considering Gamma or inverse Gamma distributions is that their convolutions have a closed form meaning the overall failure (absorption) time distribution is available in closed form. Inference using only the time-to-failure data is therefore relatively straightforward. The main focus of the paper is investigating the loss in efficiency of only using time-to-failure data when additional data on intermediate states (either through continuous observation or panel data) are available. The authors show that the loss in efficiency can be rather substantial, particularly if interest lies in making predictions about time to failure given an existing process history rather than just mean or median survival. It is therefore suggested that information on intermediate states should be incorporated where possible despite this presenting computationally difficulties in the case of panel data.

Wednesday, 4 May 2011

Discrete-time semi-Markov modeling of human papillomavirus persistence

Mitchell, Hudgens, King, Cu-Uvin, Lo, Rompalo, Sobel and Smith have a new paper in Statistics in Medicine. This considers a non-parametric estimator for a 2-state discrete-time semi-Markov process. Following Kang and Lagakos who assume one of the two states (e.g. state 0) is Markov, the process can be characterised by
$\mathbf{p} = (p_{01},p_{10}(2),p_{10}(1),\ldots,p_{10}(n_t - 1),p_{1+}(n_t))$
where $\inline p_{10}(i)$ is the probability of making a transition from 1 to 0 given i time units spent in state 1,
$\inline n_t$ is the maximum length of state sequence observable in the data and $p_{1+}(n_t) = 1 - \sum_{i=1}^{n_t - 1} p_{10}(i).$
Extensions to the model, allowing $\inline p_{01}$ to depend on time in state and to allow an additional disease-free state $\inline 0^{*}$ corresponding to disease-free with no past disease, are also proposed. Missing observations can be dealt with by summing over all possible observed states at the missing times. Estimation of $\inline \mathbf{p}$ is by maximum likelihood. An acknowledged limitation is the inability to cope with the case of unknown initiation times if either the sojourn distribution of state 0 is non-geometric or observation can start in the disease state.

Of particular interest to the authors is an estimate of 'persistence' of the disease state. This is defined as spending j time units in the disease state, counting single disease free (negative) observations surrounded by positive observations as time spent in the disease. The probability of persistence is just a function of the transition probabilities $\inline \mathbf{p}$ and so readily estimable.

The authors claim that their discrete time model doesn't not require a "guarantee time" unlike Kang and Lagakos. This is obviously ridiculous, the discrete time model requires a guarantee time of 1 time unit for all transitions! While adopting a discrete time model simplifies the problem of inference to something quite trivial, one has to question how realistic it is to model something that is clearly a continuous time process as discrete time. Bachetti et al's more general approach is along similar lines. Similarly, while the estimation is nominally non-parametric, the discrete time assumption is in many respects more severe than, say, constraining sojourn distributions to be Weibull distributed.

The clinical definition of persistence which makes the assumption that a negative observation between two positive observations counts as a positive is easily accommodated for via the discrete time model. However, a more satisfactory approach would be to adopt a more formal definition, based in continuous time, e.g. persistence if disease free period is less than say 6 months. This would have parallels with the approach taken by Mandel (2010) in defining a hitting time in terms of having a sojourn of more than some length in the disease state. Farewell and Su also dealt with a similar problem but their approach seems to be best avoided.

Monday, 21 March 2011

Joint model with latent state for longitudinal and multistate data

Dantan, Joly, Dartigues and Jacqmin-Gadda have a new paper in Biostatistics. There is a wide literature on joint models of survival with longitudinal data with some extensions for joint competing-risks survival and longitudinal data (e.g. Elashoff et al). Dantan et al develop a joint model for multi-state survival data and longitudinal data. The standard approach to these joint models is to have a random effect that it shared across the model for the survival data (e.g. appearing as a frailty) and the longitudinal model (e.g. random slopes and intercepts in a generalized linear mixed model). Rather than follow this convention, Dantan et al allow a slightly more direct correspondence between the survival and longitudinal parts. Specifically, they have a progressive 4-state multi-state model relating to healthy, pre-diagnosis, illness and death states. The pre-diagnosis state is unobservable and entry into this state corresponds to the time a which the slope of decline in the longitudinal biomarker changes. The model is an extension on the random change-point model proposed by Jacqmin-Gadda et al (Biometrics, 2006), as here it is not necessary to make unrealistic assumptions about death being non-informative censoring for illness.

For the PAQUID dataset on cognitive decline, the baseline transition intensities are of Weibull form for healthy to pre-diagnosis and pre-diagnosis to illness (the latter being Weibull w.r.t time since pre-diagnosis). The hazard of death is assumed to depend only on age (via piecewise constant intensities) and the value of the longitudinal marker, Y(t), but not explicitly on the current disease state. This assumption seems to be primarily for computational reasons.

A limitation of the approach taken in the paper is that dementia is only diagnosed at clinic visits, so in effect is interval censored between the current and last clinic visit. The authors just assume entry into the dementia state occurred at the midpoint between clinic visits. Through simulation in the supplementary materials the authors show this doesn't cause serious bias.
However, a further issue is that the dependency of the dementia age on the observed value of the marker at a clinic visit would presumably mean the assumption of independence between dementia age and observation errors conditional on the random effects (slopes, intercepts and change-time) would be inappropriate. To what extent is the apparent increase in the decline of cognition before diagnosis due to such an artifact?

Monday, 21 February 2011

Testing Markovianity in the Three-state Progressive Model via future-past Association

Mar Rodríguez-Girondo and Jacobo de Uña Álvarez have a paper currently available as a Universidade de Vigo Discussion paper in Statistics and Operations Research. They develop a test for the Markov property in a progressive three-state model subject to continuous observation up to right-censoring. The test is based on calculating Kendall's Tau at each time point t, which involves calculating the difference between the concordance and discordance probabilities for two pairs of (Z,T) where Z=sojourn in state 1, T=time to entry in state 3, given both subjects are in state 2 at time t, i.e. Z <= t < T. If the Markov property holds we expect Kendall's Tau to stay at around zero. For a non-Markov process tau would vary with time away from 0. An estimator for Tau is developed and a bootstrap resampling algorithm is proposed to estimate a p-value for tau at a fixed time point t, based on independently sampling Z and T from their empirical distributions. A trace of p-values at a grid of time points can then be produced.

Unlike the Cox-proportional hazards approach where a single statistic is produced, here the p-value varies depending on the choice of t chosen. A superior power to the Cox-PH approach was obtained in simulations but only for a good choice of t. An omnibus statistic based on some weighted integral of the absolute value (or square) of tau over the observation range would be useful.

The paper only deals with the progressive 3-state case. It's unclear how the method would be extended to incorporate complicated past history in a more general multi-state model. However, there is scope to extend it to testing whether a particular state within a model has semi-Markov dependencies. In its current form, while the test is an interesting concept, using a simple Cox-PH seems a much more attractive prospect in practice. Update: This paper is now published in Biometrical Journal.

Friday, 7 January 2011

mstate: An R Package for the Analysis of Competing Risks and Multi-State Models

The sixth paper in the special issue of Journal of Statistical Software is by Liesbeth de Wreede, Marta Fiocco and Hein Putter and is about the R package mstate. A journal article on the package already exists in Computer Methods and Programs in Biomedicine. However, while that paper primarily dealt with theoretical aspects, the current paper is largely a case-study example based on a 6 state model for leukemia patients after bone marrow transplantation.

mstate uses the existing survival package to fit the Cox proportional hazards models. Much of the infrastructure of mstate is in functions for preparing data to be in the correct form. To use mstate for a Cox-Markov or Cox-semi-Markov model, a single coxph() object is required. This results in somewhat messy commands being required because separate strata are required and the same covariate needs to appear multiple times to allow it to have different effects for each transition intensity. For instance the call to coxph in the paper requires 16 lines. While there is perhaps some pedagogical advantage to this complication, in ensuring the user really understands what they are fitting, there is surely scope to allow some automation to this process so that the user only need specify which covariates are required for each transition intensity and this could be passed to coxph behind the scenes.

Finally, as has been noted elsewhere, mstate currently does virtually all calculations within R itself. As a result computation times are sometimes disappointing, especially for models on large datasets.

Thursday, 6 January 2011

p3state.msm: Analyzing Survival Data from an Illness-Death model

The second paper in the special issue at Journal of Statistical Software is by Luis Meira-Machado and Javier Roca-Pardinas and is about the package p3state.msm. This package allows non and semi-parametric estimation for three-state progressive or illness-death models subject to right-censoring. Much of the functionality of p3state.msm such as Cox-Markov and Cox-semi-Markov models is available elsewhere in mstate. The main niche of p3state.msm is the ability to estimate transition probabilities without making either Markov or semi-Markov assumptions applying the methodology of Meira-Machado et al (2006, Lifetime Data Analysis). Note that while the Aalen-Johansen estimator is consistent for state occupancy probabilities (Datta and Satten 2001 S&PL, Glidden 2002 Biometrics), this doesn't extend to general transition probabilities. Currently estimates are only available for the case of no covariates, though the authors state they are considering extensions to allow Cox-type models for such non-Markov models. Similarly, they are planning to allow a greater range of models, e.g. a progressive k state model.

Friday, 17 December 2010

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Michael Sweeting, Daniela De Angelis, John Parry and Barbara Suligo have a new paper in Statistics in Medicine. This considers estimating the time to a biomarker crossing a threshold from seroconversion in doubly-censored data (denoted T). Such methods have been used in the past to model time to seroconversion from infection. Sweeting et al state in the introduction that a key motivation is to allow the prevalence of recent infection to be estimated. This is defined as

$P(d) = \int_{0}^{d} I(t)S(d-t) dt$

where S(t) is the time from seroconversion to biomarker crossing. Here we see there is a clear assumption that T is independent of calendar time d. This is consistent with the framework first proposed by De Gruttola and Lagakos (Biometrics, 1989) where the time to first event is assumed to be independent of the time between first and second events, the data are essentially a three-state progressive semi-Markov model where interest lies in estimating the sojourn distribution in state 2. Alternatively, the methods of Frydman (JRSSB, 1992) are based on a Markov assumption. Here the marginal distribution of the sojourn in state 2 could be estimated by integrating over the estimated distribution of times to entry into state 2. Sweeting et al however treat it as standard bivariate survival data (where X denotes the time to serocoversion and Z denotes time to biomarker crossing) using MLEcens for estimation. It's not clear whether this is sensible if the aim is to estimate P(d) as defined above. Moreover, Betensky and Finklestein (Statistics in Medicine, 1999) suggested an alternative algorithm would be required for doubly-censored data because of the inherent ordering (i.e. Z>X). Presumably as long as the intervals for seroconversion and crossing are disjoint the NPMLE for S is guaranteed to have support only at non-negative times.

Sweeting et al make a great deal about the fact that since the NPMLE of (X,Z) is non-unique, e.g. the NPMLE can only ascribe mass to regions of the (X,Z) plane, and that T = Z - X, there will be huge uncertainty over the distribution of S between the extremes of assuming mass only in the lower left corner or only in the upper right corner of the support rectangles. They provide a plot to show this in their data example. The original approach to doubly-censored data of De Gruttola and Lagakos assumes a set of discrete time points of mass for S (the generalization to higher order models was given by Sternberg and Satten). This largely avoids any problems of non-uniqueness (though some problems remain see e.g. Griffin and Lagakos 2010).

For the non-parametric approach, Sweeting et al seem extremely reluctant to make any assumptions whatsoever. In contrast, they go completely to town on assumptions once they get onto their preferred Bayesian method. It should firstly be mentioned that the outcome time is time to crossing of a biomarker and there is considerable auxiliary data available in the form of intermediate measurements. Thus, under any kind of monotonicity assumptions, we can see that extra modelling of the growth process of the biomarker has merit. Sweeting et al use a parametric, mixed effects growth model, to model the growth of the biomarker. The growth is measured in terms of time since seroconversion, which is unknown. They consider two methods: a naive method that assumes seroconversion occurs at the midpoint of the time interval and a uniform prior method that assumes a priori that the seroconversion time is distributed uniformly within the times at which it is interval censored (i.e. last time before seroconversion and the first biomarker measurement time). The first method is essentially like imputing the interval midpoint for X. The second method is like assuming the marginal distribution for X is uniform.

Overall, the paper comes across as a straw man argument against non-parametric methods appended onto a Bayesian analysis that makes multiple (unverified) assumptions.

Friday, 3 December 2010

Interpretability and importance of functionals in competing risks and multi-state models

Per Kragh Andersen and Niels Keiding have a new paper currently available as a Department of Biostatistics, Copenhagen research report. They argue that three principles should be adhered to when constructing functionals of the transition intensities in competing risks and illness-death models. The principles are:

1. Do not condition on the future.
2. Do not regard individuals at risk after they have died.
3. Stick to this world.

They identify several existing ideas that violate these principles. Unsurprising, the latent failure times model for competing risks rightly comes under fire for violating (3), i.e. to say anything about hypothetical survival distributions in the absence of the other risks requires making untestable assumptions. Semi-competing risks analysis where one seeks the survival distribution for illness in the absence of death has the same problem.

The subdistribution hazard from the Fine-Gray model violates principle 2 because it involves the form $\inline P(X(t + dt) = j | X(t) \neq j)$ . Andersen and Keiding say this makes interpretation of regression parameters difficult because they are log(subdistribution hazard ratios). The problem seems to be that many practitioners interpret the coefficients as if they are standard hazard ratios. The authors go on to say that linking covariates directly to cumulative incidence functions is useful. The distinction between this and the Fine-Gray model is rather subtle as in the Fine-Gray model (when covariates are not time dependent): $\inline CIF(t; Z) = 1 - \exp{(-H(t)\exp{(b^{T}Z)})}$ i.e. b is essentially interpreted as a parameter in a cloglog model.
The conditional probability function recently proposed by Allignol et al has similar problems with principle 2.

Principle 1 is violated in the pattern-mixture parametrisation. This is where we consider the distribution of event times conditional on the event type, e.g. the sojourn in state i given the subject moved to state j. This is used for instance in flow-graph Semi-Markov models.

A distinction that is perhaps needed that isn't really made clear in the paper is that there is a difference between violating the principles for mathematical convenience e.g. for model fitting and violating the principles in terms of the actual inferential output. Functionals to be avoided should perhaps be those where no easily interpretable transformation to a sensible measure is available. Thus a pattern-mixture parametrisation for a semi-Markov model without covariates seems unproblematic, since we can retrieve the transition intensities. However, when covariates are present the transition intensities will have complicated relationships to the covariates without an obvious interpretation.

**** UPDATE : The paper is now published in Statistics in Medicine. *****

Monday, 8 November 2010

Accounting for bias due to a non-ignorable tracing mechanism in a retrospective breast cancer cohort study

Titman, Lancaster, Carmichael and Scutt have a new paper in Statistics in Medicine. This applies methods developed by Copas and Farewell (Biostatistics, 2001) to data from a retrospective cohort study where patients were traced conditional on survival up to a certain time. The authors note that the resulting observed process can be viewed as a purged process (Hoem, 1969). In addition to the pseudo-likelihood method of Copas and Farewell (which requires specification of the entry time distribution of patients), a full likelihood approach based on piecewise constant intensities under a Markov assumption is also applied. The term to take into account the conditional survival involves a transition probability, so estimation has similar difficulties to interval-censored data. For the breast cancer study considered, the two methods give very similar results.

Tuesday, 26 October 2010

Multiple imputation for estimating the risk of developing dementia and its impact on survival

Yu, Saczynski and Launer have a new paper in Biometrical Journal. This proposes an approach to fitting Cox-Markov or Cox-semi-Markov models to illness-death models subject to interval censoring via the use of iterative multiple imputation.

In essence this paper is a generalization of the method for interval censored survival data proposed by Pan (2000; Biometrics). The idea is to start by imputing the mid-point of the interval for unknown event times, calculate the parameter estimates treating these times as known. Then based on the new estimates, impute e.g. 20 different complete datasets, estimate the parameters based on these complete datasets, and the re-impute. This process continues until the parameter estimates converge.

The authors consider data regarding mortality and the onset of dementia. They model post-dementia hazard as being proportional to pre-dementia hazard and additionally allow onset age to affect hazard (for the semi-Markov model).

A drawback of the approach, which was mentioned in the paper by Pan, but not seemingly referred to by Yu et al, is that the support points for the baseline intensities will depend entirely on the initial values chosen, i.e. the final baseline intensities will have support at a subset of the original support points. For small datasets, there may be a decay in the number of support points down to 1. Pan suggested using a Link estimate of the baseline hazard (i.e. linear smoothed Breslow estimate) and to impute estimates from this to avoid the problem.

A further issue is determining what exactly constitutes convergence of the parameter estimates. After all, at each iteration a random imputation of M datasets takes place. Hence, we can expect the new parameter estimates to have a certain degree of random variation compared to the previous iteration.

The authors should be commended for having provided code and a subset of the full data in the Supporting Material of the paper. However, there seems to be an error in the code which results in an overestimation of the cumulative disease onset hazard. The code obtains the Breslow estimate of the cumulative baseline hazard. It then estimates the hazard between event times to be $\inline h(t) = (H(t_{i+1}) - H(t_i))/(t_{i+1} - t_i).$ But instead of then simulating new event times from a piecewise-exponential distribution, they ascribe h(t) to all event times (e.g. previously imputed onset times and death times) between t_i and t_{i+1}. The coding error seems to also result in considerable bias in the other estimates as corrected code (assuming support for the hazard only at imputed onset times) seems to give quite different results.

The analysis seems to be incorrect in that it treats all patients as left-truncated (i.e. healthy) at age 50 (presumably time 0 in the data is age 50). No deaths occur before age 71.6 (or 21.6 on the time scale used in the data). Even prevalent cases enter the study conditional on survival until their entry age.

A comparison with existing methods can be made by computing the NPMLE under a Markov assumption. For this we use methods developed by Gauzere (2000; PhD Thesis) and Frydman and Szarek (2009; Biometrics). For data removing the prevalent cases, a corrected version of the imputation code gives estimates that are reasonably similar to the NPMLE (see figure), whereas the uncorrected version substantially overestimates the cumulative incidence of dementia.

The results of the analysis of the data in the paper are close to those using the subsample data with the code, suggesting the same code was used for the full data. It is unclear whether the same code was used for the simulation data.

Finally, it should be noted that the method, while undoubtedly more straightforward to implement than either a semi-parametric maximum likelihood estimate or the penalized likelihood approach of Joly and Commenges, is nevertheless still quite slow. Moreover, as with the other methods, the complexity increases rapidly as the number of states in the model increases.

Thursday, 30 September 2010

Flexible hazard ratio curves for continuous predictors in multi-state models: an application to breast cancer data

Cadarso-Suarez, Meira-Machado, Kneib and Gude have a long-awaited paper (it was accepted in October 2008) newly published in Statistical Modelling. This proposes the use of P-splines to allow smooth transition intensities and smooth functionals of covariates in a Cox-Markov or Cox-semi-Markov multi-state model.

The principal difficulty in fitting these models lies in obtaining appropriate values for the penalization parameters. However, since for right-censored data, both the Cox-Markov and Cox-semi-Markov models allow factorization of the likelihood into parts relating to the constituent transition intensities, methods from univariate survival analysis can be used. The R packages survival and BayesX are both used in the analysis.

Tuesday, 29 June 2010

Hidden Markov models with arbitrary state dwell-time distributions

Langrock and Zucchini have a new paper in Computational Statistics and Data Analysis. This develops models methodology for fitting discrete-time hidden semi-Markov models. They demonstrate that hidden semi-Markov models can be approximated through hidden Markov models with aggregate blocks of states. Dwell-time in a particular state is made up of a series of latent states. An individual enters a state in latent state 1. After k steps in a state where k < m , a subject either leaves the state with probability c(k) or progresses to the next latent state with probability 1-c(k). c(k) therefore represents the hazard rate of the dwell time distribution. If time m in the state is reached then the subject may either stay in latent state m with some probability c(m) or else leave the state. Hence the tail of the dwell-time distribution is constrained to be geometric. By choosing m sufficiently large a good approximation to the desired distribution can be found.

Links

Followers

Blog Archive