Friday, 26 October 2012
Estimating parametric semi-Markov models from panel data using phase-type approximations
Andrew Titman has a new paper in Statistics and Computing. This extends previous work on fitting semi-Markov models to panel data using phase-type distributions. Here, rather than assume a model in which each state is assumed to have a Coxian phase-type sojourn distribution, the model assumes a standard parametric sojourn distribution (e.g. Weibull or Gamma). The computational tractability of phase-type distributions is exploited by approximating the parametric semi-Markov model by a model in which each state has a 5-phase phase-type distribution. In order to achieve this, a family of approximations to the parametric distribution, with scale parameter 1, is developed by solving a relatively large one-off optimization assuming the optimal phase-type parameters for given shape parameters evolve as B-spline functions. The resulting phase-type approximations can then be scaled to give an approximation for any combination of scale and shape parameter and then embedded into the overall semi-Markov process. The resulting approximate likelihood appears to be very close to the exact likelihood, both in terms of shape and magnitude.
In general a 2-phase Coxian phase-type model is likely to give similar results to a Weibull or Gamma model. The only advantages of the Weibull or Gamma model are that the parameters are identifiable under the null (Markov) model so standard likelihood ratio tests can be used to compare models (unlike for the 2-phase model). Also the Weibull or Gamma model requires fewer parameters so may be useful for smaller sample sizes.
Wednesday, 22 August 2012
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Ronald Butler and Douglas Bronson have a new paper in Biometrika. This develops approaches to non-parametric estimation of survival or hitting time distributions in multi-state semi-Markov models with particular emphasis on cases where (multiple) backward transitions are possible. The paper is an extension of the authors' previous JRSS B paper. Here the methodology additionally allows for right-censored observations.
The key property used throughout is that, under a semi-Markov model, the data can be partitioned into separate independent sets relating to the sojourn and destinations for each state. A straightforward, but computationally intensive, approach to finding bootstrap confidence bands for the survival distribution of a semi-Markov process involves a nested re-sampling scheme, first sampling with replacement from the set of pairs of sojourn times and destination states for each state in the network, and for each set of data simulating a large number of walks through the network (starting from the initial state and ending at the designated 'hitting' state) from the empirical distributions implied from the first layer of re-sampling. An alternative, faster approach proposed involves replacing the inner step of the bootstrap with a step which computes saddlepoint approximations of the empirical sojourn distributions and then using flowgraph techniques to find the implied hitting time distributions. A drawback of both approaches is that they require all the mass of the sojourn distributions to be allocated. Hence, in the presence of right-censoring some convention for the mass at the tail of the distribution must be made. Here, the convention of "redistribute-to-the-right" is used, which effectively re-weights the probability mass of all the observed survival times so that it sums to 1. On the face of it this seems a rash assumption. However, right-censoring in the sample only occurs at the end of a sequence of sojourns in several states. As such, in most cases where this technique would be used, the censoring in any particular state is likely to be quite light, even if the overall observed hitting times are heavily censored. Alternative conventions on the probability mass (for instance assuming a constant hazard beyond some point) could be made, but in all cases would be arbitrary, but hopefully would have little impact on the overall estimates.
Unlike non-parametric estimates under a Markov assumption, for which the overall survival distribution will essentially equal the overall Kaplan-Meier estimate, with increasing uncertainty as time increases, under a (homogeneous) semi-Markov assumption the estimated hazard tends to a constant limit and can hence be estimated to a relatively large degree of precision at arbitrarily high times.
Saturday, 18 August 2012
A semi-Markov model for stroke with piecewise-constant hazards in the presence of left, right and interval censoring
Venediktos Kapetanakis, Fiona Matthews and Ardo van den Hout have a new paper in Statistics in Medicine. This develops a progressive three-state illness-death model for strokes with interval-censored data. The proposed model is a time non-homogeneous Markov model. The main approach to computation is to assume a continuous effect of age (time) on transition intensities but to use a piecewise constant approximation to actually fit it. The intensity to death from the stroke state additionally depends on the time since entry into the state (i.e. age at stroke) and since the exact time is typically unknown, it is necessary to numerically integrate over the possible range of transition times (here using Simpson's rule).
The data include subjects whose time to stroke is left-censored because they have already suffered a stroke before the baseline measurement. The authors state that they cannot integrate out the unknown time of stroke because the left-interval (i.e. the last age at which the subject is known to have been healthy) is unknown. They then proceed to propose a seemingly unnecessary ad-hoc EM-type approach based on estimating the stroke age for these individuals, which requires the arbitrary choice of an age
The real issue seems to be that all subjects are effectively left-truncated at the time of entry into the study (in the sense that they are only sampled due to not having died before their current age). For subjects who are healthy at baseline this left-truncation is accounted for by just integrating the hazards of transition out of state 1 from their age at baseline rather than age 0. For subjects who have already had a stroke things are more complicated because the fact they have survived provides information on the timing of the stroke (e.g. if stroke increases hazard of death, the fact they have survived implies the stroke occurred sooner than one would assume if no information on survival were known). Essentially the correct likelihood is conditional on survival to time
Wednesday, 11 July 2012
Bayesian analysis of a disability model for lung cancer survival
Friday, 6 July 2012
Fitting and Interpreting Continuous-Time Latent Markov Models for Panel Data
Wednesday, 20 June 2012
Investigating hospital heterogeneity with a multi-state frailty model
Benoit Liquet, Jean Francois Timsit and Virginie Rondeau have a new paper in BMC Medical Research Methodology. They consider data on the evolution of patient's status for patients in intensive care units. The data originate from 16 distinct ICUs and there is therefore inherent clustering in the data. A progressive four-state model, with admission as state 0, ventilator-associated pneumonia infection (VAP) as state 1, death as state 2 and discharge as state 3 is considered. To account for the clustering, shared Gamma frailty terms are included in the intensities for particular transitions. To maintain simplicity of the method the authors consider cases where either each transition intensity has a ICU related frailty that is assumed independent of frailties for other intensities, or where there are individual level frailties that act in common across intensities. In each, there is only one level of clustering (i.e. either independent frailties on each intensity common to each centre or a subject specific intensity affecting multiple intensities). In the first case, a non-homogeneous Markov or semi-Markov allows separate models to be fitted for each transition using methods applicable to univariate survival analysis. In the second case, multiple intensities can be fitted in a single survival model by specifying separate "strata" for each intensity. In either case the presence of only one level of clustering makes estimation, at least if a Gamma frailty is assumed, relatively straightforward. The authors use the R package frailtypack which Rondeau maintains. This allows fully parametric models (via either Weibull or piecewise constant intensities) or semi-parametric models using spline intensities fitted using penalized likelihood. The emphasis on simple(r) approaches, for which existing software exist, is a good one. But some mention of the existing capacity of the more general survival package to fit Cox models with gamma frailties by using frailty() in the formula.
The wider issue of what to do in the case of nested frailties, i.e. where there are multiple layers of clustering, is an interesting one, but is not discussed which is surprising given that frailtypack has some capability for fitting such models. It is also questionable whether the assumption of independent frailties across intensities is a realistic one. To some extent this doesn't matter because if the data are Markov or semi-Markov conditional on the frailties, then even if the frailties are dependent assuming independence should produce unbiased estimates of the marginal frailty distribution (provided it is Gamma). Similarly estimates of the individual frailties should be reasonably unbiased (empirical Bayes estimates aren't fully unbiased anyway) but will be inefficient if frailty terms across intensities are actually correlated. It would be relatively straightforward to look at the correlation in the empirical Bayes estimates of the frailties associated with the intensities under the independence model as a semi-informal diagnostic. The difficult part would be fitting the multivariate frailty model if the independence model appeared inadequate.
Monday, 28 November 2011
Frailties in multi-state models: Are they identifiable? Do we need them?
The authors consider two main approaches to fitting a frailty; assuming a shared Gamma frailty which acts as a multiplier on all the intensities of the multi-state model and a two-point mixture frailty, where there are two mixture components with a corresponding set of multiplier terms for the intensities for each component. Both approaches admit a closed form expression for the marginal transition intensities and so are reasonably straightforward to fit, but the mixture approach has the advantage of permitting a greater range of dependencies, e.g. it can allow negative as well as positive correlations between sojourn times.
In the data example the authors consider the predictive power (via K-fold cross-validation) of a series of models on a forward (progressive) multi-state model. In particular they compare models which allow sojourn times in later states to depend on the time to first event, with frailty models and find the frailty model performs a little better. Essentially, these two models do the same thing and as the authors note, a more flexible model for the effect of time of first event on the second sojourn time may well give a better fit than the frailty model.
Use of frailties in models with truly recurrent events seems relatively uncontroversial. However for models where few intermediate events are possible their use, rather than some non-homogeneous model allowing dependence both on time since entry to the state and time since initiation, the choice is largely related to either what is more interpretable in terms of the application at hand or possibly what is more computationally convenient.
Thursday, 20 October 2011
Estimation of the Expected Number of Earthquake Occurrences Based on Semi-Markov Models
The dataset used is quite small, involving only 33 events.
The observed matrix of transitions is
Clearly something of basic interest would be whether there is any evidence of dependence on the previous earthquake type. A simple
The authors instead proceed with a fully non-parametric method which uses the limited number of observations per transition type (i.e. between 1 & 6) to estimate the conditional staying time distributions. While a semi-Markov process of this type may well be appropriate for modelling earthquakes, the limited dataset available is insufficient to get usable estimates.
There are some odd features to the paper. Figure 3 in particular is puzzling as the 95% confidence intervals for the expected number of magnitude [6.1,7.2] earthquakes are clearly far too narrow. It also seems strange that there is essentially no mention of the flowgraph/saddlepoint approximation approach to analyzing semi-Markov processes of this nature.
Friday, 29 July 2011
Parametric inference for time-to-failure in multi-state semi-Markov models: A comparison of marginal and process approaches
Wednesday, 4 May 2011
Discrete-time semi-Markov modeling of human papillomavirus persistence
where
Extensions to the model, allowing
Of particular interest to the authors is an estimate of 'persistence' of the disease state. This is defined as spending j time units in the disease state, counting single disease free (negative) observations surrounded by positive observations as time spent in the disease. The probability of persistence is just a function of the transition probabilities
The authors claim that their discrete time model doesn't not require a "guarantee time" unlike Kang and Lagakos. This is obviously ridiculous, the discrete time model requires a guarantee time of 1 time unit for all transitions! While adopting a discrete time model simplifies the problem of inference to something quite trivial, one has to question how realistic it is to model something that is clearly a continuous time process as discrete time. Bachetti et al's more general approach is along similar lines. Similarly, while the estimation is nominally non-parametric, the discrete time assumption is in many respects more severe than, say, constraining sojourn distributions to be Weibull distributed.
The clinical definition of persistence which makes the assumption that a negative observation between two positive observations counts as a positive is easily accommodated for via the discrete time model. However, a more satisfactory approach would be to adopt a more formal definition, based in continuous time, e.g. persistence if disease free period is less than say 6 months. This would have parallels with the approach taken by Mandel (2010) in defining a hitting time in terms of having a sojourn of more than some length in the disease state. Farewell and Su also dealt with a similar problem but their approach seems to be best avoided.
Monday, 21 March 2011
Joint model with latent state for longitudinal and multistate data
For the PAQUID dataset on cognitive decline, the baseline transition intensities are of Weibull form for healthy to pre-diagnosis and pre-diagnosis to illness (the latter being Weibull w.r.t time since pre-diagnosis). The hazard of death is assumed to depend only on age (via piecewise constant intensities) and the value of the longitudinal marker, Y(t), but not explicitly on the current disease state. This assumption seems to be primarily for computational reasons.
A limitation of the approach taken in the paper is that dementia is only diagnosed at clinic visits, so in effect is interval censored between the current and last clinic visit. The authors just assume entry into the dementia state occurred at the midpoint between clinic visits. Through simulation in the supplementary materials the authors show this doesn't cause serious bias.
However, a further issue is that the dependency of the dementia age on the observed value of the marker at a clinic visit would presumably mean the assumption of independence between dementia age and observation errors conditional on the random effects (slopes, intercepts and change-time) would be inappropriate. To what extent is the apparent increase in the decline of cognition before diagnosis due to such an artifact?
Monday, 21 February 2011
Testing Markovianity in the Three-state Progressive Model via future-past Association
Unlike the Cox-proportional hazards approach where a single statistic is produced, here the p-value varies depending on the choice of t chosen. A superior power to the Cox-PH approach was obtained in simulations but only for a good choice of t. An omnibus statistic based on some weighted integral of the absolute value (or square) of tau over the observation range would be useful.
The paper only deals with the progressive 3-state case. It's unclear how the method would be extended to incorporate complicated past history in a more general multi-state model. However, there is scope to extend it to testing whether a particular state within a model has semi-Markov dependencies. In its current form, while the test is an interesting concept, using a simple Cox-PH seems a much more attractive prospect in practice. Update: This paper is now published in Biometrical Journal.
Friday, 7 January 2011
mstate: An R Package for the Analysis of Competing Risks and Multi-State Models
mstate uses the existing survival package to fit the Cox proportional hazards models. Much of the infrastructure of mstate is in functions for preparing data to be in the correct form. To use mstate for a Cox-Markov or Cox-semi-Markov model, a single coxph() object is required. This results in somewhat messy commands being required because separate strata are required and the same covariate needs to appear multiple times to allow it to have different effects for each transition intensity. For instance the call to coxph in the paper requires 16 lines. While there is perhaps some pedagogical advantage to this complication, in ensuring the user really understands what they are fitting, there is surely scope to allow some automation to this process so that the user only need specify which covariates are required for each transition intensity and this could be passed to coxph behind the scenes.
Finally, as has been noted elsewhere, mstate currently does virtually all calculations within R itself. As a result computation times are sometimes disappointing, especially for models on large datasets.
Thursday, 6 January 2011
p3state.msm: Analyzing Survival Data from an Illness-Death model
Friday, 17 December 2010
Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods
where S(t) is the time from seroconversion to biomarker crossing. Here we see there is a clear assumption that T is independent of calendar time d. This is consistent with the framework first proposed by De Gruttola and Lagakos (Biometrics, 1989) where the time to first event is assumed to be independent of the time between first and second events, the data are essentially a three-state progressive semi-Markov model where interest lies in estimating the sojourn distribution in state 2. Alternatively, the methods of Frydman (JRSSB, 1992) are based on a Markov assumption. Here the marginal distribution of the sojourn in state 2 could be estimated by integrating over the estimated distribution of times to entry into state 2. Sweeting et al however treat it as standard bivariate survival data (where X denotes the time to serocoversion and Z denotes time to biomarker crossing) using MLEcens for estimation. It's not clear whether this is sensible if the aim is to estimate P(d) as defined above. Moreover, Betensky and Finklestein (Statistics in Medicine, 1999) suggested an alternative algorithm would be required for doubly-censored data because of the inherent ordering (i.e. Z>X). Presumably as long as the intervals for seroconversion and crossing are disjoint the NPMLE for S is guaranteed to have support only at non-negative times.
Sweeting et al make a great deal about the fact that since the NPMLE of (X,Z) is non-unique, e.g. the NPMLE can only ascribe mass to regions of the (X,Z) plane, and that T = Z - X, there will be huge uncertainty over the distribution of S between the extremes of assuming mass only in the lower left corner or only in the upper right corner of the support rectangles. They provide a plot to show this in their data example. The original approach to doubly-censored data of De Gruttola and Lagakos assumes a set of discrete time points of mass for S (the generalization to higher order models was given by Sternberg and Satten). This largely avoids any problems of non-uniqueness (though some problems remain see e.g. Griffin and Lagakos 2010).
For the non-parametric approach, Sweeting et al seem extremely reluctant to make any assumptions whatsoever. In contrast, they go completely to town on assumptions once they get onto their preferred Bayesian method. It should firstly be mentioned that the outcome time is time to crossing of a biomarker and there is considerable auxiliary data available in the form of intermediate measurements. Thus, under any kind of monotonicity assumptions, we can see that extra modelling of the growth process of the biomarker has merit. Sweeting et al use a parametric, mixed effects growth model, to model the growth of the biomarker. The growth is measured in terms of time since seroconversion, which is unknown. They consider two methods: a naive method that assumes seroconversion occurs at the midpoint of the time interval and a uniform prior method that assumes a priori that the seroconversion time is distributed uniformly within the times at which it is interval censored (i.e. last time before seroconversion and the first biomarker measurement time). The first method is essentially like imputing the interval midpoint for X. The second method is like assuming the marginal distribution for X is uniform.
Overall, the paper comes across as a straw man argument against non-parametric methods appended onto a Bayesian analysis that makes multiple (unverified) assumptions.
Friday, 3 December 2010
Interpretability and importance of functionals in competing risks and multi-state models
1. Do not condition on the future.
2. Do not regard individuals at risk after they have died.
3. Stick to this world.
They identify several existing ideas that violate these principles. Unsurprising, the latent failure times model for competing risks rightly comes under fire for violating (3), i.e. to say anything about hypothetical survival distributions in the absence of the other risks requires making untestable assumptions. Semi-competing risks analysis where one seeks the survival distribution for illness in the absence of death has the same problem.
The subdistribution hazard from the Fine-Gray model violates principle 2 because it involves the form
The conditional probability function recently proposed by Allignol et al has similar problems with principle 2.
Principle 1 is violated in the pattern-mixture parametrisation. This is where we consider the distribution of event times conditional on the event type, e.g. the sojourn in state i given the subject moved to state j. This is used for instance in flow-graph Semi-Markov models.
A distinction that is perhaps needed that isn't really made clear in the paper is that there is a difference between violating the principles for mathematical convenience e.g. for model fitting and violating the principles in terms of the actual inferential output. Functionals to be avoided should perhaps be those where no easily interpretable transformation to a sensible measure is available. Thus a pattern-mixture parametrisation for a semi-Markov model without covariates seems unproblematic, since we can retrieve the transition intensities. However, when covariates are present the transition intensities will have complicated relationships to the covariates without an obvious interpretation.
**** UPDATE : The paper is now published in Statistics in Medicine. *****
Monday, 8 November 2010
Accounting for bias due to a non-ignorable tracing mechanism in a retrospective breast cancer cohort study
Tuesday, 26 October 2010
Multiple imputation for estimating the risk of developing dementia and its impact on survival
In essence this paper is a generalization of the method for interval censored survival data proposed by Pan (2000; Biometrics). The idea is to start by imputing the mid-point of the interval for unknown event times, calculate the parameter estimates treating these times as known. Then based on the new estimates, impute e.g. 20 different complete datasets, estimate the parameters based on these complete datasets, and the re-impute. This process continues until the parameter estimates converge.
The authors consider data regarding mortality and the onset of dementia. They model post-dementia hazard as being proportional to pre-dementia hazard and additionally allow onset age to affect hazard (for the semi-Markov model).
A drawback of the approach, which was mentioned in the paper by Pan, but not seemingly referred to by Yu et al, is that the support points for the baseline intensities will depend entirely on the initial values chosen, i.e. the final baseline intensities will have support at a subset of the original support points. For small datasets, there may be a decay in the number of support points down to 1. Pan suggested using a Link estimate of the baseline hazard (i.e. linear smoothed Breslow estimate) and to impute estimates from this to avoid the problem.
A further issue is determining what exactly constitutes convergence of the parameter estimates. After all, at each iteration a random imputation of M datasets takes place. Hence, we can expect the new parameter estimates to have a certain degree of random variation compared to the previous iteration.
The authors should be commended for having provided code and a subset of the full data in the Supporting Material of the paper. However, there seems to be an error in the code which results in an overestimation of the cumulative disease onset hazard. The code obtains the Breslow estimate of the cumulative baseline hazard. It then estimates the hazard between event times to be
The analysis seems to be incorrect in that it treats all patients as left-truncated (i.e. healthy) at age 50 (presumably time 0 in the data is age 50). No deaths occur before age 71.6 (or 21.6 on the time scale used in the data). Even prevalent cases enter the study conditional on survival until their entry age.
A comparison with existing methods can be made by computing the NPMLE under a Markov assumption. For this we use methods developed by Gauzere (2000; PhD Thesis) and Frydman and Szarek (2009; Biometrics). For data removing the prevalent cases, a corrected version of the imputation code gives estimates that are reasonably similar to the NPMLE (see figure), whereas the uncorrected version substantially overestimates the cumulative incidence of dementia.

The results of the analysis of the data in the paper are close to those using the subsample data with the code, suggesting the same code was used for the full data. It is unclear whether the same code was used for the simulation data.
Finally, it should be noted that the method, while undoubtedly more straightforward to implement than either a semi-parametric maximum likelihood estimate or the penalized likelihood approach of Joly and Commenges, is nevertheless still quite slow. Moreover, as with the other methods, the complexity increases rapidly as the number of states in the model increases.
Thursday, 30 September 2010
Flexible hazard ratio curves for continuous predictors in multi-state models: an application to breast cancer data
The principal difficulty in fitting these models lies in obtaining appropriate values for the penalization parameters. However, since for right-censored data, both the Cox-Markov and Cox-semi-Markov models allow factorization of the likelihood into parts relating to the constituent transition intensities, methods from univariate survival analysis can be used. The R packages survival and BayesX are both used in the analysis.