Multi-state modelling: competing risks

Showing posts with label competing risks. Show all posts

Wednesday, 9 January 2013

Book Review of: Competing Risks and Multistate Models with R.

Ross Maller has written a book review of Beyersmann, Schumacher and Allignol's recent Springer book on Competing Risks and Multistate Models with R, published in Australian & NZ Journal of Statistics. This is primarily a rant against the cause-specific hazard approach to modelling competing risks. For instance cause specific hazards "do not explicitly take into account the obvious mixture of distributions inherent in the data." Moreover, the fact that assuming proportionality in cause-specific-hazards (CSHs) can lead to non-proportional, even crossing, relationships for cumulative incidence functions (CIFs), is painted as a terminal weakness to the approach.

Maller's main contribution to survival analysis is through models for cure fractions (see e.g. Maller and Zhou 1995) an approach that he is evidently very taken with. Apparently the correct approach to take in modelling competing risks data is to assume a finite mixture model, such that individuals in a particular class are only at risk of one particular failure rate. Moreover, the problem of covariates is claimed to be entirely solved by allowing proportional hazards within failure types, which Maller says is the approach taken by Fine and Gray (1999).

The entire nature of survival and event history analysis is in modelling the dynamics of the process. In most circumstances it is much more useful to be able to describe the process at time t given no event has occurred by time t than to describe the process conditional on a latent class membership. Moreover, in the vast majority of competing risks data, at least in medical contexts, all patients are at risk of all event types until experiencing an event. A mixture model could therefore only ever be viewed as a mathematical convenience. The fact that in practice a CSH method is actually substantially more convenient, particularly if a non- or semi-parametric approach is to be adopted, hardly aids the case for mixture models.

Maller is also misrepresenting the Fine-Gray approach which does not assume proportional hazards within failure types. The Larson-Dinse (1985) paper that Maller also cites does involves that approach. But that can lead to the same crossing cumulative incidence curves Maller takes issue with in the context of CSH. Fine-Gray assumes proportionality of the sub-distribution hazard for a particular cause. This does allow proportionality for that cause's corresponding CIF but, as a consequence, is unable to provide a covariate model for other CIFs that is guaranteed to lead to a feasible set of CIFs for all covariate values (ie. we can fit a Fine-Gray model to each cause of failure but the resulting models will be contradictory).

Fundamentally, whatever model that is postulated, we can find the implied cause-specific hazards. Assuming proportionality of the cause-specific hazards is obviously only a modelling assumption but in nearly all cases it will be a better starting point than assuming the existence of cure fractions.

Friday, 26 October 2012

Constrained parametric model for simultaneous inference of two cumulative incidence functions

Haiwen Shi, Yu Cheng and Jong-Hyeon Jeong have a new paper in Biometrical Journal. This paper is somewhat similar in aims to the pre-print by Hudgens, Li and Fine in that it is concerned with parametric estimation in competing risks models. In particular, the focus is on building models for the cumulative incidence functions (CIFs) but ensuring that the CIFs sum to less than 1 at the asymptote as time tends to infinity. Hudgens, Li and Fine dealt with interval censored data but without covariates. Here, the data are assumed to be observed up to right-censoring but the emphasis is on simultaneously obtaining regression models directly for each CIF in a model with two risks.

The approach taken in the current paper is to assume that the CIFs will sum to 1 at the asymptote, to model the cause 1 CIF using a modified three-parameter logistic function with covariates via an appropriate link function. The CIF for the second competing risk is assumed to also have a three-parameter logistic form, but covariates only affect this CIF through the probability of this risk ever occurring.

When a particular risk in a competing risks model is of primary interest, the Fine-Gray model is attractive because it makes interpretation of the covariate effects straightforward. The model of Shi et al seems to be for cases where both risks are considered important, but still seems to require that one risk be considered more important. The main danger of the approach seems to be that the model for the effect of covariates on the second risk may be unrealistic, but will have an effect on the estimates for the first risk. If we only care about the first risk the Fine-Gray model would be a safer bet. If we care about both risks it might be wiser to choose a model based on the cause-specific hazards, which are guaranteed to induce a model with well behaved CIFs albeit at the expense of some interpretability of the resulting CIFs.

Obtaining a model with a direct CIF effect for each cause seems an almost impossible task because, if we allow a covariate to effect the CIF in such a way that a sufficiently extreme covariate leads to a CIF arbitrarily close to 1, it must be having a knock-on effect on the other CIF. The only way around this would be to have a model that assigns maximal asymptote probabilities to the CIFs at infinity that are independent of any covariates e.g. $F_k(t;z) = p_k [1 - \{1-F^{*}_{0k}(t)\}^{\exp(\beta_k z)}]$ where $\inline F^{*}_{0k}(t)$ are increasing functions taking values in [0,1] and $\inline \sum p_k \leq 1$ . The need to restrict the $\inline p_k$ to be independent of covariates would make the model quite inflexible however.

Friday, 12 October 2012

Nonparametric estimation of the cumulative intensities in an interval censored competing risks model

Halina Frydman and Jun Liu have a new paper in Lifetime Data Analysis. This concerns non-parametric estimation for competing risks models under interval censoring. The problem of estimating the cumulative incidence functions (or sub-distribution functions) under interval censoring has been considered by Hudgens et al (2001) and involves an extension of the NPMLE for standard survival data under interval censoring.

The resulting estimates of the cumulative incidence functions are only defined up to increments on intervals. Moreover, the intervals by which the CIFs are defined are not the same for each competing risk. This causes problems if one wants to convert the CIFs into estimates of the cumulative cause-specific hazards. Frydman and Liu propose estimating the cumulative cause-specific hazards by first constraining the NPMLEs of the CIFs to have the same intervals of support (NB: this is just a sub-set of the set of all NPMLEs involving sub-dividing the intervals) and adopting a convention to distribute the increment within the resulting sub-intervals (they assume an equal distribution across sub-interval).

In addition they show that an ad-hoc estimator of the cumulative hazards based on the convention that the support of each interval of the NPMLE for each CIF occurs at its midpoint leads to biased results. They also show that their estimator has standard N^0.5 convergence when the support of the observation time distribution is discrete and finite.

Tuesday, 2 October 2012

Applying competing risks regression models: an overview

Bernhard Haller, Georg Schmidt and Kurt Ulm have a new paper in Lifetime Data Analysis. This reviews approaches to building regression models for competing risks data. In particular, they consider cause specific hazard regression, subdistribution hazard regression (via both the Fine-Gray model and pseudo-observations), mixture models and vertical modelling. The distinction between mixture models and vertical modelling is the order of the conditioning. In mixture models, P(D,T) = P(D)P(T|D)

imply a separate time to event model is developed for each cause of death. Whereas in vertical modelling, P(D,T) = P(T)P(D|T)

meaning there is an overall "all cause" model for survival with a time dependent model for the conditional risk of different causes. Vertical modelling fits in much nearer to the standard hazard based formulation used in classical competing risks. Haller et al also prefer it to mixture modelling for computational reasons. The authors conclude however that vertical modelling's main purpose is as an exploratory tool to check modelling assumptions which may be made in a more standard competing risks model. They suggest that in a study, particularly a clinical trial, it would be more appropriate to use a Cox model either on the cause-specific hazards or on the sub-distribution hazard. The choice between these two models would depend on the particular research question of interest.

Tuesday, 14 August 2012

Absolute risk regression for competing risks: interpretation, link functions, and prediction

Thomas Gerds, Thomas Scheike and Per Andersen have a new paper in Statistics in Medicine. To a certain extent this is a review paper and considers models for direct regression on the cumulative incidence function for competing risks data. Specifically models of the form $g\{F_1(t|X)\} = \beta_0(t) + \beta_1 X_1 + \ldots + \beta_K X_K$ where $\inline g(\cdot)$ is a known link function and $\inline F_1(t | X) = P(T \leq t, D = 1 | X)$ is the cumulative incidence function for event 1 given covariates X. The Fine-Gray model is a special case of this class of models, where a complementary log-log link is adopted. Approaches to estimation based on inverse probability of censoring weights and jackknife based pseudo-observations are considered. Model comparison based on predictive accuracy as measured through Brier score and model diagnostics based on extended models allowing time dependent covariate effects are also discussed.
The discussion gives a clear account of the various pros and cons of direct regression of the cumulative incidence functions. In particular, an obvious, although perhaps not always sufficiently emphasized issue is that if, in a model with two causes, a Fine-Gray (or other direct model) is fitted to the first cause, and another to the second cause, the resulting predictions will not necessarily have the property that $\hat{F}_1(t | X) + \hat{F}_2(t | X) \leq 1$ an issue that is not problematic if the second cause is essentially a nuisance issue, but obviously problematic if both causes are of interest. In such cases regression of the cause-specific-hazards is preferable even if it makes interpreting the effect on the cumulative intensity functions more difficult.

Tuesday, 31 July 2012

A multistate modelling approach for pancreatic cancer development in genetically high risk families

Kolamunnage-Dona, Vitone, Greenhalf, Henderson and Williamson have a new paper in Applied Statistics. This uses a competing risks model with shared frailties to model data on the progression of pancreatic cancer in the presence of clustering and informative censoring. Clustering is present due to data being available on patients from the same family groups. A patient having a resection causes censoring of the main event, time to pancreatic cancer, but is likely to be informative. This is dealt with by allowing the cause-specific hazards to depend on the same frailty term. The methodology in the paper is very similar to that of Huang and Wolfe (Biometrics, 2002), the only extension being that the current formulation allows for the possibility of time dependent covariates. It isn't clear what if any complication this adds to the original procedure in Huang and Wolfe. An MCEM algorithm is used where the E-step is approximated by using Metropolis-Hastings in order to calculate the required expected quantities. It's not clear what the authors mean when they say the Metropolis-Hastings step use "a vague prior for the frailty variance." Hopefully they mean an improper uniform prior as otherwise they would be pointlessly adding Bayesian features to an otherwise frequentist estimating procedure. On a related computational point, since the random effect for each cluster is one dimensional, I suspect using a Laplace approximation to compute the required integrals at each step would perform quite well and be a lot faster than using Metropolis-Hastings

In the analysis there does seem evidence for a shared frailty within clusters, but it appears that the parameter which links the frailty in the time to pancreatic cancer to the time to resection intensity is hard to identify having a very wide 95% confidence interval encompassing strong negative dependence through independence to strong positive dependence. The typical cluster size in the data is quite small (e.g. median is 3) and this is probably insufficient as you would ideally need some subjects to fail and some to be informatively censored in each cluster to assess their association. The point estimate is negative implying a counter intuitive negative association between resection and pancreatic cancer. The authors suggest as a model extension to allow a specific bivariate frailty linking competing risks (presumably within individuals?) - which is unlikely to be helpful.

Wednesday, 18 April 2012

Bayesian inference of the fully specified subdistribution model for survival data with competing risks

Miaomiao Ge and Ming-Hui Chen have a new paper in Lifetime Data Analysis. This considers methods for Bayesian inference in the Fine-Gray model for competing risks. In order to perform the Bayesian analysis it is necessary to fully specify the model for the competing risk(s) that are not of direct interest in the analysis. Ge and Chen propose a model in which for failure time $\inline T^{*}$ ,
$F_2(t) = P(T^{*} \leq t, \delta = 2) = M_2(t)P(\delta = 2)$ and a proportional hazards model affects $\inline M_2(t)$ via
$M_2(t | \mathbf{x}) = 1 - \exp{\{-H_{20}(t)\exp{(\mathbf{x}^{T}\beta_2)}\}}$ .
Note that $\inline \beta_2$ does not affect $\inline F_1(t)$ or $\inline P(\delta = 2)$ .
The authors consider approaches to non-parametric Bayesian inference using Gamma process priors, but also use piecewise constant hazard models (with fixed cut points) where the hazard in each time period has an independent gamma prior.

Monday, 26 March 2012

A note on the decomposition of number of life years lost according to causes of death

Per Kragh Andersen has a new paper available as a Department of Biostatistics, Copenhagen Research Report. He shows that the integral of the cumulative incidence function of a particular risk has an interpretation as the expected number of life years lost due to this cause, i.e.
$L_j(0,\tau) = \int_{0}^{\tau} F_j(t) dt = E(\tau - T_{(j)} \wedge \tau)$
It is argued that this is a more appropriate quantification of the effect of a cause-of-death than using a hypothetical estimate of life expectancy without a particular cause of death, which is reliant on an (untestable) assumption of independent competing risks.

Regression models based around expected "life years lost" are proposed using the pseudo-observations method.

Sunday, 20 November 2011

Isotonic estimation of survival under a misattribution of cause of death

Jinkyung Ha and Alexander Tsodikov have a new paper in Lifetime Data Analysis. This considers the problem of estimation of the cause specific hazard of death from a particular cause in the presence of competing risks and misattribution of cause of death. They assume they have right-censored data for which there is an associated cause of death, but that there is some known probability r(t) of misattributing the cause of death from a general cause to a specific cause (in this case pancreatic cancer) at time t.

The authors consider four estimators for the true underlying cause-specific hazards. Firstly they consider a naive estimator which obtains Nelson-Aalen estimates of the observed CSHs and transforms them to true hazards by solving the implied equations

$\begin{matrix} d\hat{\Lambda}^{Obs}_{1}(t) &=& r(t)d\Lambda_{2}(t) + d\Lambda_{1}(t)\\ d\hat{\Lambda}^{Obs}_{2}(t) &=& (1-r(t))d\Lambda_{2}(t)\\ \end{matrix}$

This estimator is unbiased but has the drawback that there are negative increments to the cause-specific hazards.
The second approach is to apply a (constrained) NPMLE estimate for instance via an EM algorithm. The authors show that, unless the process is in discrete time (such that the number of failures at a specific time point increases as the sample size increases), this estimator is asymptotically biased.
The third and fourth approaches take the naive estimates and apply post-hoc algorithms to ensure monotonicity of the cumulative hazards, by using the maximum observed naive cumulative hazard up to time t (sup-estimator) or by applying the pool-adjacent-violators algorithm to the naive cumulative hazard. These estimators have the advantage of being both consistent and guaranteed to be monotonic.

Wednesday, 24 August 2011

Maximum likelihood analysis of semicompeting risks data with semiparametric regression models

Yi-Hau Chen has a new paper in Lifetime Data Analysis which extends his 2010 JRSS B paper from competing risks to semi-competing risks data. Essentially in both cases the main idea is to model the dependence between the competing risks by assuming their event time distributions are related via some family of copulas. Mathematically this approach is quite elegant as it allows regression models to be built on the marginal distributions of each failure time, with the inherent dependency in the censoring accounted for through the copula. From a practical perspective, particularly with semi-competing risks data and medical applications one has to question the sensibleness of the model and the objective of modelling marginal distributions.

It seems most useful to follow Xu, Kalbfleisch and Tai and view semi-competing risks as an illness-death model. After accounting for covariates, a patient's illness time and death time can be related either due to a shared frailty term, which it may be sensible to assume is determined from the outset, or through onset of illness causing death to occur sooner than it would have done. In the copula model these two distinct factors get pooled together. It is questionable how well the copula model would perform when the true process has a more event determined dependence.

More importantly the question has to be asked why you would want to try and estimate the "illness free" survival distribution? This breaks Andersen and Keiding's guideline to "Stick to this world". Illness (or relapse) is never going to be eliminated. More sensible measures like the cumulative incidence function of death (without illness having occurred) can of course be derived from Chen's copula model, although analogously to the case of semi-parametric models on cause-specific hazards, the effect of covariates on the CIF may be complicated.

Monday, 8 August 2011

Joint modelling of longitudinal outcome and interval-censored competing risk dropout in a schizophrenia clinical trial

Ralitza Gueorguieva, Robert Rosenheck and Haiqun Lin have a new paper in JRSS A. The paper concerns the joint modelling of a longitudinal outcome and an interval censored competing risks outcome that explains drop-out. As is common with these joint longitudinal and survival types of models the two processes are linked via a normally distributed vector of random effects. The novelty of the paper is in the survival part is a competing risks process and the event time is interval censored. The authors adopt a parametric model for the competing risks, using the family of distributions proposed by Sparling et al (Biostatistics, 2006). This makes inference somewhat more straightforward than it would be if a non-parametric baseline cause-specific hazards were used. As recently noted, parametric treatment of competing risks data is surprisingly rare. One problem faced by the authors is that the hazard family of Sparling, while allowing closed form expressions for interval censored univariate survival data, do not result in closed form expressions for interval censored competing risks data (except in special cases). Instead a numerical integral has to be competed. The presence of the overall random effects would mean the likelihood requires nested integration. To avoid this problem the authors adopt an approximation to the true likelihood for competing risks data. If a patient is known to have had a failure of type j in the interval [t0,t1] the authors assume that the patient is censored of all risks except risk j at time t0. It is clear that this approximation will lead to systematic bias as the time at risk from each failure type will be underestimated so the hazards will tend to be overestimated. The amount of bias will depend on the typical length of the intervals [t0,t1].

For the CATIE data example the proposed approximation is probably not an issue. The drop out (competing risks) part of the model is not the primary focus of the inference, and it is really the relative hazards of different types of drop out rather than their absolute values that is important in determining the trajectories of the longitudinal measure without drop out. For instance the estimates for simulated data of a similar type are close to unbiased.
However in extreme cases like current status competing risks data the approximation will do extremely badly.

Friday, 10 June 2011

Comparison of prediction models for competing risks with time-dependent covariates

Giuliana Cortese, Thomas Gerds and Per Kragh Andersen have a new paper available as a University of Copenhagen Department of Biostatistics technical report. The paper concerns the development of models for prediction for competing risks in the presence of internal time dependent covariates. This is a follow-up to Cortese and Andersen's 2010 Biometrical Journal paper. Like the previous paper, the authors compare a multi-state modelling approach that explicitly models the progression of the (categorical) time dependent covariate and its effect on the cause specific hazards, and a landmarking approach that sets (arbitrarily chosen) time points $\inline t_1, \ldots, t_n$ and performs separate regressions to estimate the hazards for $\inline t \in [t_{r},t_{r+1}]$ conditional on the value of the time dependent covariate at time $\inline t_r$ . The authors consider two modelling approaches under landmarking, one based on Cox regression of cause-specific hazards and the other based on Fine-Gray subdistribution hazards.

They compare the predictive ability of the models to predict the outcome by landmark $\inline t_{r+1}$ given data up to $\inline t_r$ . This is assessed by using a time dependent Brier score (Gerds & Schumacher, 2006). Rather than use inverse probability weighting, the authors instead use a pseudo-value to estimate outcomes when a subject is lost to follow-up between $\inline t_r$ and $\inline t_{r+1}$ . The authors perform the comparison using a bone marrow transplant study where the competing events are relapse and death, and the internal time dependent covariate is the development of Graft versus Host Disease (GvHD). The predictive abilities are estimated via cross-validation involving randomly choosing 2/3 of patients as training data and using the remainder as test data, repeating the process 100 times. For the data considered the three methods performed equally well in terms of prediction error. As might be expected, there was significantly improved predictive ability of these models compared to one that ignored GvHD (i.e. only considered baseline time constant covariates).

As the authors note, there are advantages and disadvantages to both approaches. The multi-state modelling approach requires modelling of the covariate process (e.g. Markov or semi-Markov assumptions and proportional hazard assumptions on the effect of baseline covariates on transition rates through covariate states) and requires a categorical covariate. Landmarking can accommodate continuous covariates but relies on an arbitrary set of landmark times and requires fitting regressions at each landmark. The extra modelling required for the multi-state approach may either be a blessing, in terms of having the potential to give a greater insight into the whole process, or a curse (questions of robustness to incorrect modelling assumptions).

Tuesday, 10 May 2011

A proportional hazards regression model for the subdistribution with right-censored and left-truncated competing risks data

Xu Zhang, Mei-Jie Zhang and Jason Fine have a new paper in Statistics in Medicine. This covers the same ground as the paper by Geskus in Biometrics, in developing an approach to fitting the Fine-Gray proportional subdistribution hazard model for competing risks data with left truncated and right censored observations by using inverse probability weights (IPW). Bizarrely, the paper makes no reference at all to the Geskus paper. Presumably this is because the paper was first submitted in 2009 before Geskus's work was published (April 2010). However, it is strange that neither the authors nor the referees became aware of the work in the interim (i.e. acceptance of the paper wasn't until March 2011).

What is interesting is the differences between the approach taken in this paper compared to Geskus. The authors work on the basis that since X = min(T,C) is only observable if X > L, where T is the time of failure, L the time of left truncation and C the time of right censoring, the IPW should be calculated conditional on L < X. Zhang et al use a stabilised weight rather than the IPW to reduce the variability in the original weight. The weights they derive seem quite different to Geskus's as they depend on an estimate of overall survival, which will have to depend on the covariates if the semi-parametric model for the subdistribution hazard is to apply.
The authors suggest using Aalen additive hazard models for the overall survival (thus allowing for time varying covariate effects that can ensure the weights are consistent with the proportional subdistribution hazard model).

Zhang et al start from the general case where the truncation and censoring distributions depend on covariates (but are independent conditional on these covariates), though they only detail non-parametric estimates of the weights. Geskus argued that even if the censoring/truncation distribution depended on covariates that didn't imply it was necessary to include these covariates in the weightings.

Given these discrepancies it would be of interest to contrast and compare the two approaches to the same problem. If both approaches are effective, Geskus's seems preferable because the weights are much easier to calculate.

Wednesday, 23 February 2011

Estimating and testing for center effects in competing risks

Sandrine Katsahian and Christian Boudreau have a new paper in Statistics in Medicine. This develops methods for including frailty terms within a Fine-Gray competing risks model in order to account for clustering, e.g. effects of different centres.

Since the Fine-Gray model is essentially just a standard Cox proportional hazards regression model with additional time dependent weights, based on the censoring distribution, for individuals who have had a competing event, methods appropriate for standard Cox frailty models can be readily adapted.

Katsahian and Boudreau closely follow the approach taken by Ripatti and Palmgren (Biometrics, 2000). They assume a Gaussian frailty. Computation of the likelihood requires integrating out the frailty terms. Here this is performed using a Laplace approximation. A difficulty with the Laplace approximation is that it still requires the modal value of the frailty distribution conditional on the data and current values of the parameters. The authors therefore take a profile likelihood approach in which they fix the frailty variance $\inline \theta$ and maximize the likelihood term with respect to both the regression parameters $\inline \beta$ and the frailty terms, $\inline u$ . Having obtained, $\inline \hat\beta$ and $\inline \hat u$ they can then plug $\inline \hat u$ into the Laplace approximation to get the profile likelihood for $\inline \theta$ . The procedure gives a local approximation for $\inline \theta$ which can be used to suggest the updated estimate. Thus the process involves alternating between two Newton-Raphson algorithms until convergence.

Friday, 7 January 2011

Parametric Estimation of Cumulative Incidence Functions for Interval Censored Competing Risks Data

Peter Nyangweso[1], Michael Hudgens and Jason Fine have a new paper currently available as a University of North Carolina at Chapel Hill Department of Biostatistics Technical Report. This considers parametric modelling of cumulative incidence functions in the case of interval censored competing risks data. It is somewhat curious that while there has been quite a lot of work on interval-censored competing risks by modelling non-parametrically, little seems to have been done regarding the (presumably easier) problem of parametric modelling (although lots have work has been done for more general multi-state models).

Nyangweso and colleagues consider a parameterisation based on the cumulative incidence functions. This makes sense since the interval-censored likelihood can be directly written in terms of the CIFs. They consider a Gompertz model for each CIF taking the form:

$F_{k}(t;\Theta_k) = 1 - \exp[\beta_k \{1-\exp(\alpha_k t)\}/\alpha_k]$
Note that this allows for a proportion $\inline 1 - \exp(\beta_k/\alpha_k)$ who will never experience event k. An obvious complication is that a valid set of CIFs needs to have the property that $\inline \sum_{k}^{n_k} F_k(t;\Theta_k) \leq 1$ . It is noted that in practice the unconstrained MLE will be such that the sum of the CIFs will be less than 1 at the greatest time at which someone was right censored as otherwise the likelihood involves taking the log of a non-positive number. Some discussion of constrained optimization, where $\inline \lim_{t \rightarrow \infty }\sum_{k}^{n_k} F_k(t;\Theta_k) = 1$ , so that it is assumed that all patients must eventually experience on of the events (i.e. no cure fraction).

In addition to full likelihood estimation, the parametric analogue of the naive estimator investigated in Jewell et al (2003, Biometrika) is also considered. This just involves modelling each CIF separately as if it were univariate interval censored survival data. Obviously there is an even greater risk of CIFs summing to more than 1 for the naive estimates.

The paper does not get as far as considering models for covariates. The main complication with adding covariates to the Gompertz model seems to be that we would then be almost guaranteed to have covariate patterns where the CIFs sum to more than 1 even at times of interest. There is surely advantage in modelling the cause specific hazards as this way the CIFs are guaranteed to be valid at all times for all covariate values. While for most parametric hazards computation of the CIF requires numerical integration, the extra computation required shouldn't be prohibitive.

Update: The original paper appears to have been superseded by a new version with a slightly different set of authors (Hudgens, Li & Fine).

Thursday, 6 January 2011

Analyzing Competing Risk Data Using the R timereg Package

The first paper in the special issue of Journal of Statistical Software is by Thomas Scheike and Mei-Jie Zhang and illustrates the use of the R package timereg to fit the flexible direct binomial regression based competing risks models proposed by Scheike et al, Biometrika 2008. These models, which rely on inverse probability of censoring weights allow both proportional and additive models for the cumulative incidence functions. In particular, this offers a goodness-of-fit test for the Fine-Gray model, testing for time dependence in the covariate effects on the subdistribution hazard.
timereg offers some interesting functionality, for instance confidence bands on the cumulative incidence functions can be computed (albeit via bootstrap resampling).

The package timereg has other very useful functions not directly featured in the paper. In particular, it can fit Aalen additive hazard models which have yet to be used much for modelling transition intensities for multi-state models (Shu & Klein, Biometrika 2005 being the exception).

Friday, 3 December 2010

Interpretability and importance of functionals in competing risks and multi-state models

Per Kragh Andersen and Niels Keiding have a new paper currently available as a Department of Biostatistics, Copenhagen research report. They argue that three principles should be adhered to when constructing functionals of the transition intensities in competing risks and illness-death models. The principles are:

1. Do not condition on the future.
2. Do not regard individuals at risk after they have died.
3. Stick to this world.

They identify several existing ideas that violate these principles. Unsurprising, the latent failure times model for competing risks rightly comes under fire for violating (3), i.e. to say anything about hypothetical survival distributions in the absence of the other risks requires making untestable assumptions. Semi-competing risks analysis where one seeks the survival distribution for illness in the absence of death has the same problem.

The subdistribution hazard from the Fine-Gray model violates principle 2 because it involves the form $\inline P(X(t + dt) = j | X(t) \neq j)$ . Andersen and Keiding say this makes interpretation of regression parameters difficult because they are log(subdistribution hazard ratios). The problem seems to be that many practitioners interpret the coefficients as if they are standard hazard ratios. The authors go on to say that linking covariates directly to cumulative incidence functions is useful. The distinction between this and the Fine-Gray model is rather subtle as in the Fine-Gray model (when covariates are not time dependent): $\inline CIF(t; Z) = 1 - \exp{(-H(t)\exp{(b^{T}Z)})}$ i.e. b is essentially interpreted as a parameter in a cloglog model.
The conditional probability function recently proposed by Allignol et al has similar problems with principle 2.

Principle 1 is violated in the pattern-mixture parametrisation. This is where we consider the distribution of event times conditional on the event type, e.g. the sojourn in state i given the subject moved to state j. This is used for instance in flow-graph Semi-Markov models.

A distinction that is perhaps needed that isn't really made clear in the paper is that there is a difference between violating the principles for mathematical convenience e.g. for model fitting and violating the principles in terms of the actual inferential output. Functionals to be avoided should perhaps be those where no easily interpretable transformation to a sensible measure is available. Thus a pattern-mixture parametrisation for a semi-Markov model without covariates seems unproblematic, since we can retrieve the transition intensities. However, when covariates are present the transition intensities will have complicated relationships to the covariates without an obvious interpretation.

**** UPDATE : The paper is now published in Statistics in Medicine. *****

Monday, 1 November 2010

A regression model for the conditional probability of a competing event: application to monoclonal gammopathy of unknown significance

Arthur Allignol, Aurélien Latouche, Jun Yan and Jason Fine have a new paper in Applied Statistics (JRSS C). The paper concerns competing risks data and develops methods for regression analysis of the probability of a competing event conditional on no competing event having occurred. In terms of the cumulative incidence functions, for the case of two competing events, this can be written as $\inline CPF_1 = CIF_1/(1-CIF_2)$ . In some applications this quantity may be more useful than either the cause-specific hazards or the cumulative incidence functions themselves. One approach to regression is this scenario might be to compute pseudo-observations and perform the regression using those. The authors instead propose use of temporal process regression (Fine, Yan and Kosorok 2004), allowing estimation of time dependent regression parameters, by considering the cross-sectional data at each event time.

Monday, 14 June 2010

Regression analysis of censored data using pseudo-observations

Erik Parner and Per Kragh Andersen have a new paper available as a research report at the Department of Biostatistics, University of Copenhagen. This develops STATA routines for implementing the pseudo-observations method of performing direct regression modelling of complicated outcome measures (such as cumulative incidence functions or overall survival times) for multi-state models subject to right censoring. The paper is essentially the STATA equivalent of the 2008 paper by Klein et al which developed similar routines for SAS and R.

[Update: The paper is now published in The STATA Journal]

Friday, 9 April 2010

Cause-specific cumulative incidence estimation and the Fine and Gray model under both left truncation and right censoring

Ronald Geskus has a new paper in Biometrics. This extends the Fine and Gray model, for regression of the subdistribution hazards for competing risks models, to the case of left truncated (and right censored) data. Essentially it is shown that the standard estimator of the cumulative incidence function (CIF) can be derived as a inverse-probability-weighting estimator. Thus the Fine-Gray model for left-truncated and right-censored data can be obtained as a weighted Cox-model where individuals who are censored or experience a competing risk at time s, have weights at time t>s given by $\inline G(t)H(t)/G(s)H(s)$ where G(t) is the empirical CDF of the censoring distribution and H(t) is the empirical survivor distribution of the truncation distribution. This form appears to imply that independence is assumed between the censoring and truncation distributions. However, the equivalence of estimates of the CIF in the no-covariates case holds regardless of the relationship between censoring and truncation - only independence with failure times is required.

Links

Followers

Blog Archive