Multi-state modelling: semi-parametric

Showing posts with label semi-parametric. Show all posts

Wednesday, 9 January 2013

Book Review of: Competing Risks and Multistate Models with R.

Ross Maller has written a book review of Beyersmann, Schumacher and Allignol's recent Springer book on Competing Risks and Multistate Models with R, published in Australian & NZ Journal of Statistics. This is primarily a rant against the cause-specific hazard approach to modelling competing risks. For instance cause specific hazards "do not explicitly take into account the obvious mixture of distributions inherent in the data." Moreover, the fact that assuming proportionality in cause-specific-hazards (CSHs) can lead to non-proportional, even crossing, relationships for cumulative incidence functions (CIFs), is painted as a terminal weakness to the approach.

Maller's main contribution to survival analysis is through models for cure fractions (see e.g. Maller and Zhou 1995) an approach that he is evidently very taken with. Apparently the correct approach to take in modelling competing risks data is to assume a finite mixture model, such that individuals in a particular class are only at risk of one particular failure rate. Moreover, the problem of covariates is claimed to be entirely solved by allowing proportional hazards within failure types, which Maller says is the approach taken by Fine and Gray (1999).

The entire nature of survival and event history analysis is in modelling the dynamics of the process. In most circumstances it is much more useful to be able to describe the process at time t given no event has occurred by time t than to describe the process conditional on a latent class membership. Moreover, in the vast majority of competing risks data, at least in medical contexts, all patients are at risk of all event types until experiencing an event. A mixture model could therefore only ever be viewed as a mathematical convenience. The fact that in practice a CSH method is actually substantially more convenient, particularly if a non- or semi-parametric approach is to be adopted, hardly aids the case for mixture models.

Maller is also misrepresenting the Fine-Gray approach which does not assume proportional hazards within failure types. The Larson-Dinse (1985) paper that Maller also cites does involves that approach. But that can lead to the same crossing cumulative incidence curves Maller takes issue with in the context of CSH. Fine-Gray assumes proportionality of the sub-distribution hazard for a particular cause. This does allow proportionality for that cause's corresponding CIF but, as a consequence, is unable to provide a covariate model for other CIFs that is guaranteed to lead to a feasible set of CIFs for all covariate values (ie. we can fit a Fine-Gray model to each cause of failure but the resulting models will be contradictory).

Fundamentally, whatever model that is postulated, we can find the implied cause-specific hazards. Assuming proportionality of the cause-specific hazards is obviously only a modelling assumption but in nearly all cases it will be a better starting point than assuming the existence of cure fractions.

Friday, 23 November 2012

Ties between event times and jump times in the Cox model

Xin, Horrocks and Darlington have a new paper in Statistics in Medicine. This considers approaches for dealing with ties in Cox proportional hazard models, not between event times but between event times and changes to time dependent covariates.

If a change in a time-dependent covariate coincides with a failure time there is ambiguity over which value of the time dependent covariate, z(t+) or z(t-), should be taken for the risk set at time t. By convention, it is usually assumed that z(t-) should be taken, i.e. that the change in the covariate occurs after the failure time. The authors demonstrate that for small sample sizes and/or a large proportion of ties, the estimates can be sensitive to the convention chosen. The authors also only consider cases where z(t) is a binary indicator that jumps from 0 to 1 at some point and cannot make the reverse jump. Obviously this will magnify the potential for bias because the "change after" convention will always underestimate the true risk whereas the "change before" will always overestimate the true risk.

The authors consider some simple adjustments for the problem: compute the "change before" and "change after" estimates and take their average or use random jittering. A problem with the averaging approach is estimating the standard error of the resulting estimator. An upper bound can be obtained by assuming the two estimators have perfect correlation. The jittering estimator obviously has the problem that different random jitters will give different results, though in principle the jittering could be repeated multiple times and combined in a fashion akin to multiple imputation.

It is surprising that the further option of adopting an method akin to the Efron method for ties. Essentially at each failure time there is an associated risk set. It could be argued that every tied covariate jump time had a 50% chance of occurring before or after the failure time. The expected contribution from a particular risk set could then be $\sum_{r \in \mathcal{R}} 0.5 \exp\{\beta z_r(t+)\} + 0.5\exp\{\beta z_r(t-)\}$
It should also be possible to apply this approach using standard software, e.g. coxph() in R. It is simply necessary to replace any (start,stop) interval that ends with a tied "stop" with two intervals (start, stop - 0.00001) and (start, stop + 0.00001) each of which are associated with a weight of 0.5.

Sunday, 14 October 2012

Assessing age-at-onset risk factors with incomplete covariate current status data under proportional odds models

Chi-Chung Wen and Yi-Hau Chen have a new paper in Statistics in Medicine. This considers estimation of a proportional odds regression model for current status data in cases where a subset of the covariates may be missing at random for a subset of the patient population.

It is assumed that the probability that a portion of the covariates is missing depends on all the other observable outcomes (the failure status, the survey time and the rest of the covariate vector). The authors propose to fit a logistic regression model, involving all subjects in the dataset, for this probability of missingness. To fit the regression model for the current status data itself, they propose to use what they term a "validation likelihood estimator." This involves only working with the subset of patients with complete data but maximizing a likelihood that conditions on the fact that the whole covariate vector was observed. An advantage of using the proportional odds model over other candidate models (e.g. proportional hazards) is that the resulting likelihood remains of the proportional odds form.

Clearly a disadvantage of this "validation likelihood estimator" is that the data from subjects who have incomplete covariates is not used directly in the regression model. As a result the estimator is likely to be less efficient than approaches that effectively attempt to impute the missing covariate values. The authors argue that the validation likelihood approach will tend to be more robust since it is not necessary to make (parametric) assumptions about the conditional distribution of the missing covariates.

Tuesday, 2 October 2012

Effect of an event occurring over time and confounded by health status: estimation and interpretation. A study based on survival data simulations with application on breast cancer

Alexia Savignoni, David Hajage, Pascale Tubert-Bitter and Yann De Ryckea have a new paper in Statistics in Medicine. This considers developing illness-death type models to investigate the effect of pregnancy on the risk of recurrence of cancer amongst breast cancer patients. The authors give a fairly clear account of different potential models with particular reference to the hazard ratio $HR(t) = \frac{\lambda_{23}(t,\mathbf{z})}{\lambda_{13}(t,\mathbf{z})}$ The simplest model to consider is a Cox model with a single time dependent covariate representing pregnancy, here $\inline HR(t) = \exp(\delta)$ . This can be extended by assuming non-proportional hazards which effectively makes the effect time dependent i.e. $\inline HR(t) = \exp(\delta(t))$ . Alternatively, an unrestricted Cox-Markov model could be fitted with separate covariate effects and non-parametric hazards from each pregnancy state, yielding: $HR(t) = \exp[(\beta_{23} - \beta_{13})^{T} \mathbf{z}] \frac{\lambda_{23}(t)}{\lambda_{13}(t)}$ This model can be restricted by allowing a shared baseline hazard for $\inline \lambda_{13}, \lambda_{23}$ giving either $\inline HR(t) = \exp{\[(\beta_{23} - \beta_{13})^{T} \mathbf{z} + \delta]}$ under a Cox model with a fixed effect or $\inline HR(t) = \exp{\[(\beta_{23} - \beta_{13})^{T} \mathbf{z} + \delta(t)]}$ for a time dependent effect.

If we were only interested in $\inline HR(t)$ and any of these models seems feasible, there doesn't actual seem that much point in formulating the model as an illness-death model. Note that the transition rate $\inline \lambda_{12}(t,z)$ does not feature in any of the above equations but would be estimated in the illness-death model. The above models can be fitted by a Cox model with a time dependent covariate (representing pregnancy) that has an interaction with the time fixed covariates. The real power of a multi-state model approach would only become apparent if we were interested in the overall survival for different covariates, treating pregnancy as a random event.

The time dependent effects $\inline \delta(t)$ are represented simply via a piecewise constant time indicator in the model. The authors do acknowledge that a spline model would have been better. The other issue that could have been considered is whether the effect of pregnancy depends on time since initiation of pregnancy (i.e. a semi-Markov effect). An issue in their data example is that pregnancy is only determined via a successful birth meaning there may be some truncation in the sample (through births prevented due to relapse/death).

Applying competing risks regression models: an overview

Bernhard Haller, Georg Schmidt and Kurt Ulm have a new paper in Lifetime Data Analysis. This reviews approaches to building regression models for competing risks data. In particular, they consider cause specific hazard regression, subdistribution hazard regression (via both the Fine-Gray model and pseudo-observations), mixture models and vertical modelling. The distinction between mixture models and vertical modelling is the order of the conditioning. In mixture models, P(D,T) = P(D)P(T|D)

imply a separate time to event model is developed for each cause of death. Whereas in vertical modelling, P(D,T) = P(T)P(D|T)

meaning there is an overall "all cause" model for survival with a time dependent model for the conditional risk of different causes. Vertical modelling fits in much nearer to the standard hazard based formulation used in classical competing risks. Haller et al also prefer it to mixture modelling for computational reasons. The authors conclude however that vertical modelling's main purpose is as an exploratory tool to check modelling assumptions which may be made in a more standard competing risks model. They suggest that in a study, particularly a clinical trial, it would be more appropriate to use a Cox model either on the cause-specific hazards or on the sub-distribution hazard. The choice between these two models would depend on the particular research question of interest.

Tuesday, 14 August 2012

Absolute risk regression for competing risks: interpretation, link functions, and prediction

Thomas Gerds, Thomas Scheike and Per Andersen have a new paper in Statistics in Medicine. To a certain extent this is a review paper and considers models for direct regression on the cumulative incidence function for competing risks data. Specifically models of the form $g\{F_1(t|X)\} = \beta_0(t) + \beta_1 X_1 + \ldots + \beta_K X_K$ where $\inline g(\cdot)$ is a known link function and $\inline F_1(t | X) = P(T \leq t, D = 1 | X)$ is the cumulative incidence function for event 1 given covariates X. The Fine-Gray model is a special case of this class of models, where a complementary log-log link is adopted. Approaches to estimation based on inverse probability of censoring weights and jackknife based pseudo-observations are considered. Model comparison based on predictive accuracy as measured through Brier score and model diagnostics based on extended models allowing time dependent covariate effects are also discussed.
The discussion gives a clear account of the various pros and cons of direct regression of the cumulative incidence functions. In particular, an obvious, although perhaps not always sufficiently emphasized issue is that if, in a model with two causes, a Fine-Gray (or other direct model) is fitted to the first cause, and another to the second cause, the resulting predictions will not necessarily have the property that $\hat{F}_1(t | X) + \hat{F}_2(t | X) \leq 1$ an issue that is not problematic if the second cause is essentially a nuisance issue, but obviously problematic if both causes are of interest. In such cases regression of the cause-specific-hazards is preferable even if it makes interpreting the effect on the cumulative intensity functions more difficult.

Friday, 10 August 2012

Semiparametric transformation models for current status data with informative censoring

Chyong-Mei Chen, Tai-Fang Lu, Man-Hua Chen and Chao-Min Hsu have a new paper in Biometrical Journal. In common with the recent paper by Wang et al, this considers the estimation of current status data under informative censoring. Here, rather than using a copula to describe the dependence between censoring and failure time distributions, a shared frailty is assumed between the censoring intensity and the failure intensity. The frailty is assumed to be log-normal such that $\lambda_{Ti}(t) = \lambda_{T}(t)\exp(b_i)$ and $\lambda_{Ci}(t) = \lambda_{C}(t)\exp(b_i)$ Covariate effects are also allowed for via a general class of transformation models. For estimation, the authors approximate the semi-parametric maximum likelihood estimate by assuming that the conditional intensities, for the censoring and failure events, are piecewise constant functions with an arbitrarily chosen set of change points. Since maximization of the likelihood requires estimation of a large number of unknown parameters and integration of the frailty distribution, the authors propose an EM algorithm. The method attempts to non-parametrically estimate the failure and censoring time distributions and also the variance of the frailty term. While the assumed dependence between T & C is reasonably restrictive, ie. the frailty could feasible have appeared as $\inline \exp(\psi b_i)$ within the intensity for C allowing other types of dependence. Nevertheless, even with the restrictions it is not clear how the overall model is identifiable. We can only observe $\inline (X_i, \delta_i)$ where $\inline \delta_i$ is an indicator for whether the failure has occurred by censoring time $\inline X_i$ . Log-normal frailties are not particularly nice computationally, whereas a Gamma frailty would allow some tractability. In the case of a shared $\inline \Gamma(v,v)$ frailty it can be shown that the marginal distribution of censoring times is $P(X = x) = \lambda_C(x) \left\[ \frac{v}{\Lambda_C(x) + v}\right\]^{v+1}$ and the conditional probability of a failure by time X is given by $P(T \leq X | X = x) = 1 - \left( \frac{\Lambda_C(x) + v}{\Lambda_C(x) + \Lambda_T(x) + v} \right )^{v+1}$ The problem is that we can vary the value of v and find new cumulative intensity functions which will result in the same distribution functions. The addition of covariates via a particular model facilitates some degree of identifiability but, in a similar way to frailty terms in univariate Cox-proportional hazard models, this could just as easily be describing misspecification of the covariate model rather than true dependence.

Monday, 2 July 2012

Nonparametric estimation of current status data with dependent censoring

Chunjie Wang, Jianguo Sun, Liuquan Sun, Jie Zhou and Dehui Wang have a new paper in Lifetime Data Analysis. This considers estimation of the survivor distribution for current status data when there is dependence between the observation and survival times. Standard current status data models assume that the observation time X is independent of the survival time T. The authors note that from current status data it is possible to estimate the distribution of the observation times $G(x) = P(X \leq x)$ and p_1(x) = P(X>x,T<x)

(or with an analogous quantity: p_1(x) = P(X<x,T<x)

) and that these quantities uniquely define the marginal distribution of T, the (somewhat large) caveat being that the copula linking F and G must be fully specified. To estimate F(x) from observed data, they suggest considering the identity: $P(T < X, X < x) = \int_{0}^{x} C_{v}(F(y),G(y)) dG(y)$ where $\inline C_{v}(u,v) = \frac{\partial}{\partial v} C(u,v)$ and replacing the left hand side and G(x) with empirical quantities to obtain $\sum_{l=1}^{j} C_{v}[F(x_l),\hat{G}(x_l)]\hat{g}(x_l) = \hat{p}_2(x_j)$ where $\inline \hat{p}_2(t) = \frac{1}{n} \sum_{i} I(X_i < t, \delta_i = 1)$ , which is then solved for F(x). This approach to estimation seems a little bit clunky particularly because the resultant F(x) are not guaranteed to be monotonically increasing in x and do not seem to be guaranteed to be in [0,1] either. While they suggest a modification to let $\tilde{F}_1(x_j) = \max\{\hat{F}(x_l); l=1,\ldots,j\}$ to coerce the estimate to be monotonic, it seems that a more efficient estimator would use some variant of the pool-adjacent-violators algorithm at some juncture. The need to fully specify the copula is similar to the situation with misclassified current status data where it is necessary to know the error probabilities. As a sensitivity analysis it has some similarities to the approach for assessing dependent censoring in right-censored parametric survival models by Siannis et al. In the discussion the authors mention the possibility of extension to more general interval censored data. Once there are repeated observations from an individual there may be greater scope to estimate the degree of dependency between observations and the failure time, although an increased amount of modelling of the observation process would probably be required.

Wednesday, 25 April 2012

Use of alternative time scales in Cox proportional hazard models: implications for time-varying environmental exposures

Beth Griffin, Garnet Anderson, Regina Shih and Eric Whitsel have a new paper in Statistics in Medicine. The paper investigates the use of different time scales (e.g. other than either study time or patient age) in cohort studies analysed via Cox proportional hazard models. Of particular focus is the use of calendar time as an alternative time scale, with a motivation in the variation of environmental exposures over time. They perform a simulation study considering two scenarios for the relationship between a time varying environmental exposure variable and calendar year. In the first scenario these are made independent, while the second scenario assumes a linear relationship. As one might expect, when there is no correlation between calendar time and the time dependent environmental exposure, estimates are unbiased regardless of the choice of time scale. When a linear relationship exists then models that account for calendar time, either as the primary time scale or as additional covariates in the model. Again, this isn't necessarily surprising because the model is effectively attempting to include a year effect twice, once in the baseline hazard and again as a large component of the time dependent covariate, e.g. you are fitting a model with "mean environmental exposure in year t" and "environmental exposure" as covariates and expecting the latter to have the correct coefficient. The paper only gives a simulation study, I don't think it would have been that hard to have given some basic theoretical results in addition to the simulations.
The conclusion of the paper, albeit with caveats, is that attempting to adjust for calendar time because you suspect the environmental exposure may be correlated with time is not useful. Clearly if there are other reasons to suspect that calendar year may be important to the hazard in a study then there is an inherent lack of information in the study to establish whether the environmental exposure is directly affecting the hazard or whether it is an indirect effect due to the association with calendar time. Ideally, one would look for other calendar time dependent covariates (e.g. prevailing treatment policy regimes etc.) and perhaps try directly adjusting for them rather than calendar time itself.

Tuesday, 27 March 2012

Modeling Left-truncated and right-censored survival data with longitudinal covariates

Yu-Ru Su and Jane-Ling Wang have a new paper in the Annals of Statistics. This considers the problem of modelling survival data in the presence of intermittently observed time varying covariates when the survival times are both left truncated as well as right-censored. They consider a joint model which involves assuming there exists a random effect which influences both the longitudinal covariate values (which are assumed to be a function of the random effects plus Gaussian error) and the survival hazard. Considerably work has been done in this area in cases where the survival times are merely right-censored (e.g. Song, Davidian and Tsiatis, Biometrics 2002). The authors show that the addition of left-truncation complicates inference quite considerably; firstly because the parameters affecting the longitudinal component may not be identifiable and secondly because the score equations for the regression and baseline hazard parameters become much more complicated than in the right-censoring case. To alleviate this problem, the authors propose to use a modified likelihood rather than either the full or conditional likelihood. The full likelihood can be expressed in terms of an integral over the conditional distribution of the random effect, given the event time occurred after the truncation time. The proposed modification is to instead integrate over the unconditional random effect distribution. Heuristically this is justified by noting that
$f_{A^{*}}(a | Y^{*} \geq t) = f_{A^{*}}(a)S_{Y}(t|A^{*}=a)/S_{Y}(t)$ and
$E\left(S_{Y}(t|A^{*}=a)/S_{Y}(t) \right) = 1, \forall t \geq 0$ where $\inline A^{*}$ is the random effect. The authors also show inference based on this modified likelihood gives consistent and asymptotically efficient estimators of the regression parameters and the baseline survival hazard.

An EM algorithm to obtain the MMLE is outlined, in which the E-step involves a multi-dimensional integral which the authors evaluate through Monte Carlo approximation. The implementation of the EM algorithm is simplified if the random effect is assumed to have a multivariate Normal distribution.

Wednesday, 14 March 2012

Regression analysis based on conditional likelihood approach under semi-competing risks data

Jin-Jian Hsieh and Yu-Ting Huang have a new paper in Lifetime Data Analysis. This develops a conditional likelihood approach to fitting regression models with time-dependent covariate effects for the time to the non-terminal event in a semi-competing risks model. In line with some other recent treatments of competing and semi-competing risks (e.g. Chen, 2012), the authors use a copula to model the dependence between the times to the competing events. The authors express the data at a particular time point t in terms of indicator functions
$\inline I_{x}(t) = I(X > t)$ and $\inline I_{Y}(t) = I(Y > t)$ where $\inline X = T_1 \wedge T_2 \wedge C$ refers to the time of the non-terminal event (or its censoring time) and $\inline Y = T_2 \wedge C$ refers to the terminal event (or its censoring time). The authors show that the likelihood for the data at t can be expressed in terms of a term relating solely to the $\inline I_{Y}(t)$ , which depends only on the covariate function for the terminal event, and a term based on $\inline I_{X}(t) | I_{Y}(t)$ which contains all the information on the covariate function of interest for the non-terminal event. They therefore propose to base estimation on maximization on a conditional likelihood based on this conditional term only. The authors allow the copula itself to have a time specific copula dependence parameter. Solving the score function at a particular value of t gives consistent estimates of the parameters so the authors adopt a "working independence" model to obtain estimates across the sequence of times. The resulting estimates are step-functions that only change at observed event times.

Presumably allowing the copula dependence to be time varying could lead to situations where, for instance, $\inline P(T_1 > s, T_2 > t | Z)$ , is not a decreasing function in s for fixed t and Z. So whilst allowing the copula dependence to vary is convenient computationally, it is unclear how the model would be interpreted if the dependence parameter was estimated to vary considerably (perhaps that the chosen copula family were inappropriate?).

As usual, with these models that ascribe an explicit dependence structure between the competing event times, one has to ask whether the marginal distribution of the non-terminal event is what we are really interested in and whether we should not instead be sticking to observable quantities like the cumulative incidence function?

Sunday, 1 January 2012

Bayesian analysis of multistate event history data: beta-Dirichlet process prior

Yongdai Kim, Lancelot James and Rafael Weissbach have a new paper in Biometrika. This develops a conjugate prior process suitable for non-parametric and semi-parametric Bayesian modelling of right-censored multi-state Markov data. The model is parametrised in terms of the sum of the intensities out of each state
$A_{h.}(t) = \sum_{j \neq h} A_{hj}(t)$
and instantaneous transition probabilities
$p_{hj}(t) = dA_{hj}(t)/dA_{h.}(t)$
A possible choice for a prior process is a Dirichlet distribution but this is not independent in the limit of a continuous time process. Instead the authors propose a new beta-Dirichlet process consisting of a beta distributed part which determines the increment in $\inline A_{h.}$ (between 0 and 1) and a Dirichlet part determining the instantaneous transition probabilities for each particular transition. The authors prove this prior process is conjugate in the continuous limit.

A semi-parametric regression model is proposed, which the authors term as a semi-proportional intensities model. This consists of a proportional intensities model for the all-cause hazard of exiting state h and a multinomial type model for the instantaneous transition probabilities out of state h and bears some resemblance to the vertical modeling parametrization for competing risks regression.

In an aside the authors claim that interval censoring can easily be dealt with by treating the unknown transition time as missing data that can be accounted for in the Gibbs sampling. This only works under the assumption that only one transition can have occurred between examination times. While other authors have made this assumption (e.g. Foucher et al 2007) it is dubious to say the least and likely to result in biased estimates. Similarly, the authors claim right-censoring can be dealt with by treating a censoring event as an additional state. While this will obviously allow the observed process to be modelled, it is not clear how this approach would allow the underlying process (without censoring) is estimated?

Friday, 14 October 2011

Shape constrained nonparametric estimators of the baseline distribution in Cox proportional hazards model

Hendrik Lopuhaa and Gabriela Nane have a preprint on the Arxiv that considers estimators for the baseline hazard of the Cox proportional hazards model, subject to monotonicity constraints.

The basic idea is fairly straightforward. Using a standard NPMLE argument, the estimated hazard takes the form of a step function which is 0 before the first event time, and then piecewise constant between subsequent event times. For a fixed value of the regression coefficients, $\inline \beta$ , the baseline hazards can be found by performing a weighted pooled adjacent violators algorithm taking the weights as the (covariate corrected) time at risk in the next period and response as the empirical hazard in the next period i.e. 1/(time at risk) if the event was a failure and 0 if it was a censoring.

The authors argue that since Cox regression will give a consistent estimate of $\inline \beta$ regardless of whether the baseline hazard is monotone or not, they propose a two-stage approach where one estimates beta using a standard Cox partial likelihood and then uses this value of $\inline \beta$ to obtain the monotone baseline hazard. Obviously this estimator will have the same asymptotic properties as one based on maximizing the full likelihood jointly. Naively, a profile likelihood approach would seem possible since calculating the likelihood conditional on $\inline \beta$ is straightforward (though it is not clear whether it would be differentiable). Interestingly some quick simulations on Weibull data with shape>1 seem to suggest the full likelihood estimator of $\inline \beta$ (using the monotonicity constraint) is more biased and less efficient for small samples.

A substantial proportion of the paper is dedicated to obtaining the asymptotic properties of the estimators, which are non-standard and require empirical process theory. There is also some discussion of obtaining baseline hazards based on an increasing density constraint via analogous use of the Grennader estimator. Update: This paper has now been published in the Scandinavian Journal of Statistics.

Multi-state modelling

Links

Followers

Blog Archive