Multi-state modelling: Lifetime Data Analysis

Showing posts with label Lifetime Data Analysis. Show all posts

Friday, 12 October 2012

Nonparametric estimation of the cumulative intensities in an interval censored competing risks model

Halina Frydman and Jun Liu have a new paper in Lifetime Data Analysis. This concerns non-parametric estimation for competing risks models under interval censoring. The problem of estimating the cumulative incidence functions (or sub-distribution functions) under interval censoring has been considered by Hudgens et al (2001) and involves an extension of the NPMLE for standard survival data under interval censoring.

The resulting estimates of the cumulative incidence functions are only defined up to increments on intervals. Moreover, the intervals by which the CIFs are defined are not the same for each competing risk. This causes problems if one wants to convert the CIFs into estimates of the cumulative cause-specific hazards. Frydman and Liu propose estimating the cumulative cause-specific hazards by first constraining the NPMLEs of the CIFs to have the same intervals of support (NB: this is just a sub-set of the set of all NPMLEs involving sub-dividing the intervals) and adopting a convention to distribute the increment within the resulting sub-intervals (they assume an equal distribution across sub-interval).

In addition they show that an ad-hoc estimator of the cumulative hazards based on the convention that the support of each interval of the NPMLE for each CIF occurs at its midpoint leads to biased results. They also show that their estimator has standard N^0.5 convergence when the support of the observation time distribution is discrete and finite.

Tuesday, 2 October 2012

Applying competing risks regression models: an overview

Bernhard Haller, Georg Schmidt and Kurt Ulm have a new paper in Lifetime Data Analysis. This reviews approaches to building regression models for competing risks data. In particular, they consider cause specific hazard regression, subdistribution hazard regression (via both the Fine-Gray model and pseudo-observations), mixture models and vertical modelling. The distinction between mixture models and vertical modelling is the order of the conditioning. In mixture models, P(D,T) = P(D)P(T|D)

imply a separate time to event model is developed for each cause of death. Whereas in vertical modelling, P(D,T) = P(T)P(D|T)

meaning there is an overall "all cause" model for survival with a time dependent model for the conditional risk of different causes. Vertical modelling fits in much nearer to the standard hazard based formulation used in classical competing risks. Haller et al also prefer it to mixture modelling for computational reasons. The authors conclude however that vertical modelling's main purpose is as an exploratory tool to check modelling assumptions which may be made in a more standard competing risks model. They suggest that in a study, particularly a clinical trial, it would be more appropriate to use a Cox model either on the cause-specific hazards or on the sub-distribution hazard. The choice between these two models would depend on the particular research question of interest.

Monday, 2 July 2012

Nonparametric estimation of current status data with dependent censoring

Chunjie Wang, Jianguo Sun, Liuquan Sun, Jie Zhou and Dehui Wang have a new paper in Lifetime Data Analysis. This considers estimation of the survivor distribution for current status data when there is dependence between the observation and survival times. Standard current status data models assume that the observation time X is independent of the survival time T. The authors note that from current status data it is possible to estimate the distribution of the observation times $G(x) = P(X \leq x)$ and p_1(x) = P(X>x,T<x)

(or with an analogous quantity: p_1(x) = P(X<x,T<x)

) and that these quantities uniquely define the marginal distribution of T, the (somewhat large) caveat being that the copula linking F and G must be fully specified. To estimate F(x) from observed data, they suggest considering the identity: $P(T < X, X < x) = \int_{0}^{x} C_{v}(F(y),G(y)) dG(y)$ where $\inline C_{v}(u,v) = \frac{\partial}{\partial v} C(u,v)$ and replacing the left hand side and G(x) with empirical quantities to obtain $\sum_{l=1}^{j} C_{v}[F(x_l),\hat{G}(x_l)]\hat{g}(x_l) = \hat{p}_2(x_j)$ where $\inline \hat{p}_2(t) = \frac{1}{n} \sum_{i} I(X_i < t, \delta_i = 1)$ , which is then solved for F(x). This approach to estimation seems a little bit clunky particularly because the resultant F(x) are not guaranteed to be monotonically increasing in x and do not seem to be guaranteed to be in [0,1] either. While they suggest a modification to let $\tilde{F}_1(x_j) = \max\{\hat{F}(x_l); l=1,\ldots,j\}$ to coerce the estimate to be monotonic, it seems that a more efficient estimator would use some variant of the pool-adjacent-violators algorithm at some juncture. The need to fully specify the copula is similar to the situation with misclassified current status data where it is necessary to know the error probabilities. As a sensitivity analysis it has some similarities to the approach for assessing dependent censoring in right-censored parametric survival models by Siannis et al. In the discussion the authors mention the possibility of extension to more general interval censored data. Once there are repeated observations from an individual there may be greater scope to estimate the degree of dependency between observations and the failure time, although an increased amount of modelling of the observation process would probably be required.

Wednesday, 14 March 2012

Regression analysis based on conditional likelihood approach under semi-competing risks data

Jin-Jian Hsieh and Yu-Ting Huang have a new paper in Lifetime Data Analysis. This develops a conditional likelihood approach to fitting regression models with time-dependent covariate effects for the time to the non-terminal event in a semi-competing risks model. In line with some other recent treatments of competing and semi-competing risks (e.g. Chen, 2012), the authors use a copula to model the dependence between the times to the competing events. The authors express the data at a particular time point t in terms of indicator functions
$\inline I_{x}(t) = I(X > t)$ and $\inline I_{Y}(t) = I(Y > t)$ where $\inline X = T_1 \wedge T_2 \wedge C$ refers to the time of the non-terminal event (or its censoring time) and $\inline Y = T_2 \wedge C$ refers to the terminal event (or its censoring time). The authors show that the likelihood for the data at t can be expressed in terms of a term relating solely to the $\inline I_{Y}(t)$ , which depends only on the covariate function for the terminal event, and a term based on $\inline I_{X}(t) | I_{Y}(t)$ which contains all the information on the covariate function of interest for the non-terminal event. They therefore propose to base estimation on maximization on a conditional likelihood based on this conditional term only. The authors allow the copula itself to have a time specific copula dependence parameter. Solving the score function at a particular value of t gives consistent estimates of the parameters so the authors adopt a "working independence" model to obtain estimates across the sequence of times. The resulting estimates are step-functions that only change at observed event times.

Presumably allowing the copula dependence to be time varying could lead to situations where, for instance, $\inline P(T_1 > s, T_2 > t | Z)$ , is not a decreasing function in s for fixed t and Z. So whilst allowing the copula dependence to vary is convenient computationally, it is unclear how the model would be interpreted if the dependence parameter was estimated to vary considerably (perhaps that the chosen copula family were inappropriate?).

As usual, with these models that ascribe an explicit dependence structure between the competing event times, one has to ask whether the marginal distribution of the non-terminal event is what we are really interested in and whether we should not instead be sticking to observable quantities like the cumulative incidence function?

Sunday, 20 November 2011

Isotonic estimation of survival under a misattribution of cause of death

Jinkyung Ha and Alexander Tsodikov have a new paper in Lifetime Data Analysis. This considers the problem of estimation of the cause specific hazard of death from a particular cause in the presence of competing risks and misattribution of cause of death. They assume they have right-censored data for which there is an associated cause of death, but that there is some known probability r(t) of misattributing the cause of death from a general cause to a specific cause (in this case pancreatic cancer) at time t.

The authors consider four estimators for the true underlying cause-specific hazards. Firstly they consider a naive estimator which obtains Nelson-Aalen estimates of the observed CSHs and transforms them to true hazards by solving the implied equations

$\begin{matrix} d\hat{\Lambda}^{Obs}_{1}(t) &=& r(t)d\Lambda_{2}(t) + d\Lambda_{1}(t)\\ d\hat{\Lambda}^{Obs}_{2}(t) &=& (1-r(t))d\Lambda_{2}(t)\\ \end{matrix}$

This estimator is unbiased but has the drawback that there are negative increments to the cause-specific hazards.
The second approach is to apply a (constrained) NPMLE estimate for instance via an EM algorithm. The authors show that, unless the process is in discrete time (such that the number of failures at a specific time point increases as the sample size increases), this estimator is asymptotically biased.
The third and fourth approaches take the naive estimates and apply post-hoc algorithms to ensure monotonicity of the cumulative hazards, by using the maximum observed naive cumulative hazard up to time t (sup-estimator) or by applying the pool-adjacent-violators algorithm to the naive cumulative hazard. These estimators have the advantage of being both consistent and guaranteed to be monotonic.

Wednesday, 24 August 2011

Maximum likelihood analysis of semicompeting risks data with semiparametric regression models

Yi-Hau Chen has a new paper in Lifetime Data Analysis which extends his 2010 JRSS B paper from competing risks to semi-competing risks data. Essentially in both cases the main idea is to model the dependence between the competing risks by assuming their event time distributions are related via some family of copulas. Mathematically this approach is quite elegant as it allows regression models to be built on the marginal distributions of each failure time, with the inherent dependency in the censoring accounted for through the copula. From a practical perspective, particularly with semi-competing risks data and medical applications one has to question the sensibleness of the model and the objective of modelling marginal distributions.

It seems most useful to follow Xu, Kalbfleisch and Tai and view semi-competing risks as an illness-death model. After accounting for covariates, a patient's illness time and death time can be related either due to a shared frailty term, which it may be sensible to assume is determined from the outset, or through onset of illness causing death to occur sooner than it would have done. In the copula model these two distinct factors get pooled together. It is questionable how well the copula model would perform when the true process has a more event determined dependence.

More importantly the question has to be asked why you would want to try and estimate the "illness free" survival distribution? This breaks Andersen and Keiding's guideline to "Stick to this world". Illness (or relapse) is never going to be eliminated. More sensible measures like the cumulative incidence function of death (without illness having occurred) can of course be derived from Chen's copula model, although analogously to the case of semi-parametric models on cause-specific hazards, the effect of covariates on the CIF may be complicated.

Wednesday, 13 July 2011

Combined survival analysis of cardiac patients by a Cox PH model and a Markov chain

Michal Shauly, Gad Rabinowitz, Harel Gilutz and Yisrael Parmet have a new paper in Lifetime Data Analysis. This considers methods for modelling the effect of a mixture of time dependent and time constant covariates on overall survival, with the complication that the time dependent covariates are only observed at a discrete set of time points. They propose to firstly fit a Cox proportional hazard model assuming all covariates are fixed at their baseline values. The main modelling approach is to assume a discrete-time homogeneous Markov model with states corresponding to the combinations of the time dependent covariates (which are categorical or need to be categorized) and death. Transitions between all covariates states are assumed to be possible between each time point. The Cox model is used to determine which of the constant covariates should be considered in the Markov model. For this the authors propose to again categorize the covariates and consider a separate Markov model for each level of the covariates. Having obtained the estimates from the Markov models, it is then possible to calculate expected survival times for patients conditional on their baseline characteristics.

In general the approach proposed is reasonably sensible. However, there is panel data available for the time dependent covariates. It therefore seems possible to fit a time continuous Markov model to the data using methods appropriate for panel data (e.g. Kalbfleisch and Lawless, 1985). This approach has the advantage that the exact time of the death events can still be used.

The authors rely on categorization throughout. While this seems necessary for the time dependent covariates, there seems scope for using multinomial logit models for other covariates. Similarly, by allowing a different mortality probability for each combination of covariates they are effectively fitting covariate models with interactions (i.e. the effect of being in covariate level 2 compared to 1, is different depending on which level(s) of the other time dependent covariate(s) a subject is in). While such interactions may be necessary, it might be better to allow simpler models where only the evolution of the time dependent covariates is kept general. This is another advantage of a continuous time model since covariate effects could remain on the hazard (transition intensity) scale as in the Cox PH model.

Finally, the authors give a partial justification of the use of a time homogeneous Markov model through the Cox PH model having an approximately constant baseline hazard. It should be noted that a time homogeneous Markov model does not imply a constant absorption hazard (unless the model begins in the quasi-stationary distribution). Conversely, while a constant hazard might suggest homogeneity more than non-homogeneity, it is nevertheless possible to construct non-homogeneous processes with constant (or near constant) marginal absorption hazards. The authors do however report a $\inline \chi^2$ statistic which gives a better justification of homogeneity.

Wednesday, 11 November 2009

Computation of the asymptotic null distribution of goodness-of-fit tests for multi-state models

Andrew Titman has a new paper in Lifetime Data Analysis. This is essentially a continuation of previous papers by Aguirre-Hernandez and Farewell and by Titman and Sharples on Pearson-type goodness-of-fit tests for Markov and hidden Markov models on panel observed data. A practical problem with the tests is that the null distribution depends on the true parameter value and the observation scheme and that a chi-squared approximation can perform inadequately. A parametric bootstrap could be used to find the upper 95% point of the distribution. However, for many models the re-fitting required may take an unacceptable amount of time. Titman shows that, conditional on a fixed observation scheme, the asymptotic distribution can be expressed as a weighted sum of independent $\inline \chi^2_1$ random variables, where the weights depend on the true parameter values. A simulation study shows that computing the weights based on the maximum likelihood estimate of the parameter values, gives tests of close to the appropriate size for realistic sample sizes. The method can be applied to both Markov and misclassification-type hidden Markov models, but only when all transitions are interval-censored.

Tuesday, 28 July 2009

Nonparametric inference and uniqueness for periodically observed progressive disease models

Beth Griffin and Stephen Lagakos have a new paper in Lifetime Data Analysis. They consider panel observed progressive disease model (chain-of-events) data. The NPMLE estimator under a discrete-time semi-Markov assumption was developed by Sternberg and Satten (Biometrics, 1999). For datasets where individuals are observed at different times, some discretization of the data is required. An issue with the NPMLE is that it is not guaranteed to be unique and therefore reporting a single NPMLE may be misleading. The paper develops procedures for determining which components of the NPMLE are unique based on considering various re-parameterizations of the likelihood. The method is demonstrated on three example datasets including one on bronchiolitis obliterans syndrome in post-lung transplantation patients and one on primary HIV infection. In addition, the authors also provide a more intuitive algorithm for obtaining the NPMLE than the self-consistency algorithm of Sternberg and Satten.

Multi-state modelling

Links

Followers

Blog Archive