Tuesday, 1 March 2011

Modelling time to event with observations made at arbitrary times

Matthew Sperrin and Iain Buchan have a new paper available at the arXiv. This concerns the interesting problem of developing survival models in the presence of time dependent covariates that are only observed at baseline, for data with delayed entry (i.e. left truncation).

A large proportion of the paper is devoted to arguing that in studies where the entry time does not involve an intervention and follow-up is over a long period, it is more appropriate to use age as the primary timescale rather than using study time and including age at entry as a covariate. The latter approach can lead to counterintuitive results where a 50 year old is more likely to survive to 70 than a 55 year old with the same covariate values except age.

The paper is less convincing when it comes to possible solutions to the problem of time dependent covariates only observed at baseline. Their approach is essentially to assume that

for some arbitrary function f(t). Hence it is assumed that the time dependent covariates vary deterministically so that a subject who has lower than average blood pressure aged 30, will have increasing blood pressure as they age but stay at the same (age specific) quantile. Obviously a first issue is how realistic this is, especially for modelling internal covariates. Assuming this model is correct, Sperrin and Buchan propose a two-step procedure in which they firstly regress the observed baseline covariate values on t to estimate f(t). They then use the residuals from this procedure to put into standard proportional hazards or accelerated failure time models. A problem that they freely acknowledge is that the observed baseline covariate values will be biased if they are, for instance, both increasing with age and associated with increased hazard. They suggest, but don't implement, some form of iterative procedure to accommodate this.

They seem to have overlooked a trivial way around this problem: For a Cox proportional hazards models when f(t) is assumed linear with unknown slope v (as in their example), if we observe covariate value at time then this implies

i.e. the deterministic part of the time dependent covariate is merely absorbed into the baseline hazard and it is just necessary to include age at entry as an additional covariate. For non-linear models we have:

Here the latter term could be accommodated, for instance, by including a spline function of time of entry within the Cox-regression. A similar argument can be made for accelerated failure time models, e.g. a Buckley-James model, in that case the f(t) term can be incorporated into the residual survivor distribution. Similarly, it is easily extended to multiple covariates.

If a parametric model is used under this formulation it is only valid under the assumption that the parametric hazard family applies to the hazard with the deterministic part of the time dependent covariates included, but this is also true for Sperrin and Buchan's approach (assuming f(t) has been estimated correctly).

Having resolved these issues it is still unclear when such a model would be appropriate. It seems that a useful prediction model is only possible from baseline measured covariates if the covariates vary in time in a deterministic way. For truly time-varying covariates, while studies with relatively short follow-up will be able to give reasonable estimates of the effect of the covariate on survival, prediction requires a longitudinal set of covariate observations. Landmarking or multi-state modelling are then possible approaches to account for the time varying covariates. Update: This paper has now been published in Statistics in Medicine (highlighting the quality of their peer reviewers!).

No comments: