Sunday, 28 October 2012

Survival analysis with time varying covariates measured at random times by design

Stephen Rathbun, Xiao Song, Benjamin Neustiftler and Saul Shiffman have a new paper in Applied Statistics (JRSS C). This considers estimation of a proportional hazards survival model in the presence of time dependent covariates which are only intermittently sampled at random time points. Specifically they are interested in examples relating to ecological momentary assessment where data collection may be via electronic devices like smart phones and the decision on sampling times can be automated. They consider a self-correcting point-process sampling design, where the intensity of the sampling process depends on the past history of sampling times, which allows an individual to have random sampling times that are more regular than would be achieved from a Poisson process.

The proposed method of estimation is to use inverse intensity weighting to obtain an estimate of an individual's integrated hazard up to the event time. Specifically the estimator is $\hat{H}_i(T_i) = \sum_{j=1}^{N_i} h_i(u_{ij}) \pi^{-1}_i(u_{ij})$ for a individual with $\inline N_i$ sampling times at times $\inline 0 \leq u_{ij} \leq T_i$ and point process sampling intensity $\inline \pi(t)$. This then replaces the integrated hazard in an approximate log-likelihood.

In part of the simulation study and in an application the exact point process intensity is unknown and taken from empirical estimates from the sample. Estimating the sampling intensity didn't seem to have major consequences on the integrity of the model estimates. This seems to suggest the approach might be applicable in other survival models where the covariates are sampled longitudinally in a subject specific manner, provided a reasonable model for sampling can be devised.

Drawbacks of the method seem to be that covariates measured at baseline (which is not a random time point) cannot be incorporated in the estimate and that it seems that the covariates must be measured at the event time which may not be the case in medical contexts. The underlying hazard also needs to be specified parametrically, but as stated flexible spline modelled can be used.