Monday, 2 July 2012

Nonparametric estimation of current status data with dependent censoring

Chunjie Wang, Jianguo Sun, Liuquan Sun, Jie Zhou and Dehui Wang have a new paper in Lifetime Data Analysis. This considers estimation of the survivor distribution for current status data when there is dependence between the observation and survival times. Standard current status data models assume that the observation time X is independent of the survival time T. The authors note that from current status data it is possible to estimate the distribution of the observation times $G(x) = P(X \leq x)$ and $p_1(x) = P(X>x,T (or with an analogous quantity: $p_2(x) = P(X) and that these quantities uniquely define the marginal distribution of T, the (somewhat large) caveat being that the copula linking F and G must be fully specified. To estimate F(x) from observed data, they suggest considering the identity: $P(T < X, X < x) = \int_{0}^{x} C_{v}(F(y),G(y)) dG(y)$ where $\inline C_{v}(u,v) = \frac{\partial}{\partial v} C(u,v)$ and replacing the left hand side and G(x) with empirical quantities to obtain $\sum_{l=1}^{j} C_{v}[F(x_l),\hat{G}(x_l)]\hat{g}(x_l) = \hat{p}_2(x_j)$ where $\inline \hat{p}_2(t) = \frac{1}{n} \sum_{i} I(X_i < t, \delta_i = 1)$, which is then solved for F(x). This approach to estimation seems a little bit clunky particularly because the resultant F(x) are not guaranteed to be monotonically increasing in x and do not seem to be guaranteed to be in [0,1] either. While they suggest a modification to let $\tilde{F}_1(x_j) = \max\{\hat{F}(x_l); l=1,\ldots,j\}$ to coerce the estimate to be monotonic, it seems that a more efficient estimator would use some variant of the pool-adjacent-violators algorithm at some juncture. The need to fully specify the copula is similar to the situation with misclassified current status data where it is necessary to know the error probabilities. As a sensitivity analysis it has some similarities to the approach for assessing dependent censoring in right-censored parametric survival models by Siannis et al. In the discussion the authors mention the possibility of extension to more general interval censored data. Once there are repeated observations from an individual there may be greater scope to estimate the degree of dependency between observations and the failure time, although an increased amount of modelling of the observation process would probably be required.