Monday, 10 September 2012

Practicable confidence intervals for current status data

Byeong Yeob Choi, Jason P. Fine and M. Alan Brookhart have a new paper in Statistics in Medicine. Essentially the paper clarifies the practical implications of results relating to the asymptotic theory for current status data. In particular, it is known that the nonparametric bootstrap is inconsistent for current status data when the distribution of sampling times is continuous. The authors note that the most reliable method considered previously is a previous study by Ghosh et al concluded that construction of confidence intervals based on inversion of the likelihood ratio statistic (as original proposed in Banerjee and Wellner (2001)) gave the best results particularly for smaller sample sizes. However, they also note that this approach is difficult to implement (e.g. lack of available software). Here they therefore pursue approaches based on using the limiting Chernoff distribution to construct Wald type confidence intervals, but using cloglog or logit transformations to get better coverage, and also look at the performance of simple non-parametric bootstrap.

Perhaps unsurprisingly they find that, when sample sizes are relatively small, the performance of all methods is dependent on the quantile of the failure time distribution at which the confidence interval is computed (e.g. it performs much better when t is close to the median) and the observation density at the time point considered (performance is poorer at times with a lower observation density). Using cloglog or logit transformations is found to improve coverage, but the non-parametric bootstrap tended to outperform this approach for smaller sample sizes, suggesting the admissibility of using the non-parametric bootstrap.

An apparent omission in the paper is any mention of the altered asymptotics in the case where the observation distribution has support at a finite set of time points (or indeed where the rate of increase of points is less than ). This issue is most comprehensively discussed in Tang et al which didn't come out until after the paper was apparently submitted. However, the basic issue of standard asymptotics (and by implication a consistent bootstrap) when there is a finite set of observation points is discussed in Maathuis and Hugdens (2011) which the authors cite. For instance, in the Hoel and Walburg mice dataset used as illustration, the resolution of the data is to the nearest day. It is therefore reasonable to assume that in this case were the sample size to increase, the number of observations would either be bounded by a fixed value (e.g. ~1000) or else the number of unique points would increase at a rate much less than .

No comments: