Sunday, 14 October 2012

Assessing age-at-onset risk factors with incomplete covariate current status data under proportional odds models

Chi-Chung Wen and Yi-Hau Chen have a new paper in Statistics in Medicine. This considers estimation of a proportional odds regression model for current status data in cases where a subset of the covariates may be missing at random for a subset of the patient population.

It is assumed that the probability that a portion of the covariates is missing depends on all the other observable outcomes (the failure status, the survey time and the rest of the covariate vector). The authors propose to fit a logistic regression model, involving all subjects in the dataset, for this probability of missingness. To fit the regression model for the current status data itself, they propose to use what they term a "validation likelihood estimator." This involves only working with the subset of patients with complete data but maximizing a likelihood that conditions on the fact that the whole covariate vector was observed. An advantage of using the proportional odds model over other candidate models (e.g. proportional hazards) is that the resulting likelihood remains of the proportional odds form.

Clearly a disadvantage of this "validation likelihood estimator" is that the data from subjects who have incomplete covariates is not used directly in the regression model. As a result the estimator is likely to be less efficient than approaches that effectively attempt to impute the missing covariate values. The authors argue that the validation likelihood approach will tend to be more robust since it is not necessary to make (parametric) assumptions about the conditional distribution of the missing covariates.

No comments: