Showing posts with label Applied Statistics. Show all posts
Showing posts with label Applied Statistics. Show all posts

Sunday, 28 October 2012

Survival analysis with time varying covariates measured at random times by design


Stephen Rathbun, Xiao Song, Benjamin Neustiftler and Saul Shiffman have a new paper in Applied Statistics (JRSS C). This considers estimation of a proportional hazards survival model in the presence of time dependent covariates which are only intermittently sampled at random time points. Specifically they are interested in examples relating to ecological momentary assessment where data collection may be via electronic devices like smart phones and the decision on sampling times can be automated. They consider a self-correcting point-process sampling design, where the intensity of the sampling process depends on the past history of sampling times, which allows an individual to have random sampling times that are more regular than would be achieved from a Poisson process.

The proposed method of estimation is to use inverse intensity weighting to obtain an estimate of an individual's integrated hazard up to the event time. Specifically the estimator is for a individual with sampling times at times and point process sampling intensity . This then replaces the integrated hazard in an approximate log-likelihood.

In part of the simulation study and in an application the exact point process intensity is unknown and taken from empirical estimates from the sample. Estimating the sampling intensity didn't seem to have major consequences on the integrity of the model estimates. This seems to suggest the approach might be applicable in other survival models where the covariates are sampled longitudinally in a subject specific manner, provided a reasonable model for sampling can be devised.

Drawbacks of the method seem to be that covariates measured at baseline (which is not a random time point) cannot be incorporated in the estimate and that it seems that the covariates must be measured at the event time which may not be the case in medical contexts. The underlying hazard also needs to be specified parametrically, but as stated flexible spline modelled can be used.

Tuesday, 31 July 2012

A multistate modelling approach for pancreatic cancer development in genetically high risk families

Kolamunnage-Dona, Vitone, Greenhalf, Henderson and Williamson have a new paper in Applied Statistics. This uses a competing risks model with shared frailties to model data on the progression of pancreatic cancer in the presence of clustering and informative censoring. Clustering is present due to data being available on patients from the same family groups. A patient having a resection causes censoring of the main event, time to pancreatic cancer, but is likely to be informative. This is dealt with by allowing the cause-specific hazards to depend on the same frailty term. The methodology in the paper is very similar to that of Huang and Wolfe (Biometrics, 2002), the only extension being that the current formulation allows for the possibility of time dependent covariates. It isn't clear what if any complication this adds to the original procedure in Huang and Wolfe. An MCEM algorithm is used where the E-step is approximated by using Metropolis-Hastings in order to calculate the required expected quantities. It's not clear what the authors mean when they say the Metropolis-Hastings step use "a vague prior for the frailty variance." Hopefully they mean an improper uniform prior as otherwise they would be pointlessly adding Bayesian features to an otherwise frequentist estimating procedure. On a related computational point, since the random effect for each cluster is one dimensional, I suspect using a Laplace approximation to compute the required integrals at each step would perform quite well and be a lot faster than using Metropolis-Hastings

In the analysis there does seem evidence for a shared frailty within clusters, but it appears that the parameter which links the frailty in the time to pancreatic cancer to the time to resection intensity is hard to identify having a very wide 95% confidence interval encompassing strong negative dependence through independence to strong positive dependence. The typical cluster size in the data is quite small (e.g. median is 3) and this is probably insufficient as you would ideally need some subjects to fail and some to be informatively censored in each cluster to assess their association. The point estimate is negative implying a counter intuitive negative association between resection and pancreatic cancer. The authors suggest as a model extension to allow a specific bivariate frailty linking competing risks (presumably within individuals?) - which is unlikely to be helpful.

Thursday, 6 October 2011

A case-study in the clinical epidemiology of psoriatic arthritis: multistate models and causal arguments

Aidan O'Keeffe, Brian Tom and Vern Farewell have a new paper in Applied Statistics. They model data on the status of 28 individual hand joints in psoriatic arthitis patients. The damage to joints for a particular patient is expected to be correlated. Particular questions of interest are whether the damage process has 'symmetry', meaning if a specific joint in one hand is damaged does this increase the hazard of the corresponding joint in the other hand becoming damaged, and whether activity, meaning whether a joint is inflamed, causes the joint damage.

Two approaches to analysing the data are taken. In the first each pair of joints (from the left and right hands) is modelled by a 4 state model where the initial state is no damage to either joint, the terminal state is damage to both joints and progession to the terminal state may be through either a state representing damage only to the left joint or through a state representing damage only to the right joint. The correlation between joints within a patient is incorporated by allowing a common Gamma frailty which affects all transition intensities for all 14 pairs of joints. Symmetry can then be assessed by comparing the hazard of damage to the one joint with or without damage having occurred to the other joint. Under this approach, activity is incorporated only as a time dependent covariate. There are two limitations to this. Firstly, panel observation means that some assumption about the status of activity between clinic visits has to be made (e.g. it keeps the status observed at the last clinic visit until the current visit) and, from a causal inference perspective, activity is an internal time dependent covariate so treating it as fixed makes it harder to infer causality.

The second approach seeks to address these problems by jointly modelling activity and joint damage as linked stochastic processes. In this model each of the 28 joints is represented by a three-state model where the first two states represent presence and absence of activity for an undamaged joint, whilst state three represents joint damage. For this model, rather than explicitly fitting a random effects model, a GEE type approach is taken where the parameter point estimates are based on maximizing the likelihood assuming independence of the 28 joint processes, but standard errors are calculated using a sandwich estimator based on patient level clustering. This approach is valid provided the point estimates are consistent, which will be the case if the marginal processes are Markov. For instance if the transition intensities of type {ij} are linked via a random effect u such that
but crucially if other transition intensities also have associated random effects, these must be independent of .

The basic model considered is (conditionally) Markov. The authors attempt to relax this assumption by allowing the transition intensities to joint damage from the non-active state to additionally depend on a time dependent covariate indicating whether activity has ever been observed. Ideally the intensity should depend on whether the patient has ever been active rather than whether they have been observed to be active. This could be incorporated very straightforwardly by modelling the process as a 4 state model where state 1 represents never active, no damage, state 2 represents active, no damage, state 3 represents not currently active (but have been in the past) and state 4 represents damage. Clearly, we cannot always determine whether the current occupied state is 1 or 3 so the observed data would come from a simple hidden Markov model.

On a practical level, the authors fit the likelihood for the gamma random effects model by using the integrate function in R, for each patient's likelihood contribution for each likelihood evaluation in the (derivative-free) optimization. Presumably this is very slow. Using a simpler more explicit quadrature formulation would likely improve speed (at a very modest cost to accuracy) because the contributions for all patients for each quadrature point could be calculated at the same time and the overall likelihood contributions could then be calculated in the same operation. Alternatively, the likelihood for a single shared Gamma frailty is in fact available in closed form. This follows from arguments extending the results in the tracking model of Satten (Biometrics, 1999). Essentially, we can write every term in the conditional likelihood as some weighted sum of exponentials:

the product of these terms is then still in this form:

Calculating the marginal likelihood then just involves computing a series of Laplace transforms.
Provided the models are progressive, keeping track of the separate terms is feasible, although the presence of time dependent covariates makes this approach less attractive.

Monday, 1 November 2010

A regression model for the conditional probability of a competing event: application to monoclonal gammopathy of unknown significance

Arthur Allignol, Aurélien Latouche, Jun Yan and Jason Fine have a new paper in Applied Statistics (JRSS C). The paper concerns competing risks data and develops methods for regression analysis of the probability of a competing event conditional on no competing event having occurred. In terms of the cumulative incidence functions, for the case of two competing events, this can be written as . In some applications this quantity may be more useful than either the cause-specific hazards or the cumulative incidence functions themselves. One approach to regression is this scenario might be to compute pseudo-observations and perform the regression using those. The authors instead propose use of temporal process regression (Fine, Yan and Kosorok 2004), allowing estimation of time dependent regression parameters, by considering the cross-sectional data at each event time.

Monday, 14 June 2010

An application of hidden Markov models to French variant Creutzfeldt-Jakob disease epidemic

Chadeau-Hyam et al have a new paper in Applied Statistics (JRSS C). This is concerned with modelling vCJD in France. A 5 state multi-state model is assumed, with states representing susceptible to infection, asymptomatic infection, clinical vCJD, death from vCJD and death from causes other than vCJD. The data available are extremely sparse since no reliable test is available to distinguish susceptible from asymptomatic. Indeed the only data actually observed are the yearly transitions from infected to clinical vCJD and clinical vCJD to death. As a result, pseudo-observed quantities, estimated in previous studies or from general population data are used to get quantities such as the numbers susceptible. Various approximations in terms of the number and type of transitions possible by an individual in one year are also made. Some simulations are performed which suggest the results are reasonably robust to these approximations.

The most interesting methodological aspect of the paper is the use of an (approximately) Erlang distribution for the incubation time (rather than an Exponential). This is achieved by assuming that the incubation state is made up of 11 latent phases.

Tuesday, 24 March 2009

Estimating life expectancy in health and ill health by using a hidden Markov model

Van den Hout, Jagger and Matthews have a paper to appear in JRSS C. The paper applies the misclassification hidden Markov model, developed by Satten and Longini and Jackson and Sharples, to modelling of data on cognitive impairment in the elderly and its effect on mortality. Patients with a cognition score (MMSE) below 22 were considered impaired. However, cognitive decline is considered to be progressive so backwards transitions in the dataset are explained through misclassification.

The main aim of the paper is to estimate life expectancies in the non-impaired and impaired states. As mortality will be highly dependent on age, non-homogeneous transition intensities are required. Rather than employ the standard approach of piecewise constant intensities, the authors instead include age as a log-linear time dependent covariate and assume that an individual observed at ages t and u, for t < u, has constant intensity Q(t) for the interval (t,u). This will clearly result in some degree of bias, particularly if observation times are widely spaced. Life expectancy is then calculated by assuming intensities are constant in 1 year intervals. As this is different from how the data were estimated, the bias may be further compounded.

Rudimentary goodness-of-fit is carried out by comparing estimated survival curves from the HMM with a Cox-regression performed directly on the survival data. It is worth noting that this approach could be problematic in certain circumstances because the HMM is not nested within the Cox-regression model, so there might be discrepancies between the curves even if the HMM is correctly specified.

Monday, 10 November 2008

Analysis of interval-censored data from clustered multistate processes: application to joint damage in psoriatic arthritis

Rinku Sutradhar and Richard J. Cook have a new paper in JRSS C. This builds on previous work on random effects multi-state models by Cook, Yi, Lee and Gladman (Biometrics, 2004). Here rather than using a discrete random effects distribution and assuming time homogeneity, they use an MCEM algorithm to allow multivariate normal random effects. In addition, piecewise constant transition intensities allow the assumption of time homogeneity to be relaxed.