Presanis, De Angelis, Goubar, Gill and Ades have a new paper in Biostatistics. The paper attempts to use several disparate sources of data on prevalence and incidence to estimate transition rates and prevalences in a multi-state model of transmission of HIV among men who have sex with men (ironically abbreviated as MSM) via Bayesian evidence synthesis. The multi-state model has 4 transient states relating to "Eligible (non MSM)", "Susceptible (MSM)", "Undiagnosed" and "Diagnosed". Subjects enter aged 15 (or later due to migration) and may exit due to death or reaching age 45.
It should be noted that the multi-state model is deterministic and not stochastic, i.e. the proportions in each state at a given time conditional on the parameters and starting conditions are the solution of an ordinary differential equation. While this is unproblematic in terms of getting estimates of the expected proportions in each state at a given time, there must be some level of under representation of the uncertainty. In particular the likelihood assumes that the number of deaths from HIV from the SOPHID dataset at time is where is the number in the diagnosed state at time and is the probability of death in one year from HIV for the year . In reality D(t) should be stochastic rather than deterministic and as such we would expect the number of deaths to have a higher variance than assumed by the binomial model.
In principle the covariance matrix of the numbers in each state for each year could be derived, i.e. given the Markov model, we have N independent individuals each of whom's state occupancy at time is multinomial given the occupancy at time . A more realistic model would then assume that conditional on the parameters, the state occupation counts have a multivariate normal distribution (e.g. analogous to the approaches taken in the estimation of aggregate Markov data) and, for instance, the number of deaths from HIV is binomial with a denominator that is itself normally distributed. Quite possibly this extra source of uncertainty is negligible compared to the vast existing uncertainties but it nevertheless ought to be explored.
Thursday, 28 April 2011
Friday, 15 April 2011
On inference from Markov chain macro-data using transforms
Martin Crowder and David Stephens have a new paper in Journal of Statistical Planning and Inference. This considers estimation of a discrete-time homogeneous Markov chain from aggregate data (which they term macro-data), ie. where only the overall state counts for N patients are known at time j, while the transition counts are unknown. The likelihood for such data is intractable because it involves summing over the vast number of possible transitions that are consistent with the observed aggregate counts. As a result, inference methods have focused on moment based estimation (see e.g. Kalbfleisch, Lawless and Vollmer, Biometrics 1983) which are reasonably effective in practice.
Crowder and Stephens note that the probability generating function for the observed aggregate counts has a fairly simple form. This motivates an estimation procedure based on trying to match the quantities with their expectations (i.e. the pgfs). An obvious practical issue is the choice of vectors to use to compare the closeness between the pgf and sample quantities. The authors propose choices that avoid computational problems for large sample sizes.
Through a series of simulations, the authors demonstrate that an improvement in efficiency compared to methods based on second moments is possible for small sample sizes (e.g. n <= 25). A comparison is made with the efficiency for micro-data (i.e. where transition counts are known). However, for such small samples computation of the full likelihood for the aggregate data must become a viable option. The practical issues are whether the pgf based approach gives any real improvement in efficiency compared to second moment approaches for say n=100 (the second moment approaches are asymptotically efficient with appropriate weights) or whether the pgf method outperforms (or matches) the full likelihood for small samples sizes (i.e. n<=25) where the full likelihood is calculable. These issues aren't really addressed in the paper.
Crowder and Stephens note that the probability generating function for the observed aggregate counts has a fairly simple form. This motivates an estimation procedure based on trying to match the quantities with their expectations (i.e. the pgfs). An obvious practical issue is the choice of vectors to use to compare the closeness between the pgf and sample quantities. The authors propose choices that avoid computational problems for large sample sizes.
Through a series of simulations, the authors demonstrate that an improvement in efficiency compared to methods based on second moments is possible for small sample sizes (e.g. n <= 25). A comparison is made with the efficiency for micro-data (i.e. where transition counts are known). However, for such small samples computation of the full likelihood for the aggregate data must become a viable option. The practical issues are whether the pgf based approach gives any real improvement in efficiency compared to second moment approaches for say n=100 (the second moment approaches are asymptotically efficient with appropriate weights) or whether the pgf method outperforms (or matches) the full likelihood for small samples sizes (i.e. n<=25) where the full likelihood is calculable. These issues aren't really addressed in the paper.
Thursday, 14 April 2011
Non-homogeneous Markov process models with informative observations with an application to Alzheimer's disease
Baojiang Chen and Xiao-Hua Zhou have a new paper in Biometrical Journal. The methodological development of the paper is to extend the methods of Chen, Yi and Cook (Stat Med, 2010) to the case of a non-homogeneous Markov model using the time transformation model of Hubbard et al (Biometrics, 2008) rather than the piecewise constant intensities used in Chen, Yi and Cook. The authors claim that the time transformation model is more appealing than piecewise constant intensities because it requires fewer parameters. However, this parsimony is at the cost of flexibility, as the time transformation model assumes the same temporal trend for all intensities.
The assumption of non-informative examination times is an ever present spectre for multi-state models from panel data. Chen et al's method provides some methods when a complete set of planned examination times is known and it is simply the case that some examination times are missed, meaning the problem can be dealt within the Rubin framework of MAR/MNAR. A more general situation would be where the multi-state process and the process that generates the examination times are dependent. Here the only option seems to be to jointly model the two processes explicitly. A starting model might be one where the intensities of the counting process generating the examination times and the multi-state model are linked through a joint frailty, e.g. something analogous to models for joint modelling of longitudinal and (informative) drop-out (survival) processes.
The assumption of non-informative examination times is an ever present spectre for multi-state models from panel data. Chen et al's method provides some methods when a complete set of planned examination times is known and it is simply the case that some examination times are missed, meaning the problem can be dealt within the Rubin framework of MAR/MNAR. A more general situation would be where the multi-state process and the process that generates the examination times are dependent. Here the only option seems to be to jointly model the two processes explicitly. A starting model might be one where the intensities of the counting process generating the examination times and the multi-state model are linked through a joint frailty, e.g. something analogous to models for joint modelling of longitudinal and (informative) drop-out (survival) processes.
Tuesday, 5 April 2011
Progression of liver cirrhosis to HCC: an application of hidden Markov model.
Nicola Bartolomeo, Paolo Trerotoli and Gabriella Serio have a new paper in BMC Medical Research Methodology. This applies a three state hidden Markov model to data on the progression of liver cirrhosis to Hepatocellular carcinoma. A time homogeneous continuous time progressive model is fitted with death as the absorbing state. Covariate effects are included via a proportional intensities model.
Schoenfeld residuals, which are appropriate for right censored data, are applied here as a test of proportionality. It isn't made clear precisely how this is done here. If time of death is known exactly then a Schoenfeld type residual could be defined for the times of death replacing the standard formulation
with
where If the times of death are interval censored then this approach is inappropriate.
On a more trivial level the matrix of misclassification probabilities is missing a 1 in the third row corresponding to the absorbing state.
Schoenfeld residuals, which are appropriate for right censored data, are applied here as a test of proportionality. It isn't made clear precisely how this is done here. If time of death is known exactly then a Schoenfeld type residual could be defined for the times of death replacing the standard formulation
with
where If the times of death are interval censored then this approach is inappropriate.
On a more trivial level the matrix of misclassification probabilities is missing a 1 in the third row corresponding to the absorbing state.
Subscribe to:
Posts (Atom)