Linkage of Viral Sequences among HIV-Infected Village Residents in Botswana: Estimation of Linkage Rates in the Presence of Missing Data
Nicole Bohme Carnegie,
Rui Wang,
Vladimir Novitsky and
Victor De Gruttola
PLOS Computational Biology, 2014, vol. 10, issue 1, 1-16
Abstract:
Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40–100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data.Author Summary: The analysis of viral genomes has great potential for investigating transmission of disease, including the identification of risk factors and transmission clusters, and can thereby aid in targeting interventions. To make use of genetic data in this way, it is necessary to make inferences about population-level patterns of viral linkage. As with any rigorous statistical inference from sampled data to a population, it is important to consider the effect of the sampling strategy and the occurrence of missing data on the final inferences made. In this paper we highlight the effects of missing data on the resulting estimates of population level linkage rates and develop methods for adjusting for the presence of missing data. As an example, we consider comparing the rates of linkage of HIV sequences from subjects with high viral load, low viral load, or on antiretroviral treatment, and show that comparative inferences are compromised when adjustment is not made for missing sequences and bias in inferences can be reduced with proper adjustment.
Date: 2014
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003430 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 03430&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1003430
DOI: 10.1371/journal.pcbi.1003430
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().