It is widely believed that epidemics in new hosts diminish in virulence over time, with natural selection favoring pathogens that cause minimal disease. However, a tradeoff frequently exists between high virulence shortening host survival on the one hand but allowing faster transmission on the other. This is the case in HIV infection, where high viral loads increase transmission risk per coital act but reduce host longevity. We here investigate the impact on HIV virulence of HIV adaptation to HLA molecules that protect against disease progression, such as HLA-B∗57 and HLA-B∗58:01. We analyzed cohorts in Botswana and South Africa, two countries severely affected by the HIV epidemic. In Botswana, where the epidemic started earlier and adult seroprevalence has been higher, HIV adaptation to HLA including HLA-B∗57/58:01 is greater compared with South Africa (P = 7 × 10-82), the protective effect of HLA-B∗57/58:01 is absent (P = 0.0002), and population viral replicative capacity is lower (P = 0.03). These data suggest that viral evolution is occurring relatively rapidly, and that adaptation of HIV to the most protective HLA alleles may contribute to a lowering of viral replication capacity at the population level, and a consequent reduction in HIV virulence over time. The potential role in this process played by increasing antiretroviral therapy (ART) access is also explored. Models developed here suggest distinct benefits of ART, in addition to reducing HIV disease and transmission, in driving declines in HIV virulence over the course of the epidemic, thereby accelerating the effects of HLA-mediated viral adaptation. HLA, HIV, adaptation, antiretroviral therapy, virulence.
We studied antenatal women with chronic, ART-naïve HIV-1 infection recruited from two cohorts as follows: (i) Durban, South Africa (n = 328), enrolled between 2002 and 2005; and (ii) Gaborone, Botswana (n = 514), enrolled between 2007–2008, from the Mma Bana Study as previously described (10, 44–46). A third maternal cohort was enrolled from the same site in Durban, South Africa (n = 264) in 2012–2013, comprising 38 ART-naïve antenatal mothers and 226 postnatal mothers attending a baby immunization clinic [enrolled a median of 186 d postdelivery, interquartile range (IQR) of 71 to 375 d]. These mothers had received AZT from booking to delivery and then single dose of nevirapine during labor according to the South African national guidelines, and no ART postdelivery. Analyses were also undertaken using data from an extended Durban cohort (n = 1,218) that included subjects from outpatient clinics enrolled up to 2007 as previously described (25). Viral load was measured from plasma using the identical Roche Amplicor Version 1.5 assay in both cohorts. Samples from study subjects were HLA-A–, -B–, and -C–sequenced based typed in the Clinical Laboratory Improvement Amendments/American Society of Histocompatibility and Immunogenetics-accredited laboratory of William Hildebrand (University of Oklahoma Health Sciences Center, Oklahoma City) using a locus-specific PCR amplification strategy and a heterozygous DNA-sequencing methodology for exons 2 and 3 of the class I PCR amplicon. Relevant ambiguities (47) were resolved by homozygous sequencing. DNA sequence analysis and HLA allele assignment were performed with the software Assign-SBT Version 3.5.1 (Conexio Genomics). Ethics approval was given by the Office of Human Research Administration, Harvard School of Public Health and the Health Research Development Committee, Botswana Ministry of Health, University of KwaZulu-Natal Review Board, and Oxford Research Ethics Committee. Written informed consent was obtained from all individuals. Gag, Pol, and Nef sequences were generated from genomic DNA extracted from peripheral blood mononuclear cells, amplified by nested PCR using previously published primers to obtain population sequences, as previously described (48). Sequencing was undertaken using Big Dye Ready Reaction Terminator Mix Version 3 (Applied Biosystems). Sequences were analyzed using Sequencher Version 4.8 (Gene Codes Corporation). HLA-associated viral polymorphisms and viral amino acid covariations were identified from proviral DNA using a previously described method that corrects for phylogeny, HLA-linkage disequilibrium, and codon covariation (29, 49). Briefly, a maximum likelihood phylogenetic tree was constructed for each gene and for every HLA allele and amino acid, two generative models of the observed presence or absence of the amino acid in each sequence were created—one representing the null hypothesis that the observations are generated by the phylogenetic tree alone and the other representing the alternative hypothesis that additional escape or reversion takes place due to HLA pressure, as estimated using a modified logistic regression model (25). The likelihood of the observations was then maximized over the parameters of both models with an expectation maximization algorithm, and a P value was computed with a likelihood ratio test based on those likelihoods. To increase power, the tests were made binary such that the presence or absence of a given HLA allele was correlated with the presence or absence of a given amino acid. In addition, HLA polymorphism pairs were analyzed only when both the amino acid and the HLA were independently observed in at least 10 individuals. For every amino acid at each position, the HLA allele with the strongest association was added to the model and the analysis was repeated to identify the next most significant HLA, conditioned on those previously added to the model. This procedure was iterated until no HLA allele yielded an association with a P value of less than 0.05. A q value statistic, estimating the proportion of false positives among the associations identified at a given P value threshold, was estimated using the method of Storey and Tibshirani (50). Statistical significance was reported using q values of ≤ 0.05 (5% false-discovery rate) for each P value threshold. Associations were learned using a previously defined, multicohort set of 2,066 individuals from various regions in Southern Africa, including South Africa (n = 1,254), Botswana (n = 326), Zambia (n = 326), and n = 66 individuals of Southern African descent who enrolled in the Thames Valley clinic in the United Kingdom (26). High-resolution HLA typing was available for each individual, as were sequences for Gag (n = 1,897; n = 1,135 for p15), Pol (n = 1365; n = 1,315, 1,364, and 698 for Pr, RT, and Int, respectively), and Nef (n = 1,336). Gag, Pol, and Nef alignments were constructed using HIVAlign (51), then hand edited. Maximum likelihood trees were constructed separately for Gag, Pol, and Nef using PHYML (52). The extent to which an HIV sequence was adapted to a given HLA allele was measured as the proportion of sites associated with that allele that were adapted, defined as the presence of a polymorphism (either by itself or as a called mixture) that is positively correlated with the HLA or the presence of a polymorphism that is different from the polymorphism that is negatively associated with the HLA (if such a polymorphism exists). Sites containing indel characters are treated as missing data. In Fig. 2, for example, each point represents the HLA repertoire of each study subject (i.e., the six HLA class I molecules expressed) and the average, over all sequences in the cohort, of the number of sites in Gag, Pol, and Nef that are adapted to those HLA class I molecules, divided by the total number of sites associated with selection pressure mediated by those HLA class I molecules. This calculation therefore represents the average degree of adaptation of all sequences in that cohort to that individual HLA repertoire. Patient gag-protease isolated from plasma RNA was inserted into a NL4-3 gag-protease–deleted plasmid to generate recombinant viruses, as previously described (17, 18). Titration of virus stocks and replication assays were performed as described (12, 13), using a multiplicity of infection of 0.003. The mean slope of exponential growth from days 2 to 7 was calculated in Excel (Microsoft) using the LOGEST function and converted to natural logs. This was then divided by the slope of growth of the WT NL4-3 control included in each replication assay, to generate a normalized replication capacity for comparison between different assays. Replication assays were performed in duplicate and the results averaged. The samples used for the VRC assays were selected at random from subjects in each cohort whose absolute CD4 counts were matched in the range of 300 to 500 cells per mm3 and retrospectively compared and confirmed to be not significantly different (385 per mm3 and mean of 389 per mm3 in Gaborone and Durban, respectively). The VRC had previously been evaluated in >400 HIV-infected study subjects in Durban (17). The subset of viruses selected at random from the Durban cohort were established as representative of the entire sample group of 119 Durban subjects whose absolute CD4 counts were in the range of 300 to 500 per mm, in having the same mean VRC: The mean VRC as determined in Durban in this subset of 16 viruses was 0.62 (SD 0.09), compared with a mean VRC of 0.62 (SD 0.1) for the whole group. Viral titres and replication capacities for these 16 viruses were determined at the University of Oxford as described above and normalized to the same viral stock of the NL4-3 control used for the 63 Gaborone samples. The VRC determined at the University of Oxford for the 16 Durban Gag-protease recombinant viruses was strongly correlated with the VRCs determined for these same viruses in Durban in the previously published study (12) (r2 = 0.94 P < 0.0001; Pearson’s correlation). Gag-Protease recombinant viruses generated by this method have been shown to be representative of original plasma quasispecies (17, 18). In this study, Gag p17+p24 was sequenced from 27 randomly selected recombinant viruses from the Gaborone study cohort and compared with the original plasma HIV RNA sequence. Viral RNA was isolated from recombinant virus supernatants using a QIAamp Viral RNA mini kit (Qiagen). RT-PCR was performed and the product sequenced along with the gag-protease PCR product from plasma viral RNA, as previously described (17, 18), to obtain gag p17 and p24 population sequences from both plasma viral RNA- and recombinant virus RNA-derived cDNA. Sequences were analyzed and edited in Sequencher 4.8 and aligned to the HXB2 reference sequence (GenBank accession no. {"type":"entrez-nucleotide","attrs":{"text":"K03455","term_id":"1906382"}}K03455) in Se-Al Version 2.0a11. Plasma viral RNA-derived cDNA sequences and recombinant virus RNA-derived cDNA sequences from the Gaborone subjects were directly compared with ascertain similarity. Twenty-seven pairs of sequences were randomly chosen and percentage pairwise distances calculated using HyPhy Version 1.0β (53). The median number of nucleotide differences between the compared sequences was 2.9 (IQR 0.9–5.2), with mixed bases included as differences, resulting in an average nucleotide similarity of 0.97 between pairs, in support of previous findings (12, 13). The model shown diagrammatically in Fig. S3 is formally described by the following set of coupled, first order, ordinary differential equations: These equations were solved numerically using the deSolve package in R and the lsoda method (54). The initial values given were and time steps of size 0.01 y were used. For the first 35 y ν was set to 0 and then this was increased to 0.4 from t = 35 onward, so that from this point 40% of new infections end up being treated when they pass the treatment threshold. All parameters and their given values are detailed in Table S3.