Background: The genome-wide association study (GWAS) techniques that have been used for genetic mapping in other organisms have not been successfully applied to mosquitoes, which have genetic characteristics of high nucleotide diversity, low linkage disequilibrium, and complex population stratification that render population-based GWAS essentially unfeasible at realistic sample size and marker density. Methods: We designed a novel mapping strategy for the mosquito system that combines the power of linkage mapping with the resolution afforded by genetic association. We established founder colonies from West Africa, controlled for diversity, linkage disequilibrium and population stratification. Colonies were challenged by feeding on the infectious stage of the human malaria parasite, Plasmodium falciparum, mosquitoes were phenotyped for parasite load, and DNA pools for phenotypically similar mosquitoes were Illumina sequenced. Phenotype-genotype mapping was carried out in two stages, coarse and fine. Results: In the first mapping stage, pooled sequences were analysed genome-wide for intervals displaying relativereduction in diversity between phenotype pools, and candidate genomic loci were identified for influence upon parasite infection levels. In the second mapping stage, focused genotyping of SNPs from the first mapping stage was carried out in unpooled individual mosquitoes and replicates. The second stage confirmed significant SNPs in a locus encoding two Toll-family proteins. RNAi-mediated gene silencing and infection challenge revealed that TOLL 11 protects mosquitoes against P. falciparum infection. Conclusions: We present an efficient and cost-effective method for genetic mapping using natural variation segregating in defined recent Anopheles founder colonies, and demonstrate its applicability for mapping in a complex non-model genome. This approach is a practical and preferred alternative to population-based GWAS for first-pass mapping of phenotypes in Anopheles. This design should facilitate mapping of other traits involved in physiology, epidemiology, and behaviour.
Wild caught A. gambiae s.l. females originated either from Burkina Faso (Goundry region) or from Mali (Bancoumana region). Gravid females were captured by aspiration indoors, ensuring that at the time of capture they had already mated assortatively under natural conditions and bloodfed. They were then placed individually in oviposition tubes with wet filter paper. Females that laid eggs were collected and stored in ethanol before genomic DNA extraction. Eggs from individual oviposition were placed in a pan of water with Tetramin fish food. Emerged adults were reared under standard conditions at 26 °C and 80 % humidity, 12 h light/dark cycle with access to cotton soaked in 10 % sucrose solution. Females that laid eggs were typed for species, molecular form and the molecular karyotype of the 2La chromosomal inversion [16]. Maternal genotype was determined by genotyping. Because mating occurred in nature, the paternal genotype was inferred by genotyping F1 offspring. Isofemale families identified as A. coluzzii (M form) with the karyotype 2La/a, were used to initiate colonies. No hybrid families resulting from MxS form crosses were seen. Founder colony 03 (Fd03) was started with the F1 offspring from six mated females originating from Mali and founder colonies 5 (Fd05) and 9 (Fd09) were created with the offspring of 10 and 11, respectively, females from Burkina Faso. Colonies are routinely monitored for species and 2La inversion karyotype. Individual mosquitoes from founder colonies Fd03, Fd05, and Fd09 were genotyped for five microsatellite markers (2 L.17686896, 2 L.19444747, 2 L.41431233, 2 L.40133863, and H603) using described primers and methods [3]. The naming convention of the indicated microsatellites is chromosome arm:nucleotide coordinate. Microsatellite data were used for estimates of pairwise Fst among colonies and the wild source population. Pairwise Fst values were calculated using Genepop [33] and neighbour joining trees were constructed using Mega 2.1 [34]. Estimates of founder colony diversity were performed on mosquito samples 3–5 months prior to mapping by pooled sequencing, thus diversity and divergence as shown in Fig. 1 should accurately represent diversity present at the mapping stage. P. falciparum isolate NF54 was cultured using an automated tipper-table system [35] as implemented in the CEPIA mosquito infection facility of the Institut Pasteur [24]. For infection, mature gametocytes are added to fresh erythrocytes in AB human serum, mixed gently, and transferred to a membrane feeder prewarmed to 37 °C. Mosquitoes were allowed to feed for 15 min, and only fully engorged females were used for further analysis. Infection phenotypes were oocyst infection prevalence and intensity. Oocyst prevalence is the fraction of mosquitoes carrying at least one oocyst, while intensity is the number of oocysts per mosquito determined only in the subset of mosquitoes with ≥1 oocyst. Midguts of bloodfed females were dissected 7–8 days post-infection, stained in 1xPBS buffer with 0.4 % mercury dibromofluorescein (Sigma) and the number of oocysts per midgut was determined by light microscopy. Carcasses of the dissected mosquitoes were stored at −20 °C until DNA extraction. Genomic DNA was extracted from individual female mosquitoes by homogenization in 100ul DNAZol (Invitrogen, CA, USA) using a disposable pestle, following the manufacturer’s protocol. Based on the observed number of a P. falciparum oocysts, mosquitoes were assigned to one of three phenotype categories, and phenotype pools were constituted from ≥14 mosquitoes each for i) the “Zero” pool of bloodfed mosquitoes carrying zero oocysts, ii) the “Low” pool with 1–6 oocysts, and iii) the “High” pool with ≥10 oocysts. Thresholds for phenotype pools are determined empirically. Specifically for Fd03, the zero pool was comprised of 20 mosquitoes, the low pool (carrying 1–5 oocysts) included 17 mosquitoes and the high pool (>10 oocysts) included 14 mosquitoes. The entire infection comprised 93 individuals and the pools included 51, or 55 %. For Fd09, each pool was comprised of 20 individuals, with the low pool defined as 1–6 oocysts and the high pool as >29 oocysts. The complete infection had 102 individuals and thus the pools comprised 59 % of the entire infection. DNA concentrations were determined by the picogreen method [36], and DNAs of individual mosquitoes were combined at equal molarity to obtain a total of 700 ng per phenotype pool. The pooled DNAs were submitted to Illumina sequencing and sequenced to an average depth of 40× per pool or ~ 2× per mosquito. Illumina sequences were aligned to the AgamP3 genome [20] using Bowtie version 0.12.7 [37]. Reads with low mapping quality (MQ < 40) were removed and allele frequencies called using samtools mpileup [38]. Apparent low frequency variants, which could be either true low frequency alleles or sequencing errors, are irrelevant in a windowed analysis of pooled samples, and were not resolved. Pooled heterozygosity was calculated across sliding windows (10 kb windows, 1 kb steps) for each of the phenotype pools individually, as well as for the whole founder colony combined, using the Hp metric proposed by Qanbari et al. [39]. Relative diversity (HpR) was calculated as the proportion of heterozygosity in a phenotype pool relative to total heterozygosity within the whole founder colony after normalising for overall read-depth in each pool. Standard deviation of HpR values (SHpR) was used to identify regions with over-represented haplotypes as compared to the whole founder colony. High-SHpR regions within ≤5 Mb were combined to constitute a single locus. To establish significance thresholds, random resampling was performed for 1000 permutations for each window. SHpR values were then segmented using the fastseg Bioconductor [40] package to identify clearly differing regions. Regions below 1e−4 probability according to the permutation analysis were removed. Three regions were selected for subsequent fine mapping: two from Fd09 and one from Fd03. Loci identified from pooled sequence during the genome-wide mapping phase were filtered on the basis of differences in the proportion of reads showing the alternate allele (used here as a proxy for minor allele frequency). SNPs with the greatest differences in read-counts between phenotype pools were used to design SNP plexes for genotyping using the Sequenom MassARRAY platform. A single plex (20–25 individual SNP assays) was designed for each locus. Individual DNAs from the same experimental infection that was pool-sequenced, including individuals used to generate the pools and additional samples that did not contribute to the phenotype pools, were typed with SNPs specific to that founder colony. For both Fd03 and Fd09 there were 42 individuals that were SNP genotyped, but had not been part of the original extreme pools. These individuals either had zero oocysts or phenotypes intermediate between the low and high pools. In addition, a second, completely independent experimental infection of the same founder colony that had not been subjected to pooled sequencing was genotyped in the same way. This independent infection of Fd03 had 41 individual mosquitoes whose infection levels varied from 0 to 23 oocysts. Correlation between allele frequencies derived from pooled sequencing and individual genotyping via Sequenom is presented in Additional file 6. Individual mosquitos were categorized into binary phenotypes with respect to infection prevalence (uninfected/infected) and infection intensity (low infected/high infected) using the same oocyst cutoffs employed for pooling. Logistic regression was used to test for significant association with phenotype using PLINK [41] and all statistics controlled for multiple testing. Replicate infections were tested for significance both individually and across replicates. Pool sequencing is a relatively young variant of whole genome sequencing, and it is a challenge to ascertain candidate SNP assays for individual genotyping, which may limit the efficiency of replicating pooled sequence loci using individual genotyping [42]. Putative variants were filtered for sequencing quality, and consequences of variants were called for both colonies using the Ensembl Variant Effect Predictor (v2.3) [43] against VectorBase genebuild AgamP3.5 [44] and using Ensembl API 65.3 (Dec 2011). Enrichment for gene ontology terms was calculated by Fisher’s exact test using custom R scripts and topGO, from the Bioconductor suite [45]. dN:dS ratios were assessed by locus counting using custom R scripts. Due to the lack of available codon substitution data for this species, multiple substitutions or codon bias could not be analysed in dN:dS results. Molecular karyotyping of the 2Rb inversion for Fd09 was carried out by a published method [46]. Molecular karyotyping results were confirmed against a panel of individuals previously karyotyped by polytene chromosome analysis (not shown). Double-stranded RNAs were synthesized from PCR amplicons using the T7 Megascript Kit (Life Technologies) as described previously [24]. Primers are listed in Additional file 7. For each targeted gene, 500 ng of dsRNA (but not more than 207 nl volume) were injected into the thorax of cold-anesthetized 1-day-old A. gambiae females using a nanoinjector (Nanoject II; Drummond). The efficiency of gene silencing was monitored 4 d after dsRNA injection as follows. cDNA synthesis was performed using the M-MLV reverse transcriptase with random hexamers (Invitrogen). In each case, 1 μg of total RNA was used in triplicate reactions. The triplicates were pooled and the mixture was used as template for PCR analysis. Primers used in PCR for gene silencing verification are listed in Additional file 7. Verification of gene silencing is shown in Additional file 8. Midgut oocysts were counted as described above, and were analysed for the same two phenotypes, infection prevalence and oocyst intensity. Oocyst infection values for gene silencing experiments were calculated from replicates of ≥30 dissected mosquitoes. All replicates per condition were analysed for oocyst infection prevalence. In contrast, for analysis of oocyst intensity, only the mosquitoes carrying ≥1 oocyst are considered. Therefore, for analysis of differences in oocyst intensity, a threshold of ≥30 % oocyst infection prevalence per replicate was imposed [2, 32]. For statistical analysis, comparisons of infection prevalence were made using the Chi Square test, and comparisons of oocyst intensity (excluding mosquitoes with zero oocysts) using the non-parametric Wilcoxon Mann Whitney test. At least two independent replicate infections were performed per condition. Replicates were analysed independently using the tests described above. If at least one replicate met the significance criterion of p ≤ 0.05, a third replicate was done. The p-values from independent tests of significance were combined using the meta-analytical approach of Fisher [47], and this combined p value is reported here. The threshold for combined significance was defined as p = 0.01.