Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa

listen audio

Study Justification:
– The study aims to fill the gap in research on the gut microbiome of populations living in non-urban agriculturalist and hunter-gatherer societies.
– Understanding the gut microbiota in these populations is important for comprehending the relationship between the microbiome and health and disease in the majority of the world’s population.
Study Highlights:
– The study analyzed stool samples from adult females living in rural Bushbuckridge and urban Soweto in South Africa.
– The gut microbiomes of these populations were found to be taxonomically intermediate between those of individuals living in high-income countries and traditional communities.
– The study revealed that reference collections used to characterize microbiomes are incomplete for individuals living outside high-income countries, leading to artificially low diversity measurements.
– The study generated complete genomes of previously undescribed taxa, including Treponema, Lentisphaerae, and Succinatimonas.
– The findings suggest that the gut microbiome of South Africans does not conform to a simple “western-nonwestern” axis and contains previously unknown microbial diversity.
Recommendations:
– Conduct further research on the gut microbiome of populations living in non-urban agriculturalist and hunter-gatherer societies to gain a more comprehensive understanding of the global diversity of gut microbiota.
– Expand reference collections to include microbial genomes from individuals living in low- and middle-income countries to improve the characterization of microbiomes in these populations.
– Investigate the potential health implications of the transitional gut microbiome composition observed in South African populations.
Key Role Players:
– Researchers and scientists specializing in microbiology, genomics, and public health.
– Funding agencies and organizations supporting research on the gut microbiome and global health.
– Policy makers and government officials responsible for public health initiatives and interventions.
Cost Items for Planning Recommendations:
– Research funding for sample collection, DNA extraction, sequencing, and data analysis.
– Laboratory equipment and supplies for sample processing and analysis.
– Personnel costs for researchers, technicians, and support staff.
– Travel and logistics for fieldwork and sample collection.
– Publication and dissemination of research findings.
– Collaboration and partnership costs with local institutions and communities.

The strength of evidence for this abstract is 8 out of 10.
The evidence in the abstract is strong because it is based on a comprehensive study that includes both short- and long-read metagenomics. The study analyzes stool samples from a significant number of participants in rural and urban areas of South Africa. The research design includes human subjects research approval and informed consent. The methods used for DNA extraction, sequencing, and data analysis are well-documented. The study identifies transitional composition and undescribed taxa in the gut microbiomes of South Africans. To improve the evidence, the abstract could provide more details on the statistical analyses performed and the specific findings related to health and disease.

Human gut microbiome research focuses on populations living in high-income countries and to a lesser extent, non-urban agriculturalist and hunter-gatherer societies. The scarcity of research between these extremes limits our understanding of how the gut microbiota relates to health and disease in the majority of the world’s population. Here, we evaluate gut microbiome composition in transitioning South African populations using short- and long-read sequencing. We analyze stool from adult females living in rural Bushbuckridge (n = 118) or urban Soweto (n = 51) and find that these microbiomes are taxonomically intermediate between those of individuals living in high-income countries and traditional communities. We demonstrate that reference collections are incomplete for characterizing microbiomes of individuals living outside high-income countries, yielding artificially low beta diversity measurements, and generate complete genomes of undescribed taxa, including Treponema, Lentisphaerae, and Succinatimonas. Our results suggest that the gut microbiome of South Africans does not conform to a simple “western-nonwestern” axis and contains undescribed microbial diversity.

Stool samples were collected from women aged 40–72 years in Soweto, South Africa and Bushbuckridge Municipality, South Africa. Participants were recruited on the basis of participation in AWI-Gen56, a previous study in which genotype and extensive health and lifestyle survey data were collected. Human subjects research approval was obtained (Stanford IRB 43069, University of the Witwatersrand Human Research Ethics Committee M160121, Mpumalanga Provincial Health Research Committee MP_2017RP22_851) and informed consent was obtained from participants for all samples collected. Participants were not compensated for participation. Stool samples were collected and preserved in OmniGene Gut OMR-200 collection kits (DNA Genotek). Samples were frozen within 60 days of collection as per manufacturer’s instructions, followed by long-term storage at −80 °C. As the enrollment criteria for our study included previous participation in a larger human genomics project56, we had access to self-reported ethnicity for each participant (BaPedi, Ndebele, Sotho, Tsonga, Tswana, Venda, Xhosa, Zulu, Other, or Unknown). Samples from participants who tested HIV-positive or who did not consent to an HIV test were not analyzed. DNA was extracted from stool samples using the QIAamp PowerFecal DNA Kit (QIAGEN Cat. No. 12830) according to the manufacturer’s instructions except for the lysis step, in which samples were lysed using the TissueLyser LT (QIAGEN Cat. No. 85600) (30 s oscillations/3 min at 30 Hz). DNA concentration of all DNA samples was measured using Qubit Fluorometric Quantitation (DS DNA High-Sensitivity Kit, ThermoFisher Cat. No. {“type”:”entrez-protein”,”attrs”:{“text”:”Q32851″,”term_id”:”75280859″,”term_text”:”Q32851″}}Q32851). DNA sequencing libraries were prepared using the Nextera XT DNA Library Prep Kit (Illumina Cat. No. FC-131-1096). Final library concentration was measured using Qubit Fluorometric Quantitation and library size distributions were analyzed with the Bioanalyzer 2100 (Agilent G2939BA). Libraries were multiplexed and 150 bp paired-end reads were generated on the HiSeq 4000 platform (Illumina). Samples with greater than ~300 ng remaining mass and a peak fragment length of greater than 19,000 bp (with minimal mass under 4000 bp) as determined by a TapeStation 2200 (Agilent G2964AA) were selected for nanopore sequencing. Nanopore sequencing libraries were prepared using the 1D Genomic DNA by Ligation protocol (Oxford Nanopore Technologies SQK-LSK109) following standard instructions. Each library was sequenced with a full FLO-MIN106D R9 Version Rev D flow cell on a MinION sequencer for at least 60 h. Literature review criteria based on Brewster et al.4 were employed: PubMed, EMBASE, SCOPUS, and Web of Science were queried for observational and interventional research involving the human gut microbiome through January 2021. Terms including “gut microbiome” and “gut microbiota” and names of each of the 54 African countries were included in the search. Primary reports on the gut microbiome in African children and/or adults, utilizing either 16S rRNA or shotgun metagenomic sequencing and written in English, were included. Abstracts, secondary reports, poster presentations, reviews or editorials, and in vivo and in vitro studies were excluded. The list of relevant articles yielded by this search strategy was manually reviewed. Stool metagenomic sequencing reads were trimmed using TrimGalore v0.6.583 with a minimum quality score of 30 for trimming (–q 30) and minimum read length of 60 (–length 60). Trimmed reads were deduplicated to remove PCR and optical duplicates using htstream SuperDeduper v1.2.0 with default parameters. Reads aligning to the human genome (hg19) were removed using BWA v0.7.17-r118884. Taxonomy profiles were created with Kraken v2.0.9-beta with default parameters85 and (1) a comprehensive custom reference database containing all bacterial and archaeal genomes in GenBank assembled to “complete genome,” “chromosome,” or “scaffold” quality as of January 2020, and (2) the pre-built Struo50 GTDB release 95 database containing one genome per species. Bracken v2.2.0 was then used to re-estimate abundance at each taxonomic rank86. MetaPhlAn352 taxonomy profiles were also generated. Published data from additional adult populations were downloaded from the NCBI Sequence Read Archive or European Nucleotide Archive (Supplementary Table 4) and preprocessed and taxonomically classified as described above. The study by Backhed et al. sampled both mothers and infants: only the maternal samples were retained in this study. For datasets containing longitudinal samples from the same individual, one unique sample per individual was chosen (the first sample from each individual was chosen from the United States Human Microbiome Project cohort). K-mer sketches were computed using sourmash v2.0.060. Low abundance k-mers were trimmed using the “trim-low-abund.py” script from the khmer package v3.0.087 with a k-mer abundance cutoff of 3 (-C 3) and trimming coverage of 18 (-Z 18). Signatures were computed for each sample using the command “sourmash compute” with a compression ratio of 1000 (–scaled 1000) and k-mer lengths of 21, 31, and 51 (-k 21,31,51). Two signatures were computed for each sample: one signature tracking k-mer abundance (–track-abundance flag) for angular distance comparisons, and one without this flag for Jaccard distance comparisons. Signatures at each length of k were compared using “sourmash compare” with default parameters and the correct length of k specified with the -k flag. Unassembled metagenomic reads were functionally profiled using ShortBRED88 v0.9.3 with a pre-built antibiotic resistance database based on the Comprehensive Antibiotic Resistance Database89. Features were pre-filtered for >10% prevalence and statistical analysis was performed using MaAsLin v290 using the compound Poisson linear model (CPLM) and total sum scaling normalization with “site” as a fixed effect. Pangenomes were calculated with PanPhlAn v3.152 using parameters for increased sensitivity recommended by the authors of the tool: “–min_coverage 1–left_max 1.70–right_min 0.30”. MetaCyc pathways were profiled with HUMAnN v3.0.052 with default parameters, using the mpa_v30_CHOCOPhlAn_201901 database. Forward and reverse reads were concatenated into one file per sample prior to processing. Pathway abundances were normalized to copies per million and statistical analysis was performed using MaAsLin v2 using the CPLM and total sum scaling normalization with “site” as a fixed effect. Short-read metagenomic data were assembled with SPAdes v3.1591 and binned into draft genomes using a publicly available workflow (https://github.com/bhattlab/bhattlab_workflows/blob/master/binning/bin_das_tool_manysamp.snakefile, commit version bbe6511 as of Apr 20, 2021). Briefly, short reads were aligned to assembled contigs with BWA v0.7.1784 and contigs were subsequently binned into draft genomes with MetaBAT v2.1592, CONCOCT v1.1.093, and MaxBin v2.2.794. Default parameters were used for each binner, with the following exceptions: For the jgi_summarize_bam_contig_depths step of MetaBAT, minimum contig length was set at 1000 bp (–minContigLength 1000), minimum contig depth of coverage of 1 (–minContigDepth 1), and a minimum end-to-end percent identity of reads of 50 (–percentIdentity 50). Bins were aggregated and refined with DASTool v1.1.195. Bins were evaluated for size, contiguity, completeness, and contamination with QUAST v5.0.296, CheckM v1.0.1397, Prokka v1.14.698, Aragorn v1.2.3899, and Barrnap v0.9 (https://github.com/tseemann/barrnap/). We referred to published guidelines to designate genome quality66. Individual contigs from all assemblies were assigned taxonomic classifications with Kraken v2.0.966,85. To create de-replicated genome collections, genomes with completeness greater than 75% and contamination less than 10% (as evaluated by CheckM) were de-replicated using dRep v3.2.0100 with ANI threshold to form secondary clusters (-sa) at 0.99 (strain-level) or 0.95 (species-level). For comparison to UHGG species representatives, secondary ANI was set to 0.95. dRep chooses the genome with the highest score as the cluster representative according to the following formula: dRep score = A × Completeness − B × Contamination + C × (Contamination × (Strain heterogeneity/100)) + D × log(N50) + E × log(size) + F × (centrality−secondary ani). A through F are values which can be tuned by the user to change the relative importance of each parameter in choosing representative genomes. Default parameters (A = 1, B = 5, C = 1, D = 0.5, E = 0, F = 1) were used herein. Long-read data were assembled with Lathe v165. Briefly, Lathe implements basecalling with Guppy v2.3.5, assembly with Flye v2.4.2101, and short-read polishing with Pilon v1.23102. Contigs greater than 1000 bp were subsequently binned into draft genomes with MetaBAT v2.13 using minimum contig depth coverage of 1, minimum end-to-end percent identity of reads of 50, and otherwise using default parameters, then classified, and de-replicated as described above. Additional long-read polishing was performed using four iterations of polishing with Racon v1.4.10103 and long-read alignment using minimap2 v2.17-r941104, followed by one round of polishing with Medaka v0.11.5 (https://github.com/nanoporetech/medaka). Single-contig genomes were analyzed for GC skew using SkewIT v1105. Genomes of interest were plotted with the DNAPlotter GUI v18.1.0106. Draft genomes were additionally classified with GTDBtk v1.4.1 (classify_wf)107 using release 95 reference data. Direct comparisons between nMAGs and corresponding MAGs were performed by de-replicating high- and medium-quality nMAGs with MAGs assembled from the same sample. MAGs sharing at least 99% ANI with an nMAG were aligned to the nMAG regions using nucmer v3.1 and uncovered regions of the nMAG were annotated with prokka 1.14.6, VIBRANT v1.2.1108, and ResFams v1.2109. Phylogenetic trees for all de-replicated short- and long-read MAGs were constructed with GTDBtk v1.4.1 and visualized with iTOL v6110. To construct phylogenetic trees for taxa of interest, reference 16S rRNA sequences were downloaded from the Ribosomal Database Project (Release 11, update 5, September 30, 2016)111 and 16S rRNA sequences were identified from nanopore genome assemblies using Barrnap v0.9 (https://github.com/tseemann/barrnap/). Sequences were aligned with MUSCLE v3.8.1551112 with default parameters. Maximum-likelihood phylogenetic trees were constructed from the alignments with FastTree v2.1.10112,113 with default settings (Jukes-Cantor + CAT model). Support values for branch splits were calculated using the Shimodaira-Hasegawa test with 1000 resamples (default). Trees were visualized with FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). Statistical analyses were performed using R v4.0.2114 with packages MASS v7.3-53115, stats v4.0.2114, ggsignif v0.6.0116, and ggpubr v0.4.0117. Alpha and beta diversity were calculated using the vegan package v2.6.0118. Two-sided Wilcoxon rank-sum tests were used to compare alpha and beta diversity between cohorts. Count data were rarefied and normalized via cumulative sum scaling and log2 transformation119 prior to MDS. Data separation in MDS was assessed via PERMANOVA (permutation test with pseudo F ratios) using the adonis function from the vegan package. Differential microbial features between individuals living in Soweto and Bushbuckridge were identified from unnormalized count data output from Kraken 2 classification and Bracken abundance re-estimation (filtered for 20% prevalence and at least 500 sequencing reads per sample) using DESeq2 with the formula “~site”120. Plots were generated in R using the following packages: cowplot v1.0.0121, DESeq2 v1.28.0120, genefilter v1.70.0122, ggplot2 v3.3.2123, ggpubr v0.4.0, ggrepel v0.8.2124, ggsignif v0.6.0, gtools v3.8.2125, harrietr v0.2.3126, MASS v7.3-53, reshape2 v1.4.4127, tidyverse v1.3.0128, and vegan v2.6.0. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Based on the provided information, it is not clear how the described research study relates to innovations for improving access to maternal health. The information provided focuses on the collection and analysis of gut microbiome data from South African populations. To provide recommendations for innovations to improve access to maternal health, it would be helpful to have more specific information about the challenges or areas of improvement needed in maternal health access.
AI Innovations Description
The provided description is a scientific research paper that focuses on analyzing the gut microbiome composition of transitioning South African populations. While the information provided is detailed and specific to the research study, it does not directly address the request for a recommendation to improve access to maternal health.

To develop an innovation to improve access to maternal health, it is important to consider the specific challenges and barriers that exist in the context of maternal health in the target population. Some potential recommendations to improve access to maternal health could include:

1. Strengthening healthcare infrastructure: This could involve improving the availability and quality of healthcare facilities, ensuring the presence of skilled healthcare providers, and enhancing the capacity to provide comprehensive maternal health services.

2. Increasing awareness and education: Implementing awareness campaigns and educational programs to inform women and their families about the importance of maternal health, the available services, and how to access them. This can help overcome cultural and social barriers that may prevent women from seeking care.

3. Enhancing transportation and logistics: Improving transportation networks and systems to ensure that pregnant women can easily access healthcare facilities. This could involve providing transportation vouchers or subsidies, establishing mobile clinics, or utilizing telemedicine to reach remote areas.

4. Addressing financial barriers: Implementing strategies to reduce financial barriers to maternal healthcare, such as providing free or subsidized services, health insurance coverage, or conditional cash transfer programs for pregnant women.

5. Empowering women and communities: Promoting women’s empowerment and community engagement in maternal health through initiatives such as community health workers, peer support groups, and community-based education programs.

6. Strengthening data collection and monitoring: Developing robust data collection systems to monitor maternal health indicators, identify gaps in access, and inform evidence-based decision-making for targeted interventions.

It is important to note that the specific recommendations may vary depending on the local context and the unique challenges faced by the population.
AI Innovations Methodology
The provided text is a detailed description of a scientific study on the gut microbiome composition of South African populations. It includes information about the study design, data collection, and analysis methods. However, it does not directly address the request for innovations to improve access to maternal health or a methodology to simulate the impact of these innovations.

To provide recommendations for improving access to maternal health, it would be necessary to review existing literature, consult with experts in the field, and consider the specific context and challenges faced in the target population. Some potential innovations that could improve access to maternal health include:

1. Telemedicine and mobile health technologies: These technologies can enable remote consultations, monitoring, and access to healthcare services for pregnant women in rural or underserved areas.

2. Community health worker programs: Training and deploying community health workers who can provide basic maternal healthcare services, education, and support to women in their communities.

3. Mobile clinics and outreach programs: Bringing healthcare services, including prenatal care and maternal health services, directly to remote or underserved areas through mobile clinics or outreach programs.

4. Maternal health education and awareness campaigns: Developing and implementing educational programs and campaigns to raise awareness about maternal health issues, promote healthy behaviors, and provide information on available healthcare services.

To simulate the impact of these recommendations on improving access to maternal health, a methodology could involve the following steps:

1. Define the target population: Identify the specific population or region for which access to maternal health needs to be improved.

2. Collect baseline data: Gather relevant data on the current state of maternal health in the target population, including indicators such as maternal mortality rates, access to prenatal care, and availability of healthcare facilities.

3. Define simulation parameters: Determine the specific variables and parameters that will be used to simulate the impact of the recommendations. This could include factors such as the number of community health workers deployed, the coverage of mobile clinics, or the reach of educational campaigns.

4. Develop a simulation model: Create a mathematical or computational model that incorporates the baseline data and simulation parameters. This model should simulate the impact of the recommendations on maternal health outcomes, such as changes in maternal mortality rates or improvements in access to prenatal care.

5. Run simulations: Use the simulation model to run multiple scenarios, varying the parameters to assess the potential impact of different combinations of recommendations on maternal health outcomes.

6. Analyze results: Analyze the simulation results to evaluate the effectiveness of the recommendations in improving access to maternal health. This could involve comparing different scenarios, identifying key factors that contribute to positive outcomes, and assessing the cost-effectiveness of the interventions.

7. Refine and iterate: Based on the analysis of the simulation results, refine the recommendations and simulation model as needed. Iterate the process to further optimize the interventions and improve the accuracy of the simulations.

It’s important to note that the specific methodology for simulating the impact of recommendations on improving access to maternal health may vary depending on the available data, resources, and expertise.

Share this:
Facebook
Twitter
LinkedIn
WhatsApp
Email