Assessment of gestational age (GA) is key to provide optimal care during pregnancy. However, its accurate determination remains challenging in low- and middle-income countries, where access to obstetric ultrasound is limited. Hence, there is an urgent need to develop clinical approaches that allow accurate and inexpensive estimations of GA. We investigated the ability of urinary metabolites to predict GA at time of collection in a diverse multi-site cohort of healthy and pathological pregnancies (n = 99) using a broad-spectrum liquid chromatography coupled with mass spectrometry (LC–MS) platform. Our approach detected a myriad of steroid hormones and their derivatives including estrogens, progesterones, corticosteroids, and androgens which were associated with pregnancy progression. We developed a restricted model that predicted GA with high accuracy using three metabolites (rho = 0.87, RMSE = 1.58 weeks) that was validated in an independent cohort (n = 20). The predictions were more robust in pregnancies that went to term in comparison to pregnancies that ended prematurely. Overall, we demonstrated the feasibility of implementing urine metabolomics analysis in large-scale multi-site studies and report a predictive model of GA with a potential clinical value.
The study involves five cohorts from Asia and Africa as part of an international consortium, ‘Multi-omics for Mothers and Infants’ (MOMI). The three cohort sites, from the Alliance for Maternal and Neonatal Health Improvement (AMANHI) biorepository study are located in Sylhet (Bangladesh), Karachi (Pakistan), and Pemba (Tanzania). The two cohorts from the Global Alliance to Prevent Prematurity and Stillbirth (GAPPS) consortium are located in Matlab (Bangladesh, Preterm and Stillbirth Study [PreSSMat]) and Lusaka (Zambia, Zambian Preterm Birth Prevention Study [ZAPPS]). The primary objective of the AMANHI study is to establish a biorepository towards discovery of biomarkers of adverse pregnancy-related outcomes23. PreSSMat is a prospective cohort study designed to assess biological, environmental, and social determinants of adverse pregnancy outcomes24 while ZAPPS is a prospective cohort study and biorepository designed to characterize the factors associated with preterm birth and outcomes in Zambia25. The AMANHI study received ethical approval from the World Health Organization (WHO) Ethics Review Committee as well as local and institutional ethics committees for all three sites: icddr,b and John Hopkins University for Bangladesh, Aga Khan University for Pakistan and Zanzibar Medical Research and Ethics Committee (ZAMREC) and John Hopkins University for Tanzania. The ZAPPS cohort was approved by relevant authorities at both the University of Zambia School of Medicine and the University of North Carolina at Chapel Hill. PreSSMat received approval from the Research and Ethical Review Committees of the International Centre for Diarrhoeal Disease Research in Bangladesh (PR-14067). Informed consent for participation in the original study and for future research use of specimens was obtained from each woman prior to enrollment. The study was also approved by the Stanford Institutional Review Board (IRB 21956). All experiments were performed in accordance with relevant guidelines and regulations. Ninety-nine pregnant women were selected for the study and included 20 participants from each site with half delivering preterm (< 37 weeks’ GA) and half delivering at term (≥ 37 weeks’ GA). Only 9 samples were provided from term pregnancies at the Zambia site. Women with multiple births, congenital malformations, stillbirth, or induction of labor for any cause were excluded. Outcomes were assessed through either study procedures on the labor ward or, among those delivering elsewhere, through participant interview via direct phone calls, household visits, and/or medical record review at a postnatal visit. The study was comprised of a single urine sample for each participant (n = 99) that was collected at a prenatal visit after ultrasound confirmed at < 20 weeks of gestation. Ultrasound imaging was performed by trained sonologists and GA was estimated following guidelines from the American College of Obstetricians and Gynecologists3 (Bangladesh GAPPS ) and using INTERGROWTH-21st equations26 (Zambia) or Hadlock's formulas23,27 (AMANHI sites: Bangladesh, Pakistan, Tanzania). GA was reported in weeks. All study sites employed a uniform method for urine collection and handling. Urine samples were collected at any time of the day, aliquoted and frozen at − 80 °C within 2 h of collection. Deidentified urine aliquots were shipped on dry ice from each biorepository to Stanford University as a single batch and under continuous temperature monitoring. Urine samples from 20 uncomplicated pregnancies collected between 8 and 19 weeks of gestation at the Lucile Packard Children’s Hospital at Stanford University, served as the validation cohort. LC–MS-grade solvents and mobile phase modifiers were obtained from Fisher Scientific (water, acetonitrile, methanol) and Sigma − Aldrich (acetic acid, ammonium acetate). Urine samples were analyzed using a broad-spectrum metabolomics platform consisting of hydrophilic interaction chromatography (HILIC) and reverse phase liquid chromatography (RPLC)–MS12. Frozen urine samples were thawed on ice and centrifuged at 17,000g for 10 min at 4 °C. Supernatants (25 µl) were then diluted 1:4 with 75% acetonitrile and 100% water for HILIC- and RPLC-MS experiments, respectively. Each sample was spiked-in with 15 analytical-grade internal standards (IS). Samples for HILIC-MS experiments were further centrifuged at 21,000g for 10 min at 4 °C to precipitate proteins. Metabolic extracts were analyzed using HILIC and RPLC separations in both positive and negative ionization modes as previously described12. Data were acquired on a Thermo Q Exactive HF mass spectrometer equipped with a Heated Electrospray Ionization probe (HESI-II) and operating in full MS scan mode. MS/MS data were acquired at different fragmentation energies (NCE 25, 35 and 50) on pooled samples (QC) consisting of an equimolar mixture of all the samples in the study. HILIC experiments were performed using a ZIC-HILIC column 2.1 × 100 mm, 3.5 μm, 200 Å (Merck Millipore) and mobile phase solvents consisting of 10 mM ammonium acetate in 50/50 acetonitrile/water (A) and 10 mM ammonium acetate in 95/5 acetonitrile/water (B). RPLC experiments were performed using a Hypersil GOLD column 2.1 × 150 mm, 1.9 µm, 175 Å (Thermo Scientific) and mobile phase solvents consisting of 0.06% acetic acid in water (A) and 0.06% acetic acid in methanol (B). Data quality was ensured by: (1) sample randomization for metabolite extraction and data acquisition, (2) multiple injections of a pooled sample to equilibrate the LC–MS system prior to running the sequence (12 and 6 injections for HILIC and RPLC methods, respectively), (3) spike-in labeled IS during sample preparation to control for extraction efficiency and evaluate LC–MS performance, (4) checking mass accuracy, retention time and peak shape of the IS in each sample and (5) injection of a pooled sample every 10 injections to control for signal deviation over time. Data from each mode were independently processed using Progenesis QI software (v2.3) (Nonlinear Dynamics) as recently described28. Metabolic features from blanks and that did not show sufficient linearity upon dilution in QC samples (r 2/3 of the samples were kept for further analysis. Inter- and intra-batch variations were corrected by applying locally estimated scatterplot smoothing local regression (LOESS) on pooled samples injected repetitively along the batches (span = 0.75). Data were acquired in four batches for HILIC and RPLC modes. Dilution effects were corrected using probabilistic quotient normalization (PQN)29. Missing values were imputed by drawing from a random distribution of low values in the corresponding sample. Multiple aliquots (1 to 4) were analyzed for each sample (n = 172 from 99 unique samples). Data from replicates were aggregated by taking the mean (n = 2) or median (n = 3 to 4). Data from each mode were then merged, producing a dataset containing 6630 metabolic features. Metabolite abundances were reported as spectral counts. Peak annotation was first performed by matching experimental m/z, retention time and MS/MS spectra to an in-house library of analytical-grade standards11. Remaining peaks were identified by matching experimental m/z and fragmentation spectra to publicly available databases including HMDB (http://www.hmdb.ca/), MoNA (http://mona.fiehnlab.ucdavis.edu/) and MassBank (http://www.massbank.jp/) using the R package ‘metID’ (v0.2.0)30. Briefly, metabolic feature tables from Progenesis QI were matched to fragmentation spectra with a m/z and a retention time window of ± 15 ppm and ± 30 s (HILIC) and ± 20 s (RPLC), respectively. When multiple MS/MS spectra match a single metabolic feature, all matched MS/MS spectra were used for the identification. Next, MS1 and MS2 pairs were searched against public databases and a similarity score was calculated using the forward dot–product algorithm which considers both fragments and intensities31. Metabolites were reported if the similarity score was above 0.4. Spectra from metabolic features of interest important in random forest models (see below) were further investigated manually to confirm identification and were reported in Table S3. We used the Metabolomics Standards Initiative (MSI) level of confidence to grade metabolite annotation confidence (level 1–level 4). Level 1 represents formal identifications where the biological signal matches accurate mass, retention time and fragmentation spectra of an authentic standard run on the same platform. For level 2 identification, the biological signal matches accurate mass and fragmentation spectra available in one of the public databases listed above. Level 3 represents putative identifications that are the most likely name based on previous knowledge of urine composition. Level 4 consists in unknown metabolites. Some metabolites eluted in multiple peaks and are listed with a number in parenthesis following the metabolite name indicating the order of elution. A random forest algorithm was used to build multivariate prediction models to estimate GA at the time of sample collection using all samples (n = 99), samples from term (n = 49) and samples from preterm deliveries (n = 50). The parameters of the models were optimized using internal cross-validation and an external leave-one-out cross-validation strategy was implemented to test the predictions on the excluded sample. The final results were reported as an aggregate of all blinded predictions. A restricted model containing 3 metabolites was developed and validated using an independent cohort (n = 20, Stanford cohort). Importance of metabolic features were derived from the models while P-values were calculated from Spearman correlations. Superclass level classification was performed using International Chemical Identifiers (InChI) keys for unique metabolic features (n = 2192) using the ClassyFire Batch search https://cfb.fiehnlab.ucdavis.edu/32 (Table S1). We used the Mummichog 1 algorithm33 in the web tool MetaboAnalyst 434 to search for enriched pathways. Mummichog leverages the organization of metabolic networks to predict functional activity directly from metabolic feature tables, bypassing metabolite identification. Significance of pathways was determined by the one-sided Fisher exact t-test using KEGG pathways35–37. P-values ≤ 0.05 were considered significant. Visualization of metabolites belonging to significant pathways on the KEGG map was generated using network explorer tool in MetaboAnalyst 4. Pairwise Spearman’s rank correlations were calculated using the R package ‘Hmisc’ (v3.15–0) and weighted, undirected networks were plotted with ‘igraph’ (v0.7.1). Correlations with Bonferroni adjusted P-values ≤ 0.01 were included and displayed via the Fruchterman-Reingold method. Nodes were color-coded by significance in the term and preterm models with node size representing the betweenness centrality. The images in Fig. 1a were obtained as follows: the world map was downloaded from https://www.creativeswall.com/25-free-vector-world-maps/ and edited using adobe illustrator CS6 (v16.0.0), the drawing of the mass spectrometer was obtained from Thermo Scientific, the drawing of a computer was downloaded from https://www.netclipart.com/isee/oTmR_desktop-computer-png-clipart-computer-logo-free-download/ and the silhouette of a pregnant woman was downloaded from http://clipart-library.com/free/pregnant-woman-silhouette-clipart.html.