Background: Preterm birth is a major global health challenge, the leading cause of death in children under 5 years of age, and a key measure of a population’s general health and nutritional status. Current clinical methods of estimating fetal gestational age are often inaccurate. For example, between 20 and 30 weeks of gestation, the width of the 95% prediction interval around the actual gestational age is estimated to be 18–36 days, even when the best ultrasound estimates are used. The aims of this study are to improve estimates of fetal gestational age and provide personalised predictions of future growth. Methods: Using ultrasound-derived, fetal biometric data, we developed a machine learning approach to accurately estimate gestational age. The accuracy of the method is determined by reference to exactly known facts pertaining to each fetus—specifically, intervals between ultrasound visits—rather than the date of the mother’s last menstrual period. The data stem from a sample of healthy, well-nourished participants in a large, multicentre, population-based study, the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st). The generalisability of the algorithm is shown with data from a different and more heterogeneous population (INTERBIO-21st Fetal Study). Findings: In the context of two large datasets, we estimated gestational age between 20 and 30 weeks of gestation with 95% confidence to within 3 days, using measurements made in a 10-week window spanning the second and third trimesters. Fetal gestational age can thus be estimated in the 20–30 weeks gestational age window with a prediction interval 3–5 times better than with any previous algorithm. This will enable improved management of individual pregnancies. 6-week forecasts of the growth trajectory for a given fetus are accurate to within 7 days. This will help identify at-risk fetuses more accurately than currently possible. At population level, the higher accuracy is expected to improve fetal growth charts and population health assessments. Interpretation: Machine learning can circumvent long-standing limitations in determining fetal gestational age and future growth trajectory, without recourse to often inaccurately known information, such as the date of the mother’s last menstrual period. Using this algorithm in clinical practice could facilitate the management of individual pregnancies and improve population-level health. Upon publication of this study, the algorithm for gestational age estimates will be provided for research purposes free of charge via a web portal. Funding: Bill & Melinda Gates Foundation, Office of Science (US Department of Energy), US National Science Foundation, and National Institute for Health Research Oxford Biomedical Research Centre.
The accuracy of gestational age estimation algorithms is commonly determined by comparison with other estimation methods.3, 5, 17, 18 Because these methods rely directly or indirectly on Naegele’s rule, this tends to propagate error, rather than quantify uncertainty. This problem can be circumvented by recourse to accurately known observables for each fetus. To establish the accuracy of our approach, we used three independent methods. For method A, the algorithm is provided with two sets of ultrasound measures from a previously unseen (test) fetus and asked to determine the time interval separating them. No timing information is provided to the algorithm. Deviations from the accurately known time interval quantify the uncertainty in the information extracted from the data, including gestational age. For method B, the algorithm is given a single set of previously unseen ultrasound measures obtained at one visit and asked to estimate gestational age. No timing information is provided to the algorithm. Gestational age estimates based on measures made during a single visit are possible in the majority of cases, because the estimate is often insensitive to the choice of the growth trajectory identified as characteristic of a specific fetus. The error in such estimates is defined as the discrepancy between the gestational age predicted from biometric measures made during one visit, and the gestational age estimated using measures from two visits, because the latter is deduced by comparison with the accurately known time elapsed between the two visits. In some cases, the gestational age estimate is sensitive to the choice of growth trajectory selected, causing the algorithm to return that “an estimate with accuracy better than the typical LMP-based estimates requires additional data”. For method C, the algorithm is given fetal biometric measures from two visits without timing information and is asked to forecast the time of a subsequent scan of the fetus. Error is defined as the discrepancy between the forecast and the actual time of a subsequent visit. To be useful, a machine-learning algorithm must be statistically accurate, and able to generalise from training data to previously unseen data, ideally from a different population. Using methods A, B, and C, we show the accuracy and generalisability of our approach with reference to data from two large, multicentre studies (appendix pp 16–18). Dataset 1 pertains to 4607 healthy, well-nourished women with singleton pregnancies at low risk of adverse maternal and perinatal outcomes, who participated in the Fetal Growth Longitudinal Study (FGLS), one of the main components of the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st), a large, multicentre, longitudinal, population-based project conducted between 2009 and 2016, in eight delimited, diverse, geographical urban areas.19, 20 The data used for train and test of our algorithm were collected during the FGLS. Briefly, the study involved performing serial examinations with the same ultrasound machine (Philips HD9; Philips Healthcare, Andover, MA, USA) every 5 weeks (within 1 week either side) after an initial scan at less than 14 weeks of gestation that confirmed the certain LMP-based gestational age. Hence, the possible ranges of scan visits were at 14–18, 19–23, 24–28, 29–33, 34–38, and 39–42 weeks of gestation. The fetal anthropometric measures obtained at each visit after 14 weeks of gestation included head circumference , abdominal circumference, and femur length. Each parameter was measured in triplicate from three separately obtained ultrasound images of each structure. The measurement protocol (including masking of the ultrasonographer to the values) and the training, standardisation, and quality control procedures have been reported elsewhere.19, 21, 22, 23 The generalisability of the algorithm—ie, its ability to yield accurate estimates using fetal biometric measures from a different dataset (no part of which was used for training)—was established using dataset 2, from the INTERBIO-21st Study (phase 2 of the INTERGROWTH-21st Project).24 The protocol in the longitudinal component of INTERBIO-21st (the Fetal Study) was almost identical to that used in FGLS. However, the population was much more heterogeneous and women were at higher risk of small for gestational age and preterm birth, with the aim of improving the functional classification of preterm birth and fetal growth restriction. The flowchart we used to select healthy FGLS participants for analysis (figure 1) is similar to that used by Papageorghiou and colleagues,5 thus allowing direct comparison of the results of previous analysis with the results obtained with the algorithm presented here. A total of 3076 participants in the INTERBIO-21st Fetal Study24 with complete data were included. In both datasets 1 and 2, the distribution of ultrasound data displays peaks at about monthly intervals. To prevent this non-uniform distribution from biasing our analyses, each train-and-test run was done on a randomly selected, uniform distribution of data. No participant was used for testing more than once in the study. We ensured that changing the number of analysed scans per day from 20 to 40 changed the 95% half-intervals by no more than 1 day. The most accurate results were obtained with 20 scans per day. Flowchart used to select a subset of the participants in the INTERGROWTH-21st Fetal Growth Longitudinal Study for analysis The procedure closely follows that used by Papageorghiou and colleagues.5 INTERGROWTH-21st=International Fetal and Newborn Growth Consortium for the 21st Century. AC=abdominal circumference. FL=femur length. HC=head circumference. The accuracy of our algorithm was assessed by a train-and-test approach with the FGLS dataset (dataset 1),20 using the analytical pipeline shown in the appendix (p 7). Briefly, participants were randomly divided into N subgroups. Each of the N subgroups was reserved in turn to serve later as the test data—ie, to measure the performance of the gestational age estimation algorithm with data not used in training. The participants in the other N–1 groups were pooled. Data vectors were randomly removed from each time bin to obtain a distribution of measures uniform in time. The resulting data were used for training. The performance of the algorithm was measured using the reserved test set. This train-and-test procedure was repeated until each of the N subgroups was used as the test dataset once, with the other N–1 subgroups used for training. The procedure resulted in N sets of test results, which were pooled to assess the statistical accuracy of the algorithm. The following values of N were used: 3, 4, 5, and 10. The 95% half-intervals obtained with different values of N differed by a fraction of 1 day. The results presented in this paper pertain to N=4, with 20 scans per day, but they were not sensitive to the choice of N over the range we have explored. To show generalisability, the algorithm produced by training with FGLS data20 was used to estimate gestational age using data from the INTERBIO-21st Fetal Study (dataset 2).24 The accuracy of our approach could be fully explored only over the period spanning 20 to 30 weeks of gestation, for two reasons. First, head circumference, abdominal circumference, and femur length data were available only after 14 weeks of gestation. This data truncation lead to reduced estimation accuracy before about 16 weeks of gestation. Second, our algorithm analyses a series of measures at a time.15 In the present study, each series consisted of 1024 measures. This reduced the total accessible timespan by about 8 weeks on each flank, which was further limited by the need for suitable measures within the truncated range. In principle, the accessible timespan can be extended by analysing shorter series of measures, or by using data more uniformly distributed in time, but the former can impose a noise penalty. All statistical results presented here were obtained using MATLAB (release 2015b and 2019a). The training step, which needs to be done only once, can be accomplished in about 2 h on a Linux computer with a 12-core, 3GHz Intel Xeon CPU and 256 GB RAM. For field or clinical applications, the outcome of training can be pre-stored in software or hardware, requiring no more than a few megabytes of memory or storage. We plan to make the tool generally accessible for research purposes free of charge. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.