Achieving accurate estimates of fetal gestational age and personalised predictions of fetal growth based on data from an international prospective cohort study: a population-based machine learning study

The Lancet Digital Health, Volume 2, No. 7, Year 2020

Interpretação

Background: Preterm birth is a major global health challenge, the leading cause of death in children under 5 years of age, and a key measure of a population’s general health and nutritional status. Current clinical methods of estimating fetal gestational age are often inaccurate. For example, between 20 and 30 weeks of gestation, the width of the 95% prediction interval around the actual gestational age is estimated to be 18–36 days, even when the best ultrasound estimates are used. The aims of this study are to improve estimates of fetal gestational age and provide personalised predictions of future growth. Methods: Using ultrasound-derived, fetal biometric data, we developed a machine learning approach to accurately estimate gestational age. The accuracy of the method is determined by reference to exactly known facts pertaining to each fetus—specifically, intervals between ultrasound visits—rather than the date of the mother’s last menstrual period. The data stem from a sample of healthy, well-nourished participants in a large, multicentre, population-based study, the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st). The generalisability of the algorithm is shown with data from a different and more heterogeneous population (INTERBIO-21st Fetal Study). Findings: In the context of two large datasets, we estimated gestational age between 20 and 30 weeks of gestation with 95% confidence to within 3 days, using measurements made in a 10-week window spanning the second and third trimesters. Fetal gestational age can thus be estimated in the 20–30 weeks gestational age window with a prediction interval 3–5 times better than with any previous algorithm. This will enable improved management of individual pregnancies. 6-week forecasts of the growth trajectory for a given fetus are accurate to within 7 days. This will help identify at-risk fetuses more accurately than currently possible. At population level, the higher accuracy is expected to improve fetal growth charts and population health assessments. Interpretation: Machine learning can circumvent long-standing limitations in determining fetal gestational age and future growth trajectory, without recourse to often inaccurately known information, such as the date of the mother’s last menstrual period. Using this algorithm in clinical practice could facilitate the management of individual pregnancies and improve population-level health. Upon publication of this study, the algorithm for gestational age estimates will be provided for research purposes free of charge via a web portal. Funding: Bill & Melinda Gates Foundation, Office of Science (US Department of Energy), US National Science Foundation, and National Institute for Health Research Oxford Biomedical Research Centre. The accuracy of gestational age estimation algorithms is commonly determined by comparison with other estimation methods.3, 5, 17, 18 Because these methods rely directly or indirectly on Naegele’s rule, this tends to propagate error, rather than quantify uncertainty. This problem can be circumvented by recourse to accurately known observables for each fetus. To establish the accuracy of our approach, we used three independent methods. For method A, the algorithm is provided with two sets of ultrasound measures from a previously unseen (test) fetus and asked to determine the time interval separating them. No timing information is provided to the algorithm. Deviations from the accurately known time interval quantify the uncertainty in the information extracted from the data, including gestational age. For method B, the algorithm is given a single set of previously unseen ultrasound measures obtained at one visit and asked to estimate gestational age. No timing information is provided to the algorithm. Gestational age estimates based on measures made during a single visit are possible in the majority of cases, because the estimate is often insensitive to the choice of the growth trajectory identified as characteristic of a specific fetus. The error in such estimates is defined as the discrepancy between the gestational age predicted from biometric measures made during one visit, and the gestational age estimated using measures from two visits, because the latter is deduced by comparison with the accurately known time elapsed between the two visits. In some cases, the gestational age estimate is sensitive to the choice of growth trajectory selected, causing the algorithm to return that “an estimate with accuracy better than the typical LMP-based estimates requires additional data”. For method C, the algorithm is given fetal biometric measures from two visits without timing information and is asked to forecast the time of a subsequent scan of the fetus. Error is defined as the discrepancy between the forecast and the actual time of a subsequent visit. To be useful, a machine-learning algorithm must be statistically accurate, and able to generalise from training data to previously unseen data, ideally from a different population. Using methods A, B, and C, we show the accuracy and generalisability of our approach with reference to data from two large, multicentre studies (appendix pp 16–18). Dataset 1 pertains to 4607 healthy, well-nourished women with singleton pregnancies at low risk of adverse maternal and perinatal outcomes, who participated in the Fetal Growth Longitudinal Study (FGLS), one of the main components of the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st), a large, multicentre, longitudinal, population-based project conducted between 2009 and 2016, in eight delimited, diverse, geographical urban areas.19, 20 The data used for train and test of our algorithm were collected during the FGLS. Briefly, the study involved performing serial examinations with the same ultrasound machine (Philips HD9; Philips Healthcare, Andover, MA, USA) every 5 weeks (within 1 week either side) after an initial scan at less than 14 weeks of gestation that confirmed the certain LMP-based gestational age. Hence, the possible ranges of scan visits were at 14–18, 19–23, 24–28, 29–33, 34–38, and 39–42 weeks of gestation. The fetal anthropometric measures obtained at each visit after 14 weeks of gestation included head circumference , abdominal circumference, and femur length. Each parameter was measured in triplicate from three separately obtained ultrasound images of each structure. The measurement protocol (including masking of the ultrasonographer to the values) and the training, standardisation, and quality control procedures have been reported elsewhere.19, 21, 22, 23 The generalisability of the algorithm—ie, its ability to yield accurate estimates using fetal biometric measures from a different dataset (no part of which was used for training)—was established using dataset 2, from the INTERBIO-21st Study (phase 2 of the INTERGROWTH-21st Project).24 The protocol in the longitudinal component of INTERBIO-21st (the Fetal Study) was almost identical to that used in FGLS. However, the population was much more heterogeneous and women were at higher risk of small for gestational age and preterm birth, with the aim of improving the functional classification of preterm birth and fetal growth restriction. The flowchart we used to select healthy FGLS participants for analysis (figure 1) is similar to that used by Papageorghiou and colleagues,5 thus allowing direct comparison of the results of previous analysis with the results obtained with the algorithm presented here. A total of 3076 participants in the INTERBIO-21st Fetal Study24 with complete data were included. In both datasets 1 and 2, the distribution of ultrasound data displays peaks at about monthly intervals. To prevent this non-uniform distribution from biasing our analyses, each train-and-test run was done on a randomly selected, uniform distribution of data. No participant was used for testing more than once in the study. We ensured that changing the number of analysed scans per day from 20 to 40 changed the 95% half-intervals by no more than 1 day. The most accurate results were obtained with 20 scans per day. Flowchart used to select a subset of the participants in the INTERGROWTH-21st Fetal Growth Longitudinal Study for analysis The procedure closely follows that used by Papageorghiou and colleagues.5 INTERGROWTH-21st=International Fetal and Newborn Growth Consortium for the 21st Century. AC=abdominal circumference. FL=femur length. HC=head circumference. The accuracy of our algorithm was assessed by a train-and-test approach with the FGLS dataset (dataset 1),20 using the analytical pipeline shown in the appendix (p 7). Briefly, participants were randomly divided into N subgroups. Each of the N subgroups was reserved in turn to serve later as the test data—ie, to measure the performance of the gestational age estimation algorithm with data not used in training. The participants in the other N–1 groups were pooled. Data vectors were randomly removed from each time bin to obtain a distribution of measures uniform in time. The resulting data were used for training. The performance of the algorithm was measured using the reserved test set. This train-and-test procedure was repeated until each of the N subgroups was used as the test dataset once, with the other N–1 subgroups used for training. The procedure resulted in N sets of test results, which were pooled to assess the statistical accuracy of the algorithm. The following values of N were used: 3, 4, 5, and 10. The 95% half-intervals obtained with different values of N differed by a fraction of 1 day. The results presented in this paper pertain to N=4, with 20 scans per day, but they were not sensitive to the choice of N over the range we have explored. To show generalisability, the algorithm produced by training with FGLS data20 was used to estimate gestational age using data from the INTERBIO-21st Fetal Study (dataset 2).24 The accuracy of our approach could be fully explored only over the period spanning 20 to 30 weeks of gestation, for two reasons. First, head circumference, abdominal circumference, and femur length data were available only after 14 weeks of gestation. This data truncation lead to reduced estimation accuracy before about 16 weeks of gestation. Second, our algorithm analyses a series of measures at a time.15 In the present study, each series consisted of 1024 measures. This reduced the total accessible timespan by about 8 weeks on each flank, which was further limited by the need for suitable measures within the truncated range. In principle, the accessible timespan can be extended by analysing shorter series of measures, or by using data more uniformly distributed in time, but the former can impose a noise penalty. All statistical results presented here were obtained using MATLAB (release 2015b and 2019a). The training step, which needs to be done only once, can be accomplished in about 2 h on a Linux computer with a 12-core, 3GHz Intel Xeon CPU and 256 GB RAM. For field or clinical applications, the outcome of training can be pre-stored in software or hardware, requiring no more than a few megabytes of memory or storage. We plan to make the tool generally accessible for research purposes free of charge. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

AI Digest

Study Justification:

– Preterm birth is a significant global health challenge and a leading cause of death in children under 5 years of age.
– Current methods of estimating fetal gestational age are often inaccurate, leading to challenges in managing individual pregnancies and assessing population health.
– This study aims to improve estimates of fetal gestational age and provide personalized predictions of future growth using a machine learning approach.

Study Highlights:

– The study developed a machine learning algorithm to accurately estimate gestational age using ultrasound-derived fetal biometric data.
– The algorithm was tested on two large datasets, showing that gestational age can be estimated with 95% confidence to within 3 days between 20 and 30 weeks of gestation.
– The algorithm also provided accurate 6-week forecasts of fetal growth trajectory, helping to identify at-risk fetuses more accurately.
– The higher accuracy of gestational age estimation is expected to improve fetal growth charts and population health assessments.

Study Recommendations:

– Implement the machine learning algorithm for gestational age estimation in clinical practice to facilitate the management of individual pregnancies and improve population-level health.
– Provide the algorithm for research purposes free of charge via a web portal upon publication of the study.

Key Role Players:

– Researchers and scientists involved in developing and validating the machine learning algorithm.
– Obstetricians and gynecologists who will use the algorithm in clinical practice.
– Policy makers and public health officials responsible for implementing the algorithm at a population level.

Cost Items for Planning Recommendations:

– Development and maintenance of the web portal for providing the algorithm for research purposes.
– Training and education for obstetricians and gynecologists on how to use the algorithm in clinical practice.
– Infrastructure and resources for integrating the algorithm into existing healthcare systems.
– Ongoing research and monitoring to assess the impact of the algorithm on pregnancy management and population health.

Força da prova

The strength of evidence for this abstract is 9 out of 10.
The evidence in the abstract is strong and supported by a large, multicenter study. The machine learning approach accurately estimates gestational age and provides personalized predictions of fetal growth. The algorithm was tested on two independent datasets, demonstrating its accuracy and generalizability. The study provides specific details about the methodology and data collection process. To further improve the evidence, it would be beneficial to include information about the sample size and demographics of the study participants.

Resumo

The accuracy of gestational age estimation algorithms is commonly determined by comparison with other estimation methods.3, 5, 17, 18 Because these methods rely directly or indirectly on Naegele’s rule, this tends to propagate error, rather than quantify uncertainty. This problem can be circumvented by recourse to accurately known observables for each fetus. To establish the accuracy of our approach, we used three independent methods. For method A, the algorithm is provided with two sets of ultrasound measures from a previously unseen (test) fetus and asked to determine the time interval separating them. No timing information is provided to the algorithm. Deviations from the accurately known time interval quantify the uncertainty in the information extracted from the data, including gestational age. For method B, the algorithm is given a single set of previously unseen ultrasound measures obtained at one visit and asked to estimate gestational age. No timing information is provided to the algorithm. Gestational age estimates based on measures made during a single visit are possible in the majority of cases, because the estimate is often insensitive to the choice of the growth trajectory identified as characteristic of a specific fetus. The error in such estimates is defined as the discrepancy between the gestational age predicted from biometric measures made during one visit, and the gestational age estimated using measures from two visits, because the latter is deduced by comparison with the accurately known time elapsed between the two visits. In some cases, the gestational age estimate is sensitive to the choice of growth trajectory selected, causing the algorithm to return that “an estimate with accuracy better than the typical LMP-based estimates requires additional data”. For method C, the algorithm is given fetal biometric measures from two visits without timing information and is asked to forecast the time of a subsequent scan of the fetus. Error is defined as the discrepancy between the forecast and the actual time of a subsequent visit. To be useful, a machine-learning algorithm must be statistically accurate, and able to generalise from training data to previously unseen data, ideally from a different population. Using methods A, B, and C, we show the accuracy and generalisability of our approach with reference to data from two large, multicentre studies (appendix pp 16–18). Dataset 1 pertains to 4607 healthy, well-nourished women with singleton pregnancies at low risk of adverse maternal and perinatal outcomes, who participated in the Fetal Growth Longitudinal Study (FGLS), one of the main components of the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st), a large, multicentre, longitudinal, population-based project conducted between 2009 and 2016, in eight delimited, diverse, geographical urban areas.19, 20 The data used for train and test of our algorithm were collected during the FGLS. Briefly, the study involved performing serial examinations with the same ultrasound machine (Philips HD9; Philips Healthcare, Andover, MA, USA) every 5 weeks (within 1 week either side) after an initial scan at less than 14 weeks of gestation that confirmed the certain LMP-based gestational age. Hence, the possible ranges of scan visits were at 14–18, 19–23, 24–28, 29–33, 34–38, and 39–42 weeks of gestation. The fetal anthropometric measures obtained at each visit after 14 weeks of gestation included head circumference , abdominal circumference, and femur length. Each parameter was measured in triplicate from three separately obtained ultrasound images of each structure. The measurement protocol (including masking of the ultrasonographer to the values) and the training, standardisation, and quality control procedures have been reported elsewhere.19, 21, 22, 23 The generalisability of the algorithm—ie, its ability to yield accurate estimates using fetal biometric measures from a different dataset (no part of which was used for training)—was established using dataset 2, from the INTERBIO-21st Study (phase 2 of the INTERGROWTH-21st Project).24 The protocol in the longitudinal component of INTERBIO-21st (the Fetal Study) was almost identical to that used in FGLS. However, the population was much more heterogeneous and women were at higher risk of small for gestational age and preterm birth, with the aim of improving the functional classification of preterm birth and fetal growth restriction. The flowchart we used to select healthy FGLS participants for analysis (figure 1) is similar to that used by Papageorghiou and colleagues,5 thus allowing direct comparison of the results of previous analysis with the results obtained with the algorithm presented here. A total of 3076 participants in the INTERBIO-21st Fetal Study24 with complete data were included. In both datasets 1 and 2, the distribution of ultrasound data displays peaks at about monthly intervals. To prevent this non-uniform distribution from biasing our analyses, each train-and-test run was done on a randomly selected, uniform distribution of data. No participant was used for testing more than once in the study. We ensured that changing the number of analysed scans per day from 20 to 40 changed the 95% half-intervals by no more than 1 day. The most accurate results were obtained with 20 scans per day. Flowchart used to select a subset of the participants in the INTERGROWTH-21st Fetal Growth Longitudinal Study for analysis The procedure closely follows that used by Papageorghiou and colleagues.5 INTERGROWTH-21st=International Fetal and Newborn Growth Consortium for the 21st Century. AC=abdominal circumference. FL=femur length. HC=head circumference. The accuracy of our algorithm was assessed by a train-and-test approach with the FGLS dataset (dataset 1),20 using the analytical pipeline shown in the appendix (p 7). Briefly, participants were randomly divided into N subgroups. Each of the N subgroups was reserved in turn to serve later as the test data—ie, to measure the performance of the gestational age estimation algorithm with data not used in training. The participants in the other N–1 groups were pooled. Data vectors were randomly removed from each time bin to obtain a distribution of measures uniform in time. The resulting data were used for training. The performance of the algorithm was measured using the reserved test set. This train-and-test procedure was repeated until each of the N subgroups was used as the test dataset once, with the other N–1 subgroups used for training. The procedure resulted in N sets of test results, which were pooled to assess the statistical accuracy of the algorithm. The following values of N were used: 3, 4, 5, and 10. The 95% half-intervals obtained with different values of N differed by a fraction of 1 day. The results presented in this paper pertain to N=4, with 20 scans per day, but they were not sensitive to the choice of N over the range we have explored. To show generalisability, the algorithm produced by training with FGLS data20 was used to estimate gestational age using data from the INTERBIO-21st Fetal Study (dataset 2).24 The accuracy of our approach could be fully explored only over the period spanning 20 to 30 weeks of gestation, for two reasons. First, head circumference, abdominal circumference, and femur length data were available only after 14 weeks of gestation. This data truncation lead to reduced estimation accuracy before about 16 weeks of gestation. Second, our algorithm analyses a series of measures at a time.15 In the present study, each series consisted of 1024 measures. This reduced the total accessible timespan by about 8 weeks on each flank, which was further limited by the need for suitable measures within the truncated range. In principle, the accessible timespan can be extended by analysing shorter series of measures, or by using data more uniformly distributed in time, but the former can impose a noise penalty. All statistical results presented here were obtained using MATLAB (release 2015b and 2019a). The training step, which needs to be done only once, can be accomplished in about 2 h on a Linux computer with a 12-core, 3GHz Intel Xeon CPU and 256 GB RAM. For field or clinical applications, the outcome of training can be pre-stored in software or hardware, requiring no more than a few megabytes of memory or storage. We plan to make the tool generally accessible for research purposes free of charge. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Materiais

https://efashare.b-cdn.net/materials/681f108af3a3e3171711f202447c6ebdbe2e39b8158f58cfb57ec73bd39c2995/mmc1.pdf

Inovações

One potential innovation to improve access to maternal health is the development of a machine learning algorithm for accurately estimating fetal gestational age and providing personalized predictions of future growth. This algorithm uses ultrasound-derived fetal biometric data and is able to estimate gestational age between 20 and 30 weeks with a high level of accuracy, within a 3-day prediction interval. This is a significant improvement compared to previous algorithms, which often have wider prediction intervals. The algorithm can also provide 6-week forecasts of the growth trajectory for a given fetus, accurate to within 7 days. This innovation can help identify at-risk fetuses more accurately and improve the management of individual pregnancies. Additionally, at a population level, the higher accuracy of the algorithm is expected to improve fetal growth charts and population health assessments. The algorithm will be made available for research purposes free of charge via a web portal. This innovation has been funded by the Bill & Melinda Gates Foundation, Office of Science (US Department of Energy), US National Science Foundation, and National Institute for Health Research Oxford Biomedical Research Centre.

AI Innovations Description

The recommendation to improve access to maternal health is to develop and implement a machine learning algorithm for accurate estimation of fetal gestational age and personalized predictions of fetal growth. This algorithm uses ultrasound-derived fetal biometric data and can provide more accurate estimates of gestational age and future growth trajectory compared to current methods. The algorithm has been tested and validated using data from two large, multicenter studies, including a diverse population. By improving the accuracy of gestational age estimation, this innovation can facilitate better management of individual pregnancies and improve population-level health assessments. The algorithm will be made available for research purposes free of charge through a web portal. This recommendation is supported by funding from the Bill & Melinda Gates Foundation, Office of Science (US Department of Energy), US National Science Foundation, and National Institute for Health Research Oxford Biomedical Research Centre.

AI Innovations Methodology

The study described in the provided text focuses on improving estimates of fetal gestational age and providing personalized predictions of future growth using a machine learning approach. The methodology used to simulate the impact of these recommendations on improving access to maternal health involves three independent methods: A, B, and C.

Method A involves providing the algorithm with two sets of ultrasound measures from a previously unseen fetus and asking it to determine the time interval separating them. This helps quantify the uncertainty in the information extracted from the data, including gestational age.

Method B involves giving the algorithm a single set of previously unseen ultrasound measures obtained at one visit and asking it to estimate gestational age. This method allows for gestational age estimates based on measures made during a single visit, which is often insensitive to the choice of the growth trajectory identified as characteristic of a specific fetus.

Method C involves providing the algorithm with fetal biometric measures from two visits without timing information and asking it to forecast the time of a subsequent scan of the fetus. The error is defined as the discrepancy between the forecast and the actual time of the subsequent visit.

To assess the accuracy and generalizability of the algorithm, two large datasets were used: Dataset 1 from the Fetal Growth Longitudinal Study (FGLS) and Dataset 2 from the INTERBIO-21st Study. Dataset 1 included 4607 healthy, well-nourished women with singleton pregnancies, while Dataset 2 included 3076 participants at higher risk of small for gestational age and preterm birth.

The accuracy of the algorithm was assessed using a train-and-test approach with the FGLS dataset. Participants were randomly divided into subgroups, with one subgroup reserved as the test data and the others used for training. This train-and-test procedure was repeated multiple times to assess the statistical accuracy of the algorithm.

The generalizability of the algorithm was demonstrated by using the algorithm trained with FGLS data to estimate gestational age using data from the INTERBIO-21st Fetal Study.

Overall, this methodology allows for the accurate estimation of gestational age and personalized predictions of fetal growth, improving the management of individual pregnancies and potentially improving population-level health assessments.

Autores & Coautores

Statistics:

Citations: 24

Authors: 98

Identifiers:

DOI: 10.1016/S2589-7500(20)30131-X

Research Areas:

Health System and Policy, Maternal and Child Health, Sexual and Reproductive Health, Social Determinants, Technology and Innovations

Study Design:

Cohort Study, Cross Sectional Study, Grounded Theory

Study Approach:

Quantitative

Participants Gender:

Female

Partilhar isto: