External validation of inpatient neonatal mortality prediction models in high-mortality settings

listen audio

Study Justification:
– The study aims to evaluate the predictive accuracy of two neonatal mortality prediction models in high-mortality, low-resource settings.
– External validation of prediction models is important to ensure their reliability and generalizability.
– The study addresses the need for rigorous external validation of prediction models in neonatal care.
Highlights:
– The original prediction models were found to overestimate mortality risks and had poor calibration and discrimination.
– After model updating, the calibration of the models improved, and they showed good discrimination.
– The updated models performed better than any existing neonatal in-hospital mortality prediction models for similar settings.
Recommendations:
– The study recommends the use of the updated prediction models for predicting in-hospital neonatal mortality in low-resource settings.
– The models can also be used for case-mix adjustment when comparing similar hospital settings.
Key Role Players:
– Researchers and statisticians for model development, validation, and updating.
– Clinicians and data clerks for data collection and quality assurance.
– Hospital administrators and policymakers for implementing the recommendations.
Cost Items for Planning Recommendations:
– Research and data collection costs, including personnel salaries, data entry tools, and quality assurance procedures.
– Training and capacity building for data clerks and clinicians.
– Implementation costs for integrating the prediction models into routine neonatal care.
– Monitoring and evaluation costs to assess the impact of the models on neonatal mortality rates.

The strength of evidence for this abstract is 7 out of 10.
The evidence in the abstract is moderately strong. The study used retrospectively collected routine clinical data from 16 Kenyan hospitals to externally validate and update two neonatal mortality prediction models. The models showed good discrimination and improved calibration after updating. The study followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines and obtained approval from the Kenya Medical Research Institute. However, the abstract does not provide information on potential limitations or recommendations for future research.

Background: Two neonatal mortality prediction models, the Neonatal Essential Treatment Score (NETS) which uses treatments prescribed at admission and the Score for Essential Neonatal Symptoms and Signs (SENSS) which uses basic clinical signs, were derived in high-mortality, low-resource settings to utilise data more likely to be available in these settings. In this study, we evaluate the predictive accuracy of two neonatal prediction models for all-cause in-hospital mortality. Methods: We used retrospectively collected routine clinical data recorded by duty clinicians at admission from 16 Kenyan hospitals used to externally validate and update the SENSS and NETS models that were initially developed from the data from the largest Kenyan maternity hospital to predict in-hospital mortality. Model performance was evaluated by assessing discrimination and calibration. Discrimination, the ability of the model to differentiate between those with and without the outcome, was measured using the c-statistic. Calibration, the agreement between predictions from the model and what was observed, was measured using the calibration intercept and slope (with values of 0 and 1 denoting perfect calibration). Results: At initial external validation, the estimated mortality risks from the original SENSS and NETS models were markedly overestimated with calibration intercepts of − 0.703 (95% CI − 0.738 to − 0.669) and − 1.109 (95% CI − 1.148 to − 1.069) and too extreme with calibration slopes of 0.565 (95% CI 0.552 to 0.577) and 0.466 (95% CI 0.451 to 0.480), respectively. After model updating, the calibration of the model improved. The updated SENSS and NETS models had calibration intercepts of 0.311 (95% CI 0.282 to 0.350) and 0.032 (95% CI − 0.002 to 0.066) and calibration slopes of 1.029 (95% CI 1.006 to 1.051) and 0.799 (95% CI 0.774 to 0.823), respectively, while showing good discrimination with c-statistics of 0.834 (95% CI 0.829 to 0.839) and 0.775 (95% CI 0.768 to 0.782), respectively. The overall calibration performance of the updated SENSS and NETS models was better than any existing neonatal in-hospital mortality prediction models externally validated for settings comparable to Kenya. Conclusion: Few prediction models undergo rigorous external validation. We show how external validation using data from multiple locations enables model updating and improving their performance and potential value. The improved models indicate it is possible to predict in-hospital mortality using either treatments or signs and symptoms derived from routine neonatal data from low-resource hospital settings also making possible their use for case-mix adjustment when contrasting similar hospital settings.

The reporting of this study follows the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines, which is a set of recommendations for the reporting of studies developing, validating, or updating prediction models for prognostic purposes [17]. The Scientific and Ethics Review Unit of the Kenya Medical Research Institute (KEMRI) approved the collection of the de-identified data that provides the basis for this study as part of the Clinical Information Network (CIN). The CIN is run in partnership with the Ministry of Health (MoH) and participating hospitals. Individual consent for access to the de-identified patient data was not required. The study used data on all patients admitted to the New-Born Units (NBUs) from 16 public hospitals representative of different malaria transmission zones in Kenya, purposefully selected in partnership with the MoH. From the map in Fig. 1, the hospitals that cluster west of the map are in moderate to high malaria transmission zone while the cluster at the centre of the map are in moderate to low malaria transmission zones. These hospitals largely provide maternal care services to immediately surrounding populations including accepting referrals from smaller rural clinics. They were purposefully selected to have moderately sized NBUs with an interquartile range of annual NBU inpatient admissions of 550 to 1640 (Fig. 1). Hospitals providing data for model derivation and external validation represented by the dots. The hospitals that cluster west of the map are in moderate to high malaria transmission zone while the cluster at the centre of the map is in moderate to low malaria transmission zones De-identified patient-level data were obtained after being recorded by clinicians as part of routine care. This data collection system linked to the CIN includes data quality assurance procedures and is described in detail elsewhere [11, 18, 19]. In brief, structured paper newborn admission record (NAR) and NBU exit forms that are endorsed by the Kenyan MoH are the primary data sources for the CIN. CIN supports one data clerk in each hospital to abstract data from the paper hospital records each day for all patients after discharge with the data entered directly into a non-proprietary Research Electronic Data Capture (REDCap) tool [20] with inbuilt range and validity checks. Data entry is guided by a standard operating procedure manual that forms the basis of the data clerks’ training with automated error-checking systems. To ensure no record is missed, the research team benchmarks the admission numbers entered in the CIN database with the aggregate statistics submitted to the MoH. External data quality assurance is done by KEMRI research assistants who perform an on-site concordance check every 3 months by comparing results from 5% randomly selected records they re-enter into REDCap to data clerks’ entries. The overall concordance of the external data quality audits has been ranging between 87 and 92% over time with feedback given to the data clerks and any challenges addressed for continuous improvement of data quality. This study included neonates admitted to the NBUs between August 2016 and March 2020, from 16 hospitals representing different regions of the country, with 15 hospitals providing the external validation dataset (n = 53,909) (Fig. 1) and the 16th hospital dataset used for the model derivation and temporal validation. For objective 2, the data that was used for the model updating (i.e. re-estimating all the original SENSS and NETS regression coefficients) consisted of derivation stage dataset (April 2014 to December 2015: n = 5427), temporal validation stage dataset (January 2016 to July 2016: n = 1627), and additional data collected from August 2016 to December 2020 (n = 8848), all from the same hospital (16th hospital). Model updating is typically required where there is observed deterioration in model performance in the new population (e.g. during model external validation) [21]. We provide explanations of the meaning and significance of the different datasets in Additional file 1: Table S1. The outcome was all-cause in-hospital neonatal unit mortality. Outcome assessment was blind to predictor distribution as the hospital data clerks were unaware of the study [9]. No new predictors were considered for SENSS and NETS models’ external validation and updating, only those used in the derivation and temporal validation study were included [13, 21]. For the NETS model, the use/non-use of supplementary oxygen, enteral feeds, intravenous fluids, first-line intravenous antibiotics (penicillin and gentamicin), and parenteral phenobarbital predictors were used [10]. For the SENSS model, the presence or absence of difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, were used [10, 13]. Neonate’s birth weight by category (< 1 kg, 1.0 to < 1.5 kg, 1.5 to  4 kg) and sex were also included in both models. Weight was treated as a categorical predictor rather than being continuous, despite categorisation likely causing information loss, based on a priori clinical consensus [9, 10]. Detailed descriptions and arguments for the selection of these variables are provided in the derivation study [13] and in Additional file 1: Table S2 and Additional file 1: Table S3. The proportion of predictor missingness is consistent with previous work in Kenyan hospitals [22]. Sample size guidance for external validation of prediction models suggests a minimum recommended 100 events and 100 non-events for validation studies [23]. For SENSS and NETS models, there were 7486/53,909 (13.89%) and 6482/45,090 (14.38%) events (deaths), respectively, with 46,358/53,909 (85.99%) and 38,576/45,090 (85.55%) non-events (survived), respectively. Based on an outcome prevalence of 508/5427 (9.36%) and 447/4840 (9.24%) for the SENSS and NETS derivation datasets, respectively; 10 predictor parameters; and R-squared values of 0.453 and 0.380, using the pmsampsize library in R, the required sample sizes required for SENSS and NETS model updating were 323 and 341 patients with 31 and 32 deaths, respectively [24]. There were 7486 (from 53,909 patients) and 6482 deaths (from 45,090 patients) observed for SENSS and NETS models, respectively, which exceeds the required sample sizes [24, 25]. Predictor missingness in the SENSS external validation dataset (Additional file 1: Table S4) ranged from 1.19% (sex) to 14.63% (floppy/inability to suck). The derivation model assumed a missing at random (MAR) mechanism for the observed missingness and performed multiple imputation using the chained equation (MICE) approach [26]. Therefore, for external validation before updating, the same mechanism was assumed. Similar to the derivation study, mode of delivery, outborn, Apgar score at 5 min, HIV exposure, and outcome were used as auxiliary variables in the imputation process [13, 27]. Consistent with the NETS model derivation approach, 8819 (16.36%) observations in the external dataset with missing treatment sheets in the patient files were excluded, leaving 45,090 observations with 6482 (14.38%) in-hospital deaths. Multiple imputation was considered inappropriate and therefore not done for NETS where the entire treatment sheets were missing (i.e. no information on any of the treatment predictors was available) because individual missing treatment data was judged to be systematically missing due to the factors not reflected in the dataset. Therefore, all patients with no treatment sheets (8819/53,909 in the external dataset and 2238/8848 in the model updating dataset) and those missing data in any treatment variable in the resultant NETS dataset (9440/45,090 in the NETS external dataset and 941/6610 in the NETS model updating dataset) were dropped from NETS model analyses (Additional file 1: Table S5). Consequently, NETS analyses were complete case analyses based on the missingness of the patient’s sex and birth weight. The overall recommended process of predictive modelling is well articulated in the scientific literature [9, 14, 17, 24, 28]. To externally validate the performance of the original SENSS and NETS models, these models were applied to the CIN dataset from 15 hospitals (geographical external validation). For external validation before updating (i.e. objective one), external validation was done by applying the model coefficients obtained at the model derivation stage to the external validation data [13]. The models and coefficients are presented in Table ​Table1.1. The SENSS model was fit on each of the 33 imputed datasets (based on 33% of observations missing at least one variable [29]) with parameter estimates combined using Rubin’s rule [30]. Logistic regression models for NETS and SENSS from derivation study For each variable, the presence of the indicator takes a value of 1, and the absence takes a value of 0. The coefficients are summated to give the linear predictor, which is then converted to the predicted probability of in-hospital mortality [13] ELBW Extremely low birth weight, LBW Low birth weight, LP Linear predictor, NETS Neonatal Essential Treatment Score, SENSS Score of Essential Neonatal Symptoms and Signs, VLBW Very low birth weight Model calibration was assessed by both plotting the predicted probability of in-hospital death against the observed proportion and calculating the calibration slope and calibration-in-the-large [16]. Discrimination was assessed by the c-statistic (equivalent to the area under the receiver operating curve) [23, 28]. The confidence intervals for both c-statistic and calibration slope and intercept were calculated through bootstrapping (i.e. iterative sampling with replacement). Additionally, to facilitate a comparison of SENSS and NETS model performance to the Neonatal Mortality Rate (NMR)-2000 [12] score findings, we also report the Brier score which reflects the combined model discrimination and calibration. These metrics are briefly described in Table ​Table22 and explained in detail elsewhere [31]. Measures for model’s performance assessment (definitions adapted from Riley et al. [31] ) For objective 2 (i.e. model updating), given that simple recalibration did not resolve poor model performance (Additional file 2 [21, 32]), we refit the SENSS and NETS models and re-estimated the coefficients while applying regularisation (a technique for reducing model overfitting) using data from the 16th hospital (i.e. the models’ derivation study site). Model overfitting is when the model fits too closely to the training dataset making it unable to generalise well to new datasets. We used elastic net regularisation which combines L1 regularisation (introduces sparsity by shrinking the less important covariates’ coefficients towards zero) and L2 regularisation (minimises biassed estimates due to highly correlated independent variables) [33]. Also, to minimise model overfitting from the selection of elastic-net tunning parameters, we applied tenfold internal cross-validation repeated 20 times [34]. Cross-validation is a re-sampling procedure where the model development dataset is randomly split into an equally sized number of partitions (i.e. folds) and one of the random partitions is left out during model fitting for use as the internal validation dataset, with the model then built on the remaining portion of development dataset, and predictive performance evaluated on the left-out partition. This process is repeated with each iteration using a different partition as the validation data source. It could also include optional extra iterations to repeat the random splitting of the development dataset which would generate different folds [34]. The goal of cross-validation is assessing how accurately a predictive model might perform in practice given, for example, the different elastic net thresholds used during model fitting (i.e. thereby aiding the selection of the most optimum model hyperparameters such as regularisation parameters) [34]. The SENSS and the NETS models were fit on data collected between August 2016 and December 2020 collected from the 16th hospital. The updated SENSS and NETS model performance was evaluated on data from the other 15 hospitals (Additional file 1: Table S6). All cases included in the NETS model are a subset of those included in the SENSS model but with a treatment sheet present. Given the models are developed independently of each other, there is no substantive implication on the interpretation of findings. We provide explanations of the meaning and significance of the different datasets in Additional file 1: Table S1. To examine the heterogeneity in model performance, we compared the updated models’ internal–external cross-validation performance where we omitted one hospital at a time using it as the validation dataset, built the model on the remaining hospitals, and evaluated the model’s discrimination and calibration performance on the hospital left out. We repeated this process with each iteration using a different hospital as the validation data source [35].

The innovation described in the provided text is the external validation and updating of neonatal mortality prediction models in high-mortality settings. These models, called the Neonatal Essential Treatment Score (NETS) and the Score for Essential Neonatal Symptoms and Signs (SENSS), were initially developed using data from a Kenyan maternity hospital. In this study, the models were externally validated using data from 16 Kenyan hospitals, and their predictive accuracy for in-hospital neonatal mortality was evaluated. The models were found to be overestimating mortality risks at initial validation, but after updating, their calibration improved. The updated models showed good discrimination and better calibration performance compared to other existing neonatal in-hospital mortality prediction models. This innovation allows for the prediction of in-hospital neonatal mortality using routine neonatal data from low-resource hospital settings, which can be valuable for case-mix adjustment and improving maternal health access.
AI Innovations Description
The recommendation from this study is to develop and validate prediction models for neonatal mortality in high-mortality, low-resource settings. These models, such as the Neonatal Essential Treatment Score (NETS) and the Score for Essential Neonatal Symptoms and Signs (SENSS), utilize data that is more likely to be available in these settings, such as treatments prescribed at admission and basic clinical signs.

To improve access to maternal health, these prediction models can be used to identify high-risk neonates and allocate resources accordingly. By accurately predicting in-hospital mortality, healthcare providers can prioritize care for those at greatest risk and ensure that appropriate interventions are provided. This can help reduce neonatal mortality rates and improve overall maternal health outcomes.

It is important to note that these prediction models should be externally validated and updated using data from multiple locations to ensure their accuracy and applicability in different settings. This process allows for model improvement and better performance, making them valuable tools for case-mix adjustment and resource allocation in similar hospital settings.

Overall, developing and utilizing prediction models for neonatal mortality can contribute to improving access to maternal health by identifying high-risk neonates and providing targeted interventions and resources.
AI Innovations Methodology
The study you provided focuses on the external validation of neonatal mortality prediction models in high-mortality settings. While the study does not directly address innovations to improve access to maternal health, I can provide some potential recommendations based on the information provided.

1. Strengthening Health Information Systems: Implementing robust health information systems that capture accurate and comprehensive data on maternal health can help identify areas of improvement and monitor progress. This can include the use of electronic medical records, data collection tools, and standardized reporting mechanisms.

2. Telemedicine and Telehealth: Utilizing telemedicine and telehealth technologies can improve access to maternal health services, especially in remote or underserved areas. This can involve virtual consultations, remote monitoring of maternal health indicators, and tele-education for healthcare providers.

3. Mobile Health (mHealth) Solutions: Leveraging mobile technology can enhance access to maternal health information and services. This can include mobile apps for tracking pregnancy progress, sending reminders for prenatal care appointments, and providing educational resources.

4. Community-based Interventions: Implementing community-based interventions can improve access to maternal health services, particularly in areas with limited healthcare infrastructure. This can involve training community health workers, establishing mobile clinics, and conducting outreach programs to raise awareness about maternal health.

To simulate the impact of these recommendations on improving access to maternal health, a methodology could be developed using the following steps:

1. Define the Objectives: Clearly define the objectives of the simulation, such as assessing the potential impact of the recommendations on maternal health outcomes or evaluating the cost-effectiveness of the interventions.

2. Identify Key Variables: Identify the key variables that would be affected by the recommendations, such as the number of women accessing prenatal care, the rate of maternal mortality, or the cost of implementing the interventions.

3. Collect Baseline Data: Gather baseline data on the current state of maternal health in the target population, including relevant indicators and outcomes. This can involve reviewing existing data sources, conducting surveys, or analyzing health records.

4. Develop a Simulation Model: Develop a simulation model that incorporates the identified variables and their relationships. This can be done using statistical software or specialized simulation tools. The model should reflect the population characteristics, healthcare system infrastructure, and the potential impact of the recommendations.

5. Validate the Model: Validate the simulation model by comparing its outputs to real-world data or expert opinions. This can help ensure the accuracy and reliability of the model.

6. Implement Scenarios: Implement different scenarios within the simulation model to assess the impact of the recommendations. This can involve adjusting variables related to the recommendations, such as the coverage of telemedicine services or the number of community health workers deployed.

7. Analyze Results: Analyze the results of the simulation to evaluate the potential impact of the recommendations on improving access to maternal health. This can include assessing changes in maternal health outcomes, cost-effectiveness, or other relevant indicators.

8. Interpret and Communicate Findings: Interpret the findings of the simulation and communicate them to stakeholders, such as policymakers, healthcare providers, and community members. This can help inform decision-making and guide the implementation of interventions to improve access to maternal health.

It is important to note that the methodology for simulating the impact of recommendations on improving access to maternal health may vary depending on the specific context and available data.

Share this:
Facebook
Twitter
LinkedIn
WhatsApp
Email