Background Existence of inequalities in quality and access to healthcare services at subnational levels has been identified despite a decline in maternal and perinatal mortality rates at national levels, leading to the need to investigate such conditions using geographical analysis. The need to assess the accuracy of global demographic distribution datasets at all subnational levels arises from the current emphasis on subnational monitoring of maternal and perinatal health progress, by the new targets stated in the Sustainable Development Goals. Methods The analysis involved comparison of four models generated using Worldpop methods, incorporating region-specific input data, as measured through the Community Level Intervention for Pre-eclampsia (CLIP) project. Normalised root mean square error was used to determine and compare the models’ prediction errors at different administrative unit levels. Results The models’ prediction errors are lower at higher administrative unit levels. All datasets showed the same pattern for both the live birth and pregnancy estimates. The effect of improving spatial resolution and accuracy of input data was more prominent at higher administrative unit levels. Conclusion The validation successfully highlighted the impact of spatial resolution and accuracy of maternal and perinatal health data in modelling estimates of pregnancies and live births. There is a need for more data collection techniques that conduct comprehensive censuses like the CLIP project. It is also imperative for such projects to take advantage of the power of mapping tools at their disposal to fill the gaps in the availability of datasets for populated areas.
Figure 1 shows the study sites in southern Mozambique. Data were collected in parts of the two provinces of Gaza and Maputo. The administrative unit divisions shown in the insert are the neighbourhood units (referred to as admin 5 units in this paper). The CLIP study represents a household census of all households in 12 villages with WRA (12–49 years) conducted from March to October 2014 in Maputo and Gaza provinces of southern Mozambique. The regions had to contain a minimum population of 25 000 inhabitants that would result in at least one maternal death per year as per data from the 2007 national census.33 34 The inclusion criterion for the WRA was having lived in the household for more than 30 days prior to the date of the census and having the intention to live in the household as a permanent resident for at least 6 months following the census.33 A total of 50 493 households and 80 483 WRA (mean age 26.9 years) were surveyed. Admin 5 level data for age-specific number of WRA, pregnancies and live births and GPS coordinates of the households with WRA were collected as part of the baseline work for the CLIP trial.33 Admin 5 boundaries were generated by creating Thiessen polygons around GPS points with the same neighbourhood name. Higher level administrative boundaries (admins 4, 3, 2 and 1) were then derived from these lower level data and the corresponding age structure data (http://www.ine.gov.mz/estatisticas/estatisticas-demograficas-e-indicadores-sociais/populacao/relatorio-de-indicadores-distritais-2007) joined to each layer. To the authors’ knowledge, the CLIP data on pregnancies and live births is the most granular dataset there is in this region of Mozambique. We also anticipate that due to the rigorous attempts to identify all WRA, by visiting all households in the study area, the data are likely the most accurate representation of pregnancies and livebirths in the study area, hence the choice to use the data as part of data creation and comparison processes. Study sites, Maputo and Gaza provinces in Southern Mozambique. Two models of live births and pregnancies were created, using admin 5 level data and the other using admin 3 level data. Births and pregnancy datasets were generated using Worldpop methods highlighted in James et al,35 with the addition of region-specific data as obtained through the CLIP project, including ASFRs, births-to-pregnancy ratios and number of births, pregnancies and WRA. Spreadsheets of ASFRs for admin 3 and admin 5 were generated by dividing age-specific births by age-specific WRA, while the pregnancy-to-birth multiplier was created for the study region by dividing the total number of pregnancies by total births for each admin 5 unit (and admin 3) and averaging the multipliers to get a value for the whole region. The Worldpop adjusted 2010–2015 population dataset36 was clipped to the extent of the study region and used in the generation of the age-specific WRA raster layers. These region-specific births and pregnancy datasets were created at varying spatial scales to determine the effect of input spatial resolution on model performance. To eliminate the error introduced by inaccurate census data, the births raster dataset was adjusted by multiplying it by the CLIP births raster at each admin 5. This step ensured the error in the adjusted births dataset would be due to disaggregation only. The three datasets used to create the WRA dataset were created using census data, which as stated above, can be inaccurate. The ASFR dataset used is the CLIP dataset, hence the dataset that needs adjusting is the WRA dataset, which can be adjusted by adjusting the births dataset. Adjusting this dataset was a method used to eliminate the error due to inaccurate input census data. The adjustment factor was computed using the formula below: The adjusted births dataset becomes: This was possible because the ASFR values used to create the dataset were computed from the CLIP data, meaning that adjusting the dataset using the number of births at each admin 5 unit resulted in adjusting the WRA computed using the age structure data and the Worldpop population dataset. This meant that the error in the resulting dataset was due to disaggregation. The process of recreating the datasets is shown in figure 2. Data generation process for model comparison. CLIP, Community Level Intervention for Pre-eclampsia. The analysis involved comparison of four models: (1) CLIP model only (thematic maps with corresponding values for live births and pregnancies generated from the household survey); (2) admin 5 Worldpop-CLIP model (Worldpop methods incorporating region-specific input data at admin 5 level, as measured through the CLIP project); (3) admin 3 Worldpop-CLIP model (Worldpop methods incorporating region-specific input data at admin 3 level, as measured through the CLIP project) and (4) Worldpop-only model, using standardised input data as published through the Worldpop project.29 To quantify the impact of the model performance on actual births/pregnancy estimates, we converted the Worldpop model outputs to centroid points of the 1 km grids and joined them to admin 5 polygons, by summing the values of the centroid points falling within each polygon, to generate admin 5 polygons with the corresponding values of estimates of live births. This resulted in a thematic map of estimated live births and pregnancies, aggregated to admin 5 level. The CLIP values of births and pregnancies in the excel sheet were also joined to the polygon, resulting in a layer with the following attributes: Name of admin 5-unit, Model 1 (CLIP only) births, Model 1 (CLIP only) pregnancies, Model 2 (Admin 5 Worldpop-CLIP) births. Model 2 (Admin 5 Worldpop-CLIP) pregnancies, Model 3 (Admin 3 Worldpop-CLIP) births, Model 3 (Admin 3 Worldpop-CLIP) pregnancies, Model 4 (Worldpop), births and Model 4 (Worldpop) pregnancies. For these analyses, we compared modelled birth outputs, as pregnancy outputs are dependent on birth estimates. These polygons were dissolved into admin 4 level polygons, creating a map of localities with the corresponding births and pregnancy values of each admin 4 unit for all models. The same was done to create a map of admin 3 units with corresponding values of live births. The process is shown in figure 3. Data preparation process for validation. CLIP, Community Level Intervention for Pre-eclampsia. To compare model prediction errors, we computed the root mean square error (RMSE) across the three administrative unit levels. To enable cross dataset and administrative unit comparison of the prediction errors, the normalised root mean square error (NRMSE) was used. The formulae for both error statistics is shown below: where ei is the difference between the ith observed (O) and predicted (P) value (Pi-Oi) and n is the number of units. where O− is the mean of the observed values. To determine the impact of input data on model performance, we calculated the difference in NRMSE between model 4 and models 2 and 3. The percentage decrease in prediction error was calculated by dividing the differences by the NRMSE of model 4 at different administrative unit levels and expressing it as a percentage. To quantify the contribution of spatial resolution to the prediction error (expressed as a percentage), the differences in percentage error decrease between models 2 and 3 were averaged. This average percentage value was translated as the proportion of the prediction error due to spatial resolution of input data. Each head of the household and WRA who participated provided informed consent and this was confirmed by their signature or fingerprint prior to data collection.33
N/A