Background: It is well known that safe delivery in a health facility reduces the risks of maternal and infant mortality resulting from perinatal complications. What is less understood are the factors associated with safe delivery practices. We investigate factors influencing health facility delivery practices while adjusting for multiple other factors simultaneously, spatial heterogeneity, and trends over time. Methods: We fitted a logistic regression model to Lot Quality Assurance Sampling (LQAS) data from Uganda in a framework that considered individual-level covariates, geographical features, and variations over five time points. We accounted for all two-covariate interactions and all three-covariate interactions for which two of the covariates already had a significant interaction, were able to quantify uncertainty in outputs using computationally intensive cluster bootstrap methods, and displayed outputs using a geographical information system. Finally, we investigated what information could be predicted about districts at future time-points, before the next LQAS survey is carried out. To do this, we applied the model to project a confidence interval for the district level coverage of health facility delivery at future time points, by using the lower and upper end values of known demographics to construct a confidence range for the prediction and define priority groups. Results: We show that ease of access, maternal age and education are strongly associated with delivery in a health facility; after accounting for this, there remains a significant trend towards greater uptake over time. We use this model together with known demographics to formulate a nascent early warning system that identifies candidate districts expected to have low prevalence of facility-based delivery in the immediate future. Conclusions: Our results support the hypothesis that increased development, particularly related to education and access to health facilities, will act to increase facility-based deliveries, a factor associated with reducing perinatal associated mortality. We provide a statistical method for using inexpensive and routinely collected monitoring and evaluation data to answer complex epidemiology and public health questions in a resource-poor setting. We produced a model based on this data that explained the spatial distribution of facility-based delivery in Uganda. Finally, we used this model to make a prediction about the future priority of districts that was validated by monitoring and evaluation data collected in the next year.
The study was conducted by the USAID STAR E-LQAS project, which is implemented Management Sciences for Health with Liverpool School of Public Health as a technical partner for LQAS. Trained district health managers collected data from individuals with household surveys conducted in 19–64 districts of Uganda at seven points in time during 2003–2012, using the Lot Quality Assurance Sampling (LQAS) methodology [35]. The surveys were financed by the World Bank and USAID [36] with questions adapted from accepted sources such as the Uganda Demographic Surveys. The District Health Management Team divided each district into 4–6 administrative subdistrict strata called supervision areas (SA) and selected 19 mothers of children 0–11 months (or 24 if 4 SAs) randomly from each SA. The SA sample size was selected so that when subdistrict data (the SA) are aggregated, the resulting district-level coverage proportion estimates for key indicators are calculated with a 95 % confidence interval not exceeding ±10 %. Villages were selected using probability proportional to size (PPS) sampling, wherein a comprehensive village population list supplied by each district was the sampling frame used to select villages from which the individual samples are taken. There was on average 88 villages in the sampling frame of each SA. PPS sampling ensures that sample villages are selected based upon their proportional representation of the entire population. Usually a sample of 19 villages was identified, sometimes less if some villages had a large population size relative to others in the same SA. Individual respondents were then randomly selected from the PPS-selected villages using a randomizing technique [35]. The main approached used was segmentation sampling. Segmentation was recommended as it was found to be a more rigorous second-stage sampling technique [37] and is now advocated in several survey guidelines [38–40]. District Health Officers also requested a second approach be offered, namely, simple random sampling from an updated village listing of households. The latter was recommended only in cases in which a recently updated list existed and could be verified. With either approach once a reference house was selected the next closes house was selected for interview. This addition reduced the chance of a house having a zero probability of selection. The former approach was recommended in the trainings and used most frequently. Table 1 shows the number of districts in each Ugandan region that were surveyed in each year and the number of mothers interviewed in those regions. A total of 18,471 randomly selected mothers of children aged 0–11 months were interviewed, the inclusion criterion being that mothers had have been present in the village at least 3-months prior to the survey. Each maternal questionnaire included demographic characteristics and various health-related behaviours. Respondents with missing or erroneous responses were removed, leaving a total of 18,098 (98 %) records with complete information. These data were integrated into a superset, and in this study we analysed mothers’ responses to the question “Where did you give birth?”, their age at the time of the survey (in years) and their education level (none, primary, secondary, post-secondary). Uganda LQAS data reliability studies are available for review [41, 42]. Number of districts and mothers surveyed within each region of Uganda for each survey year We obtained district-level data from a variety of sources, including geospatial road and population data from 2009 [34] and 2010 Geographical Information System (GIS) locations of health centres. We calculated the number of health facilities per capita (per 100,000 inhabitants) based on the number of health facilities with in-patient beds (level III and above), since mothers are referred to these higher-level facilities for FBD. Household assets data from DHS 2011 [20] were used to stratify responses by economic quintiles. Altitude data was obtained from the US Geological Survey [43]. Our analysis consists of 3 phases: FBD mapping, model construction, and prediction of priority districts and population strata in them. Phases 1–2 used the 2003–2011 data, while phase 3 also included the 2012 data. All analysis was done using the statistical software R version 2.15 [44]; we used the R-package ‘maptools’ [45] to construct the maps. We classified mothers as giving birth either at home or in a health facility and plotted on a map the percentage of mothers with FBD for each district surveyed. One map was produced for each cluster of survey years: 2003–2004, 2006, 2009–2010, and 2011. Survey years were combined so that a similar number of surveyed districts were included in each map. We calculated 95 % confidence intervals (CI) using clustered bootstrapping [46], a non-parametric error estimation method which takes into account residual spatial correlation of the indicator (See Appendix 1 for a detailed description of how the maps and confidence intervals were constructed). We use a clustered bootstrap because it accounts for the fact that the survey samples were clustered within supervision areas. The total population size of each supervision area was not available so this analysis gives an equal weighting for each supervision area. Using all 2003–2011 data, we fitted a logistic regression model to investigate factors simultaneously associated with FBD. The individual-level factors included in the model were age, education and the year that the mother was surveyed. We also included district-level covariates: each mother was assigned a value for the number of health facilities per capita, population density, road density, wealth index, and mean and standard deviation of the altitude of her district. Mothers were also assigned a categorical variable specifying whether or not they lived in Kampala, to correct for the fact that Kampala had extremely different district-level covariates to all other districts and should therefore be considered separately. Covariates with significant nonlinearity were base-2-log-transformed before being incorporated into the model (see “Appendix 2” for the reasoning). All covariates were included as continuous variables, except for education, which was categorical. We used forward selection based [47] on the Akaike Information Criterion (AIC) to include interaction terms between the covariates if they improved the model. This is one of the standard procedure for model selection. Tables 2 and and33 display information about each covariate: the distribution of ages and educational categories for the mothers, and the average values and range of the district-level covariates calculated over all 112 districts in Uganda. Characteristics of individual-level covariates (sample sizes) Characteristics of district-level covariates, over all the districts surveyed As a first stage to validate our selected model, we compared it to a null spatial model, for which the probability of FBD for a mother is predicted to be the average value for her district. This null model represents a situation where the differences between the indicators in each district are not captured by any covariates and are assumed to be random. The model with the lowest AIC is the better construct. As a second stage of model validation, we constructed a Receiver Operational Characteristic (ROC) curve. The ROC curve plots the relationship between the true positive rate (the probability that a true outcome is correctly predicted to be true) and the false positive rate (the probability that a false outcome is predicted to be true) for different classification cutoffs. The accuracy can be summarised by the area under the ROC curve (AUC). An AUC of 1.0 indicates a perfect prediction: all datapoints were correctly classified. An AUC of 0.5 indicates a random test, which allocates positive outcomes at random half of the time [48]. As a third stage, we used two-fold Monte Carlo cross-validation [49] to estimate the prediction error for unseen data; the model was repeatedly fitted to a randomly chosen half of the 2003–2011 data and then used to predict the FBD values of the other half. For each iteration, we calculated the squared error between the observed and predicted district-level FBD indicator, and took the mean over all 1000 iterations. The square-root of the resulting mean squared error defines a prediction error for each district with the same units as the original indicator, and thus is a standard estimate of the absolute difference between the prediction and the indicator. Our selected model gave an estimate of the odds ratio (OR) for FBD for each covariate. For our model, the OR for a covariate is the ratio between the odds of FBD for two mothers, both of whom, for the covariate being examined, have all other covariates set to their average values. If the covariate is categorical, such as education level, then the ratio is between each level and the lowest level, which, in this example, is ‘no formal education’. If a base-2-log-transformed covariate was used in the model, then the ratio is between the odds calculated for the covariate and double the covariate. For the other continuous covariates, the ratio is between the odds calculated for the covariate and the covariate plus a unit increase. The OR therefore provides an estimate for how strongly each covariate is associated with the odds of FBD. Finally, we used the model to classify unsurveyed districts into ‘priority’ groups to flag districts predicted to have particularly low indicator values. Since we do not know the distribution of age and education in these unsurveyed districts, we decided to predict an upper and lower limit of a range of values for the indicator in each district rather than an average value. We chose the values for age and education most strongly associated with FBD and then the values with the strongest negative association, and then we used the model to predict the probability of FBD for a mother with her age and education set to these values and the survey year set to 2012. To obtain an estimate for the upper limit for the indicator in each district, we applied the model to the most strongly associated age and education values. To account for any uncertainty in the model parameters we took the upper part of 95 % CI obtained from the model with bootstrap clustering as a conservative estimate of the upper limit. For the lower limit, the same procedure was carried out with the negatively associated values and taking the lower part of the 95 % CI. The lower and upper limit define the predicted range for each district. The priority groups were assigned on the basis of these limits. The low-priority group, defined as districts with lower limits between 50–100 % FBD and upper limits between 80–100 % FBD, contained districts that were likely to have high indicator values. The mild-priority group, defined as districts with lower limits between 0–30 % and upper limits between 60–80 %, contained districts likely to have fairly low indicator values. The high-priority group, defined as districts with lower limits between 0–30 % and upper limits between 30–60 %, contained districts likely to have very low indicator values. All other scenarios were classified as an unclear-priority group. We then validated the projections by checking that the 2012 values lay within their predicted ranges.