Service environment link and false discovery rate correction: Methodological considerations in population and health facility surveys

listen audio

Study Justification:
This study focuses on the methodological considerations in population and health facility surveys, specifically in analyzing maternal health service data in Ethiopia. The study aims to explore the issues associated with geographic data linkage and the spatial and multilevel analyses that can be used to analyze maternal health service use. By examining healthcare problems spatially and hierarchically, this study can assist in efficient resource allocation, monitoring and evaluating service efficacy, and identifying areas that need special emphasis for intervention purposes.
Highlights:
– The study uses data from the 2016 Ethiopia Demographic and Health Survey and the 2014 Ethiopia Service Provision Assessment to analyze maternal health service use.
– Two geographic data linking methods, administrative boundary link and Euclidean buffer link, are used to link health facility data with population survey data.
– The study compares the two linking methods and identifies their strengths and limitations.
– The use of a False Discovery Rate correction enables the identification of true spatial clusters of maternal health service use.
– The study emphasizes the importance of examining maternal health service use both spatially and hierarchically for effective intervention strategies.
Recommendations:
– The study recommends using a service environment link to minimize methodological issues associated with geographic data linkage.
– The use of a False Discovery Rate correction is recommended to account for multiple and dependent testing in local spatial statistics.
– The study highlights the importance of spatial and multilevel analyses in identifying geographic areas that need special emphasis for intervention purposes.
Key Role Players:
– Researchers and analysts familiar with geographic data linkage and spatial analysis methods.
– Public health officials and policymakers responsible for resource allocation and intervention strategies.
– Data managers and statisticians skilled in data linking and analysis.
Cost Items for Planning Recommendations:
– Training and capacity building for researchers and analysts in geographic data linkage and spatial analysis methods.
– Data management and analysis software, such as ArcGIS and SAS.
– Time and resources for data collection and linking.
– Personnel costs for researchers, analysts, and data managers.
– Costs associated with dissemination of study findings, such as conference presentations and publication fees.

The strength of evidence for this abstract is 8 out of 10.
The evidence in the abstract is strong because it provides detailed information about the methods used, the data sources, and the statistical analyses conducted. The abstract clearly explains the purpose of the study and the findings. However, to improve the evidence, the abstract could include more information about the sample size and the specific results of the statistical analyses. Additionally, it would be helpful to provide information about the limitations of the study and suggestions for future research.

Background Geospatial data are important in monitoring many aspects of healthcare development. Geographically linking health facility data with population data is an important area of public health research. Examining healthcare problems spatially and hierarchically assists with efficient resource allocation and the monitoring and evaluation of service efficacy at different levels. This paper explored methodological issues associated with geographic data linkage, and the spatial and multilevel analyses that could be considered in analysing maternal health service data. Methods The 2016 Ethiopia Demographic and Health Survey and the 2014 Ethiopia Service Provision Assessment data were used. Two geographic data linking methods were used to link these two datasets. Administrative boundary link was used to link a sample of health facilities data with population survey data for analysing three areas of maternal health service use. Euclidean buffer link was used for a census of hospitals to analyse caesarean delivery use in Ethiopia. The Global Moran’s I and the Getis-Ord Gi* statistics need to be carried out for identifying hot spots of maternal health service use in ArcGIS software. In addition to this, since the two datasets contain hierarchical data, a multilevel analysis was carried out to identify key determinants of maternal health service use in Ethiopia. Results Administrative boundary link gave more types of health facilities and more maternal health services as compared to the Euclidean buffer link. Administrative boundary link is the method of choice in case of sampled health facilities. However, for a census of health facilities, the Euclidean buffer link is the appropriate choice as this provides cluster level service environment estimates, which the administrative boundary link does not. Applying a False Discovery Rate correction enables the identification of true spatial clusters of maternal health service use. Conclusions A service environment link minimizes the methodological issues associated with geographic data linkage. A False Discovery Rate correction needs to be used to account for multiple and dependent testing while carrying out local spatial statistics. Examining maternal health service use both spatially and hierarchically has tremendous importance for identifying geographic areas that need special emphasis and for intervention purposes.

Data from the Democratic Republic of Ethiopia were used for this analysis. The DHS and SPA surveys, which were conducted within a 19-month window, were used. Geographic coordinates were available for both datasets. The 2016 EDHS used the 2007 Ethiopian Population and Housing Census sampling frame. The census frame has a list of 84,915 Enumeration Areas (EAs) that were prepared for the 2007 national census [31]. In general terms, an EA is a geographic location that has an average of 181 households. Each sampling frame has information on EA location, residence (rural or urban) and the estimated number of households. The 2016 EDHS survey was a cross-sectional household study; it is the main source of data on population healthcare utilization. The survey used a stratified sampling procedure in two stages. Urban and rural area stratification was made for each region which yielded 21 sampling strata [31]. At stage one, 645 EAs (202 versus 443 in urban and rural areas, respectively) were sampled using a probability proportional to enumeration size. Before the actual data collection, a list of households was made in the sampled EAs. At stage two, households were selected using a systematic sampling technique from the list of households in each of the EAs. A fixed number of 28 households were sampled per EA using an equal probability allocation. All women aged 15–49 years were eligible for individual interviews. A total of 15,683 women of reproductive age were interviewed out of the identified 16,583 eligible women [31]. The 2014 ESPA+ survey was a health facility-based cross-sectional study, and is the main source of data on the availability of health services. This survey used a list of 23,102 formal health facilities operating in the country. The list was obtained from the Federal Ministry of Health. Two hundred and two hospitals, 3,292 health centres, 15,618 health posts and 3,990 clinics (higher, medium and lower clinics) were included in the list. These facilities were managed by the government, private for profit and non-governmental organizations. A combination of census and simple random sampling techniques were used to select health facilities [32]. Because of their importance and limited numbers, all hospitals, with the inclusion of all newly identified hospitals, were included in the survey. However, a representative sample of health centres and clinics was selected from a master health facility list. Health posts were selected independently. In total, 1,327 health facilities, which includes 321 health posts and 10 newly identified hospitals, were included in the survey. Due to various reasons (security issues in Somali region, inability to obtain consent at military hospitals, and duplicate facility names), data were collected from 1,165 health facilities representing 88% of sampled facilities [32]. The EDHS provides data on utilization of health services as well as respondents’ socio-demographic characteristics, while the ESPA+ survey provides information on service availability and facilities’ readiness to provide services. The geographic coordinates and region identification codes collected in both surveys were used to link each DHS cluster and SPA facility score. Clusters and health facilities with missing geographic coordinates were excluded. The administrative polygons of Ethiopia, which were obtained from Natural Earth [33], were also used. Two geographic linking methods were used for directly linking clusters with health facilities: administrative boundary link and Euclidean buffer link. In Ethiopia, family planning, antenatal and delivery care services are being provided at all levels of health facilities, such as at health posts, clinics, health centres and hospitals. However, with the exception of a census of hospitals, the SPA survey collected these data from sampled health facilities. In this case, using geographic linking method to link sampled health facilities with DHS clusters is challenging. For instance, geographic linking based on the nearest sampled health facility would be problematic as the nearest health facility to each DHS cluster might not be included in the SPA survey, which could result in misclassification error. On this occasion, an administrative boundary link is an appropriate choice for directly linking sampled health facility data with DHS survey data [2, 3]. This method links all DHS clusters with all health facilities found within the respective administrative boundaries. In this study, city administrations and administrative regions of Ethiopia were used as administrative boundary link. This data linking approach was used for contraceptive, antenatal care and health facility delivery data analysis. As a result, this method did not miss any health facility that falls within the respective administrative boundaries. In Ethiopia, caesarean delivery is being provided at emergency obstetric care (EmOC) facilities. The SPA survey collected data from all hospitals. In this case, geographically linking census health facilities with DHS clusters provides a good picture of service environment at cluster level. Euclidean buffer link is used for linking a census of health facilities data with population survey data [2, 3]. This method links all EDHS clusters with all hospitals found within a defined buffer distance; in this case, the closest hospital to each cluster was linked. This approach was used for caesarean delivery analysis. This method avoids an unnecessary merging of health facilities that can result in loss of information at cluster level, which is the shortcoming of administrative boundary link. The links between SPA facilities and DHS clusters were defined by creating healthcare service environment variables. In this analysis, maternal health facilities are defined as any healthcare facility providing family planning, antenatal care, and basic and comprehensive obstetric care services. The following four health service environment variable scores, taken from the SPA survey, were created: average distance to the nearest maternal health facilities, maternal health service availability score, readiness to provide maternal health services score, and a general health facilities readiness score. The maternal health indices (family planning, antenatal care, basic obstetric care and comprehensive obstetric care indices) were created using the World Health Organization’s ‘Service Availability and Readiness Indicators’ [34, 35]. Average distance to the nearest maternal health facility was calculated after linking each DHS cluster with the SPA facilities. PROC SQL was used to link the two data sets using their geographic coordinates in SAS. Since the SPA facilities, except a census of hospitals, were sampled, taking the nearest health facility to each cluster would have been problematic. In the ESPA survey, for instance, the nearest health facility to every EDHS cluster might not have been included. Therefore, to have a representative distance across the nine regions and the two city administrations, regional average distances were calculated for contraceptive, antenatal care and health facility delivery data analysis. On the other hand, since all the hospitals were included in the SPA survey, the nearest hospital providing caesarean delivery in each cluster was used for caesarean delivery analysis. A principal component analysis was used to compute all service availability and readiness scores for health facilities. General service readiness for all health facilities was computed. Six general service readiness dimensions were used for caesarean delivery, eight for family planning, and nine for both antenatal care and health facility delivery services. For each maternal health service, the first two principal components (health facility management system and infrastructure) in the principal component analysis were used to compute two general service readiness scores. With regard to service specific scores, for those health facilities which reported as providing family planning services, indices of family planning availability and readiness were created. Family planning availability scores were created using seven variables (combined oral contraceptive pills, progestin-only contraceptives pills, progestin-only injectable contraceptives, intrauterine device, emergency contraceptive pills, male sterilization, and female sterilization). For each indicator, in order to measure the availability of family planning services, health facilities were given one point for services available and zero for unavailable services. Thus, two family planning availability scores (long acting and short-term contraceptives methods) were created using the principal component analysis. Similarly, a family planning service readiness score was computed using seven dichotomous variables (family planning training, family planning checklists and/ or job-aids, combined oral contraceptive pills, progestin-only injectable contraceptives, intrauterine device, implants, and emergency contraceptive pills). The principal component analysis resulted in two family planning readiness scores (readiness to provide long acting and short-term contraceptives) that were used to measure a facility’s readiness to provide family planning services. For antenatal care providing facilities, indices of antenatal care availability and readiness were created. One antenatal care availability score (antenatal care supplements) was created using four variables (iron and folic acid supplements, tetanus toxoid vaccination, and a combination of iron and folic acid supplements). Similarly, an antenatal care service readiness score was computed using six dichotomous variables (ANC guideline, ANC checklists and/or job aids, staff trained in ANC service, urine dipstick/protein and haemoglobin test, and tetanus toxoid vaccine). The principal component analysis resulted in two antenatal care readiness scores (readiness to provide diagnostic services and skilled care), which were used to measure a facility’s readiness to provide antenatal care services. Furthermore, for those health facilities reported as providing basic obstetric care services, indices of basic obstetric care availability and readiness were created. Basic obstetric care availability score was created using seven variables (parenteral administration of antibiotics, parenteral administration of uterotonic drugs, parenteral administration of anticonvulsants, assisted vaginal delivery, manual removal of placenta, manual removal of retained products, and neonatal resuscitation). For each indicator, in order to measure the availability of basic obstetric care services, health facilities were given one point for services available and zero for unavailable services. Thus, one basic obstetric care availability score (BEmOC signal functions) was created using the principal component analysis. Similarly, a basic obstetric care readiness score was computed using twelve dichotomous variables (staff trained in delivery & newborn care, skilled delivery care provider [24 hour coverage], examination light, delivery pack, suction apparatus [mucus extractor], manual vacuum extractor, vacuum aspiration [D&C kit], neonatal bag and mask, blank partograph, antibiotic eye ointment for newborn [e.g., Tetracycline], injectable antibiotic [e.g., Ceftriaxone], and IV solution [Ringer lactate & Normal saline] with infusion set). This analysis resulted in three basic obstetric care readiness scores (skilled personnel, medicine and commodities, and delivery equipment) that were used to measure a facility’s readiness to provide basic obstetric care services. Indices of comprehensive obstetric care availability and readiness were created for comprehensive obstetric care providing facilities. Two comprehensive obstetric care availability scores (basic and comprehensive components) were created using seven variables (parenteral administration of antibiotics, parenteral administration of uterotonic drugs, parenteral administration of anticonvulsants, assisted vaginal delivery, manual removal of retained products, neonatal resuscitation and blood transfusion). Comprehensive obstetric care readiness scores were computed using nine dichotomous variables (staff trained in delivery & newborn care, anaesthesia equipment, resuscitation table or neonatal resuscitation kit, oxygen, Spinal needle, blood typing, cross match testing, blood supply sufficiency, and caesarean section set). The analysis resulted in two comprehensive obstetric care readiness scores (equipment and supplies, and skilled personnel) that were used to measure a facility’s readiness to provide comprehensive obstetric care services. With regard to measuring outcome variables, a woman was considered to be using modern contraception if she used any modern contraceptive methods including female sterilization, male sterilization, oral contraceptive pills, intrauterine device (IUD), injectables, implants, or the lactational amenorrhea method. Male condom use was excluded since women could obtain condoms from shops that the SPA survey did not capture. For the antenatal care analysis, a woman’s use of antenatal care for her most recent birth in the five years preceding the survey was measured based on the number of antenatal visits. Pregnant women were grouped into three categories: those who had no ANC visits; one to three ANC visits; and four or more ANC visits. Regarding health facility delivery, a pregnant woman was considered to be using facility delivery if she reported that her most recent birth (within the five years preceding the survey) was at a health facility. Lastly, a woman was considered to have used caesarean delivery if her most recent birth (within the five years preceding the survey) was via caesarean section. The two data sets were linked using SAS software. The spatial analysis can be carried out using ArcGIS software. The Ethiopian Polyconic Projected Coordinate System, based on the World Geodetic System 84 (WGS84) coordinate reference system (CRS), was used to produce a flattened map of the country. The spatial statistics can be used to identify statistically significant spatial clusters (hot/cold spots) of maternal health service use. The GLIMMIX procedure in SAS can be used to estimate hierarchical models for categorical data, in this case, maternal health service use. The Global Moran’s I statistic or global spatial autocorrelation is the first step to be carried out in identifying spatial patterns of observations. It is used to measure the overall clustering and test the null hypothesis that there was complete spatial randomness (no spatial clustering) of observations [36]. It is used to measure the correlation between neighbouring observations and to find out spatial patterns and level of spatial clustering among neighbouring features [37]. The Global Moran’s I statistic is calculated by [38]: where n is the number of features (it is the number of clusters in this study), wij is the spatial weight between feature i and j, xi and xj are attribute values for feature i and j, respectively with mean x¯ and So is the aggregate of all the spatial weights: So = ∑i∑jwij The ZI-score for the statistic is computed as ZI=I−E[I]Var[I] where E[I] = −1/(n−1) and Var[I] = E[I2]−E[I]2 However, the Global Moran’s I (the measure of overall spatial autocorrelation) answers only the question “Is there spatial clustering?”; it does not answer the question “Where are the clusters (hot spots/cold spots)?”[39]. Therefore, local measures of spatial autocorrelation, that is Hot Spot Analysis (Getis-Ord Gi* statistic), are necessary to identify the type of spatial correlation and test the significance of local spatial patterns. The next step in identifying spatial clusters is to carry out incremental spatial autocorrelation. The incremental spatial autocorrelation is important to determine the scale, that is the critical distance or distance bandwidth at which there is maximum clustering. It measures spatial autocorrelation for a series of distances and creates a line graph with corresponding z-scores. The z-scores reflect intensity of spatial clustering; statistically significant z-scores indicate the distances at which maximum clustering are pronounced [40]. Before running the incremental spatial autocorrelation, the average distance at which a feature has at least one neighbour needs to be calculated using the Calculate Distance Band from Neighbour Count in the Spatial Statistics tools toolbox in ArcMap. Then, the maximum distance at which clustering of maternal health service use peaked can be obtained with corresponding z-score after running the incremental spatial autocorrelation. Lastly, the Getis-Ord Gi* statistic uses this maximum distance to identify statistically significant spatial clusters of hot spots (areas of high maternal health service use rates) and cold spots (low maternal health service use rates). The Getis-Ord Gi* statistic (local G-statistic) is used to test the statistical significance of local clusters and to determine the spatial extent of these clusters [41]. It is useful for identifying clusters by determining spatial dependence and relative magnitude between an observation and its neighbouring observations. The Getis-Ord local statistics [42] is given as: where xj is the attribute value for feature j, wij is the spatial weight between feature i and j, n is equal to the total number of features and x¯ is the mean maternal health service use: x¯=∑j=1nxjn and S=∑j=1nxj2n−(x¯)2 The Gi* statistic is a z-score, which means no further calculations are required. The Gi* is assumed to be normally distributed [41]. In other words, it can be calculated as a standard normal variant with a probability from the z-score distribution [43]. Clusters with a 95% significance level from a two-tailed normal distribution indicate significant clustering. A z-score of near zero and p-value greater than 0.05 indicate complete spatial randomness within the study area. A positive z-score along with p-value less than 0.05 indicate clustering of high values. Assessing the significance of local statistics of spatial association gets more complex as the number of spatial features/locations increases. In spatial analysis, it is fundamentally important to account for multiple and dependent comparisons. A False Discovery Rate (FDR) correction method can be applied to account for multiple and dependent tests in Local Statistics of Spatial Association [44]. A comparison of local statistic results was made with and without applying the False Discovery Rate correction in ArcGIS. After linking women in the respective cluster to the health facility variables, a multilevel regression analysis can be carried out. The EDHS survey employed a multistage cluster sampling technique where women in the survey were nested within regions. Due to the hierarchical nature of the data, analysis can be done using a two stage Hierarchical Generalized Linear Model (HGLM), which is appropriate for categorical, non-normally distributed response variables including binary data. The GLIMMIX procedure (PROC GLIMMIX) in SAS can be used to estimate the hierarchical generalized linear models [45]. The equation necessary for estimating this two level model is presented below. where Yij represents the log odds of using maternal health service for woman i in region j, γ00 provides the log odds of using maternal health service in a typical region, Wj is a region-level predictor for region j, γ01 is the slope associated with this predictor, μ0j is the level-2 error term representing a unique effect associated with region j, γ10 is the average effect of the individual-level predictor, Xij is an individual-level predictor for woman i in region j, and μ1j is a random slope for a level-1 predictor variable Xij, which allows the relationship between the individual-level predictor (Xij) and the outcome (Yij) to vary across level-2 units. This analysis procedure enabled the identification of potential factors associated with the utilization of maternal health service with a 95% confidence interval and p-value < 0.05. A common maximum likelihood estimation technique available with PROC GLIMMIX in SAS (the Laplace estimation) can be used to estimate the best-fit model [45]. The model building process should start with the unconditional model (a model containing no predictors) and more complex models can be gradually built by checking improvements in model fit after each model is estimated. A likelihood ratio test that examines the difference in the -2 log likelihood (-2LL) can be used to assess the best fitting model [45]. The unconditional (empty) model is used to calculate the intra-class correlation coefficient (ICC), which estimates how much variation in the use of maternal health service exists between regions (level-2 units). In HGLMs, it is assumed that there is no level-1 error variance; to calculate the intra-class correlation coefficient, a slight modification is made. The level-1 residual variance (εij) follows a logistic distribution and is standardized with a mean of zero and variance = π23 [46]. Therefore, for a two–level random intercept HGLM with an intercept variance of σμ02, the intra-class correlation coefficient (Rho) is given by; ρ=σμ02σμ02+π23 [46]. Ethical approval was obtained from the Human Research Ethics Committee, The University of Newcastle. We also got the Ethiopian Public Health Institute (EPHI) and the Measure DHS program approval to access the datasets.

N/A

Based on the information provided, here are some potential innovations that can be used to improve access to maternal health:

1. Geospatial Data Linkage: Use geospatial data to link health facility data with population data. This can help in monitoring and evaluating maternal health services at different levels and assist with efficient resource allocation.

2. Administrative Boundary Link: Use administrative boundary link to directly link sampled health facility data with population survey data. This method ensures that all health facilities within the respective administrative boundaries are included in the analysis.

3. Euclidean Buffer Link: Use Euclidean buffer link to link a census of health facilities data with population survey data. This method provides cluster-level service environment estimates, which can be useful for analyzing specific maternal health services like caesarean delivery.

4. False Discovery Rate Correction: Apply a False Discovery Rate correction to account for multiple and dependent testing while carrying out local spatial statistics. This correction method helps in identifying true spatial clusters of maternal health service use.

5. Multilevel Analysis: Conduct a multilevel analysis to identify key determinants of maternal health service use. This analysis takes into account the hierarchical nature of the data and can provide insights into factors influencing maternal health service utilization.

These innovations can help in improving access to maternal health by providing a better understanding of service utilization patterns, identifying areas that need special emphasis, and informing targeted interventions.
AI Innovations Description
The recommendation to improve access to maternal health based on the provided information is to utilize a service environment link and apply a False Discovery Rate (FDR) correction.

The service environment link involves geographically linking health facility data with population data. This allows for efficient resource allocation, monitoring and evaluation of service efficacy at different levels. By linking the two datasets, it becomes possible to analyze maternal health service data and identify key determinants of maternal health service use.

The False Discovery Rate (FDR) correction is important when conducting local spatial statistics to account for multiple and dependent testing. This correction helps identify true spatial clusters of maternal health service use and ensures accurate results.

By implementing these recommendations, policymakers and healthcare providers can gain insights into the spatial patterns of maternal health service use and identify areas that require special emphasis and intervention. This can lead to targeted efforts to improve access to maternal health services and ultimately improve maternal health outcomes.
AI Innovations Methodology
The paper discusses methodological considerations in population and health facility surveys for improving access to maternal health services. It explores the use of geospatial data linkage and spatial analysis techniques to identify hot spots and determine the impact of various factors on maternal health service use in Ethiopia.

To improve access to maternal health, the paper recommends using a service environment link to minimize methodological issues associated with geographic data linkage. This involves linking health facility data with population data using administrative boundaries or Euclidean buffer distances. The administrative boundary link is suitable for sampled health facilities, while the Euclidean buffer link is appropriate for a census of health facilities.

To simulate the impact of these recommendations on improving access to maternal health, the paper suggests using spatial analysis techniques in ArcGIS software. The Global Moran’s I statistic can be used to measure overall clustering and test for spatial randomness. The Getis-Ord Gi* statistic can identify statistically significant hot spots and cold spots of maternal health service use.

To account for multiple and dependent testing in local spatial statistics, the paper recommends applying a False Discovery Rate (FDR) correction. This correction method helps assess the significance of local clusters of maternal health service use.

Finally, the paper suggests conducting a multilevel regression analysis using a Hierarchical Generalized Linear Model (HGLM) to identify factors associated with the utilization of maternal health services. The GLIMMIX procedure in SAS can be used to estimate the best-fit model and assess the impact of individual and region-level predictors.

Overall, the methodology described in the paper provides a comprehensive approach to simulate the impact of recommendations on improving access to maternal health services. It combines geospatial data linkage, spatial analysis techniques, and multilevel regression analysis to identify areas of focus and potential interventions for improving maternal health outcomes.

Share this:
Facebook
Twitter
LinkedIn
WhatsApp
Email