Background: Many sub-Saharan countries are confronted with persistently high levels of infant mortality because of the impact of a range of biological and social determinants. In particular, infant mortality has increased in sub-Saharan Africa in recent decades due to the HIV/AIDS epidemic. The geographic distribution of health problems and their relationship to potential risk factors can be invaluable for cost effective intervention planning. The objective of this paper is to determine and map the spatial nature of infant mortality in South Africa at a sub district level in order to inform policy intervention. In particular, the paper identifies and maps high risk clusters of infant mortality, as well as examines the impact of a range of determinants on infant mortality. A Bayesian approach is used to quantify the spatial risk of infant mortality, as well as significant associations (given spatial correlation between neighbouring areas) between infant mortality and a range of determinants. The most attributable determinants in each sub-district are calculated based on a combination of prevalence and model risk factor coefficient estimates. This integrated small area approach can be adapted and applied in other high burden settings to assist intervention planning and targeting.Results: Infant mortality remains high in South Africa with seemingly little reduction since previous estimates in the early 2000’s. Results showed marked geographical differences in infant mortality risk between provinces as well as within provinces as well as significantly higher risk in specific sub-districts and provinces. A number of determinants were found to have a significant adverse influence on infant mortality at the sub-district level. Following multivariable adjustment increasing maternal mortality, antenatal HIV prevalence, previous sibling mortality and male infant gender remained significantly associated with increased infant mortality risk. Of these antenatal HIV sero-prevalence, previous sibling mortality and maternal mortality were found to be the most attributable respectively.Conclusions: This study demonstrates the usefulness of advanced spatial analysis to both quantify excess infant mortality risk at the lowest administrative unit, as well as the use of Bayesian modelling to quantify determinant significance given spatial correlation. The “novel” integration of determinant prevalence at the sub-district and coefficient estimates to estimate attributable fractions further elucidates the “high impact” factors in particular areas and has considerable potential to be applied in other locations. The usefulness of the paper, therefore, not only suggests where to intervene geographically, but also what specific interventions policy makers should prioritize in order to reduce the infant mortality burden in specific administration areas. © 2011 Sartorius et al; licensee BioMed Central Ltd.
South Africa is administratively divided into nine provinces responsible for health service delivery (Figure (Figure4),4), and further divided into 53 districts [47 district municipalities and 6 metropolitan districts. These districts are then disaggregated into 248 local municipalities. Service delivery for water and sanitation is a municipal function in most instances. Map of South Africa, with provinces and neighbouring countries. The nine provinces vary in a number of ways. The Western Cape has the highest human development index (HDI) 1 followed by Gauteng [48], while the ten most deprived districts in 2007/2008 were located in Kwazulu Natal (6), Eastern Cape (3) and Limpopo (1) that are all classified as rural development districts [49]. Conversely all the districts within the Western Cape were classified as the least deprived as were three of the six metros, namely the City of Cape Town and the Nelson Mandela metro (Eastern Cape) and the City of Johannesburg (Gauteng). The data were drawn from the community survey run by Statistics South Africa in 2007. These data included information regarding demographic indicators (fertility, mortality and migration) and socio-economic data that included poverty indicators, access to facilities and services and levels of unemployment [50]. The 2007 Community Survey randomly sampled enumeration areas (EA) and then dwelling units within each EA. An enumeration area is defined as the smallest geographical unit (piece of land) into which the country is divided for enumeration purposes. Enumeration areas contain between 100 to 250 households. The survey indicted 80,787 EA’s countrywide and 1,321 were excluded as they were designated as institutions or recreational areas. The EA’s within each municipality were ordered by land use and human settlement type and selection was done using systematic random sampling. The second level of the sampling frame consisted of re-listing the dwelling units (which could potentially contain one or more households) within the selected EA’s. Random selection of dwelling units was based on a fixed proportion of 10% of the total listed dwellings in an EA. The survey sample covered 274 348 dwelling units across all the provinces and attained a response rate of 93.9% [51]. In this regard, the recalculation of person weights to address sampling errors was applied to provide more credible estimates of the population at national and provincial levels. Data based on these weights were used in the analysis in this paper. The South African Statistics Council [52] found the reported demographic data (fertility and mortality proportions) to be entirely plausible when compared to other censuses. Certain limitations and potential errors were identified by Statistics SA and the South African Statistical Council when reviewing the survey. The following systematic errors were observed in the data: -Underestimate of men relative to women; -Underestimate of children younger than 10 years; -Excess of people aged 10-24 in Western Cape and Gauteng; and -Deficit of women aged 20-34 in Free State, KwaZulu-Natal and Limpopo. The following aggregated (ecological) sub-district level data were extracted (Nesstar) from the primary Community Survey 2007 database: infant population and deaths; maternal (deaths, fertility, and if a previous sibling(s) to the current infant had died); paternal (deaths); sub-district education level, employment status and household income; household services (access to water, water type and distance to nearest water source; household toilet facilities; household refuse removal). We also calculated Gini-coefficient, a commonly used measure of inequality, for each of the sub-districts based on the dispersion of annual household income within that sub-district. Additional data regarding district level antenatal HIV sero-prevalence in 2007 were extracted from the District Health Barometer for 2007/2008 [49]. Other data sources: additional district level data on HIV antenatal seroprevalence and the number of clinics in each distract are taken from District Health Barometer for 2007/2008 [49]. Finally, a national shape file containing all 248 sub-districts was imported into MapInfo Professional 9.5 to create the necessary areal and geospatial data. Centroids of each sub-district, as well as an adjacency matrix of all neighbouring sub-district combinations were extracted using functions within this software package. These centroids and the adjacency matrix were needed for the various spatial and multivariable analyses (autocorrelation, clustering and Bayesian conditional autoregressive approaches) described in detail below. The infant mortality proportions were calculated for each district by dividing the observed number of deaths by the total population in district i (i = 1,…,52) based on the weighted 2007 community survey. To identify districts in which the mortality proportion was significantly above average, we constructed the exact 95% confidence intervals for each rate using the Poisson distribution of the observed number of deaths [53]. District mortality was considered significantly above average for that year if the overall proportion for the given year was below the lower limit (α = 0.025) of the mortality proportion for that district [54]. This approach does not allow conclusions for districts close to the reference value (SMR = 1) which are equally crucial to policy makers. The combined approach of difference and equivalence testing has recently emerged as a way to improve the interpretability of areal spatial data [55]. Thus for districts which were not significantly different from the reference value but that were greater than 1 (SMR>1) we also performed equivalency testing using a typically used critical value of Δ = 0.2 [56], which leads to an equivalence range of (0.8, 1.25). We used the twice-the-smaller-tail (TST) method [57] which is an computation of the equivalence test statistic for discrete distributions (i.e. Poisson in this case). Various spatial analysis techniques and models were employed in this study to compare and identify significant infant mortality “hotspots”, namely Moran’s I spatial autocorrelation coefficient [58], Kulldorff spatial scan statistic [59], a standard Bayesian convolution conditional autoregressive approach [60] and lastly a Bayesian augmented zero-inflated Poisson approach [61,62]. The first three each have inherent strengths and weaknesses which are extensively detailed in the literature. Further detail regarding the Bayesian approaches are provided in this paper. Given the similarity of the output for infant mortality risk from these various approaches, we only present results for the final Bayesian approach (see Appendix 1 for details of model assessment). We did however use Moran’s I to test both for significance of values within a sub-district as well as a measure of the strength of clustering or dispersion of the various indicator variables [63]. Exceedance probabilities (i.e. smoothed standardised mortality ratio in given area significantly greater than 1) from the Bayesian spatial modelling approach were used to identify sub-districts with significant excess infant mortality risk in the attributable fraction analysis. This is further detailed in Appendix 1. In order to address the problems associated with small area analysis and spatial correlation, we finally used Bayesian hierarchical modelling. Small area studies have better interpretability than larger scale studies and are less susceptible to ecological fallacy or bias. However the drawbacks include data that may be very sparse with a large number of event free (zero count) area and over-dispersion of the data [26]. Correlation or interdependence of observations in neighbouring or adjoining areas also poses a problem. Objects (in this case sub-districts) in close proximity are often more alike. Consequently, it is important to include the effects of spatial proximity when performing statistical inference on such processes. The standard error of the covariates, moreover, is underestimated if this spatial correlation is not taken into account, thereby overestimating the significance of the risk factors. The estimates of the outcomes, such as mortality, are also incorrect at the locations where data are missing. Bayesian areal or geostatistical models relax the assumption of independence and assume that spatial correlation is a function of neighbouring locations or distance between locations and also allows prediction at unsampled locations [64]. Lastly, measurement errors for both numerators and denominators also represent a problem associated with small area studies [27]. Bayesian hierarchical models are the most commonly used framework to address the problems posed by small area analysis [65]. Bayesian estimators are also widely used in order to obtain reliable estimates for the relative risk when there are sub-areas with small populations and traditional estimates of relative risk lead to unreliable or unstable results [66]. With the development of Markov Chain Monte Carlo (MCMC) methods and software such as OpenBUGS, Bayesian approaches are being increasingly applied to the analysis of many social and health problems in addition to disease mapping and modelling. Two different Bayesian spatial model formulations were tested and used in this study. These models were based on fitting spatial Poisson models with two random-effects terms that took the following into account: (1) sub-district contiguity [spatial term); and (2) sub-district heterogeneity. We firstly used the Besag, York and Molliè [60] or convolution conditional autoregressive (CAR) model that is discussed in more technical detail in Appendix 1. For the spatial risk map we used a formulation of the above which included no covariates (only a constant and the convolution conditional autoregressive terms). To calculate expected outcomes (Ei), the overall infant mortality for 2007 was multiplied by each sub-districts infant population to give the expected number of infant deaths. The following indicator variables were tested against infant mortality: maternal mortality; previous sibling(s) outcome; education, household income and Gini-coefficient derived from income; household services (access to water, household toilet facilities; household refuse removal; ratio of infants to sub-district clinics). In order to assess the relationship between infant mortality and the various predictors, preliminary univariate zero inflated Poisson regressions were run in Stata 10.0 SE. Covariates significant at the 10% level were then incorporated into the multivariable Bayesian spatial model. Details of the multivariable model are provided in Appendix 2. We wanted to assess the degree to which sub-district exposure to a particular variable (e.g. access to water and sanitation) impacted on infant mortality. This could provide an indication for policy makers about what intervention(s) to prioritise. To do this we linked together the risk estimates associated with the indicators in the multivariable model with the actual prevalence of exposure to those indicators within the various high risk sub-districts identified through our spatial analysis. The following standard formula for calculating an attributable fraction (AF) for each determinant based on its prevalence of exposure (pe) in a given sub district, as well as the model coefficient (IRR) for that determinant was used: Finally, the analysis was carried out in STATA 10.0 SE, SaTScan and OpenBUGS. Maps were developed in Stata 10.0 SE and MapInfo Professional 9.5.
N/A