Background: During the millennium development goals period, reduction in under-five mortality (U5M) and increases in child health intervention coverage were characterised by sub-national disparities and inequities across Kenya. The contribution of changing risk factors and intervention coverage on the sub-national changes in U5M remains poorly defined. Methods: Sub-national county-level data on U5M and 43 factors known to be associated with U5M spanning 1993 and 2014 were assembled. Using a Bayesian ecological mixed-effects regression model, the relationships between U5M and significant intervention and infection risk ecological factors were quantified across 47 sub-national counties. The coefficients generated were used within a counterfactual framework to estimate U5M and under-five deaths averted (U5-DA) for every county and year (1993–2014) associated with changes in the coverage of interventions and disease infection prevalence relative to 1993. Results: Nationally, the stagnation and increase in U5M in the 1990s were associated with rising human immunodeficiency virus (HIV) prevalence and reduced maternal autonomy while improvements after 2006 were associated with a decline in the prevalence of HIV and malaria, increase in access to better sanitation, fever treatment-seeking rates and maternal autonomy. Reduced stunting and increased coverage of early breastfeeding and institutional deliveries were associated with a smaller number of U5-DA compared to other factors while a reduction in high parity and fully immunised children were associated with under-five lives lost. Most of the U5-DA occurred after 2006 and varied spatially across counties. The highest number of U5-DA was recorded in western and coastal Kenya while northern Kenya recorded a lower number of U5-DA than western. Central Kenya had the lowest U5-DA. The deaths averted across the different regions were associated with a unique set of factors. Conclusion: Contributions of interventions and risk factors to changing U5M vary sub-nationally. This has important implications for targeting future interventions within decentralised health systems such as those operated in Kenya. Targeting specific factors where U5M has been high and intervention coverage poor would lead to the highest likelihood of sub-national attainment of sustainable development goal (SDG) 3.2 on U5M in Kenya.
Our analysis involved four main steps. First, data on U5M rates and factors associated with child survival were synthesised and estimated at the county level from population censuses and household sample surveys [1]. Second, a set of parsimonious factors significantly associated with U5M in the Kenyan context were selected. In the third step, a Bayesian ecological space–time mixed-effects regression model was fitted to quantify the relationship between the parsimonious set of factors and U5M. In the fourth step, counterfactual analysis, an approach widely used to assess causal attribution [25–34] in health applications [33–38], was used to determine how much changes in U5M (deaths averted (U5-DA) or lives lost (U5-LL)) were attributable to the changes in intervention coverage and disease infection prevalence between 1993 and 2014. The outcome variable was U5M available for each county across 22 years from 1993 to 2014 generated using demographic and spatio-temporal models detailed elsewhere [1]. In brief, ten household surveys and three population censuses with birth histories were assembled and spatially aligned to county boundaries. Five demographic methods were applied to estimate U5M per county by survey and smoothed using a Bayesian spatio-temporal Gaussian process regression (GPR) accounting for spatio-temporal relatedness, sample size and demographic methods [1]. The candidate list of predictor variables included 43 factors known to be associated with U5M (Additional file 1) available at county level spanning across 22 years (Table 1 and Additional file 1). The 43 factors were identified from existing frameworks of child survival [39–42] (Additional file 1), relevance to Kenya’s health priorities and data availability and defined based on household survey guidelines while ensuring temporal comparability. Estimates for 39 factors were generated using data from 20 household surveys and three population censuses via a Bayesian spatio-temporal GPR model while four factors were available from disparate sources [7, 43–45]. Table 1 outlines the factors, while detailed definitions and the specific data sources for each factor are presented in Additional file 1. Our analyses included data up to 2014 when the last household sample survey was conducted. Alternative data sources are limited; the coverage and completeness of data from civil registration and vital statistics systems remains low in Kenya while routine data does not capture those who do not interact with the health systems and misses majority of deaths that happen in the communities. Forty-three factors associated with child survival and thematic groups as used in the current analysis. Their definitions and respective data sources are detailed in Additional file 1 Model development aimed to select a parsimonious set of factors strongly associated with U5M in Kenya to reduce overfitting and fluctuating regression coefficients [46, 47]. Before formal statistical model development, factors whose contribution was captured by other factors among the 43 were excluded to reduce any potential collinearity, circularity and confounding [3, 48, 49] (Additional file 2). The role of insecticide-treated bed nets (ITNs) and antimalarial medicine is captured by malaria prevalence [7, 50, 51]; intermittent preventive treatment in pregnancy (IPTp) role is partially captured by low birth weight (LBW) [52–55] and was available in 13 malaria-endemic counties only [56] while the effect of three doses of diphtheria–tetanus–pertussis (DTP3) vaccine, polio (Polio3), measles and BCG vaccines was captured by fully immunised status. Likewise, factors measuring the same intervention and have an overlapping impact (for example, maternal education and maternal literacy) were grouped, their relationship with U5M quantified via a simple regression and the best fitting factor included based on a lower Akaike Information Criterion (AIC) [3, 48, 49] (Additional file 2). All the factors under consideration are associated with U5M (Additional file 2). To validate the findings against the Kenyan context, a simple regression model was fitted to explore the bivariate association between U5M, and factors retained in the preceding stage. Factors with a p-value < 0.2 were retained and were considered in the elastic net regression (ENR) model. ENR, a rigorous penalisation regression, was used to reduce dimensionality by selecting factors that explain most of the variation in U5M [57–60] as applied in child survival studies [61, 62]. It was implemented via the glmnet R statistical package [60] with factors having non-zero coefficients forming the base model. Further simplification of the base model was explored through the Deviance Information Criterion (DIC) and model predictive capability using out of sample validation [46, 47, 63]. The presence of multicollinearity was assessed using the variance inflation factor (VIF) with a cut-off of four with the collinear and interpretable factors combined through principal component analysis [64–66]. The final list of factors from the model development phase was included in a Bayesian ecological space–time mixed-effects regression model to estimate adjusted association with U5M (Eq. 1). The model included an intercept, fixed effects, spatial and temporal random effects and a space–time interaction term. The random variables were assigned prior distributions that borrowed the strength of information across space and time to capture better the underlying structure of U5M, changes in time-varying factors that affect all counties and unchanging factors of U5M within each county [33, 36]. Equation 1 Bayesian ecological space–time mixed-effects regression for quantifying the association between U5M and factors from the model development process. where ln(5q0)i, t is the natural logarithm of U5M in county i and year t, βo the intercept, βj regression coefficients for the fixed effects, ∑j=1nxj, (μi) the structured spatial random effect, vi the unstructured spatial random effect, (γt) the temporal random effect and δi, t the type 1 space–time interaction specified for parsimony and concerns on identifiability with highly structured interactions. The spatial dependence was defined by a neighbourhood matrix through the queen adjacency with the value of a parameter in one county influenced by the average value of its neighbouring counties with some additional variability. The convolution Besag, York and Mollié (BYM) conditional autoregressive (CAR) model was used to express spatial dependence [67, 68]. Similarly, temporal neighbours were defined by the adjacent period points (preceding and post) while the space–time interaction parameter δi, t accounted for any departure from predictable patterns based on the overall temporal and spatial effects [68]. The variance parameters for the random effects were assigned non-informative priors due to lack of prior corresponding data to inform the choice of such specification and allow the data to drive the model results [69]. The hyper prior distributions followed inverse gamma distributions with parameter values of 0.5 and 0.0005 [68]. The inference was made via Markov Chain Monte Carlo and posterior distributions of parameters summarised by the mean and the 95% credible intervals (CI). The estimated regression coefficients from the space–time model were used to compute annual counterfactual U5M for every county between 1993 and 2014; U5M predictions assuming intervention coverage and disease prevalence of each factor had stayed constant at its 1993 value over 22 years. The differences between the observed U5M and counterfactual U5M were multiplied by the annual number of under-fives to compute the annual number of U5-DA and/or U5-LL between 1993 and 2014 based on census data including the corresponding CI. In summary, the set-up allowed for the estimation of the annual number of child deaths averted associated with changes in disease prevalence and intervention coverage relative to 1993 values. Model convergence was evaluated via trace plots and Gelman–Rubin statistic [70] while the model accuracy was assessed using the MC error, the standard deviation and their ratio [71]. Data preparation and pre-processing were done in StataCorp. 2014 [Stata Statistical Software: Release 14. College Station, TX: StataCorp LP]; model development was conducted in R statistical (V.3·4·1) while the final Bayesian ecological space–time mixed-effects regression was fitted in WinBUGS Package (version 1.4.3) [72]. All the cartographies were done in ArcMap 10.5 (ESRI Inc., Redlands, CA, USA).