Background: Over 155 million children under five suffer from stunting, and it is responsible for over one million deaths and 54.9 million Disability Adjusted Life Years (DALYS) of under-five children worldwide. These predominantly occurred in low-and middle-income countries like sub- Saharan Africa. Stunted children begin their lives at a marked disadvantage. Some of these are; poor cognition and educational performance, low adult wages, lost productivity and, when accompanied by excessive weight gain later in childhood, an increased risk of nutrition- related chronic diseases in adult life and the devastating effects of stunting can last a lifetime and even affect the next generation. Despite this, its magnitude rises in the past 25 years in sub-Saharan Africa. Studies that capture the pooled prevalence and associated factors of stunting among children aged 6-59 months in sub-Saharan Africa are limited. Therefore, this study was carried out on the basis of the Bayesian approach to determine the pooled prevalence and predictors of stunting among children aged 6-59 months in Sub-Saharan Africa. Objective: To assess the pooled prevalence of stunting and associated factors among children aged 6-59 months in Sub-Saharan Africa Methods: For this study a total of 173,483 weighted samples from the demography and health survey data set of 35 sub-Saharan African countries from 2008 to 2020 were used. After checking Variation between cluster by computing Intraclass Correlation Coefficient, binary logistic regression model was conducted based on hierarchical Bayesian statistical approach to account the hierarchical nature of demography and health survey data and to get reliable estimates by using additional information from the prior distribution. Adjusted odds ratio with 95% credible interval of the best fitted model was used to ascertain the predictors. Results: The pooled prevalence of stunting in Sub-Saharan Africa was about 35% (95%CI: 34.87, 35.31). Of the sub-regions, the highest prevalence of stunting was in East Africa, 37% (95%, CI: 36.96, 37.63) followed by Central Africa, 35% (95%CI: (34.93, 35.94). Being male (AOR = 1.27, 95% CrI 1.25, 1.30), small birth size (AOR = 1.29, CrI 1.25, 1.32), home delivery (AOR = 1.17, CrI 1.14, 1.20), and no education of mothers (AOR = 3.07, CrI 2.79, 3.39) were some of the significant predictors of stunting of children. Conclusion and recommendation: The prevalence of stunting of children in sub-Saharan Africa is among the highest in the world. Predictors such as being male, being small at birth, a child delivered at home, and, low level of maternal education were some of the predictors of childhood stunting. Stakeholders and non-governmental organizations should consider those contributing factors of stunting when they plan and design nutritional improvement programs.
A secondary data analysis was conducted based on 35 SSA countries’ DHS data set from 2008 to 2020. SSA has four sub-regions namely; Central Africa, East Africa, West Africa and Southern Africa. According to global trends in 2020, the population of SSA was 1.1 billion [39]. In 2015, Sub-Saharan Africa was home to 27 of the world’s 28 poorest countries and had more extremely poor people than in the rest of the world combined [40]. Recent DHS data set of 35 SSA countries were appended together to investigate the pooled prevalence and determinants of stunting among children aged 6–59 months in SSA. The DHSs were a nationally representative survey that collected data on basic health indicators like mortality, morbidity, family planning service utilization, fertility, maternal and child health-related indicators including height and weight of under-five children. The data was derived from the measure DHS program, and the detailed information about the surveys can be found in each country’s’ DHS report. Demography and health survey sample designs are usually two-stage stratified probability samples drawn from an existing sample frame, generally the most recent census frame. Typically, DHS samples are stratified by geographic region and by urban/rural areas within each region. Within each stratum, the sample design specifies an allocation of households to be selected. In the first stage of selection, the primary sampling units (PSUs) are selected with probability proportional to size (PPS) within each stratum. The PSU forms the survey cluster. In the second stage, a complete household listing is conducted in each of the selected clusters. Following the listing of the households, a fixed number of households is selected by equal probability systematic sampling in the selected cluster. For this study, we used the Kids Record dataset (KR file). Using the KR file, we extract the dependent and independent variables for each country and then we appended, and pooled the DHS survey data of the 35 sub-Saharan Africa countries. Standard DHS Surveys have large sample sizes (usually between 5,000 and 30,000 households) and typically are conducted about every 5 years, to allow comparisons over time. For this study, a total weighted sample of 173,483 of under-five children aged 6–59 months from the DHS data set of 35 sub-Saharan African countries were used S1 Table in S1 File. Stunting status of under-five children: height for age of children dichotomized as normal (not stunted) if height for age ≥ -2 SD and stunted if height of age < -2 SD form WHO child growth reference. Weight measurements were obtained using lightweight SECA mother-infant scales with a digital screen designed and manufactured under the guidance of UNICEF. Height measurements were obtained using the Shorr measuring board. Children younger than 24-months were measured for their height while lying down, and children older than 24 months were measured while standing. In line with the study’s objectives and due to the hierarchical nature of the DHS data where children and mothers were nested within a cluster, two level independent variables were considered. Community-level or second level variables: sub-regions of SSA, residence, distance to a health facility. Categorized as socio demographic and economic factors including; child’s age, child’s sex, breastfeeding (ever breastfeed an currently breastfeeding), maternal age, marital status, maternal educational level, wealth index, media exposure, sex of household head, type of water source, type of toilet facility, and obstetrical and/ health-related factors include; place of delivery, a recent history of diarrhea, recent history of cough, deworming of child, recent history of fever, immunization, type of birth, size of the baby at birth and subsequent birth interval. A secondary data analysis was done based on the most recent DHS datasets of 35 SSA countries from 2008 to 2020. First, we have requested the DHS measure program permission for the data access by explaining the purpose of the study. Once accessed and downloaded the data set, we have appended the 35 countries’ DHS data set in to one dataset using the Stata command “append using”. Then we have developed a structured checklist based on literature. After that we have kept plausible variables using our checklist. Then after we had removed missing records in the dependent variable, and categorized the outcome, height for age, as normal and stunted based on the WHO growth reference. We had also categorized, recategorized and recoded our independent variables based on literature and according to DHS guide 7. Furthermore, we have made all the data management process using Stata version 14 and the data was weighted using sampling weight, primary sampling unit, and strata before any statistical analysis done to restore the representativeness of the survey. We had conducted meta-analysis, and Stata version 14 software was used to compute the pooled prevalence and construct the forest plot [41]. Since there was heterogeneity between countries’ surveys, we had computed sub-group prevalence using random effect methods, and the variance estimation method used was Restricted Maximum Likelihood (REML). Finally, the data was exported to R version 4.1.0 statistical software [42], and we used the Bayesian Regression Model Using Stan (BRMS) of R package to conduct the binary logistic regression analysis based on Bayesian hierarchical statistical approach to identify associated factors of stunting [43]. Results were presented in the form of texts, tables and figures. Since the outcome variable is dichotomized (normal and stunted), a binary logistic regression model was employed. Given the DHS data has hierarchical nature, participants are nested within a cluster, and it is natural to assume that study subjects in the same cluster may share similar characteristics to participants in another cluster. This violates the independence observations and equal variance assumptions between clusters of the logistic regression model. This implies the need to take into account the heterogeneity between clusters by using an appropriate model. The data has two levels with a group of J EAs and within-group j (j = 1, 2…, J), a random sample nj of level-one units (individual children). The response variable is denoted by; Therefore, a multilevel binary logistic model was fitted to get a reliable estimate and standard error in such data that has nested nature so as to make appropriate inferences and conclusions [44]. Four models were constructed for the multilevel binary logistic regression analysis. The first model was an empty model without any explanatory variables, to calculate the extent of cluster variation on stunting status. Variation between clusters (EAs) was assessed by computing Intraclass Correlation Coefficient (ICC), Median Odds Ratio (MOR) and proportional change in variance (PCV) (the equations and interpretations of these entities can be found in the supplementary material). The second model was fitted with individual level variables; the third model was adjusted for community level variables while the fourth was fitted by incorporating both individual and community level variables together. The statistical analysis was conducted based on the Bayesian statistical approach which considers the population parameters as unknown and random entities that follow a certain probability distribution. This approach has three components which include the likelihood function, prior distribution, and posterior distribution [45]. The likelihood function is one component of the Bayesian statistical approach that exploits information about the parameters contained in the data at hand. In our case, the data to be used has a binary response with two hierarchies which needs to employ multilevel binary logistic regression analysis. It is the probability distribution that expresses prior uncertainty associated with the parameter of interest. There are two common types of priors in Bayesian statistics (Informative and Non-informative priors). An informative prior is a prior distribution that is used when information about the parameter of interest is available before the data is collected and the information can be obtained from previous studies, expert knowledge or experience and a combination of both but prior information should not be obtained from the currently collected data. On the other hand, when there is not enough prior information non informative (uniform or flat) prior, which gives less weight to the prior knowledge can be used. For this study, non-informative priors; Normal flat (0, 1000) prior distribution for the fixed (population and intercept) parameters and uniform (0, 1000) prior distribution for random effect (group level) were used. The posterior distribution is a method to summarize what we know about uncertain quantities after the data has been observed in the Bayesian statistical analysis. It can be obtained by multiplying the prior distribution normal for β parameters and the binary logistic regression likelihood function. The posterior distribution is derived from the well-known Bayes’ theorem (the formulation is supplemented). Since Bayesian estimation needs complex calculation, approximation using simulation is required. For this study, Simulation was applied by using a BRMS package with two chains each of 3000 with 1000 warm up iterations, and all the parameters were left to initiate randomly. In our case, a variant of Markov chain Monte Carlo (MCMC) methods called No-U turn sampler (NUTS) was used. NUTS sampler improves the drawback of Hamiltonian Monte Carlo’s (HMC) by introducing the slice variable sampled uniform distribution in the sampling procedure. The results from a given distribution are not deemed reliable until the chain has reached its stationary assumption. But the inference becomes appropriate when target distributions become well converged. Therefore, monitoring its convergence is essential for producing reliable results from the posterior distribution. The convergence of the targeted distribution was assessed by trace plot, density plot, effective sample size and R hat statistics. Model selection criteria such as Leave-one-out cross-validation (LOO) and Watanabe Akaike information criterion (WAIC) are considered as appropriate for selecting the best fitted model in Bayesian approach using BRMS. WAIC is calculated as log predictive density for each data point minus estimated effective number of parameters, and it becomes unreliable if any estimated effective number parameter exceeds 0.4, which holds true for our case. As a result, LOOIC was used to compare the four models fitted and the full model containing both individual level and community level variables was selected as best fitted model because it had the smallest LOOIC than the other three models. Adjusted odds ratio (AOR) with 95% credible interval (CrI) from best fitted model was used to assure variables which have significant association with stunting status among under five children.