Background: Determining the spatial patterns of infection among young children living in a malaria-endemic area may provide a means of locating high-risk populations who could benefit from additional resources for treatment and improved access to healthcare. The objective of this secondary analysis of baseline data from a cluster-randomized trial among 1943 young Ghanaian children (6-35 months of age) was to determine the geo-spatial factors associated with malaria and non-malaria infection status. Methods: Spatial analyses were conducted using a generalized linear geostatistical model with a Matern spatial correlation function and four definitions of infection status using different combinations of inflammation (C-reactive protein, CRP > 5 mg/L) and malaria parasitaemia (with or without fever). Potentially informative variables were included in a final model through a series of modelling steps, including: individual-level variables (Model 1); household-level variables (Model 2); and, satellite-derived spatial variables (Model 3). A final (Model 4) and maximal model (Model 5) included a set of selected covariates from Models 1 to 3. Results: The final models indicated that children with inflammation (CRP > 5 mg/L) and/or any evidence of malaria parasitaemia at baseline were more likely to be under 2 years of age, stunted, wasted, live further from a health facility, live at a lower elevation, have less educated mothers, and higher ferritin concentrations (corrected for inflammation) compared to children without inflammation or parasitaemia. Similar results were found when infection was defined as clinical malaria or parasitaemia with/without fever (definitions 3 and 4). Conversely, when infection was defined using CRP only, all covariates were non-significant with the exception of baseline ferritin concentration. In Model 5, all infection definitions that included parasitaemia demonstrated a significant interaction between normalized difference vegetation index and land cover type. Maps of the predicted infection probabilities and spatial random effect showed defined high- and low-risk areas that tended to coincide with elevation and cluster around villages. Conclusions: The risk of infection among young children in a malaria-endemic area may have a predictable spatial pattern which is associated with geographical characteristics, such as elevation and distance to a health facility. Original trial registration clinicaltrials.gov (NCT01001871)
The data used in these analyses were generated from the baseline survey of a community-based, cluster-randomized trial conducted in 2010 in Wenchi and Tain districts of the Brong-Ahafo region, a substantially rural area of Ghana [6]. At the time there were an estimated 7.2 million cases of malaria per year in Ghana, and the prevalence of anaemia among preschool-aged children was 76.1 % (95 % CI 73.9–78.2 %) [21, 22]. Briefly, the aim of the randomized trial was to determine the effect of providing iron with other micronutrients in powder form for 5 months during the rainy season (March–November) on the incidence of malaria among 1958 children aged 6–35 months (representing 1552 clusters and 22 villages) (Fig. 1) [6]. A village was eligible for inclusion in the study if the inhabiting households had at least one child between 6 and 35 months of age. Potentially eligible participants were screened, beginning with villages near the north-east border of Wenchi, then moving to adjacent villages along the main road network. Eligible children were aged 6–35 months, eating solid foods, and living in the study area for at least the following 6 months. Exclusion criteria included severe anaemia (haemoglobin <7.0 g/dL), severe malnutrition (weight-for-length z-score 5 mg/L) and/or malaria parasitaemia; (2) inflammation (CRP > 5 mg/L) without parasitaemia; (3) parasitaemia with measured concurrent fever (axillary temperature >37.5 °C) or reported history of fever within 48 h (i.e., clinical malaria); and, (4) parasitaemia with or without concurrent fever or history of fever. All dependent variables were binary-valued (coded as ‘1’ for positive infection status), and analysed using a logistic model. The four different outcomes were modelled separately in order to explore whether observed geo-spatial associations were influenced by the way infection was defined, and how much of this influence may have been driven by malaria versus non malaria infection types. Geo-spatial and non-spatial variables were chosen for inclusion in the final models based on expert opinion and a review of the literature pertaining to spatial risk factors of malaria and anaemia among young children in low- and middle-income countries [18]. Variables were eligible for inclusion if they were considered to be direct or indirect antecedent factors associated with infection (e.g., elevation), and excluded if they were potential outcomes of infection (e.g., anaemia). The models were fit using Bayesian inference via an integrated nested laplace approximation (INLA) algorithm [29]. Given the exploratory nature of the analyses, weak or uninformative priors were used for all model parameters with the exception of the Matern shape parameter, which was fixed at 2. Spatial predictions were made on a 100-cell grid covering the study area. The Matern correlation, approximated by a Markov random field [30], extended an additional 3000 m in each direction. Infection probabilities, after transformation with a logit link function, were modelled as the sum of the contributions of the explanatory variables, as well as spatially correlated and compound-level random effect terms. The posterior medians of the odds of infection were computed, assuming baseline values for individual-level covariates and location-specific values for the spatial covariates. A spatially continuous (or geostatistical) model was used for the spatial random effect term, where the correlation between the log-odds of infection of two individuals was given by a Matern spatial correlation function and applied to the distance separating their respective compounds. All spatial modelling was conducted using the glgm function from the ‘geostatsp’ package in R [31, 32]. In order to gain additional insight into the variable relationships of interest, five different combinations of selected candidate variables were modelled separately for each outcome. Models 1–3 included independent variables grouped by measurement level. Model 1 included individual-level variables only: baseline child age, sex, weight-for-length z-score and length-for-age z-score, and baseline iron status (ferritin concentration). Age in months was calculated using the reported date of birth and trial enrolment date. The age variable was included in all models with a change point at 24 months, as this was the closest half-year to the mean age of those children who were no longer receiving breast milk (mean = 26.8 months ± 5.8, n = 746). Similar age variable definitions have been used in other studies of iron deficiency and anemia in children [33, 34]. Model 2 included only household-level variables: asset score, maternal education, and distance from each compound to the nearest health facility. Household asset score was generated using a principal component analysis of six economic indicators (farm ownership, size and type of crops grown, type of toilet facility, house ownership). For descriptive purposes, asset score was dichotomized at the median; however, it was modelled continuous variable. Maternal education was included as a binary variable, representing ‘none’ (0) versus ‘any’ (1) level of education (e.g., primary, middle, secondary or higher). Distance to the nearest health facility (an indicator of access to the health care system) was measured ‘as the bird flies’ (straight-line or Euclidean distance) using the near table tool in ArcMap (ArcGIS 10.2, Environmental Systems Resource Institute, Redlands, CA, USA). Five satellite-derived variables were included in Model 3: elevation, land cover type (LC), NDVI, and two NDVI-LC interaction terms. Elevation was included as a proxy for temperature [35], and ranged across the trial area from 116 to 530 m. elevation values were centred by subtracting 250 m before including them in the analyses. Land cover type was a discrete categorical variable consisting of three values: woody savannah (LC = 8, n = 21/1943 observations), urban and built-up land (LC = 13, n = 243/1943 observations), and cropland/natural vegetation mosaic (LC = 14, n = 1679/1943 observations). In all analyses, the largest category (cropland/natural vegetation mosaic) was used as the reference. Given that the Ghana trial was conducted during the rainy season, rainfall was not expected to vary substantially across the study area, and thus was not included as a spatial variable. Rather, NDVI (a measure of ‘greenness’) was included as an indicator of water accumulation potential or soil moisture [16]. NDVI values were averaged over the year that the study was conducted (2010) in a single raster file, and ranged from 0.22 to 0.62. An interaction term for NDVI and LC was created by, first, using the NDVI raster to mask the LC raster except in areas where LC had a cell value of 8 (woody savannah). The unmasked cells were then given a value of 0. The same method was also used to create the NDVI-LC interaction term for LC values of 13 (urban and built-up land). The new rasters for the interaction terms were then included in the analyses to investigate whether the association between the dependent variable (infection status) and vegetation (or soil moisture) varied across areas with or without a woody savannah or urban/built-up land cover type. The final model (Model 4) combined selected variables from Models 1–3, including age, sex, weight-for-length z-score, length-for-age z-score, baseline iron status (serum ferritin corrected for CRP using the regression method and re-scaled by multiplying each corrected value by the inverse of the inter-quartile range), asset score, distance to the nearest health facility, and elevation. Variable selection was informed by exploratory descriptive analyses using generalized additive models, linear regression modelling, and simulation analyses. As a confirmatory modelling step, a ‘maximal’ model (Model 5) was also developed and included the same variables as the final model with the addition of maternal education, NDVI, LC, and the two NDVI-LC interaction terms. The maximal model provided an opportunity to investigate variable relationships of interest that were not included in the final model in order to preserve statistical power. As such, there was a higher risk of over-parameterization, and thus the findings from Model 5 were interpreted with caution and used mainly for hypothesis-generation. In all models with individual-level variables, ferritin concentration was corrected for CRP using a regression-based method (Namaste et al., pers. comm.). The advantage of the regression method is that it can correct ferritin for CRP without requiring the use of pre-determined cut-offs (which can vary across the literature partly due to the detection limits of analytical equipment used) and, therefore, better accounts for the linear relationship between inflammation and ferritin. The first step in the correction approach was to natural logarithm (ln)-transform ferritin, and CRP concentrations to approximate a normal distribution. Zero values for CRP were replaced with a constant, near-zero value (0.02 mg/L) before ln transformation. A linear regression coefficient for CRP was obtained using univariate modelling with ferritin as the outcome. A reference value of 0.104 mg/L, representing little or no inflammation, was subtracted from the ln-CRP concentrations in the regression equation. The reference value was obtained from a meta-analysis of data from the Biomarkers Reflecting Inflammation and Nutrition Determinants of Anemia (BRINDA) study, involving 27,865 pre-school aged children across 15 countries [36]. The correction was then applied only to ln-CRP values that were greater than the ln-CRP reference in order to avoid over-adjustments. The adjusted ferritin equation was calculated by subtracting the influence of CRP as follows: where ‘NB’ is the actual value of ferritin, β1 is the CRP coefficient, ‘obs’ is the raw observations for CRP, and ‘ref’ is the reference value. Maps of predicted infection probabilities (odds ratios) and residual spatial variation from the final model (Model 4) were plotted and overlaid with a base map of the trial area. The residual spatial variation plot represented the posterior mean of the spatial random effect, corresponding to the difference between the predicted and expected odds of infection at each location (given the spatial covariate at each location). Individual-level non-spatial variables and effect sizes did not contribute to the plots. For example, an odds ratio of 1.5 indicated that all individuals living at a particular location had a 50 % higher risk of infection compared to similar individuals (e.g. in terms of age, sex, iron status) living in an area where the relative risk was 1.0. On the other hand, if two dissimilar individuals (e.g. with different ages) lived at the same location, they had different infection risks; however, both ratios (e.g. risk divided by ‘typical risk’ for their respective ages) were identical. All model output plots had a spatial resolution of 380 m by 380 m per cell. These plots were visually compared to each other and to relevant satellite-derived maps (e.g., elevation) in order to generate potential explanations for the spatial patterns observed.