Using Open-Access Data to Explore Relations between Urban Landscapes and Diarrhoeal Diseases in Côte d’Ivoire

listen audio

Study Justification:
– Landscape features are often overlooked as predictors of diarrhoeal diseases, despite their potential impact on health outcomes.
– Understanding the relationship between landscape patterns and diarrhoea can provide valuable insights for urban planners and public health managers in rapidly urbanizing areas like Côte d’Ivoire.
– This study aimed to build a framework using open-access data and open-source software to investigate diarrhoea risk factors originating from the physical environment and to explore the association between different types of urban settlements and diarrhoea prevalence.
Study Highlights:
– The study used a large ensemble of open-access data, including health, land cover, and basic infrastructure data, to conduct an ecological analysis.
– Geospatial data on diarrhoea occurrence among children under five years old were obtained from the Demographic and Health Surveys (DHS) program.
– Other data sources included weather conditions from Terraclimate, night illumination from NASA’s Earth Observatory, land cover data from the European Space Agency’s Land Cover Climate Change Initiative Project, and population estimates from WorldPop.
– Landscape metrics were calculated using PyLandStats to quantify and describe spatial patterns of urban settlements.
– Regression models were used to assess the association between landscape metrics and diarrhoea prevalence, while controlling for access to water, sanitation, education, and spatial autocorrelation.
Recommendations for Lay Readers:
– The study found a significant association between a specific urban landscape pattern and diarrhoea prevalence.
– Improving water, sanitation, and hygiene infrastructures is crucial for preventing diarrhoeal diseases, but the overall physical environment also plays a role in health outcomes.
– Urban planners and public health managers should consider the impact of landscape features on diarrhoea risk and incorporate this knowledge into their decision-making processes.
– Open-access data and open-source software can be valuable tools for investigating the relationship between landscape patterns and health outcomes.
Recommendations for Policy Makers:
– Policy makers should prioritize the improvement of water, sanitation, and hygiene infrastructures to prevent diarrhoeal diseases.
– However, efforts to improve infrastructure may be less effective if the overall physical environment remains precarious.
– Policy interventions should consider the impact of landscape features on health outcomes and aim to create healthier urban environments.
– Collaboration between urban planners, public health managers, and other key stakeholders is essential to address the complex relationship between landscape patterns and diarrhoea prevalence.
Key Role Players:
– Urban planners
– Public health managers
– Researchers
– Data analysts
– Government officials
– Non-governmental organizations (NGOs)
– Community leaders
– Health practitioners
Cost Items for Planning Recommendations:
– Data collection and analysis
– Research personnel and expertise
– Software and technology infrastructure
– Training and capacity building
– Stakeholder engagement and collaboration
– Implementation of infrastructure improvements
– Monitoring and evaluation of interventions
– Public awareness campaigns and education initiatives

The strength of evidence for this abstract is 8 out of 10.
The evidence in the abstract is strong because the study used a large ensemble of open-access data and conducted an ecological analysis to identify specific landscape features associated with diarrhoea. The study also used multiple regression models to test the association between landscape patterns and the prevalence of diarrhoea. However, to improve the evidence, the abstract could provide more information on the sample size, statistical significance of the findings, and potential limitations of the study.

Unlike water and sanitation infrastructures or socio-economic indicators, landscape features are seldomly considered as predictors of diarrhoea. In contexts of rapid urbanisation and changes in the physical environment, urban planners and public health managers could benefit from a deeper understanding of the relationship between landscape patterns and health outcomes. We conducted an ecological analysis based on a large ensemble of open-access data to identify specific landscape features associated with diarrhoea. Designed as a proof-of-concept study, our research focused on Côte d’Ivoire. This analysis aimed to (i) build a framework strictly based on open-access data and open-source software to investigate diarrhoea risk factors originating from the physical environment and (ii) understand whether different types and forms of urban settlements are associated with different prevalence rates of diarrhoea. We advanced landscape patterns as variables of exposure and tested their association with the prevalence of diarrhoea among children under the age of five years through multiple regression models. A specific urban landscape pattern was significantly associated with diarrhoea. We conclude that, while the improvement of water, sanitation, and hygiene infrastructures is crucial to prevent diarrhoeal diseases, the health benefits of such improvements may be hampered if the overall physical environment remains precarious.

This cross-sectional, ecological study combined open-access data on health, land cover, and basic infrastructures, readily obtained from different sources (Table 1). Geospatial data on the occurrence of diarrhoea among children under the age of five years can be obtained from the Demographic and Health Surveys (DHS) programme in vector format [28]. The DHS provides anonymised survey data at the individual and household levels that include cases of diarrhoea that had occurred in the 2 weeks preceding their survey. The DHS also provides data on access to WASH facilities and education, which were relevant to the current analysis. DHS data can be georeferenced by linking the observations to the point locations of survey clusters. To ensure anonymity, these locations do not correspond to the precise locations of participating households, but rather to the mean location of households belonging to a same cluster—i.e., for every cluster of surveyed households, there is one single GPS position that is attributed to all households belonging to the same cluster. For the datasets used in this analysis [29,30,31,32], there are 351 clusters distributed across Côte d’Ivoire, for a total of 9686 surveyed households. These clusters contain a median number of 27 households, or 134 people. Models of weather conditions can be obtained from Terraclimate in raster format, at a spatial resolution of ~4 × 4 km [33], and these data were included in our study because climatic conditions can be associated with diarrhoea [34]. Night illumination was used as a proxy for the presence of urban infrastructures, and can be obtained in raster format (500 × 500 m) from the National Aeronautics and Space Administration’s (NASA) Earth Observatory [35]. Land cover data can be obtained in raster format (300 × 300 m) from the European Space Agency’s Land Cover Climate Change Initiative Project (ESA Land Cover CCI) [36]. Population estimates at high spatial resolutions are provided by WorldPop [37], also in raster format (100 × 100 m). Finally, vector data on mobility infrastructures can be obtained from OpenStreetMap. List of open-access datasets used to explore relations between landscape and diarrhoea in Côte d’Ivoire. 1 shapefile. Combining health and environmental data is often a challenge, considering the differences in terms of the spatial and temporal resolutions of openly available data [38]. In this study, the areal units used to aggregate data were based on the geolocations of DHS clusters, which had the coarsest spatial resolution. All data processing steps were done in Python language, using the Jupyter computing environment. The following packages were used to process and visualise the data: PyLandStats 2.3.0 [39], rasterstats 0.15.0 [40], rasterio 1.2.9 [41], earthpy 0.9.2 [42], Fiona 1.8.20 [43], pandas 1.3.4 [44], geopandas 0.9.0 [45], numpy 1.20.3 [46], statsmodels 0.13.0 [47], pysal 2.5.0 [48], matplotlib 3.4.3 [49], and seaborn 0.11.2 [50]. The different data layers were harmonised spatially and temporally based on the buffer areas generated from DHS clusters, hereinafter called “spatial units”. Based on previous studies [51], we generated circular buffer zones originating from each cluster centroid, with radii based on the geographic blur determined by the DHS data anonymisation protocol: urban clusters had a buffer radius of 2 km, while rural clusters had a buffer radius of 5 km (Figure 1). The different data layers were aggregated at the cluster level: environmental data were aggregated based on the respective buffer areas, while DHS survey data were aggregated based on the clusters’ unique identifiers. In this way, for each spatial unit we obtained the following: (i) the prevalence of diarrhoea among children under the age of five; (ii) the proportion of the cluster’s population with access to at least “basic” water and sanitation, as defined by the World Health Organisation (WHO) and the United Nations Children’s Fund (UNICEF) Joint Monitoring Programme [25]; (iii) the proportion of the cluster’s female population who never attended school; (iv) local climatic conditions; (v) a list of landscape metrics derived from remotely sensed data (NASA and ESA Land Cover CCI), hybrid models (WorldPop), and volunteered geographic information (OpenStreetMap). Table 2 shows the variables obtained from this pre-processing in more detail. Buffer areas (red circles generated from DHS clusters’ centroids) were used as reference areas to calculate landscape metrics. Elaborated by the authors with QGIS, from: DHS [28] and ESA Land Cover CCI [36]. © ESA Climate Change Initiative—Land Cover led by UCLouvain (2017). List of variables calculated for each spatial unit (variables aggregated by buffer area). 1 As defined by the WHO and UNICEF Joint Monitoring Programme. 2 Aged between 15 and 49 years. 3 Applied to all land cover categories. Landscape metrics have been widely used to quantify and describe spatial patterns of “patches” of similar land cover categories [11,14,39], and thus are useful to analyse spatial patterns of urban settlements. These metrics were key features to our study, allowing us to relate the prevalence of diarrhoea (dependent variable) to indicators describing the form and composition of urban settlements (independent variables). We used the Python package PyLandStats [39] to calculate a series of landscape metrics for each land cover class contained within each spatial unit’s perimeter (Table 2), based on the data provided by ESA’s Land Cover Climate Change Initiative Project [36]. The metrics employed here—i.e., the proportion, edge and shape index of land cover patches—were based on previous studies that referred to land cover data to analyse spatial patterns of urban settlements [14] and environmental determinants of disease [18]. We added a level of detail to our landscape metrics by reclassifying patches originally categorised as “urban”, based on the levels of night illumination and demographic density (Figure 2 and Figure 3). Our study used the intensity of night illumination as a proxy for the presence of urban infrastructures—or, in other words, for the “quality” of urbanisation. Moreover, the quality of illumination affects the use of WASH amenities, especially by females [52], and thus may impact the risk of diarrhoea. We defined “precarious” and “regular” urban areas, building on the hypothesis that, if basic infrastructures are present, the level of illumination follows the level of demographic density. In this sense, urban pixels were considered “precarious” if they presented a high demographic density but a low (or relatively low) intensity of night illumination; on the other hand, “regular” urban patches had night illumination levels that matched the demographic density. This categorisation was done based on quantiles (Figure A1, Appendix A): for each urban pixel, if its density value was situated in a higher quantile than its night illumination value, it was considered “precarious”. Layers used to reclassify urban patches based on demographic density and night illumination (zoom on Abidjan’s metropolitan area). Elaborated by the authors with QGIS, from: NASA [35], ESA Land Cover CCI [36] and WorldPop [37]. © ESA Climate Change Initiative—Land Cover led by UCLouvain (2017). Reclassified urban patches (zoom on Abidjan’s metropolitan area). Elaborated by the authors with QGIS, from: NASA [35], ESA Land Cover CCI [36] and WorldPop [37]. © ESA Climate Change Initiative—Land Cover led by UCLouvain (2017). Another indicator of the quality of urbanisation was the presence of roads, for which we calculated two indicators: road availability (km of roads per ha of built-up area) and linearity (ratio between edge length and linear distance between the two vertices of the same edge). The latter also served as an indicator of the urban form. Finally, based on the literature, a selection of features that have been associated with diarrhoea were included as control variables. Basic water and sanitation services are key to prevent diarrhoea [2], and were therefore included. Access to these services was measured using the DHS household datasets [30]. Maternal education has been associated with a lower risk of diarrhoea [53]; at the same time, it has been related with reporting bias, as households with higher levels of maternal education have shown increased rates of reported child diarrhoea [54]. A proxy variable of maternal education (i.e., women’s educational attainment) was therefore added, based on individual DHS data [31]. Climatic conditions also have been associated with diarrhoea [34], but here they showed no significant correlation (see Table A3, Appendix D). Hence, these data were discarded (for more details, see the computer code section at the end of this article). To preserve some level of detail in the data, the variables resulting from the aggregation at the cluster level consisted of proportions (percentage) rather than simple means or medians. For example, demographic density in each spatial unit was given by the ratio between the number of built-up pixels classified as “dense” (with a value higher than the statistical series’ median) and the total number of built-up pixels contained in the respective spatial unit. By the end of the pre-processing, we obtained a geo-dataset with 351 spatial units. Each spatial unit had three control variables (water, sanitation, and women’s educational attainment) and a total of 90 independent variables, i.e., the landscape metrics described in Table 2 (the full list of variables is given in Table S1, see Supplementary Materials). We used four different regression models to assess the significance of the associations between the calculated landscape metrics and diarrhoea, while accounting for the effects of access to water, sanitation, and education, as well as spatial autocorrelation. The input data were standardised with a min/max scaler function, so that the effects of the different features could be compared through their coefficients. We started by running a feature selection algorithm to identify the most important variables to be included in the regression models. Given the high number of independent variables that were derived from the landscape metrics (n = 90), a preliminary feature selection was necessary to avoid multicollinearity and to clarify the scope of the analysis. The feature selection algorithm was composed of two “filters”. The first filter was a stepwise selection, which is a process that adds variables from a predefined list to the model, one-by-one, rechecking at each step the importance of all previously included variables [55]. In other words, the stepwise process combines forward and backward feature selection processes, consisting of iterative linear regressions allowing to identify the “best” features based on predefined significance thresholds (maximum p-value was set to 0.1) and model performance (residual sum of squares). The second and final filter was based on Spearman’s rank correlations (ρ): we used the Python package statsmodels [47] to calculate bivariate correlations between each feature selected with the stepwise method and the dependent variable (i.e., prevalence of diarrhoea at cluster level); only features with a p-value smaller than 0.1 were kept. We purposively opted for relative high thresholds of p-values because of the exploratory nature of this study. Once we concluded the feature selection, we ran both weighted and unweighted regression models. First, we built an unweighted ordinary least squares (OLS) model containing the dependent, control, and independent variables, as well as a constant. Then, we built a weighted model with the same features using the cluster weights given by the DHS. In fact, when conducting country-level analyses, the DHS suggests using cluster weights to adjust for eventual biases resulting from their sampling method. Given the infectious nature of diarrhoeal diseases, the analysis also needed to account for spatial dependence [56]. To this end, we used the Python package pysal [48] to run two models of spatial regression with the same features, spatial lag and spatial error, as explained in Section 2.4. Spatial dependence, or spatial autocorrelation, is the phenomenon by which values of observations are associated with each other through geographic distance (e.g., high values close to other high values) [57]. Accounting for spatial dependence is essential because linear regressions assume a normal, random distribution of error terms and the absence of spatial autocorrelation in the dependent variable. We estimated the probability of spatial autocorrelation in the dependent variable by calculating the global Moran’s I, which indicated whether the observed values of prevalence of diarrhoea were clustered, or randomly distributed, in space. As for the error terms in the OLS regression, we detected the probability of spatial dependence through the Lagrange multiplier test for spatial error. Contrary to an OLS model, spatial regressions can account for the spatial autocorrelation of the dependent variable (spatial lag dependence) and of the error term (spatial error dependence) [58]. The spatial lag model used in this study incorporates the spatial autocorrelation of the dependent variable by introducing the average values of neighbours as an additional variable into the regression specification (Equation (1)): where y is an N × 1 vector of observations on a dependent variable taken at each of N locations, α is the intercept, ρ is a scalar spatial lag parameter, W is an N × N matrix of weights indicating the spatial framework of neighbourhood effects among the dependent variable, X is an N × k matrix of explanatory variables, β is a k × 1 vector of parameters, and ε is an N × 1 vector of error terms. Similarly, the spatial error model used in this study also incorporates spatial autocorrelation by introducing the average values of neighbours as an additional variable into the regression specification, but this time using the values of the error terms (Equation (2)): where u is the vector of spatially autocorrelated residuals with constant variance and covariance terms, specified by an N × N matrix of weights indicating the spatial framework of neighbourhood effects among the error terms (W) and a spatial error coefficient (λ). The unit of analysis was the buffer area generated from each DHS cluster’s centroid (spatial unit). Out of the 351 spatial units, 10 were excluded as they did not have valid geographic coordinates. Furthermore, because our analysis focused on human settlements, we opted to keep only those spatial units with at least 1 “urban” pixel (300 × 300 m). Hence, we excluded 74 spatial units where no human occupation was detected—including units with settlements not sufficiently large to be detected at the spatial resolution used here. In the end, 267 spatial units (out of 351) were included in our regression analyses. Details about the discarded units are given in Table A1 (Appendix B). To determine whether the size and proportion of urban areas affected the association between landscape features and diarrhoea, the processes described in Section 2.3 were stratified into two levels. First, we conducted the regression analyses with all the 267 spatial units that met our inclusion criteria. Then, we conducted the same analyses with an “urban” subset, which contained 105 spatial units. The criterion for a spatial unit to be classified as “urban” was to have a proportion of urbanised area (ratio between the surface of “urban” pixels and the spatial unit’s total area) that was above the average of the retained 267 spatial units. Figure 4 shows the location of the 267 spatial units included in the analysis, specified by subset. Location of DHS clusters, by subset. Elaborated by the authors with QGIS, from: DHS [28], GADM and OpenStreetMap.

Based on the provided information, here are some potential innovations that can be used to improve access to maternal health:

1. Open-access data analysis: Utilize open-access data to explore the relationship between urban landscapes and maternal health outcomes. This can help urban planners and public health managers gain a deeper understanding of how landscape patterns impact maternal health.

2. Integration of multiple data sources: Combine data from various sources, such as Demographic and Health Surveys (DHS), Terraclimate, NASA’s Earth Observatory, European Space Agency’s Land Cover Climate Change Initiative Project (ESA Land Cover CCI), WorldPop, and OpenStreetMap. This integration allows for a comprehensive analysis of factors affecting maternal health.

3. Geospatial analysis: Use geospatial analysis techniques to georeference and analyze the data. This can help identify specific geographic areas with higher prevalence rates of maternal health issues, allowing for targeted interventions and resource allocation.

4. Landscape metrics: Calculate landscape metrics derived from remotely sensed data to quantify and describe spatial patterns of urban settlements. These metrics can provide insights into the form and composition of urban areas and their association with maternal health outcomes.

5. Regression modeling: Apply regression models, such as ordinary least squares (OLS) and spatial regression models, to assess the significance of associations between landscape metrics and maternal health outcomes. These models can account for control variables, spatial autocorrelation, and potential biases in the data.

6. Feature selection algorithms: Use feature selection algorithms to identify the most important variables to be included in the regression models. This helps avoid multicollinearity and provides clarity on the scope of the analysis.

7. Weighted regression models: Consider weighted regression models that account for cluster weights given by the DHS. This adjustment helps address potential biases resulting from the sampling method used in the surveys.

8. Stratification of analysis: Stratify the analysis based on the size and proportion of urban areas. This allows for a more focused examination of the association between landscape features and maternal health outcomes in urban settings.

By implementing these innovations, policymakers and healthcare professionals can gain valuable insights into the relationship between urban landscapes and maternal health, leading to targeted interventions and improved access to maternal healthcare services.
AI Innovations Description
The recommendation that can be developed into an innovation to improve access to maternal health based on the provided description is to use open-access data and open-source software to investigate the relationship between landscape patterns and the prevalence of diarrhoea among children under the age of five. This approach can provide urban planners and public health managers with a deeper understanding of how different types and forms of urban settlements are associated with different rates of diarrhoea. By analyzing landscape features as variables of exposure, it is possible to identify specific patterns that are significantly associated with diarrhoea. This information can then be used to inform interventions and improve the overall physical environment to prevent diarrhoeal diseases.
AI Innovations Methodology
Based on the provided description, here are some potential recommendations for improving access to maternal health:

1. Enhance urban planning: Urban planners and public health managers should consider the relationship between landscape patterns and health outcomes, particularly in rapidly urbanizing areas. By understanding how specific landscape features are associated with maternal health issues, such as diarrhoeal diseases, planners can design urban environments that promote better health outcomes.

2. Improve water, sanitation, and hygiene (WASH) infrastructures: While the study highlights the importance of considering landscape patterns, it also emphasizes that improvements in WASH infrastructures are crucial for preventing diarrhoeal diseases. Therefore, investing in the development and maintenance of clean water sources, proper sanitation facilities, and hygiene education can significantly improve maternal health.

3. Address the overall physical environment: The study suggests that even with improvements in WASH infrastructures, the health benefits may be limited if the overall physical environment remains precarious. This recommendation highlights the need to address broader environmental factors, such as access to healthcare facilities, transportation, and safe living conditions, to ensure comprehensive improvements in maternal health.

To simulate the impact of these recommendations on improving access to maternal health, a methodology could be developed as follows:

1. Data collection: Gather relevant data on maternal health indicators, landscape patterns, WASH infrastructures, and other environmental factors. This data can be obtained from various sources, such as surveys, open-access databases, and remote sensing technologies.

2. Data preprocessing: Clean and preprocess the collected data to ensure consistency and compatibility. This may involve georeferencing, aggregating data at appropriate spatial units, and standardizing variables for analysis.

3. Feature selection: Use a feature selection algorithm to identify the most important variables related to maternal health outcomes. This step helps avoid multicollinearity and focuses the analysis on the most relevant factors.

4. Regression modeling: Build regression models to assess the significance of the associations between the selected variables and maternal health outcomes. Consider both unweighted and weighted models, accounting for the effects of access to WASH infrastructures and education, as well as spatial autocorrelation.

5. Spatial analysis: Incorporate spatial analysis techniques to account for spatial autocorrelation and assess the impact of landscape patterns on maternal health outcomes. This may involve spatial lag and spatial error models, which consider the spatial relationships between neighboring areas.

6. Evaluation and interpretation: Evaluate the regression models and spatial analysis results to understand the impact of the recommendations on improving access to maternal health. Interpret the coefficients and statistical significance of the variables to identify the most influential factors.

7. Scenario analysis: Conduct scenario analysis to simulate the potential impact of implementing the recommendations. This can involve manipulating variables related to urban planning, WASH infrastructures, and the overall physical environment to assess their effects on maternal health outcomes.

8. Policy recommendations: Based on the simulation results, provide policy recommendations for improving access to maternal health. These recommendations should consider the identified influential factors and their potential impact on maternal health outcomes.

By following this methodology, policymakers and stakeholders can gain insights into the potential impact of different recommendations on improving access to maternal health. This information can guide decision-making and resource allocation to effectively address maternal health challenges.

Share this:
Facebook
Twitter
LinkedIn
WhatsApp
Email