Background: Commercial geospatial data resources are frequently used to understand healthcare utilisation. Although there is widespread evidence of a digital divide for other digital resources and infra-structure, it is unclear how commercial geospatial data resources are distributed relative to health need. Methods: To examine the distribution of commercial geospatial data resources relative to health needs, we assembled coverage and quality metrics for commercial geocoding, neighbourhood characterisation, and travel time calculation resources for 183 countries. We developed a country-level, composite index of commercial geospatial data quality/availability and examined its distribution relative to age-standardised all-cause and cause specific (for three main causes of death) mortality using two inequality metrics, the slope index of inequality and relative concentration index. In two sub-national case studies, we also examined geocoding success rates versus area deprivation by district in Eastern Region, Ghana and Lagos State, Nigeria. Results: Internationally, commercial geospatial data resources were inversely related to all-cause mortality. This relationship was more pronounced when examining mortality due to communicable diseases. Commercial geospatial data resources for calculating patient travel times were more equitably distributed relative to health need than resources for characterising neighbourhoods or geocoding patient addresses. Countries such as South Africa have comparatively high commercial geospatial data availability despite high mortality, whilst countries such as South Korea have comparatively low data availability and low mortality. Sub-nationally, evidence was mixed as to whether geocoding success was lowest in more deprived districts. Conclusions: To our knowledge, this is the first global analysis of commercial geospatial data resources in relation to health outcomes. In countries such as South Africa where there is high mortality but also comparatively rich commercial geospatial data, these data resources are a potential resource for examining healthcare utilisation that requires further evaluation. In countries such as Sierra Leone where there is high mortality but minimal commercial geospatial data, alternative approaches such as open data use are needed in quantifying patient travel times, geocoding patient addresses, and characterising patients’ neighbourhoods.
In this paper, we aim to quantify the extent to which the same perverse relationship with health needs applies to geospatial data availability as with healthcare provision. We explore two scales through a cross-sectional, ecological study design. We firstly examine the relationship between geospatial data availability and health need as measured by all-cause mortality and mortality due to three groups of causes, globally at national level. We then consider the relationship between health need and geospatial data availability in two sub-national case studies from Ghana and Nigeria. At international level, we examine the availability, by country, of three sets of commercial data resources that are central to understanding population demand for healthcare and spatial patterns of healthcare utilisation. These are geocoding tools for locating patients’ residences; transportation network resources for computing patient travel from place of residence to health facility; and area statistics for characterising the neighbourhoods where patients live. We excluded other commercial geospatial data resources not directly related to healthcare-seeking behaviour, such as remotely sensed imagery. To identify such resources, we used the search strategy in Additional file 1: Table S1. We included only geospatial data resources that met the following criteria: Where necessary, we contacted data providers to request permission to use data availability or quality statements in our analysis, only including those where such permission was granted. The geospatial resources that met all these criteria were included in our analysis are shown in Table 1 (Additional file 1: Tables S2–S4 documents data resources that were excluded and reasons for this). Commercial geospatial data resources for geocoding patient addresses, estimating travel times, and characterising patients’ neighbourhoods Alongside these resources, we used all-cause mortality by country for the most recent period (2000–2015) reported by the World Health Organisation (WHO) [22], as a general health outcome measure and thereby metric of healthcare need. We also separately examined the major WHO categorization of mortality: non-communicable diseases; injuries; communicable diseases, maternal, perinatal, and nutritional conditions for 183 countries. National mortality data from WHO were age-standardised to account for differences in population structure between countries. As dependent territories are not reported separately in WHO mortality data, these were excluded from our analysis. We then generated commercial geospatial resource indicators by country as follows: To examine the availability of these geospatial resources relative to healthcare need, as measured by standardised all-cause mortality and cause-specific mortality, we computed relative concentration indices and slope indices of inequality [23] for each of these measures of geospatial data availability using a tool from Public Health England [24]. In this context, the slope index of inequality measured the change in mortality relative to ranked geospatial data availability/quality, whilst the relative concentration index measured the mortality gradient against relative geospatial data availability/quality. We also created a composite index of commercial geospatial resource quality/availability (geospatial resource index) by combining these various indicators. For each of the three index domains (geocoding resources, patient travel, and neighbourhood characterisation), we ranked each country from highest to lowest based on each of the above indicators, then summed these ranks, dividing the total by the maximum possible summed rank to give an index for each domain between 0 and 1. To avoid the index being dominated by indicator availability at domain level, we then summed the three domain index values. We regressed logged standardised mortality against the geospatial resource index, identifying as outliers in terms of data availability those countries with studentised residuals greater than two. We also calculated the correlation of the geospatial resource index with the percentage of internet users and gross domestic product (GDP) per capita for 2016 in each country [25]. To examine sub-national geospatial commercial resource availability and quality, two sub-national case studies were conducted, one in Eastern Region, Ghana and the other in Lagos State, Nigeria. Both focussed on success rates for geocoding facility locations (health facilities and schools respectively). In the absence of robust district-level mortality estimates, both studies examined geocoding success rates relative to area deprivation at administrative level 2 (districts in Ghana or local government areas in Nigeria). In this context, we consider area deprivation to reflect ‘an area’s potential for health risk from ecological concentration of poverty, unemployment, economic disinvestment, and social disorganisation’ [26]. In Eastern Region, 984 health facility place-names from 25 districts were obtained from the Ghana Health Service routine data repository (DHIMS2) and geocoded via an interface to the Google Maps API Version 2 [27]. Geocoding success was measured as the proportion of facilities per district for which a location within Eastern Region was returned. District deprivation was assessed firstly via the 2017 UNICEF District League Table (DLT) [28], a composite index of district development based on indicators of education, sanitation, rural water, health, security and governance. Secondly, district deprivation was also assessed via a bespoke district deprivation index. The bespoke deprivation index was created from 12 indicators representing six domains: information access, education, energy, employment, water and sanitation, and living conditions, adapting an approach used in South Africa [29]. Indicators values were drawn from 2010 census data [30]. Within each domain, each indicator was standardised by conversion to a z-score, with z-scores averaged for each domain. The average scores for the six domains were then summed to give a composite deprivation score. Similarly, in Lagos State 310 schools, both private and public, from 20 Local Government Areas (LGAs) were obtained from online news media [31]. These were then geocoded using the Google Maps API Version 2 via BatchGeo [32]. A deprivation index with the same six domains as Ghana was created for the LGAs, but with 9 indicators drawn from 2006 census data acquired from the National Population Commission. These were then standardised and combined using the same method as for Eastern Region. For both case studies, geocoding success per district/LGA was then plotted against deprivation. Relative concentration indices and slope indices of inequality were computed for district-level geocoding success rates versus the deprivation measures.