The validity of an area-based method to estimate the size of hard-to-reach populations using satellite images: The example of fishing populations of Lake Victoria

Study Justification:
The study aimed to address the challenge of estimating the size of hard-to-reach populations in resource-limited settings where reliable census data is often unavailable. By using satellite images and local household survey data from fishing communities on Lake Victoria in Uganda, the study aimed to develop a simple and cost-effective method to estimate population sizes. This information is crucial for planning service and resource allocation to communities in need of health interventions.
– The study compared three methods to estimate populations: two using average population density and one using a regression model.
– The estimates for total population from all three methods were similar, with errors less than 2.2%.
– However, there were often large errors in estimates for individual villages.
– The study demonstrated that a simple area-based model can provide reasonable estimates of total population in rural Ugandan fishing communities.
Recommendations for Lay Reader and Policy Maker:
1. Consider implementing the area-based method to estimate population sizes in hard-to-reach communities where reliable census data is unavailable.
2. Recognize that while the method provides reasonable estimates of total population, there may be larger errors in estimating population sizes for individual villages.
3. Explore additional methods or data sources to improve the accuracy of population estimates for individual villages.
Key Role Players:
1. Research teams from the MRC/UVRI Uganda Research Unit: They have expertise in conducting surveys and collecting accurate population data.
2. Local community leaders and representatives: They can provide valuable insights and local knowledge to assist in estimating population sizes.
3. Satellite imagery providers: They play a crucial role in providing access to satellite images for estimating community areas.
Cost Items for Planning Recommendations:
1. Satellite imagery access and processing: Budget for obtaining satellite images and using software tools like Google Earth Pro for estimating community areas.
2. Survey and data collection: Allocate resources for conducting household surveys to gather accurate population data.
3. Research team and staff: Consider the cost of employing researchers and staff members to analyze data and develop population estimation models.
4. Training and capacity building: Provide training and capacity building programs for local community leaders and representatives to enhance their involvement in population estimation efforts.

Background: Information on the size of populations is crucial for planning of service and resource allocation to communities in need of health interventions. In resource limited settings, reliable census data are often not available. Using publicly available Google Earth Pro and available local household survey data from fishing communities (FC) on Lake Victoria in Uganda, we compared two simple methods (using average population density) and one simple linear regression model to estimate populations of small rural FC in Uganda. We split the dataset into two sections; one to obtain parameters and one to test the validity of the models. Results: Out of 66 FC, we were able to estimate populations for 47. There were 16 FC in the test set. The estimates for total population from all three methods were similar, with errors less than 2.2%. Estimates of individual FC populations were more widely discrepant. Conclusions: In our rural Ugandan setting, it was possible to use a simple area based model to get reasonable estimates of total population. However, there were often large errors in estimates for individual villages.

We use data from the fishing communities of Lake Victoria in Uganda. The villages were selected as already having been surveyed in previous research by the research teams from the MRC/UVRI Uganda Research Unit and therefore accurate population data and global positioning system (GPS) location for each village were available. All estimates obtained from the methods described below were compared to these ground survey data. A fishing community (FC) was defined as a residential area in which the majority of the residents rely on Lake Victoria for income generation. Household surveys were conducted in 2012–13 and counted number of households and number of people in each household [12, 13]. All of the villages are fishing communities, with 39 on the mainland and 27 on the islands of Lake Victoria. These communities are characterised by single storey buildings, with the majority used for residential purposes. These communities are hard-to-reach, poorly served by skilled health care providers and have poor access to clean water and sanitation. Health issues include HIV, helminth infection, malaria, and high maternal and newborn morbidity. The populations of these communities are typically very mobile, consisting of transient populations who move between villages and within the wider region and country. Each community was viewed in Google Earth Pro software (GEP) and communities with no central cluster of residential structures were excluded. We also excluded fishing communities for which GPS coordinates did not show up as a village on the available satellite imagery, or where satellite images were unavailable. For each fishing community with satellite imagery available, we used GEP software to assess the area as follows. A member of our team [CG] estimated the perimeter of each community based on where structures were observable, and assessed density as either low or high, based on the space visible between structures on the satellite image (see Fig. 1). Although the perimeter was drawn so as to enclose the majority of structures which naturally formed the community, it was occasionally the case that some structures were excluded. The area enclosed within the perimeter was calculated automatically by the GEP software. We estimate that this process took less than 1 min per FC. Examples of boundaries fitted to the typical satellite images of FC We compared three methods of estimating populations: two using the average density and one using a regression model. The two average density methods calculated the average in different ways: the first used the average of the individual FC densities; the second used the overall population density calculated by summing the population of all FCs and dividing by the total area. We refer to these two methods as AD1 and AD2. The simple linear regression model we used consisted of a constant term and the FC area as the single predictor. The average density methods can be considered as regression models without a constant term; this allows the first two methods to be described as: where Yi is the predicted population for village i, and β is the average population density (however calculated). The regression method can be described as where α* and β* are the regression coefficients representing the intercept and slope respectively. All population estimates are presented rounded to the nearest whole number; when calculating total populations by summing individual populations the original estimates were used. To allow us to test and compare these approaches we randomly split the data into two sets: an index set of 31 FCs which we used to calculate the parameters (average density and regression coefficients) and a test set of 16 FCs which we used to compare the predictions made by these parameters with the values from the earlier surveys. We also calculated the unstratified parameters in the entire dataset of 47 FC’s, as these are the best available estimates from the data we have. We report each of these parameters with a 95% confidence interval (CI), with the exception of the M2 parameter for which a CI cannot be calculated as it is the simple ratio of total population to total area. To calculate the average density of FC for M1 we first calculated the density in each of the 31 index FCs and then used the mean of these figures as the parameter βM1. We then applied this value to each of the 16 test FCs to predict their population, and summed these estimates to give a total population for the test FCs. For M2 we calculated the total population of the 31 index communities and divided by their total area, and again used this parameter βM2 to calculate the populations of the remaining FC. We ran a simple linear regression, using area as a predictor of population, on the 31 index FCs. We took the parameters from this regression (α*, the intercept and β*, the coefficient for area) applied them to the 16 test FCs. We summed these individual estimates to get an estimate for the total population of the 16 test FCs. Note that because the constant is calculated at the village level, it was not possible to apply these parameters to an entire region; they must be applied at the village level. We repeated the above twice: once stratifying on location (island/mainland) and once stratified on assessed density category (low/high). In each case, we used the same original set of index and test FC, to enable comparison between the methods. We then separately calculated parameters in each stratum, and applied them to the test FC according to stratification level. This is equivalent to allowing an interaction between area and the stratification factor in Eqs. 1 and 2; alternatively it can simply be expressed as separate equations with equivalent parameters for each level of the stratification factor. That is, parameters βisland, βmainland, βlow-density, and βhigh-density, and similarly for β*, α and α*. Stata v15.0 was used for population estimation and GEP was used to obtain satellite images and estimate areas.


