Background: There is no standardized approach to comparing socioeconomic status (SES) across multiple sites in epidemiological studies. This is particularly problematic when cross-country comparisons are of interest. We sought to develop a simple measure of SES that would perform well across diverse, resource-limited settings.Methods: A cross-sectional study was conducted with 800 children aged 24 to 60 months across eight resource-limited settings. Parents were asked to respond to a household SES questionnaire, and the height of each child was measured. A statistical analysis was done in two phases. First, the best approach for selecting and weighting household assets as a proxy for wealth was identified. We compared four approaches to measuring wealth: maternal education, principal components analysis, Multidimensional Poverty Index, and a novel variable selection approach based on the use of random forests. Second, the selected wealth measure was combined with other relevant variables to form a more complete measure of household SES. We used child height-for-age Z-score (HAZ) as the outcome of interest.Results: Mean age of study children was 41 months, 52% were boys, and 42% were stunted. Using cross-validation, we found that random forests yielded the lowest prediction error when selecting assets as a measure of household wealth. The final SES index included access to improved water and sanitation, eight selected assets, maternal education, and household income (the WAMI index). A 25% difference in the WAMI index was positively associated with a difference of 0.38 standard deviations in HAZ (95% CI 0.22 to 0.55).Conclusions: Statistical learning methods such as random forests provide an alternative to principal components analysis in the development of SES scores. Results from this multicountry study demonstrate the validity of a simplified SES index. With further validation, this simplified index may provide a standard approach for SES adjustment across resource-limited settings. © 2014 Psaki et al.; licensee BioMed Central Ltd.
This study took place at the eight field sites of the MAL-ED study (see Table 1). Study sites are located in a mix of rural, urban, and peri-urban areas of: Dhaka, Bangladesh (BGD); Fortaleza, Brazil (BRF); Vellore, India (INV); Bhaktapur, Nepal (NEB); Naushahro Feroze, Pakistan (PKN); Loreto, Peru (PEL); Venda, South Africa (SAV); and Haydom, Tanzania (TZH). Sites used a standardized protocol for data collection. Description of MAL-ED study sites and mean WAMI scores The WAMI score (range 0 to 1) measures household socioeconomic status, including access to improved Water/sanitation, Assets, Maternal education, and Income. Prior to beginning the ongoing cohort study, we conducted a cross-sectional feasibility study to identify the optimal approach to measuring household SES. We administered a standardized survey including demographic, socioeconomic status, and food insecurity questions to 100 households in each of the eight field sites between September 2009 and August 2010. Households were randomly selected from census results collected within the previous year at each site. Households were eligible to participate if they were located within the MAL-ED catchment area and if a child aged 24 to 60 months lived in the household. In households with multiple children in this age range, we randomly selected only one eligible child. Data collection lasted two to four weeks in each site. We obtained ethical approval from the Institutional Review Boards at each of the participating research sites, the Johns Hopkins Bloomberg School of Public Health (Baltimore, USA) and the University of Virginia School of Medicine (Charlottesville, USA). We adapted demographic and SES questions from the most recent DHS questionnaires [13]. Improved water and sanitation were based on World Health Organization definitions [14]. Site investigators reviewed questionnaires and identified items that were problematic in their sites. Each site approved a final list of questions and response categories and the associated data collection procedures. Final demographic questions focused on age and education of the head of household and child’s mother, as well as mother’s fertility history. The SES section assessed household assets, housing materials, water source and sanitation facilities, and ownership of land or livestock. The survey also included a question on monthly household income in local currency. The questionnaire was developed in English and translated into local languages as appropriate and back-translated for quality assurance. Field workers measured the selected child aged 24 to 60 months for height and weight in each participating household. Trained field staff used a locally produced platform with sliding headboard to measure standing height to the nearest 0.1 cm. They used digital scales to measure weight to the nearest 100 grams. We used the 2006 World Health Organization Multi-Country Growth Reference Study (WHO MGRS) to calculate height-for-age Z-scores (HAZ). Based on these standards, we defined stunting as a HAZ less than two standard deviations below the global median [15]. Our statistical analyses comprised two phases. First, we identified the best approach to selecting and weight household assets as a proxy for wealth. Second, we combined our wealth measure with other relevant variables to form a more complete measure of household SES. In both phases we assessed the associations between SES/wealth measures and child HAZ for two reasons: 1) we were interested in directly comparing the predictive power of wealth/SES measures, and 2) assessing associations between a construct of interest and other constructs that are believed to be related theoretically or empirically is one way of assessing construct validity [16]. We chose HAZ rather than weight-for-height because the former is a better measure of chronic deprivation, while the latter commonly indicates a composite of acute and chronic deprivation [3]. In both phases of analyses we were guided by a desire to identify the simplest valid measure of wealth or SES in terms of variables and computation required. We compared four approaches to selecting and weighting indicators to measure household wealth: maternal education, PCA, MPI, and a novel variable selection method based on the use of conditional random forests [17]. We used maternal education as a baseline to assess the added value of assets beyond this commonly used proxy for household wealth [18,19]. Maternal education was constructed as a simple continuous measure of years of education completed by the child’s mother at the time of the survey. To construct the PCA-based SES index, we first selected a subset of dichotomous indicators, including assets, housing materials, and facilities, using Cronbach’s coefficient alpha. PCA was then conducted on the tetrachoric correlation matrix of selected indicators Additional file 1: Table S1, and we used the first principal component as the SES score for each household [20]. The MPI index, adapted from the UNDP approach based on available data, included the following indicators: maternal education (years of schooling); health (any child has died); and standard of living (electricity, water, sanitation, flooring, cooking fuel, and ownership of more than one of seven assets). Although the UNDP includes child nutritional status, we excluded this variable because it was our outcome of interest. We weighted these three areas equally to create a household wealth score [12]. Random forests (RF) are an expansion on classification trees using bootstrapping methods to generate multiple trees [17]. The RF approach to measuring wealth used the same initial indicators as the PCA method to ensure comparability of results Additional file 1: Table S1, i.e., so that differences in predictive power could be attributed to the method rather than the selection of assets. We used unsupervised learning with random forests to calculate conditional variable importance using the cforest package in R, which produces a variable importance rank in terms of their predictive value of a specified outcome (i.e., HAZ). Ownership of a subset of indicators was summed to create household wealth scores. We then compared the three approaches (PCI, MPI, and RF) with maternal education to measure household wealth and the strength association with HAZ. The following criteria were used to compare the three wealth measurement approaches vs. maternal education: 1) leave-one-out cross-validation; 2) coefficient of determination (R2) values based on linear regression models with each wealth measure as the predictor and indicator variables for each site; and 3) scaled effect sizes from the same regression models. Leave-one-out cross-validation uses all observations except one to identify important variables for classification, while the remaining observation is used as the test set to measure the predictive error. This process is repeated using each observation as the test set to calculate the mean squared error (MSE) [17]. The approach with the smallest MSE predicts HAZ the most accurately. We also calculated 10-fold cross validation (results not shown), which produced similar findings to leave-one-out cross-validation. The coefficient of determination R2 represents the proportion of variability explained by a statistical model. The approach with the largest coefficient of determination captures the most variability in HAZ [21]. The effect size represents the estimated change in HAZ for each one-unit change in household wealth. Since the scales of each approach vary, we compared the effect of a 25% increase in each measure of household wealth. We then examined associations between each wealth measurement approach and monthly household income. We converted household income to USD using January 1, 2010 exchange rates. Given the expected association between household wealth and income, these analyses provided evidence of the construct validity of each approach to measuring household wealth [22]. Based on the cumulative evidence from these analyses, we selected one approach to measuring household wealth. The second phase of our analyses sought to incorporate several aspects of SES: access to improved Water and sanitation, the selected approach to measuring household wealth (Assets), Maternal education, and Income (i.e. the WAMI index). We included improved water and sanitation in response to guidance that SES measures should be based on hypothesized causal pathways in a study [23]. We then examined the predictive power of this composite measure of household SES relative to HAZ using the criteria described above. We used R 2.10.1 (http://www.r-project.org) and STATA 12.1 (STATA Corp., College Station, USA) for statistical analysis.