Background: Payment for performance (P4P) strategies, which provide financial incentives to health workers and/or facilities for reaching pre-defined performance targets, can improve healthcare utilisation and quality. P4P may also reduce inequalities in healthcare use and access by enhancing universal access to care, for example, through reducing the financial barriers to accessing care. However, P4P may also enhance inequalities in healthcare if providers cherry-pick the easier-to-reach patients to meet their performance targets. In this study, we examine the heterogeneity of P4P effects on service utilisation across population subgroups and its implications for inequalities in Tanzania. Methods: We used household data from an evaluation of a P4P programme in Tanzania. We surveyed about 3000 households with women who delivered in the last 12 months prior to the interview from seven intervention and four comparison districts in January 2012 and a similar number of households in 13 months later. The household data were used to generate the population subgroups and to measure the incentivised service utilisation outcomes. We focused on two outcomes that improved significantly under the P4P, i.e. institutional delivery rate and the uptake of antimalarials for pregnant women. We used a difference-in-differences linear regression model to estimate the effect of P4P on utilisation outcomes across the different population subgroups. Results: P4P led to a significant increase in the rate of institutional deliveries among women in poorest and in middle wealth status households, but not among women in least poor households. However, the differential effect was marginally greater among women in the middle wealth households compared to women in the least poor households (p = 0.094). The effect of P4P on institutional deliveries was also significantly higher among women in rural districts compared to women in urban districts (p = 0.028 for differential effect), and among uninsured women than insured women (p = 0.001 for differential effect). The effect of P4P on the uptake of antimalarials was equally distributed across population subgroups. Conclusion: P4P can enhance equitable healthcare access and use especially when the demand-side barriers to access care such as user fees associated with drug purchase due to stock-outs have been reduced.
Our study used data from a controlled before and after evaluation study of the P4P scheme in Pwani region, Tanzania, described elsewhere [11, 18]. All seven districts in Pwani region (intervention arm), and four districts from Morogoro and Lindi regions (comparison arm) were sampled. The comparison districts were selected to be comparable to intervention districts in terms of poverty and literacy rates, the rate of institutional deliveries, infant mortality, population per health facility, and the number of children under one year of age per capita [18]. Baseline data collection was done in January 2012, with a follow-up survey 13 months later. In the intervention arm, we included all 6 hospitals and 16 health centres that were eligible for the P4P scheme, and a random sample of 53 eligible dispensaries. A similar number of facilities were included in the comparison arm. Facilities were randomly sampled amongst those where P4P was implemented and matching comparison facilities were selected based on facility level of care, ownership, staffing levels, and case load [18]. To assess maternal and child health service utilisation in the population, we randomly sampled 20 households of women from the catchment area of each health facility who had delivered in the 12 months prior to the survey. In total, we surveyed 3000 households with eligible women in both arms at baseline, and a similar number in the follow-up survey. The household survey also collected information on maternal background characteristics (e.g. age, marital status, education occupation, religion, and number of births), and household characteristics (e.g. household size, health insurance status, and ownership of assets and housing particulars for assessing the household socioeconomic status). Our outcome variables include the two incentivised services which we know from prior analysis improved significantly as a result of P4P: institutional deliveries and uptake of two doses of intermittent preventive treatment (IPT2) for malaria during antenatal care [11]. These were measured as binary outcomes for whether a woman gave birth in a health facility and received IPT2 during antenatal care, respectively. To examine the distribution of P4P effects on these two outcomes, we generated population subgroups based on individual and household-level characteristics, according to Andersen’s behavioural model of healthcare utilisation [3, 4]. In this study we only considered predisposing and enabling factors since data on perceived illness was not available. “Perceived illness” could also be argued to be of less relevance for maternal service utilisation outcomes, since study participants were largely healthy. Subgroups of predisposing factors include: marital status (married vs. none), maternal age (15–49) years (below vs. above the median age of 25), education (no education vs. primary level/above), occupation (farmer vs. non-farmer), religion (Muslim vs. non-Muslim), number of births/parity (parity 1 vs. parity 2/above), and household size (below vs. above the median size of 5 members). Subgroups of enabling factors include: health insurance status (any insurance vs. none), place of residence (rural vs. urban district), and household wealth status subgroups. The wealth subgroups were generated from wealth scores derived by the principal component analysis based on 42 items of household characteristics and asset ownership (Appendix 1: Table 5) [29, 83]. The household wealth scores were generated separately for baseline and follow-up samples, since participants differed over time. Households were ranked by wealth scores from poorest (low score) to least poor and classified into three-equal sized groups (terciles): poorest, middle and least poor. Subgrouping based on five-equal sized groups (quintiles) were also generated to examine the sensitivity of the findings to different wealth subgroupings. We first compared the sample means of individual and household-level characteristics at baseline between intervention and comparison arms, and assessed whether the differences between arms were statistically significant by using t-tests. We then assessed the distribution of service utilisation outcomes at baseline across population subgroups by estimating the utilisation gap (i.e. a difference in average service use between two subgroups) [87]. We used t-tests to test whether the utilisation gaps were significantly different from zero. To examine whether the effects of P4P on outcomes differed across population subgroups, we first performed subgroup analyses to identify the P4P effect on each subgroup, and then tested the significance of differential effects between subgroups through analysing the interaction effect. We identified the average effect of P4P on service utilisation by using a linear difference-in-differences regression model. This model compares the changes in outcomes over time between participants in the intervention and comparison arms as specified in Eq. (1): where Yijt is the utilisation outcome (institutional deliveries or uptake of IPT2) of individual i in facility j’s catchment area and at time t. The intervention dummy variable P4Pj takes the value 1 if a facility is in the intervention arm and 0 if it is in the comparison arm. The unobserved time invariant facility characteristics γj were controlled for through facility fixed-effects estimation; and included δt for year fixed effects. We also controlled for individual and household-level covariates Xijt (age, education, occupation, religion, marital status, parity, insurance status, household size, and household wealth status) as potential confounders. The error term is εijt. We clustered the standard errors at the facility level, or facility catchment area, to account for serial correlation of εijt at the facility level. The effect of P4P on utilisation for each subgroup is given by β1. To test the significance of an eventual differential effect across subgroups, we included a three-way interaction term between the average treatment effect (P4Pj × δt) and a subgrouping variable Gi (based on predisposing and enabling factors). The associated two-order interaction terms were also included in the model. The coefficient of interest is β4 which indicates the differential effect of P4P across subgroups as shown in Eq. (2): The use of the difference-in-difference approach to estimate the effect of P4P on outcomes relies on the key identifying assumption that the trends in outcomes would be parallel across study arms in the absence of the intervention [41]. While this can never be formally tested, we supported the assumption by verifying that the pre-intervention trends in utilisation outcomes at the household level were parallel across study arms as described elsewhere [11]. By surveying women who had delivered in the past 12 months at baseline, four longitudinal outcomes were generated and used to verify the assumption: share of institutional deliveries, caesarean section deliveries, women who breastfeed within one hour of birth, and women who paid for delivery care. We further performed several robustness checks. First, we re-estimated the P4P differential effect by using wealth quintiles instead of wealth terciles to examine whether the results were sensitivity to wealth group classification. We also generated wealth status subgroups for each study arm and re-estimated the P4P differential effect by arm-based wealth subgroups to avoid the pre-existing baseline imbalance in wealth status between arms. Second, we re-estimated the regression model by including three-way interactions with categorical variable which gives multiple subgroups (e.g. education levels, occupation categories, parity groups and age groups) instead of interactions with binary variables (e.g. married vs. none). Third, we applied a non-linear logit model instead of linear model because of binary outcome variables. Fourth, we clustered the standard errors at the district level instead of facility level and used a bootstrapping method to adjust for the small number of clusters [20]. All the analyses were performed by using STATA version 13.
N/A