Objective: To test the success of a maternal healthcare quality improvement intervention in actually improving quality. Design: Cluster-randomized controlled study with implementation evaluation; we randomized 12 primary care facilities to receive a quality improvement intervention, while 12 facilities served as controls. Setting: Four districts in rural Tanzania. Participants: Health facilities (24), providers (70 at baseline; 119 at endline) and patients (784 at baseline; 886 at endline). Interventions: In-service training, mentorship and supportive supervision and infrastructure support. Main outcome measures: We measured fidelity with indictors of quality and compared quality between intervention and control facilities using difference-in-differences analysis. Results: Quality of care was low at baseline: the average provider knowledge test score was 46.1% (range: 0-75%) and only 47.9% of women were very satisfied with delivery care. The intervention was associated with an increase in newborn counseling (β: 0.74, 95% CI: 0.13, 1.35) but no evidence of change across 17 additional indicators of quality. On average, facilities reached 39% implementation. Comparing facilities with the highest implementation of the intervention to control facilities again showed improvement on only one of the 18 quality indicators. Conclusions: A multi-faceted quality improvement intervention resulted in no meaningful improvement in quality. Evidence suggests this is due to both failure to sustain a high-level of implementation and failure in theory: quality improvement interventions targeted at the clinic-level in primary care clinics with weak starting quality, including poor infrastructure and low provider competence, may not be effective.
This study was implemented in 24 primary care clinics, or dispensaries, in four districts of Pwani Region, Tanzania. Selection criteria were previously described in detail [15]. Dispensaries are outpatient facilities programed to provide primary care, including reproductive health services [16, 17]. In Pwani, 73% of deliveries occurred in health facilities in 2010, and around one third of those occurring in health facilities occurred in primary care facilities [12]. We stratified the 24 facilities by district and then randomized facilities in a 1:1 ratio to either the intervention or the control group, resulting in three intervention and three control facilities in each district. Randomization occurred by pulling facility names out of a hat in the presence of research staff and regional health officials. Clusters were defined as the health facility and the surrounding catchment area. Facilities in the intervention group received a maternal and newborn health quality improvement intervention, while facilities in the control group continued with standard care. Delivery of interventions known to avert maternal and newborn deaths (e.g. high quality antenatal care (ANC) and rapid deployment of emergency care) [18] requires competent and motivated providers working within well-equipped facilities that are able to support basic emergency obstetric and newborn care (BEmONC), with appropriate access to referral facilities. The MNH+ intervention uses BEmONC training to provide a review of foundational knowledge, complemented by continuous mentoring and supportive supervision by an obstetrician, and provision of the necessary equipment, supplies, and medication. Our theory of change is that these quality inputs will translate into better quality process of care and outcomes (box). Implementation of the intervention began in June 2012; by July 2013, the full intervention was underway and continued into the spring of 2016. Theory of change and intervention components We developed an implementation index to assess the effect of variation of the intervention across the 12 intervention facilities [20, 21]. For each intervention component, we identified indicators for the dose delivered (e.g. proportion of expected supportive supervision visits delivered), reach to the intended audience (e.g. proportion of providers who are trained) and dose received (e.g. provider’s training scores). Fidelity is defined as the correct application of the program [21]. Instead of looking at whether each individual intervention component was implemented as intended, we chose a more demanding definition of fidelity: whether the immediate intended effect, that is improvement in quality, was achieved. We thus specified a range of quality metrics using Donabedian’s model of quality of care of structure, process, and outcome. Trained providers completed a 60-question multiple-choice test that emphasized obstetric and newborn emergency care and two clinical vignettes that tested their clinical judgment in obstetric emergencies (appendix 1), receiving a continuous score between 0 and 1 on each instrument. We used data from facility registers to create a composite indicator of routine obstetric services (appendix 2). For each facility, we created an indicator for the sum of each of the six BEmONC signal functions (life-saving health services) that had been performed in the previous 3 months. We measured reported receipt of services as the proportion of women receiving a uterotonic, the proportion of women receiving IV antibiotics and a composite indicator of counseling on six items. We measured patients’ perception of quality through composite indicators for nontechnical quality and technical quality. We asked patients and providers to report their perception of quality at the facility. Patients also reported their satisfaction with delivery care. Indicators were created to compare those with the top rating (e.g. excellent or very satisfied) to all others. We measured four indicators of maternal health through biomarkers collected during the household survey: lack of anemia (hemoglobin level is 12.0 g/dl or above for nonpregnant women and 11.0 g/dl or above for pregnant women [22]), lack of hypertension (average systolic reading less than 140 mm Hg and average diastolic reading less than 90 mm Hg [23]), distribution of EQ-5D (EuroQol Group, Rotterdam, Netherlands) and distribution of mid-upper arm circumference (MUAC). Patient-level data were collected as repeated cross-sections in 2012, 2014 and 2016 (Appendix 2 for summary) [15, 24, 25]. All households in the catchment area were enumerated. The sample size was determined based on another primary outcome, utilization. At midline, we selected 60% of women from each catchment area using a simple random sample. Women were eligible for the household survey if they were at least 15 years of age and lived within the catchment area of a study facility, and included in this analysis if they had delivered their most recent child between 6 weeks and 1 year prior to the interview in one of the study facilities. At midline and endline, women were invited to have their hemoglobin and blood pressure tested. The job satisfaction survey was offered to all healthcare providers [26], while the obstetric knowledge test and the clinical vignettes were offered to healthcare providers who had received formal pre-service training in obstetric care (i.e. clinical officers and nurses). The facility audit was adapted from the needs assessment developed by the Averting Maternal Death and Disability Program and the United Nations system [27]. The audit asked about services routinely provided by that facility. In addition, we collected aggregate monthly indicators of use and quality from the facility registers and partographs. The provider surveys, facility audits and register abstraction were conducted annually. The implementation team at Tanzania Health Promotion Support (THPS) collected data on intervention delivery. Data collection methods are further described in appendix 2. All women and healthcare providers participating in surveys provided written, informed consent prior to participation. Ethics review boards in both Tanzania, National Institute for Medical Research and Ifakara Health Institute and in the U.S., Columbia University and the Harvard T.H. Chan School of Public Health approved this study. Completed surveys were imported into Stata version 14.2 for cleaning and analysis. We first conducted descriptive statistics then assessed the implementation and fidelity of the intervention. Each of the three indicators (dose delivered, dose received and reach) were multiplied together to obtain a composite indicator for each of the three components (infrastructure, training and supportive supervision) [21, 28]. These three scores were then averaged to create a single composite measure of implementation strength. Complete implementation would thus be represented by a score of ‘1’ and complete failure of implementation by a score of ‘0’. To measure the effect of the MNH+ intervention on obstetric quality, we conducted difference-in-differences analyses assessing the difference between intervention and control facilities in the change of each quality indicator from baseline (2012) to endline (2016). These analyses control for both differences in quality patterns between facilities at baseline and changing patterns over time that are external to the intervention but consistent across the region. We included a fixed effect for district to account for stratification during the design phase. Except where noted, all models used generalized estimating equations with an exchangeable correlation structure. For binary quality measures, we used a log link to estimate risk ratios [29]. The robust sandwich estimator was used to account for clustering at the facility level. Because anemia and hypertension were not measured at baseline, we could not conduct a difference-in-differences analysis. Instead, we compared intervention to control at endline and adjusted for age, household wealth and district [30, 31]. Additionally, we assessed whether there was an effect of the intervention on the quality results at midline (2014). To assess changes in provider knowledge and competence, our primary analysis evaluated within provider changes. Because of unexpectedly low retention of providers across the five-year study period, we assessed changes from baseline (2012) to first follow-up (2013). We conducted a secondary analysis to measure changes in mean facility knowledge score from baseline (2012) to endline (2016). We conducted linear regression with a fixed effect for district and the robust sandwich estimator to account for clustering at the facility level. We conducted a sub-group analysis to assess the impact of the intervention in the high-implementation facilities (top third) compared to control facilities (N = 12) through difference-in-differences analyses.