Understanding factors associated with attending secondary school in Tanzania using household survey data

PLoS ONE, Volume 17, No. 2 February, Year 2022

Interpretation

Background Sustainable Development Goal (SDG) 4 aims to ensure inclusive and equitable access for all by 2030, leaving no one behind. One indicator selected to measure progress towards achievement is the participation rate of youth in education (SDG 4.3.1). Here we aim to understand drivers of school attendance using one country in East Africa as an example. Methods Nationally representative household survey data (2015–16 Tanzania Demographic and Health Survey) were used to explore individual, household and contextual factors associated with secondary school attendance in Tanzania. These included, age, head of household’s levels of education, gender, household wealth index and total number of children under five. Contextual factors such as average pupil to qualified teacher ratio and geographic access to school were also tested at cluster level. A two-level random intercept logistic regression model was used in exploring association of these factors with attendance in a multi-level framework. Results Age of household head, educational attainments of either of the head of the household or parent, child characteristics such as gender, were important predictors of secondary school attendance. Being in a richer household and with fewer siblings of lower age (under the age of 5) were associated with increased odds of attendance (OR = 0.91, CI 95%: 0.86; 0.96). Contextual factors were less likely to be associated with secondary school attendance. Conclusions Individual and household level factors are likely to impact secondary school attendance rates more compared to contextual factors, suggesting an increased focus of interventions at these levels is needed. Future studies should explore the impact of interventions targeting these levels. Policies should ideally promote gender equality in accessing secondary school as well as support those families where the dependency ratio is high. Strategies to reduce poverty will also increase the likelihood of attending school. We used the Demographic and Health Surveys (DHS) data for Tanzania (2015–16 DHS, n = 595 clusters) [24, 42] downloaded from the MEASURE DHS website, and extracted a potential list of key individual (age and sex of the child, education attainment of the head of the household and parents) and household (household wealth index, number of children under the age of 5 in the household) level characteristics used as variables (or factors) that impact secondary school attendance. The DHS program collects nationally representative household surveys in over 90 countries, which provide indicators on a wide range of topics including population, wealth, maternal and child health, fertility and family planning, nutrition, and education indicators. DHS sampling design is implemented using a two (or sometimes three) stages stratified sampling design using censuses as sampling frames. During the first stage of selection, enumeration areas (EAs), also known as clusters, are selected by using a probability proportional-to-size selection (EA size). During the second stage household are usually sampled from a complete household listing in the selected EAs using systematic sampling. Specific details on the sampling procedures for can be found on the DHS final report [24] and DHS Sampling Manual [42]. Clusters (also used interchangeably with primary sampling units (PSU) and enumeration areas (EAs)) are defined as a group of households in the same area or a block (if in urban areas) selected for the interview within the complex survey design used by the DHS. The adjusted net attendance rate used as outcome variable in this study was defined as the total number of students of the official secondary school age-group attending primary or secondary or higher education at a reference academic year, following indications from UIS UNESCO and the DHS methods [43, 44]. It therefore included children of official school age who accessed school earlier or later than the normal enrolment age and was expressed as a percentage of the corresponding population [45], giving a more precise picture of participation to school. The designated age-ranges for secondary school in Tanzania ranges between 14 to 19 years old. The numerator was the de facto total population of secondary school age attending secondary school (or primary, secondary or higher in the case of the adjusted rate) while the denominator was the total number of de facto secondary school age population. The age at the start of the academic year was used to determine the eligible secondary school age population used in the numerators and denominators for the net attendance rate [44]. To establish these age ranges, full information on the date of birth of the child in question was triangulated with the start of academic year, to account for temporal gap between the interviews and the start of the academic year. The out-of-school rate for secondary school was calculated by subtracting the adjusted net attendance rate for secondary education from 100%. Alongside individual and household level factors, contextual level factors such as travel time to nearest secondary school (a proxy for access to school) and pupil to teacher qualified ratio (PQTR, a proxy for school service offered / quality) [46] were constructed. Cluster level information about travel time to the nearest secondary school were extracted from the gridded estimates to account for travel time as contextual variable. The average travel time to nearest secondary school were extracted at each cluster. The methodology for computing travel time has been documented in previous studies [47–50]. Firstly, school locations were triangulated with ancillary spatial data on elevation (DEM), obtained from HydroSHEDS dataset [51], land cover, obtained from MERIS GlobCover [52], and road networks, assembled from Open Street Maps (OSM) and other online resources such as the National Geospatial-Intelligence Agency (NGA) [53] and MapCruzin [54], using Access Mod version 5 software [55]. Secondly, a raster surface of travel times to the school locations that include walking across land cover and motorised travel along major roads was generated and used in the analysis. The focal statistics tool available under ArcGIS Spatial Analyst (ESRI ArcGIS 10.7) was employed to calculate means within a 2km or 5km buffer around each cluster, depending on whether they were an urban or rural cluster in order to take into account the urban/rural split in survey sampling, and the respective 2km and 5km displacement of DHS clusters. The PQTR is defined as the average number of pupils per qualified teacher at a given level of education, based on headcounts of both pupils and teachers [44], and it was also tested as a contextual variable. The PQTR gives indication of how many teachers per pupil are present in a school, and therefore how much care and attention can be given to each individual pupil. A qualified teacher is one who has at least the minimum academic qualifications required for teaching his/her subjects at the relevant level in a given country [44]. Information on pupil-to-qualified teacher ratio (PQTR) was extracted from online education database for Tanzania (www.africaopendata.org [46]). This included information on the number of children enrolled at each school and teachers in every classroom of each secondary school. The higher the pupil-qualified teacher ratio, the lower the relative access of pupils to qualified teachers, where a high pupil-teacher ratio suggests that each teacher is responsible for a large number of pupils. On the contrary, it is generally assumed that a low pupil-qualified teacher ratio signifies smaller classes, which enables the teacher to pay more attention to individual students [44]. A GIS inverse distance weighting (IDW) interpolation technique was used to create a continuous surface of PQTR in Tanzania. S1 Fig and S1 Text show the distribution of the pupil-qualified teacher ratio (PQTR) in Tanzania for each school and a relative surface. PQTR values for each DHS cluster were extracted using focal statistics around urban areas (using a buffer of 2km) and rural areas (5km). Information about children of secondary school age were linked to the PQTR quality indicator, and values at cluster level were therefore employed in the modelling framework as a contextual variable to understand factors associated with access to secondary school. Based on the data availability and on the theoretical relationship discussed in previous studies between determinants and school attendance [among others: 8, 21], a full list of individual, family and household and contextual characteristics was derived and their association with the outcome variable tested using bivariate analysis and a forward stepwise covariate selection process. A bivariate analysis was performed to identify demographic and socio-economic characteristics associated with the adjusted secondary school attendance ratio. This descriptive analysis tested for differences within groups using F test and t-test for equality of means (for continuous variables) adjusting for sample design and with a significance level of p<0.05. Data were analysed with Stata/SE 16.0 for Windows [56] and adjusted for the survey sampling design. Additionally, a forward-stepwise covariate selection procedure using an alpha level of 0.05 was implemented to identify a parsimonious set, while collinearity between independent variables was explored using VIF statistic. Collinearity was considered high for covariates with a VIF greater than 4, which indicates a twofold increase in the standard error of a regression coefficient, in presence of collinearity. In case of two collinear variables (with high VIF), the variables with the highest R2 statistic when compared to the outcome variable was retained. Interaction terms for variables age and level of education of the household head were initially tested outside the modelling stage, followed by an assessment within the modelling stage, where they resulted to be not significant, and therefore not included in the final model. A two-level (multilevel) random intercept logistic regression analysis for the probability of attending secondary school was conducted, with individuals nested within primary sampling units (clusters) [39, 40], and the notation for a two-level random intercept model for binary responses as follows: where, uj~N(0, σu2), and πij = is the probability of an event occurring for the i level 1 unit in the j level 2 unit; β0 is the log-odds that y = 1 when x = 0 and u = 0; β1 is effect on log-odds of 1-unit increase in x for individuals in same group; uj is the effect of being in group j on the log-odds that y = 1; also known as a level 2 residual; σu2 is the level 2 (residual) variance, or the between-group variance in the log-odds that y = 1 after accounting for x; x1ij is a generic level one nested within level 2 independent variable; x2j indicates a level two independent variable. The response variable “School attendance” was binary distributed, with value equal to 0 when the eligible children of secondary school age wasn’t attending school, and value equal to 1 when the eligible children of secondary school age was attending school. The analysis aimed at describing factors associated with children attending school. At the first level, we defined the child, parents or household level; with level two we defined the cluster (community/contextual) level. Due to the sample size, there was no rationale for having either parents or household as second level in the model. Interaction terms were also explored outside the modelling frameworks and tested within the full multilevel models to assess their significance level. Log-likelihood tests for goodness of fit were performed between a simple logistic regression and a null model with random intercept at level two. Adding a random intercept at cluster level proved to be statistically significant and therefore random intercepts were retained. Finally, to find the best possible full multilevel model, log-likelihood tests for goodness of fit were also performed by comparing the null model with random intercept and by adding one independent variable at the time. For ease of interpretability, our results for the multilevel models are presented using odds ratios, taking the exponent of the log-odds and confidence intervals at 95% of probability. Intraclass correlation coefficients (ICC) were calculated for the final model. ICC measure the correlation of the observations of the children belonging to the same cluster (community), and it is defined as the variance between clusters divided by the total variance, where the total variance is formed by the variance between groups and the variance within groups [41]. Finally, adjusted mean predictions for the fixed portion of the model were calculated after running the multilevel logistic model, to compute the probability of accessing secondary school for selected characteristics in the model, holding all the other independent variables in the model at their mean values. University of Southampton number: 45660.

AI Digest

Study Justification:

– The study aims to understand the factors associated with attending secondary school in Tanzania.
– This is important because Sustainable Development Goal 4 aims to ensure inclusive and equitable access to education for all by 2030.
– The study will contribute to measuring progress towards achieving SDG 4.3.1, which focuses on the participation rate of youth in education.

Study Highlights:

– The study used nationally representative household survey data from the 2015-16 Tanzania Demographic and Health Survey.
– Individual, household, and contextual factors were explored to understand their association with secondary school attendance.
– Factors such as age, education levels of the household head and parents, gender, household wealth, and number of children under five were found to be important predictors of attendance.
– Contextual factors, such as pupil to qualified teacher ratio and geographic access to school, were less likely to be associated with attendance.

Study Recommendations:

– Interventions should focus on individual and household level factors to improve secondary school attendance rates.
– Gender equality in accessing secondary school should be promoted.
– Support should be provided to families with a high dependency ratio.
– Strategies to reduce poverty will increase the likelihood of attending school.

Key Role Players:

– Ministry of Education: Responsible for implementing policies and interventions to improve secondary school attendance.
– Non-governmental Organizations (NGOs): Can provide support and resources to implement interventions at the individual and household levels.
– Community Leaders: Can play a role in promoting gender equality and raising awareness about the importance of education.

Cost Items for Planning Recommendations:

– Teacher Training: Budget for training teachers to improve the quality of education and reduce pupil to qualified teacher ratio.
– Infrastructure Development: Budget for constructing and maintaining schools to improve access to education.
– Scholarships and Financial Assistance: Budget for providing financial support to families with a high dependency ratio and those living in poverty.
– Awareness Campaigns: Budget for raising awareness about the importance of education and promoting gender equality in accessing secondary school.

Strength of Evidence

The strength of evidence for this abstract is 7 out of 10.
The evidence in the abstract is rated 7 because it provides a clear description of the methods used, including the use of nationally representative household survey data and a two-level random intercept logistic regression model. The results are presented in a concise manner, highlighting the important predictors of secondary school attendance. However, the abstract could be improved by providing more specific details about the sample size and the statistical significance of the findings. Additionally, it would be helpful to include information about the limitations of the study and suggestions for future research.

Abstract

We used the Demographic and Health Surveys (DHS) data for Tanzania (2015–16 DHS, n = 595 clusters) [24, 42] downloaded from the MEASURE DHS website, and extracted a potential list of key individual (age and sex of the child, education attainment of the head of the household and parents) and household (household wealth index, number of children under the age of 5 in the household) level characteristics used as variables (or factors) that impact secondary school attendance. The DHS program collects nationally representative household surveys in over 90 countries, which provide indicators on a wide range of topics including population, wealth, maternal and child health, fertility and family planning, nutrition, and education indicators. DHS sampling design is implemented using a two (or sometimes three) stages stratified sampling design using censuses as sampling frames. During the first stage of selection, enumeration areas (EAs), also known as clusters, are selected by using a probability proportional-to-size selection (EA size). During the second stage household are usually sampled from a complete household listing in the selected EAs using systematic sampling. Specific details on the sampling procedures for can be found on the DHS final report [24] and DHS Sampling Manual [42]. Clusters (also used interchangeably with primary sampling units (PSU) and enumeration areas (EAs)) are defined as a group of households in the same area or a block (if in urban areas) selected for the interview within the complex survey design used by the DHS. The adjusted net attendance rate used as outcome variable in this study was defined as the total number of students of the official secondary school age-group attending primary or secondary or higher education at a reference academic year, following indications from UIS UNESCO and the DHS methods [43, 44]. It therefore included children of official school age who accessed school earlier or later than the normal enrolment age and was expressed as a percentage of the corresponding population [45], giving a more precise picture of participation to school. The designated age-ranges for secondary school in Tanzania ranges between 14 to 19 years old. The numerator was the de facto total population of secondary school age attending secondary school (or primary, secondary or higher in the case of the adjusted rate) while the denominator was the total number of de facto secondary school age population. The age at the start of the academic year was used to determine the eligible secondary school age population used in the numerators and denominators for the net attendance rate [44]. To establish these age ranges, full information on the date of birth of the child in question was triangulated with the start of academic year, to account for temporal gap between the interviews and the start of the academic year. The out-of-school rate for secondary school was calculated by subtracting the adjusted net attendance rate for secondary education from 100%. Alongside individual and household level factors, contextual level factors such as travel time to nearest secondary school (a proxy for access to school) and pupil to teacher qualified ratio (PQTR, a proxy for school service offered / quality) [46] were constructed. Cluster level information about travel time to the nearest secondary school were extracted from the gridded estimates to account for travel time as contextual variable. The average travel time to nearest secondary school were extracted at each cluster. The methodology for computing travel time has been documented in previous studies [47–50]. Firstly, school locations were triangulated with ancillary spatial data on elevation (DEM), obtained from HydroSHEDS dataset [51], land cover, obtained from MERIS GlobCover [52], and road networks, assembled from Open Street Maps (OSM) and other online resources such as the National Geospatial-Intelligence Agency (NGA) [53] and MapCruzin [54], using Access Mod version 5 software [55]. Secondly, a raster surface of travel times to the school locations that include walking across land cover and motorised travel along major roads was generated and used in the analysis. The focal statistics tool available under ArcGIS Spatial Analyst (ESRI ArcGIS 10.7) was employed to calculate means within a 2km or 5km buffer around each cluster, depending on whether they were an urban or rural cluster in order to take into account the urban/rural split in survey sampling, and the respective 2km and 5km displacement of DHS clusters. The PQTR is defined as the average number of pupils per qualified teacher at a given level of education, based on headcounts of both pupils and teachers [44], and it was also tested as a contextual variable. The PQTR gives indication of how many teachers per pupil are present in a school, and therefore how much care and attention can be given to each individual pupil. A qualified teacher is one who has at least the minimum academic qualifications required for teaching his/her subjects at the relevant level in a given country [44]. Information on pupil-to-qualified teacher ratio (PQTR) was extracted from online education database for Tanzania (www.africaopendata.org [46]). This included information on the number of children enrolled at each school and teachers in every classroom of each secondary school. The higher the pupil-qualified teacher ratio, the lower the relative access of pupils to qualified teachers, where a high pupil-teacher ratio suggests that each teacher is responsible for a large number of pupils. On the contrary, it is generally assumed that a low pupil-qualified teacher ratio signifies smaller classes, which enables the teacher to pay more attention to individual students [44]. A GIS inverse distance weighting (IDW) interpolation technique was used to create a continuous surface of PQTR in Tanzania. S1 Fig and S1 Text show the distribution of the pupil-qualified teacher ratio (PQTR) in Tanzania for each school and a relative surface. PQTR values for each DHS cluster were extracted using focal statistics around urban areas (using a buffer of 2km) and rural areas (5km). Information about children of secondary school age were linked to the PQTR quality indicator, and values at cluster level were therefore employed in the modelling framework as a contextual variable to understand factors associated with access to secondary school. Based on the data availability and on the theoretical relationship discussed in previous studies between determinants and school attendance [among others: 8, 21], a full list of individual, family and household and contextual characteristics was derived and their association with the outcome variable tested using bivariate analysis and a forward stepwise covariate selection process. A bivariate analysis was performed to identify demographic and socio-economic characteristics associated with the adjusted secondary school attendance ratio. This descriptive analysis tested for differences within groups using F test and t-test for equality of means (for continuous variables) adjusting for sample design and with a significance level of p<0.05. Data were analysed with Stata/SE 16.0 for Windows [56] and adjusted for the survey sampling design. Additionally, a forward-stepwise covariate selection procedure using an alpha level of 0.05 was implemented to identify a parsimonious set, while collinearity between independent variables was explored using VIF statistic. Collinearity was considered high for covariates with a VIF greater than 4, which indicates a twofold increase in the standard error of a regression coefficient, in presence of collinearity. In case of two collinear variables (with high VIF), the variables with the highest R2 statistic when compared to the outcome variable was retained. Interaction terms for variables age and level of education of the household head were initially tested outside the modelling stage, followed by an assessment within the modelling stage, where they resulted to be not significant, and therefore not included in the final model. A two-level (multilevel) random intercept logistic regression analysis for the probability of attending secondary school was conducted, with individuals nested within primary sampling units (clusters) [39, 40], and the notation for a two-level random intercept model for binary responses as follows: where, uj~N(0, σu2), and πij = is the probability of an event occurring for the i level 1 unit in the j level 2 unit; β0 is the log-odds that y = 1 when x = 0 and u = 0; β1 is effect on log-odds of 1-unit increase in x for individuals in same group; uj is the effect of being in group j on the log-odds that y = 1; also known as a level 2 residual; σu2 is the level 2 (residual) variance, or the between-group variance in the log-odds that y = 1 after accounting for x; x1ij is a generic level one nested within level 2 independent variable; x2j indicates a level two independent variable. The response variable “School attendance” was binary distributed, with value equal to 0 when the eligible children of secondary school age wasn’t attending school, and value equal to 1 when the eligible children of secondary school age was attending school. The analysis aimed at describing factors associated with children attending school. At the first level, we defined the child, parents or household level; with level two we defined the cluster (community/contextual) level. Due to the sample size, there was no rationale for having either parents or household as second level in the model. Interaction terms were also explored outside the modelling frameworks and tested within the full multilevel models to assess their significance level. Log-likelihood tests for goodness of fit were performed between a simple logistic regression and a null model with random intercept at level two. Adding a random intercept at cluster level proved to be statistically significant and therefore random intercepts were retained. Finally, to find the best possible full multilevel model, log-likelihood tests for goodness of fit were also performed by comparing the null model with random intercept and by adding one independent variable at the time. For ease of interpretability, our results for the multilevel models are presented using odds ratios, taking the exponent of the log-odds and confidence intervals at 95% of probability. Intraclass correlation coefficients (ICC) were calculated for the final model. ICC measure the correlation of the observations of the children belonging to the same cluster (community), and it is defined as the variance between clusters divided by the total variance, where the total variance is formed by the variance between groups and the variance within groups [41]. Finally, adjusted mean predictions for the fixed portion of the model were calculated after running the multilevel logistic model, to compute the probability of accessing secondary school for selected characteristics in the model, holding all the other independent variables in the model at their mean values. University of Southampton number: 45660.

Materials

Innovations

Based on the provided information, it seems that the focus is on understanding factors associated with attending secondary school in Tanzania. While the description does not explicitly mention innovations for improving access to maternal health, I can provide some potential recommendations based on innovations in the field of maternal health that could be applied to improve access. These recommendations are not directly related to the specific study described, but they can be considered as general suggestions for improving access to maternal health:

1. Telemedicine and Telehealth: Implementing telemedicine and telehealth services can improve access to maternal health by allowing pregnant women in remote or underserved areas to receive virtual consultations, prenatal care, and postnatal support from healthcare professionals.

2. Mobile Health (mHealth) Applications: Developing and promoting mobile health applications that provide information, reminders, and guidance on prenatal care, nutrition, and maternal health can empower women to take control of their health and access necessary resources.

3. Community Health Workers: Training and deploying community health workers who can provide essential maternal health services, education, and support to women in their communities can help bridge the gap between healthcare facilities and remote areas.

4. Maternal Health Vouchers: Implementing voucher programs that provide financial assistance to pregnant women for accessing maternal health services, including antenatal care, delivery, and postnatal care, can help overcome financial barriers and improve access.

5. Maternity Waiting Homes: Establishing maternity waiting homes near healthcare facilities can provide a safe and comfortable place for pregnant women to stay before delivery, especially for those who live far away and may face challenges in reaching the facility in time.

6. Transportation Support: Providing transportation support, such as ambulances or transportation vouchers, to pregnant women in remote areas can ensure timely access to healthcare facilities during emergencies or for routine check-ups.

7. Task-Shifting and Training: Training and empowering non-specialist healthcare providers, such as midwives and nurses, to deliver quality maternal health services can help address the shortage of skilled healthcare professionals in underserved areas.

8. Public-Private Partnerships: Collaborating with private healthcare providers and organizations to expand access to maternal health services, especially in areas with limited public healthcare infrastructure, can help fill the gaps and improve access.

It is important to note that these recommendations are general and may need to be tailored to the specific context and needs of Tanzania.

AI Innovations Description

The recommendation that can be developed into an innovation to improve access to maternal health is to focus on individual and household level factors. This means implementing interventions that target these specific levels to increase secondary school attendance rates. Additionally, policies should promote gender equality in accessing secondary school and support families with a high dependency ratio. Strategies to reduce poverty will also increase the likelihood of attending school. It is important to understand the drivers of school attendance, such as age, education levels of the household head and parents, gender, household wealth index, and the number of children under the age of five in the household. By addressing these factors, access to maternal health can be improved.

AI Innovations Methodology

The provided text describes a study conducted in Tanzania to understand the factors associated with attending secondary school. The study used nationally representative household survey data from the Tanzania Demographic and Health Survey (DHS) to explore individual, household, and contextual factors that impact secondary school attendance.

To simulate the impact of recommendations on improving access to maternal health, a similar methodology can be applied. Here’s a brief description of the methodology:

1. Data Collection: Collect nationally representative household survey data that includes information on maternal health indicators such as antenatal care, skilled birth attendance, postnatal care, and access to healthcare facilities.

2. Variable Selection: Identify key individual, household, and contextual factors that may impact access to maternal health. These factors could include age, education level, socioeconomic status, distance to healthcare facilities, availability of skilled healthcare providers, and cultural beliefs.

3. Sampling Design: Implement a two-stage stratified sampling design using censuses as sampling frames. Select clusters (enumeration areas) in the first stage and sample households from the selected clusters in the second stage.

4. Data Analysis: Perform descriptive analysis to identify demographic and socioeconomic characteristics associated with access to maternal health services. Use statistical tests to determine significant differences within groups.

5. Multilevel Modeling: Conduct a multilevel logistic regression analysis to explore the association between the selected factors and access to maternal health services. Use a two-level random intercept model with individuals nested within clusters.

6. Model Selection: Use a forward-stepwise covariate selection process to identify a parsimonious set of independent variables. Consider collinearity between variables using the variance inflation factor (VIF) statistic.

7. Model Evaluation: Assess the goodness of fit of the final model using log-likelihood tests. Calculate the intraclass correlation coefficient (ICC) to measure the correlation of observations within clusters.

8. Odds Ratio Calculation: Calculate odds ratios and confidence intervals to quantify the association between the independent variables and access to maternal health services.

9. Predictive Analysis: Generate adjusted mean predictions for the fixed portion of the model to estimate the probability of accessing maternal health services for selected characteristics, while holding other variables at their mean values.

By following this methodology, researchers can simulate the impact of different recommendations on improving access to maternal health. They can identify the most influential factors and develop targeted interventions to address barriers and improve access to maternal health services.

Authors & Co-Authors

Statistics:

Authors: 5

Identifiers:

DOI: 10.1371/journal.pone.0263734

Research Areas:

Community Interventions, Food Security, Health System and Policy, Maternal Access, Maternal and Child Health, Quality of Care, Sexual and Reproductive Health, Social Determinants

Study Countries:

Multi-Countries, Tanzania

Study Design:

Cross Sectional Study

Study Approach:

Quantitative