Sociocultural influences on the development of child language skills have been widely studied, but the majority of the research findings were generated in Northern contexts. The current crosslinguistic, multisite study is the first of its kind in South Africa, considering the influence of a range of individual and sociocultural factors on expressive vocabulary size of young children. Caregivers of toddlers aged 16 to 32 months acquiring Afrikaans (n = 110), isiXhosa (n = 115), South African English (n = 105), or Xitsonga (n = 98) as home language completed a family background questionnaire and the MacArthur-Bates Communicative Development Inventory (CDI) about their children. Based on a revised version of Bronfenbrenner’s (1977) ecological systems theory, information was obtained from the family background questionnaire on individual factors (the child’s age and sex), microsystem-related factors (the number of other children and number of adults in the child’s household, maternal level of education, and SES), and exosystem-related factors (home language and geographic area, namely rural or urban). All sociocultural and individual factors combined explained 25% of the variance in expressive vocabulary size. Partial correlations between these sociocultural factors and the toddlers’ expressive vocabulary scores on 10 semantic domains yielded important insights into the impact of geographic area on the nature and size of children’s expressive vocabulary. Unlike in previous studies, maternal level of education and SES did not play a significant role in predicting children’s expressive vocabulary scores. These results indicate that there exists an interplay of sociocultural and individual influences on vocabulary development that requires a more complex ecological model of language development to understand the interaction between various sociocultural factors in diverse contexts.
This study has a quantitative design and is cross-sectional, crosslinguistic and descriptive in nature. Data for this paper were collected as part of a multilingual, multidisciplinary, inter-institutional research project on the gesture and language development of young South African children in all South Africa’s official languages (see Brookes et al., forthcoming; Dowling and Whitelaw, 2018). To obtain information on children’s language development for this paper, adapted MacArthur-Bates CDIs and a family background questionnaire for four of South Africa’s official languages were completed by the caregivers of Afrikaans-, isiXhosa-, SAE-, or Xitsonga-speaking toddlers of 16 to 32 months. Caregivers of 428 children aged 16 to 32 months were recruited via (i) local childcare institutions and local and national not-for-profit organizations offering services directed at families with young children, (ii) existing personal and professional networks of the researchers, and (iii) social media. Caregivers were either one of the child’s birth or adoptive parents, grandparents, other family members, or another guardian who parented the child alongside or instead of the biological parent. Inclusion criteria were that (i) the caregiver had to be a South African national (ii) raising a child of 16 to 32 months (iii) in their mother tongue (iv) in South Africa. The exclusion criteria were more than 4 h per day of exposure to another language/other languages in the child’s home, and caregiver concern about the child’s hearing or communication development. We excluded children who received more than 4 h a day of exposure to other languages to control for the often reported – and contested (see, e.g., Pearson et al., 1993; Hoff et al., 2012; De Houwer et al., 2014) – difference in expressive vocabulary size between monolingual and multilingual children when considering the vocabulary size in each of the multilingual child’s languages separately. We also wanted to avoid adding the variable of amount of exposure to each language, given that bilingual children have been shown to have higher vocabulary scores for what is reported to be their first than for their second language (O’Toole et al., 2017). Children for whom concerns about hearing and/or communication development were reported were excluded to limit the number of factors which could cause variation in vocabulary size in our sample, given that our focus was on sociocultural (and not health-related) influences. Our sampling plan stated that half of the targeted 100 participants for each language had to be male, to control for the often-reported influence of sex on child language skills. For Afrikaans, isiXhosa and Xitsonga, half of the participants had to live in rural areas, to control for the reported effect of geographic location on vocabulary size and composition. For these three languages, there were no specific targets as regards SES. For SAE, half of the participants had to be from low SES homes, regardless of geographic location, because SAE is infrequently spoken as home language in rural areas, but does vary according to SES (see Mesthrie, 2002; Bekker, 2012). Table 1 shows the number of the participants and their demographic information. As can be seen from this table, the target number of participants was exceeded for all languages apart from Xitsonga. The Afrikaans participants had the highest mean age (1.32 months higher than the youngest language group, isiXhosa). Whereas SAE and isiXhosa each had almost the same number of male and female participants, Afrikaans had more females than males and Xitsonga more males than females. However, an ANOVA yielded no statistically significant group differences for Sex [F(3,424) = 1.104, p = 0.347] nor for Age [F(3,424) = 1.410, p = 0.239]. Afrikaans, Xitsonga, and isiXhosa collectively had 163 rural and 160 urban participants, and all but three SAE participants were situated in urban areas. Participant demographic information by language. The MacArthur-Bates CDI has been adapted into nearly 100 languages from a range of language families7. It has an infant version (on the gestures, play routines, common action, and words that children of 8 to 18 months can understand and use) and a toddler version (on the words and early morphology, word combinations and sentence complexity of children aged 16 to typically 30/36 months). For the purposes of this paper, only the word section of the toddler version was considered. In each case, the caregivers were asked to indicate on a checklist whether the child understood and produced the word. The South African versions of the CDI have not yet been validated. The question that can arise is whether caregivers in South Africa are able to report accurately on their toddlers’ language skills – if the caregivers engage in less child-directed speech, do they know their child well enough linguistically to reliably indicate which words their child understands and produces? Although South African data are not yet available, Alcock et al. (2015) found that in rural Kenya, caregivers were able to accurately report their younger children’s receptive vocabulary (at an age when there are few productive words to report) and older children’s grammatical errors. Based on this study from Kenya, we worked on the premise that South African caregivers are capable of providing reliable information. The American English toddler version of the CDI (Fenson et al., 1993) was translated by three adult mother-tongue speakers per language. Hereafter, adaptations (entailing the addition or removal of words) were made based on the outcome of (i) a minimum of two focus group discussions and/or sets of interviews8 with parents of young children and professional child service providers, (ii) consultation with linguists and speech-language therapists who are mother tongue speakers of the language (five for Xitsonga, three for isiXhosa, three for Afrikaans, and two for SAE), and (iii) 30-min samples of naturally occurring speech from six children per language (see Brookes et al., forthcoming). The preliminary versions of the CDIs and family background questionnaires were piloted with 40 caregivers of 16- to 32-month-olds per language (for Afrikaans, Xitsonga, and isiXhosa, 20 rural and 20 urban; for SAE, 20 low- and 20 mid-SES). After this pilot, statistical analyses of the data obtained guided decisions on further exclusion or replacement of lexical items. From the approximately 1200 lexical items piloted, 733 to 773 vocabulary items per language were retained for the CDIs used in the current study. The CDIs of the West Germanic languages had one more semantic domain than the Bantu language CDIs, as pronouns were not included in the Bantu language CDIs9. For the current study, the total CDI vocabulary score and a subset of 10 semantic domains (amounting to approximately half of the total number of lexical items on the CDI) were used for analysis. This selection was made to reduce the number of semantic domains to a manageable number, as the scope of this article did not allow consideration of all semantic domains. These 10 domains were selected based on their similarity in terms of number of items across languages and their tangibility, in that they either are all nouns or refer to games and routines, which we expect would make them more susceptible to sociocultural differences (see, e.g., Potgieter and Southwood, 2016 for a South African study which found that 4-year-old low-SES and mid-SES monolingual children differed significantly in terms of their noun-related but not verb-related vocabulary scores). These 10 domains were ANIMALS, CLOTHING, FOOD AND DRINK, FURNITURE, GAMES AND ROUTINES, PEOPLE, PLACES TO GO, SMALL HOUSEHOLD ITEMS, TOYS, and VEHICLES. Table 2 contains selected information on the number of lexical items per language version of the CDI used for data collection for this paper. Number of lexical items of the CDI, by language and semantic domain. The family background questionnaire was developed after consulting (i) the literature on demographic and other factors influencing language development in young children, (ii) the results of the 2011 South African census (Statistics South Africa, 2012), and (iii) members of communities speaking the language concerned. The questionnaire included questions on child health and development; childcare arrangements; household composition, income and food expenditure; parental level of education and occupation; and language exposure in and outside of the home, as these factors have been shown to affect child language development in other research contexts. Each language version of the questionnaire was piloted along with the CDI for that language, and questions were subsequently omitted, refined and rephrased based on the feedback received from the parents, caregivers and fieldworkers about their clarity, ease of reading, and cultural appropriateness. An electronic version of the consent form, family background questionnaire and CDI for each language was created on Qualtrics (Qualtrics, Provo, UT, United States), combined into one online form. The majority of the data were collected by fieldworkers who were either students or employees of child development organizations. They were trained online using Zoom or WhatsApp, as South Africa was in full to moderate lockdown due to COVID-19 at the time of data collection, and contact research was therefore not allowed. All data were collected either using the fieldworkers’ smartphones or tablets (using a link sent to them via WhatsApp), or – in cases where fieldworkers did not have their own suitable devices – on tablets couriered to them with the correct language version of the form in Qualtrics preloaded onto the tablet. Where assisted by a fieldworker, caregivers completed the questionnaire and CDI on their smartphones, with the fieldworker being available for consultation throughout. Caregivers without smartphones and/or sufficient literacy skills were interviewed telephonically by the fieldworker who entered the caregivers’ responses into Qualtrics. Cellphone credit and internet data to do so were supplied electronically to fieldworkers and caregivers. For some of the Afrikaans and SAE submissions, the electronic form was completed independently by the caregiver. In these cases, the caregivers had sufficiently high levels of literacy, and had access to a suitable electronic device and internet connection. The consent form, questionnaire and CDI collectively took 40 to 60 min to complete, depending on the number of lexical items the child knew and the caregiver’s reading ability and computer literacy. Qualtrics allows completion across multiple sessions (and automatically takes one to the first uncompleted page if reopened on the same device), so caregivers were able to stop and resume as needed. Submission had to take place within a week of first opening the form on Qualtrics; opened but unsubmitted forms were submitted automatically by Qualtrics after a week. Ethical clearance for the study was obtained from the relevant research ethics committees at the University of Cape Town and Stellenbosch University10. Information on the study and informed consent forms were available in the mother tongue of the participants on Qualtrics, and if consent for participation was not granted, Qualtrics did not allow the potential participant to proceed to the family background questionnaire and CDI. The informed consent form, family background questionnaire and CDI were completed voluntarily and anonymously. Participants could withdraw from the study at any stage by exiting Qualtrics prematurely. Qualtrics records all responses and indicates the percentage completion of each form. Submissions not showing a 100% completion were removed during data cleaning, thereby effectively making it possible for participants to withdraw their data from the study. Participants who completed the form independently donated their time to the research project. Those who completed the form with the assistance of a fieldworker could supply a mobile phone number to the fieldworker (not via Qualtrics) in order to be sent an electronic supermarket voucher as a thank-you gift (to the value of approximately 10 loaves of bread) via WhatsApp or text message. The research team ensured that all COVID-19-related social distancing protocols of their respective institutions were followed to protect both fieldworkers and participants from undue risk. In order to address RQ1 and RQ2, hierarchical linear regression was conducted in R version 4.0.2 (R Core Team, 2020), using the lm function, to determine whether the selected sociocultural variables can predict the participants’ Total vocabulary score. Four separate blocks were applied, controlling for the variables entered into the previous blocks. Age was entered as the first control variable whereas the second block contained the other individual factor, Sex. The third block contained the microsystem factors (SES, Maternal education, Number of adults in the household, and Number of other children in the household), which refer to systems with which the child is said to have direct interaction, whereas the fourth block contained the exosystem factors (Geographic area, which referred to rural vs. urban area, and Language). RQ3 was answered by calculating correlations, first for all languages combined and then for each language separately. This was done to determine whether any relationships exist between the above-mentioned sociocultural factors and the 10 semantic domains.
N/A