Background Voice messages have been employed as an effective and efficient approach for increasing health service utilization and health promotion in low- and middle-income countries. However, unlike SMS, voice message services require their users to pick up a phone call at its delivery time. Furthermore, voice messages are difficult for the users to review their contents afterward. While recognizing that voice messages are more friendly to specific groups (eg, illiterate or less literate populations), there should be several challenges in successfully operationalizing its intervention program. Objective This study is aimed to estimate the extent to which voice message service users pick up the phone calls of voice messages and complete listening up to or beyond the core part of voice messages. Methods A voice message service program composed of 14 episodes on maternal, newborn, and child health was piloted in Lagos, Nigeria, from 2018 to 2019. A voice message call of each of 14 episodes was delivered to the mobile phones of the program participants per day for 14 consecutive days. A total of 513 participants in the voice message service chose one of five locally spoken languages as the language to be used for voice messages. Two multilevel logistic regression models were created to understand participants’ adherence to the voice message: (a) Model 1 for testing whether a voice message call is picked up; and (b) Model 2 for testing whether a voice message call having been picked up is listened to up to the core messaging part. Results The greater the voice message episode number became, the smaller proportion of the participants picked up the phone calls of voice message (aOR: 0.98; 95% CI: 0.97–0.99; P = .01). Only 854 of 3765 voice message calls having been picked up by the participants (22.7%) were listened to up to their core message parts. It was found that picking up a phone call did not necessarily ensure listening up to the core message part. This indicates a discontinuity between these two actions. Conclusions The participants were likely to stop picking up the phone as the episode number of voice messages progressed. In view of the discontinuity between picking up a phone call and listening up to the core message part, we should not assume that those picking up the phone would automatically complete listening to the entire or core voice message.
This study is a cross-sectional study using two multilevel logistic regression models. This study was conducted in Lagos Mainland, Lagos State, whose population was estimated at 13.5 million as of 2018 [29]. The study site was Lagos Mainland Local Government Area (LGA), one of the most populous LGAs in Lagos State, Nigeria [30]. While Yoruba is the largest ethnic group in Lagos State, there are other ethnic groups such as Egun, Hausa and Igbo [31]. According to the Nigeria Demographic Health Survey 2018, most women (85.9%) in Lagos State owned mobile phones [28]. The study site was one of the targets LGAs of the Project for Strengthening Pro-Poor Community Health Services in Lagos State (the Project), implemented jointly by Lagos State Primary Health Care Board (LSPHCB) and Japan International Cooperation Agency (JICA) during the period from January 2017 to March 2019. The Project aimed to strengthen the primary healthcare service delivery system for urban poor populations. To evaluate the project interventions, the Project conducted a baseline survey in February 2017 and its follow-up survey in July 2018. The study target groups were pregnant women having participated in both the Project’s baseline and follow-up surveys and further expressed their willingness to receive voice messaging interventions (Fig 1). In the follow-up survey, we randomly selected 1000 from the cohort of 2112 pregnant women who participated in the baseline survey conducted in 2017. The number of pregnant women in the baseline survey (n = 2112) was great enough to represent entire pregnant women in the study site. Of 1000 pregnant women selected from the cohort defined in the baseline survey, 698 mothers having given live births were interviewed in the follow-up survey. The reasons for not participating in the follow-up survey included: relocation (n = 56), refusal (n = 78), absence (n = 48), child and/or maternal death (n = 2) and miscarriage (n = 82) ((c) in Fig 1). Of 698 interviewed mothers, 513 indicated the ownership and availability of mobile phones in their households and agreed to receive voice messages as an intervention of this study. The voice messaging intervention consisted of 14 episodes in different topics: (a) introduction; (b); antenatal care; (c) danger signs during pregnancy; (d) postnatal care; (e) newborn care; (f) child illnesses #1; (g) child illnesses #2; (h) exclusive breastfeeding; (i) immunization; (j) complementary feeding; (k) vitamin A supplementation; (l) growth monitoring; (m) prevention of child accidents; and (n) conclusion (Table 1). The messages were composed and then reviewed several times jointly by the health education experts of Lagos State Ministry of Health (LSMOH), LSPHCB, and JICA. The messages were initially composed in English and then translated into five local languages: Egun, Hausa, Igbo, Pidgin English, and Yoruba. Participants received voice messages in one of the five local languages they had chosen in advance. Each episode lasted for approximately 170 seconds, starting with an opening music, a topic-specific dialogue between an announcer and a nurse, and then ending with a closing music. Participants received one episode per day for 14 consecutive dayson their mobile phones registered during the baseline survey. We randomly assigned the participants to either of the two groups. One group received voice messages for 14 days from 8 to 21 December 2018 (Group 1), and the other group did for 14 days from 7 to 20 January 2019 (Group 2). During the first period of message delivery (ie, for Group 1), the same message was broadcasted at least once a day through three local radio stations: (a) Radio Lagos for Yoruba and Egun; (b) Eko FM for Hausa, Igbo and Pidgin English; and (c) Traffick FM for Pidgin English. During the second period of the message delivery (ie, for Group 2), no radio broadcast was made. We planned to randomly allocate the voice message delivery time to each participant from the following three options: (a) initial call at 10 AM and reminder call at noon; (b) initial call at noon and reminder at 2 PM; and (c) initial call at 10 AM and reminder call at 2 PM. Only when a participant did not pick up the initial phone call, another call was delivered as the reminder either at noon or 2 PM. However, a certain proportion of voice messages (12.8%) were delivered after 4 PM, most likely due to technical errors of the voice message system. This study was designed as a cross-sectional study using the following two types of datasets. First, to identify the characteristics of the participants, we used the baseline and follow-up survey data collected by the Project. Those surveys collected the socio-demographic and socio-economic status data using an interviewer-administered structured questionnaire on a computer-assisted personal interview (CAPI) software SurveyCTO (ver 2.20, Dobility Inc., Massachusetts). Second, we used the output data from the voice message system that was operated and managed by a local system development company (eg, participants’ phone numbers, languages participants chose, the episode numbers of voice messages, dates and time of voice message delivery, and the number of minutes during which participants listened to each voice message). When a participant picked up a phone call, the voice message system automatically recorded its time and date, and length of listening time. When a participant failed to pick up an initial phone call, she then had one more chance to receive the voice message delivery as aforementioned. When a participant did not pick up the reminder call, the voice message system recorded the message delivery time and response status (ie, busy, ringing but unanswered). In this study, two multilevel logistic regression models were developed. Model 1 tested whether a voice message call was picked up (whether participants started listening to a voice message), while Model 2 tested whether a voice message call having been picked up was listened to up to the core messaging part. Thus, Model 2 was applied exclusively to those having picked up voice message calls. A dichotomous variable, whether a participant picked up a voice message call (ie, “Picked” and “Did not pick”), was employed as the dependent variable for Model 1. “Picked” was coded when the participant picked up the phone. “Did not pick” was coded when the call record was either busy or ringing but unanswered in the voice message system. A dichotomous variable, whether a participant completed listening up to the core message part (ie, “Completed” and “Did not complete”), was employed as the dependent variable for Model 2. The minimum number of seconds for which a voice message needs to be listened to for participants’ adequate and meaningful understanding was set as the time threshold for each episode (Table 1). When a participant hung up the phone without completing listening up to the time threshold, we assumed that the level of her understanding on the message was partial and inadequate. Thus, “Completed” indicates that a participant listened to the message up to or beyond the time threshold. Alternatively, “Did not complete” indicates that a participant hung up before the time threshold. A total of 14 variables were employed as independent variables for both Model 1 and Model 2. They were composed of: (a) six variables related to mothers (age, religion, marital status, education, employment status, and language chosen for receiving voice messages); (b) two variables related to children born to them during the intervention period (age and sex); (c) three variables related to households (the number of children under five years of age, decision-maker on health, and wealth quintile); and (d) three variables related to the voice messaging intervention (implementation year and month, ownership of mobile phone, episode number of a voice message, and voice message delivery time). Age of children born to participant mothers during the intervention period was classified into three categories as the date of birth was unknown for some children: (a) under 18 months of age; (b) 18–24 months of age; and (c) NA. Wealth quintile was created by sorting out all the mothers’ households according to the wealth index values. Wealth index was calculated by applying a principal components analysis [32] to variables of households’ ownerships of key properties and access to key services (water source, sanitation facility, cooking fuel, materials for floor, roof and external walls, radio, television, refrigerator, generator, fan, air conditioner, computer, bicycle, motorbike, car). All the mothers’ households were divided into five equal-sized groups by wealth index score (ie, poorest, poor, middle, rich, and richest). A series of voice message episodes were numbered from 1 to 14 according to the order of maternal and child health milestones and message deliveries. Most of the independent variables are shown in Table 2, but only voice message delivery time is shown in Table 3. The independent variables are presented separately in these two tables because the total number of cases differs between participant-related variables (n = 513 in Table 2) and intervention-result-related variables (n = 7182 in Table 3). Message delivery time was classified into four categories: (a) between 10 AM and noon; (b) between noon and 2 PM; (c) between 2 PM and 4 PM; and (d) after 4 PM. a Exclude reminding messages The dependent variables for both Model 1 and Model 2 are dichotomous. Since 14 voice messages were delivered to all the participants, the data on participants’ adherence to each 14 voice message were recorded in the voice message system. This study employed multilevel logistic regression analysis with random effect and reported an adjusted odds ratio at 95% confidence interval (CI) and P value using the robust standard error for each model. All data processing and analyses were performed using Stata (ver 15.1, StataCorp LLC, College Station, TX). Ethical approval was obtained from the Health Research and Ethics Committee at Lagos State University Teaching Hospital (Ref: LREC /06/10/764). Written informed consent was obtained from all participants.