Background: Digital health programs, which encompass the subsectors of health information technology, mobile health, electronic health, telehealth, and telemedicine, have the potential to generate “big data.” Objective: Our aim is to evaluate two digital health programs in India-the maternal mobile messaging service (Kilkari) and the mobile training resource for frontline health workers (Mobile Academy). We illustrate possible applications of machine learning for public health practitioners that can be applied to generate evidence on program effectiveness and improve implementation. Kilkari is an outbound service that delivers weekly gestational age-appropriate audio messages about pregnancy, childbirth, and childcare directly to families on their mobile phones, starting from the second trimester of pregnancy until the child is one year old. Mobile Academy is an Interactive Voice Response audio training course for accredited social health activists (ASHAs) in India. Methods: Study participants include pregnant and postpartum women (Kilkari) as well as frontline health workers (Mobile Academy) across 13 states in India. Data elements are drawn from system-generated databases used in the routine implementation of programs to provide users with health information. We explain the structure and elements of the extracted data and the proposed process for their linkage. We then outline the various steps to be undertaken to evaluate and select final algorithms for identifying gaps in data quality, poor user performance, predictors for call receipt, user listening levels, and linkages between early listening and continued engagement. Results: The project has obtained the necessary approvals for the use of data in accordance with global standards for handling personal data. The results are expected to be published in August/September 2019. Conclusions: Rigorous evaluations of digital health programs are limited, and few have included applications of machine learning. By describing the steps to be undertaken in the application of machine learning approaches to the analysis of routine system-generated data, we aim to demystify the use of machine learning not only in evaluating digital health education programs but in improving their performance. Where articles on analysis offer an explanation of the final model selected, here we aim to emphasize the process, thereby illustrating to program implementors and evaluators with limited exposure to machine learning its relevance and potential use within the context of broader program implementation and evaluation.
We present the methods section in parts. We first present a detailed description of the data we plan to use as our source including the architecture of the databases and data elements. Program data are currently held in different databases located in Gurugram and call data records are held in the Mobile Network Operator’s datacenter in Delhi. Next, we provide a description of the data munging (ie, data wrangling) and analysis methods including a brief description of the various machine algorithms under consideration. Auxiliary nurse midwives collect and register details of pregnant women and, after delivery, of postpartum women and children born in their catchment areas. These data are captured in print registers and uploaded at the block level by data entry operators, forming the data in the pregnancy tracking databases. The data collected include personal identifiers such as geographic location, names of women and a mandatory mobile phone number, and where available, details of the pregnancy and childbirth. Data capture happens at two key time points: (1) the earliest is the registration of the woman at the time of the identification of pregnancy, and (2) following childbirth, when the details for delivery care are available. In actual practice, these events may happen many days or months after the event (pregnancy registration or birth of child) has happened. Figure 1 summarizes the databases and flow of data for both Mobile Academy and Kilkari. The following are existing databases: Summary of data flow for Kilkari and Mobile Academy. The registration data on pregnant women and ASHAs are collected by the Ministry of Health and Family Welfare of the Government of India and the ministries of health of the states participating in the program. The data will be analyzed under a data sharing agreement with the Bill & Melinda Gates Foundation and Johns Hopkins University, University of Cape Town, and BBC Media Action. The Institutional Review Boards of Johns Hopkins School of Public Health, Sigma in New Delhi, India, and the University of Cape Town have provided the ethical certification for the study. For the Kilkari program, the pregnant women or postpartum women’s data are captured in the RCH and MCTS systems, or in state-based systems that then pass data to RCH or MCTS, and from there to the MOTECH system. Before the data are accepted by MOTECH, the system automatically runs validations to check that the mobile numbers are in the correct format, locations match location masters in the MOTECH database, and last menstrual period and date of birth are within the Kilkari timeframe. The MOTECH system uses the last menstrual period or the delivery date to determine the schedule of messages to be delivered. The MOTECH engine provides the list of phone numbers (clients) to be called each day to the IVR system, which then calls the numbers and plays the appropriate pre-recorded message, which is stored in the IVR system’s content management system. If the call is not answered, then the IVR system attempts to call again at least 3 times every day for 4 days until the call is answered. For the Mobile Academy program, details on ASHAs including their names, phone numbers, geographic location, and age are contained in either the RCH or MCT databases, or in state-based databases integrated with MCTS and used to register them to Mobile Academy. The MOTECH engine captures these data on ASHAs from the RCH or MCTS databases and following registration to Mobile Academy, ASHAs are eligible to call in to the IVR system using the same phone number provided in the RCH database. The IVR system validates the phone number against the MOTECH system and then retrieves the “bookmark” information that details the status of the ASHA and her progress on the list of content expected to be covered. Based on this information, the appropriate content is delivered to the ASHA via the IVR system and the updated data return to the MOTECH database. The data from the databases (Figure 1) will be extracted onto secure password-protected hard drives from each server storage. Merging data files will be complex given the nature of identifiers across databases. An MCTS record does not have a beneficiary ID. Instead, it has a “Mother” (pregnancy) ID or a “Child” ID. In other words, MCTS tracks pregnancies and births, rather than women. When Kilkari first went live in October 2015, it mirrored the MCTS approach and generated subscription IDs for each pregnancy and then birth. However, the new RCH database does have a unique beneficiary ID, which enables the system to track an individual woman through her multiple pregnancies and the births of children. The architecture of the MOTECH database and Kilkari was changed in December 2016 to introduce a unique beneficiary ID and MOTECH was then integrated with RCH in mid-2017. There is an additional complexity, namely that MOTECH used to allow multiple Kilkari subscriptions on one mobile number, assuming a single phone could be shared by a number of women in a joint family. However, a decision was made to remove this feature in 2017 (July 28 for RCH and October 6 for MCTS) due to the complexity it created in analyzing system-generated data. Hence the analytic time horizon assumed in the analysis may span from 2017-2018 after the MCTS-RCH integration occurred and the aforementioned changes were made. The merging of datasets will occur in India, and only de-identified data will be stored on the hard drives and used in this analysis. As part of Study Aim 1, we will examine the quality of the data for completeness, including patterns and any geographic clustering in missingness. Analyses described in this section are being carried out as part of a larger external evaluation of Kilkari. We describe concurrent efforts to undertake a randomized controlled trial (RCT) in the state of Madhya Pradesh (MP) for Kilkari, inclusive of baseline surveys with pregnant and postpartum women, and ASHA workers. Once identified as part of baseline survey activities and randomized to receive Kilkari content (or no content at all), phone numbers will be fed directly into the MOTECH database for provision of program services. For pregnant women, additional data collected as part of baseline household surveys include demographic factors (age, education, parity, literacy), socioeconomic characteristics (household assets, conditions), health care seeking and practices, as well as data on digital literacy and phone access. These data can be linked to MOTECH, IVR, and call center records to provide additional data elements. Overall, these data as well as data on technology performance (receipt of messages) and user engagement (behavioral performance) with content will help estimate exposure to Kilkari used in the assessment of causality as part of the RCT. For ASHAs, baseline survey data will include similar data elements on demographic, socioeconomic, and mobile literacy and phone access as well as knowledge and work-related variables linked to reported motivation and satisfaction. Overall, these added data elements can be linked to IVR and call record data for this subpopulation of Mobile Academy and Kilkari users in four districts of MP where the RCT is underway. Descriptive statistics, including univariate plots like histograms, will be used to understand the distribution of each variable, including skewness and outliers. Multivariate plots like scatterplots and locally weighted scatterplot smoothing (LOWESS) lines will be used to understand the relationships between different variables. Efforts to prepare the data are divided into two parts: splitting data into training and testing groups, and data processing. To avoid overfitting models that work well for the data in hand but fail to predict well with other datasets, the data will be split into three components. This is possible due to the large size of the dataset. The training set will comprise 60%, the test dataset 20%, and the validation dataset 20% of the data. The test set will be used to test and fine-tune the accuracy of predictive models, and the final selected model will be applied to the validation dataset. We anticipate having data from 2017, 2018, and 2019 and will ensure equal representation by random sampling. To ensure that the data are controlled for time as a confounder, subsets will be equally represented across different time periods. Data processing is the act of preparing the data from its raw format into a usable format by the machine learning models. Indications for data processing will include (1) making the data easier to use; new indicators will need to be created to facilitate their use as predictors, (2) reducing computational cost of many algorithms by decreasing the number of variables, especially correlated and collinear variables, (3) removing noise due to outliers, and (4) making the results easier to understand. The most common methods by which algorithms learn about data to make predictions are supervised, unsupervised, and semisupervised learning [1]. Supervised learning trains algorithms using example input and output data, previously labeled by humans. Data may be labeled—a term used to denote that the outcome (or class) is known (eg, ASHA has completed the training module or not completed the training module)—or unlabeled. In contrast, unsupervised learning is concerned with uncovering structure and patterns within complex datasets based on information that is neither classified nor labeled. In unsupervised machine learning, the algorithms learn to infer structure based on unlabeled input data using clustering techniques. Semisupervised learning is a hybrid analytic technique, applied in contexts where the majority of data points are missing outcome information and yet prediction remains the goal [1]. In this program context, supervised machine learning algorithms are expected to be the primary analytic method employed because analyses are focused on classification using predictors and available data are expected to be labeled. The transformation of variables may be achieved by a variety of techniques including the creation of composite indicators and box-cox transformations. Unsupervised machine learning techniques, including dimensionality reduction techniques such as principal components analysis or K-means clustering, will be carried out as appropriate. Principal component analysis uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance and accounts for the highest proportion of the variance in the data, with each succeeding component accounting for the highest variance possible after accounting for the previous components. K-means clustering is a way to use data to uncover natural groupings within a heterogeneous population (Table 1). To uncover patterns, the algorithm starts by first assigning data points into random groups. The group centers are then calculated, and the group memberships are re-assigned based on the distances between each data point and the group centers. This process is repeated until there are no changes in the group memberships from the previous iteration [16]. In its application to Mobile Academy, K-means clustering will be used to detect patterns in ASHA engagement with training content, including training initiation and completion. Among Kilkari users, K-means clustering will be used to assess patterns in exposure to content by user characteristics based on data elements available in the RCH, including parity, age, and geographic area. Sample of data elements by source for Kilkari & Mobile Academy. Once data have been processed, testing of algorithms will be carried out. Table 2 summarizes the algorithms proposed for training along with their intended applications to Mobile Academy and Kilkari. To determine the model with the best fit, we will explore several machine learning approaches in turn. Models will be fit on the training set, and the fitted model used to predict the responses for the observations in the validation set. The preferred analytic approaches will be selected based on their ability to minimize the total error of the classification, where the latter is defined as the probability that a solution will classify an object under the wrong category. We describe each approach considered below in lay terminology, along with indications for use, and its proposed application in the evaluations of Mobile Academy and Kilkari. Summary of algorithms proposed for testing and their intended application to Mobile Academy and Kilkari. Our choice of methods will include a mix of algorithms based on their strengths and weaknesses and the objective of the process. A comprehensive comparison of supervised learning methods is provided in literature [17,18]. SVM and NNs are expected to perform better with continuous data while the Naïve Bayes method and decision trees perform better with discrete/categorical variables. Naïve Bayes and decision trees have good tolerance to missing values, while NNs and SVM do not. NNs and Naïve Bayes have difficulty handling irrelevant and redundant attributes (ie, extra variables with no useful information or variables with too many categories and too few numbers), while SVM and decision trees are insensitive towards them. Variables with high correlation negatively affect the performance of both Naïve Bayes and NNs, whereas SVM are relatively robust to correlated variables. While Naïve Bayes is robust to noise, NNs are sensitive to poor measurement of variables and susceptible to overfitting. NNs and SVM perform well with multidimensional data and when there is a nonlinear relationship between predictor and outcome. Naïve Bayes requires less memory for both training and validation phase, whereas NN requires large memory allocation across all phases. SVM and NNs usually outperform other methods while Naïve Bayes may yield less accurate results. Table 3 compares the strengths and weakness of different supervised machine learning methods. Performance comparisons of learning algorithms modified from Kotisiantis et al [17,18] (++++ represents the best and + the worst performance). To facilitate decision making on the optimal analytic approach, three steps will be undertaken: (1) develop the correct model for each algorithm using the training dataset, (2) apply the final model for each algorithm on the test dataset, and (3) apply the best performing algorithm on the validation dataset. In Step 1, algorithms will be run using the training dataset comprising 60% of the total sample from across all states for which data are available. For each algorithm, iterative testing will be run to select the best model that fits the data. The emerging results will then be assessed for model fit and accuracy. Table 4 summarizes the four proposed metrics for assessing the performance of each model. Metrics for assessing the performance of each model. aTP: true positive, TN: true negative, FP: false positive, FN: false negative To illustrate the definition of performance metrics for Mobile Academy, we define true positives (TP) as the number of correctly classified ASHAs who have completed the training, and true negatives (TN) as the number of correctly classified ASHAs who have not completed the training. False positives (FP) are defined as the number of ASHAs incorrectly classified as having completed the training, while false negatives (FN) are the number of ASHAs incorrectly classified as not having completed the training. Results from the performance metrics will help define the final model for each algorithm. In Step 2, these final models for each algorithm will be applied to the test dataset, which comprises approximately 20% of the total data. Using the same performance metrics, the models with the best fit and accuracy will be applied to the validation dataset as part of Step 3. Ultimately, predictions for Mobile Academy will aim to determine the probability of the ASHA finishing the course in a predetermined time frame and the possible score/performance of the individual ASHAs. For Kilkari, we will determine predictors for exposure to Kilkari content based on user characteristics, as well as explore the effect of early listening patterns on postpartum engagement and overall exposure.
N/A