Population Health Metrics Research Consortium gold standard verbal autopsy validation study: Design, implementation, and development of analysis datasets

Population Health Metrics, Volume 9, Year 2011

Ukuhunyushwa

Background: Verbal autopsy methods are critically important for evaluating the leading causes of death in populations without adequate vital registration systems. With a myriad of analytical and data collection approaches, it is essential to create a high quality validation dataset from different populations to evaluate comparative method performance and make recommendations for future verbal autopsy implementation. This study was undertaken to compile a set of strictly defined gold standard deaths for which verbal autopsies were collected to validate the accuracy of different methods of verbal autopsy cause of death assignment.Methods: Data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. The Population Health Metrics Research Consortium (PHMRC) developed stringent diagnostic criteria including laboratory, pathology, and medical imaging findings to identify gold standard deaths in health facilities as well as an enhanced verbal autopsy instrument based on World Health Organization (WHO) standards. A cause list was constructed based on the WHO Global Burden of Disease estimates of the leading causes of death, potential to identify unique signs and symptoms, and the likely existence of sufficient medical technology to ascertain gold standard cases. Blinded verbal autopsies were collected on all gold standard deaths.Results: Over 12,000 verbal autopsies on deaths with gold standard diagnoses were collected (7,836 adults, 2,075 children, 1,629 neonates, and 1,002 stillbirths). Difficulties in finding sufficient cases to meet gold standard criteria as well as problems with misclassification for certain causes meant that the target list of causes for analysis was reduced to 34 for adults, 21 for children, and 10 for neonates, excluding stillbirths. To ensure strict independence for the validation of methods and assessment of comparative performance, 500 test-train datasets were created from the universe of cases, covering a range of cause-specific compositions.Conclusions: This unique, robust validation dataset will allow scholars to evaluate the performance of different verbal autopsy analytic methods as well as instrument design. This dataset can be used to inform the implementation of verbal autopsies to more reliably ascertain cause of death in national health information systems. © 2011 Murray et al; licensee BioMed Central Ltd. Gold standard VA data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. Table Table11 shows the age and sex distribution for the decedents represented in this study, as well as the national life expectancy. The age and sex distribution of the decedents represented in the verbal autopsy sample and the national life expectancy for the country according to the 2010 United Nations numbers Research at the Andhra Pradesh, India, site was implemented and coordinated through the George Institute for Global Health, India, and was centered in the main capital city, Hyderabad, as well as the neighboring areas of Ranga Reddy, Medak, and Nalgonda. Hyderabad is 100% urban with a population of roughly 3,830,000 inhabitants. The neighboring area Ranga Reddy has a similar population size (3,575,000) and is roughly half urban and half rural. The Medak and Nalgonda areas are similar to each other, both roughly 14% urban, comprised of 3,248,000 people in Nalgonda and 2,670,000 in Medak. The Bohol Island site was led by the Research Institute for Tropical Medicine in Manila. Bohol is a tropical island province located in the Central Visayas of the Philippines, with 46 municipalities and Tagbilaran City. Verbal autopsies were collected over the entire island, as well as a small proportion from Manila. According to the 2007 census, 1,230,000 people live in Bohol. Manila is urban, while Bohol is divided into roughly 46% urban and 54% rural. The research site in Dar es Salaam, Tanzania, was managed by collaborators at the Muhimbili University of Health and Allied Sciences. Verbal autopsies were collected from all over the city of Dar es Salaam, which has a population of roughly 2,487,000 people according to the 2002 census, with 94% of people living in urban areas and 6% living in rural areas. The Mexican study was coordinated by the National Institute of Public Health in the Federal District and the state of Morelos. According to the 2010 Census, 8.85 million inhabitants live in the Federal District and 1.8 million live in Morelos. Sixteen percent of the population of the state lives in rural areas [36]. Pemba Island, Tanzania, is the smaller of the two islands of the Zanzibar archipelago. The research there was coordinated through the Public Health Laboratory Ivo de Carneri as part of a collaboration between the Ministry of Health and Social Welfare and Johns Hopkins University. Verbal autopsies were collected from all areas of the island. This island has a population of roughly 400,000 inhabitants. The island is 99% rural and 1% semi-urban. Finally, the Uttar Pradesh site in India was led by collaborators at the CSM Medical University (CSMMU, formerly, King George Medical College) in Lucknow. Verbal autopsies were collected from a wide range of districts in the state of Uttar Pradesh: Ambedkar Nagar, Bahraich, Barabanki, Basti, Faizabad, Gonda, Hardoi, Lakhimpur, Lucknow, Rae Bareli, Sitapur, Sultanpur, and Unnao. Table Table22 shows the population and urban percentage for each of these districts. The population size in thousands and percent of population that is urban for the Uttar Pradesh, India field sites, according to the 2001 Census of India The instrument development was based on the WHO standardized verbal autopsy instrument [37], which in turn was based in part on the work of Chandramohan et al. (1994) for adult deaths and of Anker et al. (1999) for neonatal and child deaths [38,39]. Separate questions were developed for neonatal deaths and stillbirths, children 1 month to 11 years, and adults 12 years and older. Experience gained from VA studies in Andhra Pradesh and China where the WHO instrument, or slight variants of it, had been applied was also considered [40,41]. A committee drawn from the principal and associate investigators considered modifications based on published and unpublished experiences with the WHO instrument, including fieldwork conducted as part of a large VA study in Thailand. The final instrument was translated into the respective local languages, and then back-translated to English by a different translator to ensure accuracy. The PHMRC instrument is comprised of a general information module, an adult module, and a child and neonatal module. Skip patterns were integrated into the general information module to collect the age of the deceased and then direct interviewers to the correct module to administer. In administering the WHO instrument, the interviewer must first determine the age of the deceased and select the correct instrument to deliver, which results in the potential for more interviewer error and a less fluid interview. The general information module, which is administered in all verbal autopsies, collects items such as education of the decedent, household characteristics, and a household roster. The adult module collects a history of chronic conditions, symptoms of the deceased, women’s health questions if the decedent is female, alcohol and tobacco use, and injury information; it also transcribes any available medical record and death certificate information. The child and neonatal module first asks background questions on information such as whether the mother is still alive, where the deceased was born, the size of the decedent at birth, and the delivery date. The questionnaire then ascertains whether the decedent was a stillbirth and, if so, collects symptom questions, such as signs of injury. If not, the questionnaire collects more general information such as the age of the baby or child when they became ill and the age at death. If the decedent is under 28 days (inclusive of stillbirths), a maternal history is collected. In addition, if the decedent is under 28 days and was born live, a full set of neonatal symptom questions are collected. If the decedent is between 28 days to 11 years, infant and child symptom questions are asked. All available health records and death certificates are transcribed for both neonatal and child deaths. Finally, for all ages, the open narrative section was moved to the end of the interview, after the structured questions. This was done to ensure that in future work, we could remove the open-ended items without concern that the results collected in this study were a function of the open-ended items coming prior to structured content. In addition to the structural changes, there are important differences between the PHMRC instrument and the WHO instrument. First, the WHO adult module is administered on ages 15 and above, while the PHMRC adult module begins at age 12. This expansion of the ages included in the adult module ensures that conditions clinically present, such as maternal mortality in 12 to 14 year olds, are captured through this instrument. Second, a substantial portion of the questions were reworded to ensure clarity. Medical terminology was converted to easily understandable descriptions to target a lay population. For example, “Did s/he have abdominal distension?” was reworded to “Did [NAME] have a more than usual protruding belly?” Information was also added for precision, or removed to ensure only the most diagnostically relevant information was collected. Similarly, we added or dropped entire questions to capture the most essential information, while reducing the duration of the interview as much as possible. One common question type dropped from the instrument was the duration of certain symptoms. For example, the PHMRC instrument asks whether adults had developed a lump in the neck, armpit, breast, or groin but dropped the follow up question “For how long did s/he have the lumps?” as the presence of the symptom alone was the most important information. Another common question type dropped from the WHO instrument was about treatment that had been received by the decedent, as they were less important in informing the cause of death. Finally, the PHMRC instrument did not include questions about chronic conditions in children, such as cancer, tuberculosis, and diabetes. Additional file 1 illustrates the content questions, such as symptoms experienced by the decedent that were added or dropped when converted from the WHO instrument to the PHMRC instrument. The small wording changes are not included in this additional file, though the full PHMRC instrument is included in Additional file 2 (general module), Additional file 3 (adults), and Additional file 4 (children and neonates) for reference. A key challenge for the study was to identify the cause list for each of the three age groups for which we would seek to collect a sample of gold standard deaths. Our selection of the target cause list was based on consideration of the WHO estimates of the leading causes of death in the developing world in each age group, those causes for which verbal autopsy might be able to function adequately because unique signs and symptoms could potentially be collected in an interview, and the potential to find, in the six sites, deaths with sufficient laboratory, medical imaging, and pathological detail in order that a gold standard cause of death assignment could be made. The cause lists were also designed so that they were mutually exclusive and collectively exhaustive. The target cause list for adults, children, and neonates included 53, 27, and 13 GS causes, respectively, plus stillbirths (for a complete list of causes, see Additional file 5). These cause lists are much longer than for any previously undertaken VA validation study. In fact, nearly all previous VA validation studies have started with a community or convenience sample of deaths and then ascertained cause in hospital records rather than seeking to collect data on a list of causes by design. A critical component of the study was the development, for each cause, of clear criteria that had to be fulfilled for a death to be assigned as a GS cause of death. Depending on the cause of death, these criteria included clinical endpoints, laboratory findings, medical imaging, and pathology. Additional file 6 (adults) and Additional file 7 (children and neonates) provide the gold standard criteria for each cause. These gold standard criteria were developed by a committee of physicians involved in the study and underwent multiple cycles of group review. Preliminary review of hospital records in the sites indicated it would be very difficult to identify any deaths for some causes that would meet the strict gold standard criteria. In order to ensure that as many potentially eligible deaths in each site as possible were collected for the study, a less strict but nevertheless detailed level 2 set of criteria were also developed (see Additional files 6 and 7). In some cases, these level 2 criteria were further disaggregated into level 2A and level 2B. By way of example, the criteria for determining a death as being due to adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia are shown in Table Table33. Examples of gold standard criteria for adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia Level 1 is the most stringent criteria, while level 2A or 2B were also collected for some causes. By recording the level of diagnosis for each death, we are able to test whether the assessment of performance for any method is affected by the level of cause of death assignment according to our criteria. As described above, a stringent set of diagnostic criteria for each cause of death was developed by a team of study physicians before fieldwork began. Each site then enrolled local health facilities at which medical records would be reviewed. Consortium members led a two-day training at each of the sites to train the reviewers in the gold standard definitions, the protocols for identifying cases meeting these criteria, and the procedure for extracting the pertinent medical information. Each reviewer was provided a pocket guide detailing the necessary criteria for each gold standard cause of death. The medical information from qualifying records was extracted using a standard medical data extraction form (MDEF, see Additional file 8), which the study team developed. Once eligible records were extracted, a local physician reviewed the medical information and determined the gold standard level of the particular case according to the diagnostic criteria outlined for each level for each cause. The following information details the specific protocol followed by each research site. In Andhra Pradesh, four hospitals were recruited for the study. Three are government hospitals – Gandhi Hospital, Osmania General Hospital, and Chest Hospital – and one is a private hospital, CARE Foundation. There was 24-hour surveillance at the hospitals and all patients were enrolled with their addresses. Study supervisors collected information on all deceased patients from all wards, and clinicians involved in the study then reviewed the case sheets to select those that conformed to the gold standard criteria (levels 1, 2A, and 2B). The medical information from all qualifying cases selected by the clinicians was extracted and sent to the George Institute Hyderabad office for enrollment in the verbal autopsy study. In Bohol, the majority of deaths were reviewed at the Bohol Regional Hospital. This facility is the referral hospital for Bohol Province with the highest available standards of clinical investigation and hence diagnosis. Three nurses monitored all deaths in the hospital. They ensured that all reports of investigations (imaging and laboratory) were located and attached to the charts. In addition, to augment the number of deaths collected, 467 deaths were recruited from two hospitals in Manila: the Veterans Memorial Medical Center and the Rizal Medical Center. In all locations, the nurses summarized the case notes, including reports of investigations, onto the medical data extraction forms. MDEFs were first reviewed by two study physicians who assigned cause of death and decided by diagnosis and GS level which VAs should not be collected. Deaths were reviewed as soon as possible after the death. At the Dar es Salaam site, five health facilities were used as recruitment sites. These were Mwananyamala Hospital, Temeke Hospital, Muhimbili National Hospital, Ocean Road Cancer Institute, and Hindu Mandal Hospital. Mwananyamala and Temeke are both district hospitals, each of which records roughly 1,500 deaths per year. Ocean Road Cancer Institute is the only cancer treatment facility in Tanzania and was an important source for causes such as cervical cancer, esophageal cancer, breast cancer, leukemia, prostate cancer, and lymphomas. Muhimbili National Hospital is a referral and teaching hospital with a higher mortality rate than the other enrolled facilities. Hindu Mandal Hospital is a private hospital in the heart of Dar es Salaam. It has a well-established HIV/AIDS clinic and commonly receives noncommunicable disease cases. At each location, a nurse affiliated with the study reviewed medical records to identify qualifying cases. The cases identified by the nurses were reviewed by physicians, who filled out the MDEFs with the gold standard levels for the cases that were eligible for enrollment. The nurses spoke with family members of the deceased if present at the hospital to enroll them in the study, collect their consent, and obtain mapping information and directions for a verbal autopsy interview. In Mexico, after obtaining authorization to work in each medical unit, a group of six trained physicians reviewed the medical records of cases (and when available the reports from autopsies) that could be included in the study, filled an extraction form for each case, and classified them as levels 1, 2, or 3 according to the gold standard criteria proposed by the PHMRC. Only cases classified as levels 1 and 2 were considered eligible for the study. The original design considered the inclusion of only one to three large hospitals in Mexico City, but due to the difficulty of completing the quota of gold standard cases, hospitals from the health service network of the Federal District government and from the Ministry of Health of the state of Morelos were included. The data were collected from 36 public hospitals: 33 from the Federal District and three from Morelos. In Pemba, there are four major government hospitals on the island, though most facilities do not have a certified medical doctor present and are managed by medical assistants and nurses. Surveillance systems were put in place in all four hospitals to identify deaths and to classify them into GS categories. The hospital supervisor recorded complete identification information upon admission of each patient, and the attending physician medical assistant confirmed the admission diagnosis. Hospital supervisors ensured that the signs and symptoms experienced by the patient were recorded and that a mortality form with the cause(s) of death was filled out by the attending physician in the event of a death. All forms were sent back to the field headquarters for data entry. A computer algorithm was run to identify cases meeting GS criteria, and all GS cases were recorded in a database. A computer listing was prepared with identifier information to schedule the VA interviews. In Uttar Pradesh, the gold standard deaths were enrolled at CSMMU, Lucknow, which is a tertiary care government facility with patient inflow from all over Uttar Pradesh and bordering states, including districts in the neighboring country of Nepal. The catchment area spreads over a radius of more than 500 km, of which about 85% cases come from 13 districts surrounding Lucknow. There was 24-hour surveillance at facilities and all patients were enrolled with an address. When a death occurred, the project medical officer reviewed the patient case sheet in consultation with the resident doctor in order to assess the GS levels against standard criteria. Once enrolled, the VA interviewers at each site attended a training session led by consortium members using standardized materials and an interviewer’s manual. The training manuals provided information on the study background, the roles and responsibilities of the VA interviewer, background on how VA cases were selected, instructions for administering the questionnaire, and information on every question in the instrument. The manual provided guidance on how to handle an array of questions or concerns, tips for building rapport with the respondents, and probing as needed to collect reliable information. Following the training, VA assignments were given to interviewers blinded to the medical information or cause of death of the decedent along with directions or map queues to the households. In some sites the families were contacted in advance to schedule an appointment, though this decision was left to the sites’ discretion. All interviews were collected after a culturally appropriate grieving period had passed. The minimum grievance period was six days in Bohol and the maximum was six months in Mexico (as required by the ethics boards at the hospitals). The maximum amount of time post-death that an interview was collected was eight months in the Mexico site. The rate of interview refusals varied by site from 1.8% to 9.5%. For those that consented to a verbal autopsy, the instrument was administered on paper in the field, and returned to the field headquarters for double data entry. Interviews lasted an average of 45 minutes across all of the sites. To ensure the highest quality data was collected, quality control checks were performed both at the individual site level, as well as at the Institute for Health Metrics and Evaluation (IHME), where all data were transmitted through a secured password-protected site for analysis. In all sites, supervisors were trained in the protocols for monitoring quality control at the site level. Supervisors were instructed to observe VA interviewers in the field during the early stage of data collection to ensure they were conducted properly and to provide guidance. Supervisors additionally checked every VA form collected throughout the study to ensure that it was filled out consistently and correctly. If issues were identified by the supervisor, a reinterview was conducted as needed. The field interviewers had periodic meetings with their supervisors to discuss performance, progress, and challenges. Supervisors at most sites additionally reinterviewed a portion of the verbal autopsies to spot check the quality of the information collected. At IHME, we systematically evaluated all datasets electronically for numerous types of quality issues by a comprehensive set of codes. First, we reviewed the dataset for missing values and for incorrect skip patterns that result in specific questions having been filled in or left blank erroneously. The dataset was also evaluated to determine if any of the observed values fell outside of expected ranges. For example, if the response for a neonatal symptom duration was greater than 28 days (the cutoff for classification as a neonatal death), this value was flagged. Next, if the dataset was submitted in multiple sections, we examined the final comprehensive database for any technical issues that may have occurred in merging the individual files. Finally, we merged the dataset with the gold standard medical record information, which was separately transmitted to IHME by the site coordinator. We examined the observations for consistency between the two sources of information, such as the sex of the decedent as reported in the medical record and as reported by the verbal autopsy respondent. Any issues determined through this stringent checking process were compiled into a report and sent to the site to review. Site coordinators were asked to speak with the interview staff and rectify any correctable issues such as data entry mistakes. In addition to the full dataset as it was collected, we have also created a series of dichotomous variables from each of the polytomous (categorical) and continuous (duration) variables. Some analytical methods can only use dichotomized variables, so this effort to create the dichotomous variables increases the information available to these types of empirical methods. For each continuous duration item, depending on the item, we identified a short or long cutoff. For example, a duration of 8.8 days marks long duration of a fever. If a VA reports a fever of 10 days, it is considered to have the symptom of “having a long fever.” We determine the cutoff as being two median absolute deviations above the median of the mean durations across causes (MAD estimator). The MAD estimator can be used as a robust measure of the standard deviation and is especially useful in cases where extremely long durations may be reported, which would bias measures such as the standard deviation. Additional file 9 shows the cutoffs for each item developed in this way. For polytomous variables, we examined the pattern of the endorsement rates across causes and mapped the categories into two, thus creating a dichotomous version of the variable. For example, we judged that there was a stronger signal produced by combining moderate and severe fevers. Additional file 10 shows the mapping of each response category into dichotomous variables. Based on the data collected, some polytomous variables appeared to have little or no information content and were not mapped into a dichotomous form. These low information content items are shown in Additional file 11. This exercise was undertaken for neonatal, child, and adult modules separately. There has long been concern that the performance of a VA instrument and the associated analytical method for assigning cause could be different for deaths where the decedent died in a hospital or had made extensive use of health services prior to death, compared to deaths with no health care experience (HCE). As an attempt to examine how VA may work in communities with limited or no access to health care services, Murray et al. [12] studied how PCVA and the Symptom Pattern Method performed when all items referring to use of health services such as “Have you ever been diagnosed with…” or hospital records or death certificates were excluded from the analysis. They showed that, in China, recall of the household or possession of medical records recorded in the VA interview had a profound effect on both the concordance for PCVA as well as the performance of the Symptom Pattern Method. Given this empirical finding, we believe it is useful to test how excluding household recall of health care experience likely provides a more realistic assessment of how VA performs in communities without access to health services. As such, we have created two versions of the datasets developed above, one version with all variables and one version excluding recall of health care and medical records. Specifically, the without HCE dataset excludes the following information. First, a series of questions asked if the deceased had any specified conditions, which would likely indicate a health care provider had diagnosed the individual. Each of the following conditions was asked: “Did decedent have [asthma, hypertension, obesity, stroke, tuberculosis, AIDS, arthritis, cancer, COPD, dementia, depression, diabetes, epilepsy, heart disease]?” Second, if any medical records were available, the interviewer was asked to provide a transcription of the last note on the medical record. Third, if a death certificate was available, the interviewer was asked to record the immediate cause of death, first underlying cause, second underlying cause, third underlying cause, and contributing causes from the death certificate. Finally, at the end of the questionnaire, an open-ended section was provided to collect any comments from the interviewer, as well as to ask the respondent “to summarize, or tell us in your own words, any additional information about the illness and/or death of your loved one?” Excluding this entire section excludes both open narrative recall of HCE but also, in the case of PCVA, excludes any other information on timing and sequencing of signs and symptoms that might be conveyed in this section. The structured instrument includes various open text items. First, some questions in the instrument ask the respondent to choose from a list of specified response options. For example, “Where was the rash located?” has the following response options: face, trunk, extremities, everywhere, or “other (specify: ____).” If the response is not one of the listed options, the respondent is asked to fill in the location of the rash as the “other” response. The questions that include an “other” free text response option are as follows: “Where was the rash located?”; “Where was the pain located?”; “Which were the limbs or body parts paralyzed?”; “What kind of tobacco did [NAME] use?”; “Did [NAME] suffer from an injury or accident such as a ____?”; “Where was the deceased born?”; “What were the abnormalities?” in reference to any abnormalities at time of delivery; “Where did the deceased die?”; “What was the color of the liquor when the water broke?” in reference to labor; “Where did the delivery occur?”; and “Who delivered the baby?” In the questions that collect information about a health facility or midwife, free text responses collected the name and address of the place or person. In addition to these free text items, if any medical record or death certificates were available, the interviewer was asked to transcribe the information from the records as free text. Finally, at the end of each interview, the open narrative question “Summarize, or tell us in your own words, any additional information about the illness and/or death of your loved one?”(as described above) was collected in addition to any notes from the interviewer. Open text could in theory be highly informative, especially household recall of HCE and an interviewer’s direct recording of death records or hospital records kept by the household. These observations are likely to be available in populations with some access to health care services. To make this information available to automated methods, we processed open text in the following steps. First, all free text was compiled into a database and a dictionary was created to map all similar words to the same stem word. For example, the terms AMI, myocardial infarction syndrome, acute myocardial infarction, ISHD, MI, coronary heart disease, CHD, IHD, MCI, and MYIN would all be mapped by the dictionary into the same variable (“IHD: Acute Myocardial Infarction”). Next, a program called README [42] extracts each individual variable and assigns a frequency count for the number of times it appears in the entire free text database. Variables that are not deemed to be diagnostically relevant or that are very low in frequency are then dropped from the dataset. The final product is a condensed dictionary of medically important terms consisting of 106 variables for adults, 90 for children, and 39 for neonates. These terms are added as additional binary symptoms (present or not present) in the VA database. If any of the terms appear in the free text for a particular death, it is counted as a positive endorsement for that symptom. These symptoms are not used in the “without” HCE dataset. Additional file 12 provides the comprehensive dictionary that was developed. For empirical VA methods that must be developed using the pattern of responses observed in a dataset, validation needs to be undertaken on a set of deaths that were not included in the development of the method. This is the concept of a training dataset distinct from a test dataset. Further, as recommended in Murray et al. [15] it is important to have test datasets with widely varying cause-specific mortality fractions (CSMFs) so that a VA method does not by chance appear to be better than another because of the specific CSMF composition in the training set. To facilitate strict comparability, we have created 500 train-test dataset pairs. Each pair was created by first splitting the data randomly (without replacement) into 75%/25% training and test datasets, cause by cause, and then resampling the data in the test dataset (with replacement) to have 7,836 adult, 2,075 child, 1,629 neonatal, and 1,002 stillbirth deaths, matching a cause composition drawn from an uninformative Dirichlet distribution (Figure (Figure1).1). In other words, each test dataset has been resampled to have a different CSMF composition. Because the CSMF compositions have been drawn from an uninformative Dirichlet, across the 500 test datasets, there are cases where any given cause has a cause fraction near zero and cause fractions as high as 20% or more. By the nature of this sampling strategy, there is no correlation between the CSMF composition of the training and test dataset pairs. The process of generating 500 test and training datasets (done separately for each cause of death). In order to have an efficient cause list for the analysis, we have reduced it in two steps as illustrated in Table Table4.4. From the original gold standard target cause list we received deaths from the sites for 53 diseases in adults, 27 in children, and 13 in neonates, excluding stillbirths. The first step was to select only those causes with 15 or more deaths (see Additional file 5 for a detailed mapping), and due to that decision we reduced the list into 46 adult causes, 22 child causes, and 12 neonate causes, excluding stillbirths. For instance, pelvic inflammatory diseases, uterine cancer, and dementia in adults; AIDS with tuberculosis in children; and meningitis in neonates had fewer than 15 deaths each. We also eliminated pertussis in children and neonatal tetanus because no pertussis and only four neonatal tetanus deaths were gathered. These deaths were assigned to one of the remaining categories, such as residual categories like “other defined cancers” or “other childhood infectious diseases.” In the next step we explored the frequency with which one cause was erroneously classified as another cause in the analysis. For example, deaths due to maternal hemorrhage were often assigned to anemia in the analysis and vice versa. Similarly, all types of diabetes in adults (diabetes with coma, with renal failure, or with skin infection), sepsis with and without local bacterial infection in children, and respiratory distress syndrome in neonates regardless of the gestational age were all frequently hard to differentiate in the analysis. The causes that were frequently confused with each other were aggregated into a new cause in the final analysis cause list. For example, all six maternal causes were combined into one maternal category. After this step, the final cause list for analysis had 34 causes for adults, 21 for children, and 10 for neonates, excluding stillbirths. Reduction in number of causes to the final analysis cause list, excluding stillbirths

I-AI Digest

Study Justification:

– Verbal autopsy methods are important for evaluating causes of death in populations without adequate vital registration systems.
– This study aimed to create a high-quality validation dataset to evaluate the accuracy of different verbal autopsy methods in assigning cause of death.

Highlights:

– Data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India.
– Over 12,000 verbal autopsies on deaths with gold standard diagnoses were collected.
– Difficulties in finding sufficient cases and misclassification for certain causes led to a reduced target list of causes for analysis.
– 500 test-train datasets were created to evaluate method performance and cause-specific compositions.

Recommendations:

– The unique validation dataset can be used to evaluate the performance of different verbal autopsy analytic methods and inform the implementation of verbal autopsies in national health information systems.
– Further research is needed to improve the accuracy and efficiency of verbal autopsy methods.

Key Role Players:

– Researchers and scholars in the field of population health metrics
– Health facilities and medical professionals involved in data collection and validation
– Policy makers and government officials responsible for implementing verbal autopsy programs

Cost Items for Planning Recommendations:

– Training and capacity building for researchers and interviewers
– Data collection and management tools
– Travel and logistics for site visits and coordination
– Communication and dissemination of study findings
– Monitoring and quality control measures

Amandla Obufakazi

The strength of evidence for this abstract is 7 out of 10.
The evidence in the abstract is strong because it describes a well-designed study that collected over 12,000 verbal autopsies on deaths with gold standard diagnoses. However, there were difficulties in finding sufficient cases to meet gold standard criteria and problems with misclassification for certain causes. To improve the evidence, future studies could focus on increasing the number of cases that meet gold standard criteria and addressing the issues with misclassification.

Abstract

Gold standard VA data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. Table Table11 shows the age and sex distribution for the decedents represented in this study, as well as the national life expectancy. The age and sex distribution of the decedents represented in the verbal autopsy sample and the national life expectancy for the country according to the 2010 United Nations numbers Research at the Andhra Pradesh, India, site was implemented and coordinated through the George Institute for Global Health, India, and was centered in the main capital city, Hyderabad, as well as the neighboring areas of Ranga Reddy, Medak, and Nalgonda. Hyderabad is 100% urban with a population of roughly 3,830,000 inhabitants. The neighboring area Ranga Reddy has a similar population size (3,575,000) and is roughly half urban and half rural. The Medak and Nalgonda areas are similar to each other, both roughly 14% urban, comprised of 3,248,000 people in Nalgonda and 2,670,000 in Medak. The Bohol Island site was led by the Research Institute for Tropical Medicine in Manila. Bohol is a tropical island province located in the Central Visayas of the Philippines, with 46 municipalities and Tagbilaran City. Verbal autopsies were collected over the entire island, as well as a small proportion from Manila. According to the 2007 census, 1,230,000 people live in Bohol. Manila is urban, while Bohol is divided into roughly 46% urban and 54% rural. The research site in Dar es Salaam, Tanzania, was managed by collaborators at the Muhimbili University of Health and Allied Sciences. Verbal autopsies were collected from all over the city of Dar es Salaam, which has a population of roughly 2,487,000 people according to the 2002 census, with 94% of people living in urban areas and 6% living in rural areas. The Mexican study was coordinated by the National Institute of Public Health in the Federal District and the state of Morelos. According to the 2010 Census, 8.85 million inhabitants live in the Federal District and 1.8 million live in Morelos. Sixteen percent of the population of the state lives in rural areas [36]. Pemba Island, Tanzania, is the smaller of the two islands of the Zanzibar archipelago. The research there was coordinated through the Public Health Laboratory Ivo de Carneri as part of a collaboration between the Ministry of Health and Social Welfare and Johns Hopkins University. Verbal autopsies were collected from all areas of the island. This island has a population of roughly 400,000 inhabitants. The island is 99% rural and 1% semi-urban. Finally, the Uttar Pradesh site in India was led by collaborators at the CSM Medical University (CSMMU, formerly, King George Medical College) in Lucknow. Verbal autopsies were collected from a wide range of districts in the state of Uttar Pradesh: Ambedkar Nagar, Bahraich, Barabanki, Basti, Faizabad, Gonda, Hardoi, Lakhimpur, Lucknow, Rae Bareli, Sitapur, Sultanpur, and Unnao. Table Table22 shows the population and urban percentage for each of these districts. The population size in thousands and percent of population that is urban for the Uttar Pradesh, India field sites, according to the 2001 Census of India The instrument development was based on the WHO standardized verbal autopsy instrument [37], which in turn was based in part on the work of Chandramohan et al. (1994) for adult deaths and of Anker et al. (1999) for neonatal and child deaths [38,39]. Separate questions were developed for neonatal deaths and stillbirths, children 1 month to 11 years, and adults 12 years and older. Experience gained from VA studies in Andhra Pradesh and China where the WHO instrument, or slight variants of it, had been applied was also considered [40,41]. A committee drawn from the principal and associate investigators considered modifications based on published and unpublished experiences with the WHO instrument, including fieldwork conducted as part of a large VA study in Thailand. The final instrument was translated into the respective local languages, and then back-translated to English by a different translator to ensure accuracy. The PHMRC instrument is comprised of a general information module, an adult module, and a child and neonatal module. Skip patterns were integrated into the general information module to collect the age of the deceased and then direct interviewers to the correct module to administer. In administering the WHO instrument, the interviewer must first determine the age of the deceased and select the correct instrument to deliver, which results in the potential for more interviewer error and a less fluid interview. The general information module, which is administered in all verbal autopsies, collects items such as education of the decedent, household characteristics, and a household roster. The adult module collects a history of chronic conditions, symptoms of the deceased, women’s health questions if the decedent is female, alcohol and tobacco use, and injury information; it also transcribes any available medical record and death certificate information. The child and neonatal module first asks background questions on information such as whether the mother is still alive, where the deceased was born, the size of the decedent at birth, and the delivery date. The questionnaire then ascertains whether the decedent was a stillbirth and, if so, collects symptom questions, such as signs of injury. If not, the questionnaire collects more general information such as the age of the baby or child when they became ill and the age at death. If the decedent is under 28 days (inclusive of stillbirths), a maternal history is collected. In addition, if the decedent is under 28 days and was born live, a full set of neonatal symptom questions are collected. If the decedent is between 28 days to 11 years, infant and child symptom questions are asked. All available health records and death certificates are transcribed for both neonatal and child deaths. Finally, for all ages, the open narrative section was moved to the end of the interview, after the structured questions. This was done to ensure that in future work, we could remove the open-ended items without concern that the results collected in this study were a function of the open-ended items coming prior to structured content. In addition to the structural changes, there are important differences between the PHMRC instrument and the WHO instrument. First, the WHO adult module is administered on ages 15 and above, while the PHMRC adult module begins at age 12. This expansion of the ages included in the adult module ensures that conditions clinically present, such as maternal mortality in 12 to 14 year olds, are captured through this instrument. Second, a substantial portion of the questions were reworded to ensure clarity. Medical terminology was converted to easily understandable descriptions to target a lay population. For example, “Did s/he have abdominal distension?” was reworded to “Did [NAME] have a more than usual protruding belly?” Information was also added for precision, or removed to ensure only the most diagnostically relevant information was collected. Similarly, we added or dropped entire questions to capture the most essential information, while reducing the duration of the interview as much as possible. One common question type dropped from the instrument was the duration of certain symptoms. For example, the PHMRC instrument asks whether adults had developed a lump in the neck, armpit, breast, or groin but dropped the follow up question “For how long did s/he have the lumps?” as the presence of the symptom alone was the most important information. Another common question type dropped from the WHO instrument was about treatment that had been received by the decedent, as they were less important in informing the cause of death. Finally, the PHMRC instrument did not include questions about chronic conditions in children, such as cancer, tuberculosis, and diabetes. Additional file 1 illustrates the content questions, such as symptoms experienced by the decedent that were added or dropped when converted from the WHO instrument to the PHMRC instrument. The small wording changes are not included in this additional file, though the full PHMRC instrument is included in Additional file 2 (general module), Additional file 3 (adults), and Additional file 4 (children and neonates) for reference. A key challenge for the study was to identify the cause list for each of the three age groups for which we would seek to collect a sample of gold standard deaths. Our selection of the target cause list was based on consideration of the WHO estimates of the leading causes of death in the developing world in each age group, those causes for which verbal autopsy might be able to function adequately because unique signs and symptoms could potentially be collected in an interview, and the potential to find, in the six sites, deaths with sufficient laboratory, medical imaging, and pathological detail in order that a gold standard cause of death assignment could be made. The cause lists were also designed so that they were mutually exclusive and collectively exhaustive. The target cause list for adults, children, and neonates included 53, 27, and 13 GS causes, respectively, plus stillbirths (for a complete list of causes, see Additional file 5). These cause lists are much longer than for any previously undertaken VA validation study. In fact, nearly all previous VA validation studies have started with a community or convenience sample of deaths and then ascertained cause in hospital records rather than seeking to collect data on a list of causes by design. A critical component of the study was the development, for each cause, of clear criteria that had to be fulfilled for a death to be assigned as a GS cause of death. Depending on the cause of death, these criteria included clinical endpoints, laboratory findings, medical imaging, and pathology. Additional file 6 (adults) and Additional file 7 (children and neonates) provide the gold standard criteria for each cause. These gold standard criteria were developed by a committee of physicians involved in the study and underwent multiple cycles of group review. Preliminary review of hospital records in the sites indicated it would be very difficult to identify any deaths for some causes that would meet the strict gold standard criteria. In order to ensure that as many potentially eligible deaths in each site as possible were collected for the study, a less strict but nevertheless detailed level 2 set of criteria were also developed (see Additional files 6 and 7). In some cases, these level 2 criteria were further disaggregated into level 2A and level 2B. By way of example, the criteria for determining a death as being due to adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia are shown in Table Table33. Examples of gold standard criteria for adult breast cancer, adult acute myocardial infarction, child pneumonia, and neonatal birth asphyxia Level 1 is the most stringent criteria, while level 2A or 2B were also collected for some causes. By recording the level of diagnosis for each death, we are able to test whether the assessment of performance for any method is affected by the level of cause of death assignment according to our criteria. As described above, a stringent set of diagnostic criteria for each cause of death was developed by a team of study physicians before fieldwork began. Each site then enrolled local health facilities at which medical records would be reviewed. Consortium members led a two-day training at each of the sites to train the reviewers in the gold standard definitions, the protocols for identifying cases meeting these criteria, and the procedure for extracting the pertinent medical information. Each reviewer was provided a pocket guide detailing the necessary criteria for each gold standard cause of death. The medical information from qualifying records was extracted using a standard medical data extraction form (MDEF, see Additional file 8), which the study team developed. Once eligible records were extracted, a local physician reviewed the medical information and determined the gold standard level of the particular case according to the diagnostic criteria outlined for each level for each cause. The following information details the specific protocol followed by each research site. In Andhra Pradesh, four hospitals were recruited for the study. Three are government hospitals – Gandhi Hospital, Osmania General Hospital, and Chest Hospital – and one is a private hospital, CARE Foundation. There was 24-hour surveillance at the hospitals and all patients were enrolled with their addresses. Study supervisors collected information on all deceased patients from all wards, and clinicians involved in the study then reviewed the case sheets to select those that conformed to the gold standard criteria (levels 1, 2A, and 2B). The medical information from all qualifying cases selected by the clinicians was extracted and sent to the George Institute Hyderabad office for enrollment in the verbal autopsy study. In Bohol, the majority of deaths were reviewed at the Bohol Regional Hospital. This facility is the referral hospital for Bohol Province with the highest available standards of clinical investigation and hence diagnosis. Three nurses monitored all deaths in the hospital. They ensured that all reports of investigations (imaging and laboratory) were located and attached to the charts. In addition, to augment the number of deaths collected, 467 deaths were recruited from two hospitals in Manila: the Veterans Memorial Medical Center and the Rizal Medical Center. In all locations, the nurses summarized the case notes, including reports of investigations, onto the medical data extraction forms. MDEFs were first reviewed by two study physicians who assigned cause of death and decided by diagnosis and GS level which VAs should not be collected. Deaths were reviewed as soon as possible after the death. At the Dar es Salaam site, five health facilities were used as recruitment sites. These were Mwananyamala Hospital, Temeke Hospital, Muhimbili National Hospital, Ocean Road Cancer Institute, and Hindu Mandal Hospital. Mwananyamala and Temeke are both district hospitals, each of which records roughly 1,500 deaths per year. Ocean Road Cancer Institute is the only cancer treatment facility in Tanzania and was an important source for causes such as cervical cancer, esophageal cancer, breast cancer, leukemia, prostate cancer, and lymphomas. Muhimbili National Hospital is a referral and teaching hospital with a higher mortality rate than the other enrolled facilities. Hindu Mandal Hospital is a private hospital in the heart of Dar es Salaam. It has a well-established HIV/AIDS clinic and commonly receives noncommunicable disease cases. At each location, a nurse affiliated with the study reviewed medical records to identify qualifying cases. The cases identified by the nurses were reviewed by physicians, who filled out the MDEFs with the gold standard levels for the cases that were eligible for enrollment. The nurses spoke with family members of the deceased if present at the hospital to enroll them in the study, collect their consent, and obtain mapping information and directions for a verbal autopsy interview. In Mexico, after obtaining authorization to work in each medical unit, a group of six trained physicians reviewed the medical records of cases (and when available the reports from autopsies) that could be included in the study, filled an extraction form for each case, and classified them as levels 1, 2, or 3 according to the gold standard criteria proposed by the PHMRC. Only cases classified as levels 1 and 2 were considered eligible for the study. The original design considered the inclusion of only one to three large hospitals in Mexico City, but due to the difficulty of completing the quota of gold standard cases, hospitals from the health service network of the Federal District government and from the Ministry of Health of the state of Morelos were included. The data were collected from 36 public hospitals: 33 from the Federal District and three from Morelos. In Pemba, there are four major government hospitals on the island, though most facilities do not have a certified medical doctor present and are managed by medical assistants and nurses. Surveillance systems were put in place in all four hospitals to identify deaths and to classify them into GS categories. The hospital supervisor recorded complete identification information upon admission of each patient, and the attending physician medical assistant confirmed the admission diagnosis. Hospital supervisors ensured that the signs and symptoms experienced by the patient were recorded and that a mortality form with the cause(s) of death was filled out by the attending physician in the event of a death. All forms were sent back to the field headquarters for data entry. A computer algorithm was run to identify cases meeting GS criteria, and all GS cases were recorded in a database. A computer listing was prepared with identifier information to schedule the VA interviews. In Uttar Pradesh, the gold standard deaths were enrolled at CSMMU, Lucknow, which is a tertiary care government facility with patient inflow from all over Uttar Pradesh and bordering states, including districts in the neighboring country of Nepal. The catchment area spreads over a radius of more than 500 km, of which about 85% cases come from 13 districts surrounding Lucknow. There was 24-hour surveillance at facilities and all patients were enrolled with an address. When a death occurred, the project medical officer reviewed the patient case sheet in consultation with the resident doctor in order to assess the GS levels against standard criteria. Once enrolled, the VA interviewers at each site attended a training session led by consortium members using standardized materials and an interviewer’s manual. The training manuals provided information on the study background, the roles and responsibilities of the VA interviewer, background on how VA cases were selected, instructions for administering the questionnaire, and information on every question in the instrument. The manual provided guidance on how to handle an array of questions or concerns, tips for building rapport with the respondents, and probing as needed to collect reliable information. Following the training, VA assignments were given to interviewers blinded to the medical information or cause of death of the decedent along with directions or map queues to the households. In some sites the families were contacted in advance to schedule an appointment, though this decision was left to the sites’ discretion. All interviews were collected after a culturally appropriate grieving period had passed. The minimum grievance period was six days in Bohol and the maximum was six months in Mexico (as required by the ethics boards at the hospitals). The maximum amount of time post-death that an interview was collected was eight months in the Mexico site. The rate of interview refusals varied by site from 1.8% to 9.5%. For those that consented to a verbal autopsy, the instrument was administered on paper in the field, and returned to the field headquarters for double data entry. Interviews lasted an average of 45 minutes across all of the sites. To ensure the highest quality data was collected, quality control checks were performed both at the individual site level, as well as at the Institute for Health Metrics and Evaluation (IHME), where all data were transmitted through a secured password-protected site for analysis. In all sites, supervisors were trained in the protocols for monitoring quality control at the site level. Supervisors were instructed to observe VA interviewers in the field during the early stage of data collection to ensure they were conducted properly and to provide guidance. Supervisors additionally checked every VA form collected throughout the study to ensure that it was filled out consistently and correctly. If issues were identified by the supervisor, a reinterview was conducted as needed. The field interviewers had periodic meetings with their supervisors to discuss performance, progress, and challenges. Supervisors at most sites additionally reinterviewed a portion of the verbal autopsies to spot check the quality of the information collected. At IHME, we systematically evaluated all datasets electronically for numerous types of quality issues by a comprehensive set of codes. First, we reviewed the dataset for missing values and for incorrect skip patterns that result in specific questions having been filled in or left blank erroneously. The dataset was also evaluated to determine if any of the observed values fell outside of expected ranges. For example, if the response for a neonatal symptom duration was greater than 28 days (the cutoff for classification as a neonatal death), this value was flagged. Next, if the dataset was submitted in multiple sections, we examined the final comprehensive database for any technical issues that may have occurred in merging the individual files. Finally, we merged the dataset with the gold standard medical record information, which was separately transmitted to IHME by the site coordinator. We examined the observations for consistency between the two sources of information, such as the sex of the decedent as reported in the medical record and as reported by the verbal autopsy respondent. Any issues determined through this stringent checking process were compiled into a report and sent to the site to review. Site coordinators were asked to speak with the interview staff and rectify any correctable issues such as data entry mistakes. In addition to the full dataset as it was collected, we have also created a series of dichotomous variables from each of the polytomous (categorical) and continuous (duration) variables. Some analytical methods can only use dichotomized variables, so this effort to create the dichotomous variables increases the information available to these types of empirical methods. For each continuous duration item, depending on the item, we identified a short or long cutoff. For example, a duration of 8.8 days marks long duration of a fever. If a VA reports a fever of 10 days, it is considered to have the symptom of “having a long fever.” We determine the cutoff as being two median absolute deviations above the median of the mean durations across causes (MAD estimator). The MAD estimator can be used as a robust measure of the standard deviation and is especially useful in cases where extremely long durations may be reported, which would bias measures such as the standard deviation. Additional file 9 shows the cutoffs for each item developed in this way. For polytomous variables, we examined the pattern of the endorsement rates across causes and mapped the categories into two, thus creating a dichotomous version of the variable. For example, we judged that there was a stronger signal produced by combining moderate and severe fevers. Additional file 10 shows the mapping of each response category into dichotomous variables. Based on the data collected, some polytomous variables appeared to have little or no information content and were not mapped into a dichotomous form. These low information content items are shown in Additional file 11. This exercise was undertaken for neonatal, child, and adult modules separately. There has long been concern that the performance of a VA instrument and the associated analytical method for assigning cause could be different for deaths where the decedent died in a hospital or had made extensive use of health services prior to death, compared to deaths with no health care experience (HCE). As an attempt to examine how VA may work in communities with limited or no access to health care services, Murray et al. [12] studied how PCVA and the Symptom Pattern Method performed when all items referring to use of health services such as “Have you ever been diagnosed with…” or hospital records or death certificates were excluded from the analysis. They showed that, in China, recall of the household or possession of medical records recorded in the VA interview had a profound effect on both the concordance for PCVA as well as the performance of the Symptom Pattern Method. Given this empirical finding, we believe it is useful to test how excluding household recall of health care experience likely provides a more realistic assessment of how VA performs in communities without access to health services. As such, we have created two versions of the datasets developed above, one version with all variables and one version excluding recall of health care and medical records. Specifically, the without HCE dataset excludes the following information. First, a series of questions asked if the deceased had any specified conditions, which would likely indicate a health care provider had diagnosed the individual. Each of the following conditions was asked: “Did decedent have [asthma, hypertension, obesity, stroke, tuberculosis, AIDS, arthritis, cancer, COPD, dementia, depression, diabetes, epilepsy, heart disease]?” Second, if any medical records were available, the interviewer was asked to provide a transcription of the last note on the medical record. Third, if a death certificate was available, the interviewer was asked to record the immediate cause of death, first underlying cause, second underlying cause, third underlying cause, and contributing causes from the death certificate. Finally, at the end of the questionnaire, an open-ended section was provided to collect any comments from the interviewer, as well as to ask the respondent “to summarize, or tell us in your own words, any additional information about the illness and/or death of your loved one?” Excluding this entire section excludes both open narrative recall of HCE but also, in the case of PCVA, excludes any other information on timing and sequencing of signs and symptoms that might be conveyed in this section. The structured instrument includes various open text items. First, some questions in the instrument ask the respondent to choose from a list of specified response options. For example, “Where was the rash located?” has the following response options: face, trunk, extremities, everywhere, or “other (specify: ____).” If the response is not one of the listed options, the respondent is asked to fill in the location of the rash as the “other” response. The questions that include an “other” free text response option are as follows: “Where was the rash located?”; “Where was the pain located?”; “Which were the limbs or body parts paralyzed?”; “What kind of tobacco did [NAME] use?”; “Did [NAME] suffer from an injury or accident such as a ____?”; “Where was the deceased born?”; “What were the abnormalities?” in reference to any abnormalities at time of delivery; “Where did the deceased die?”; “What was the color of the liquor when the water broke?” in reference to labor; “Where did the delivery occur?”; and “Who delivered the baby?” In the questions that collect information about a health facility or midwife, free text responses collected the name and address of the place or person. In addition to these free text items, if any medical record or death certificates were available, the interviewer was asked to transcribe the information from the records as free text. Finally, at the end of each interview, the open narrative question “Summarize, or tell us in your own words, any additional information about the illness and/or death of your loved one?”(as described above) was collected in addition to any notes from the interviewer. Open text could in theory be highly informative, especially household recall of HCE and an interviewer’s direct recording of death records or hospital records kept by the household. These observations are likely to be available in populations with some access to health care services. To make this information available to automated methods, we processed open text in the following steps. First, all free text was compiled into a database and a dictionary was created to map all similar words to the same stem word. For example, the terms AMI, myocardial infarction syndrome, acute myocardial infarction, ISHD, MI, coronary heart disease, CHD, IHD, MCI, and MYIN would all be mapped by the dictionary into the same variable (“IHD: Acute Myocardial Infarction”). Next, a program called README [42] extracts each individual variable and assigns a frequency count for the number of times it appears in the entire free text database. Variables that are not deemed to be diagnostically relevant or that are very low in frequency are then dropped from the dataset. The final product is a condensed dictionary of medically important terms consisting of 106 variables for adults, 90 for children, and 39 for neonates. These terms are added as additional binary symptoms (present or not present) in the VA database. If any of the terms appear in the free text for a particular death, it is counted as a positive endorsement for that symptom. These symptoms are not used in the “without” HCE dataset. Additional file 12 provides the comprehensive dictionary that was developed. For empirical VA methods that must be developed using the pattern of responses observed in a dataset, validation needs to be undertaken on a set of deaths that were not included in the development of the method. This is the concept of a training dataset distinct from a test dataset. Further, as recommended in Murray et al. [15] it is important to have test datasets with widely varying cause-specific mortality fractions (CSMFs) so that a VA method does not by chance appear to be better than another because of the specific CSMF composition in the training set. To facilitate strict comparability, we have created 500 train-test dataset pairs. Each pair was created by first splitting the data randomly (without replacement) into 75%/25% training and test datasets, cause by cause, and then resampling the data in the test dataset (with replacement) to have 7,836 adult, 2,075 child, 1,629 neonatal, and 1,002 stillbirth deaths, matching a cause composition drawn from an uninformative Dirichlet distribution (Figure (Figure1).1). In other words, each test dataset has been resampled to have a different CSMF composition. Because the CSMF compositions have been drawn from an uninformative Dirichlet, across the 500 test datasets, there are cases where any given cause has a cause fraction near zero and cause fractions as high as 20% or more. By the nature of this sampling strategy, there is no correlation between the CSMF composition of the training and test dataset pairs. The process of generating 500 test and training datasets (done separately for each cause of death). In order to have an efficient cause list for the analysis, we have reduced it in two steps as illustrated in Table Table4.4. From the original gold standard target cause list we received deaths from the sites for 53 diseases in adults, 27 in children, and 13 in neonates, excluding stillbirths. The first step was to select only those causes with 15 or more deaths (see Additional file 5 for a detailed mapping), and due to that decision we reduced the list into 46 adult causes, 22 child causes, and 12 neonate causes, excluding stillbirths. For instance, pelvic inflammatory diseases, uterine cancer, and dementia in adults; AIDS with tuberculosis in children; and meningitis in neonates had fewer than 15 deaths each. We also eliminated pertussis in children and neonatal tetanus because no pertussis and only four neonatal tetanus deaths were gathered. These deaths were assigned to one of the remaining categories, such as residual categories like “other defined cancers” or “other childhood infectious diseases.” In the next step we explored the frequency with which one cause was erroneously classified as another cause in the analysis. For example, deaths due to maternal hemorrhage were often assigned to anemia in the analysis and vice versa. Similarly, all types of diabetes in adults (diabetes with coma, with renal failure, or with skin infection), sepsis with and without local bacterial infection in children, and respiratory distress syndrome in neonates regardless of the gestational age were all frequently hard to differentiate in the analysis. The causes that were frequently confused with each other were aggregated into a new cause in the final analysis cause list. For example, all six maternal causes were combined into one maternal category. After this step, the final cause list for analysis had 34 causes for adults, 21 for children, and 10 for neonates, excluding stillbirths. Reduction in number of causes to the final analysis cause list, excluding stillbirths

Izinto zokwakha

Emisha

Based on the provided information, here are some potential innovations that could be used to improve access to maternal health:

1. Mobile Health (mHealth) Applications: Develop mobile applications that provide pregnant women with access to information, resources, and support for prenatal care, childbirth, and postpartum care. These apps can provide personalized health advice, appointment reminders, and educational materials.

2. Telemedicine: Implement telemedicine services that allow pregnant women in remote areas to consult with healthcare providers through video calls or phone calls. This can help overcome geographical barriers and provide access to prenatal care and medical advice.

3. Community Health Workers: Train and deploy community health workers who can provide basic prenatal care, education, and support to pregnant women in underserved areas. These workers can conduct home visits, monitor pregnancies, and refer women to healthcare facilities when necessary.

4. Maternal Health Vouchers: Introduce voucher programs that provide pregnant women with financial assistance to access maternal health services. These vouchers can cover the cost of prenatal care, childbirth, and postpartum care, making healthcare more affordable and accessible.

5. Transportation Support: Develop transportation programs that provide pregnant women with reliable and affordable transportation to healthcare facilities. This can help overcome transportation barriers and ensure timely access to prenatal care and emergency obstetric services.

6. Maternal Health Hotlines: Establish hotlines staffed by trained healthcare professionals who can provide information, advice, and support to pregnant women. These hotlines can be accessible 24/7 and offer guidance on prenatal care, childbirth, and postpartum issues.

7. Maternal Health Education Campaigns: Launch public awareness campaigns that focus on educating communities about the importance of maternal health and the available services. These campaigns can address cultural beliefs, myths, and misconceptions surrounding pregnancy and childbirth.

8. Maternal Health Clinics: Set up specialized maternal health clinics in underserved areas, staffed by skilled healthcare providers who can offer comprehensive prenatal care, childbirth services, and postpartum support.

9. Public-Private Partnerships: Foster collaborations between government agencies, non-profit organizations, and private healthcare providers to improve access to maternal health services. These partnerships can leverage resources, expertise, and funding to expand healthcare infrastructure and services.

10. Data Collection and Analysis: Establish robust data collection systems to monitor maternal health indicators and identify gaps in access and quality of care. This data can inform evidence-based decision-making and targeted interventions to improve maternal health outcomes.

AI Innovations Description

The recommendation to improve access to maternal health based on the provided information is to use the Population Health Metrics Research Consortium (PHMRC) gold standard verbal autopsy validation study as a basis for developing innovative solutions. This study collected over 12,000 verbal autopsies on deaths with gold standard diagnoses in six sites across four countries. The dataset can be used to evaluate the performance of different verbal autopsy analytic methods and instrument designs. By analyzing the data and identifying the strengths and weaknesses of current verbal autopsy methods, improvements can be made to accurately ascertain the cause of death in national health information systems. This will ultimately lead to better access to maternal health by providing more reliable and accurate data for decision-making and resource allocation.

AI Innovations Methodology

Based on the provided information, it seems that the focus of the study is on creating a gold standard dataset for validating different methods of verbal autopsy cause of death assignment. While the study does not directly address innovations to improve access to maternal health, I can provide some potential recommendations for innovations in this area:

1. Telemedicine: Implementing telemedicine platforms can improve access to maternal health services, especially in remote or underserved areas. This technology allows pregnant women to consult with healthcare professionals remotely, reducing the need for travel and increasing access to prenatal care.

2. Mobile health (mHealth) applications: Developing mobile applications that provide information and resources for maternal health can empower women to take control of their own health. These apps can provide guidance on prenatal care, nutrition, and exercise, as well as reminders for appointments and medication.

3. Community health workers: Training and deploying community health workers who can provide basic maternal health services and education in rural or underserved areas can improve access to care. These workers can conduct prenatal check-ups, provide health education, and refer women to higher-level healthcare facilities when necessary.

4. Maternal health clinics on wheels: Mobile clinics equipped with necessary medical equipment and staffed by healthcare professionals can bring maternal health services directly to communities that lack access to healthcare facilities. These clinics can provide prenatal care, vaccinations, and other essential services.

To simulate the impact of these recommendations on improving access to maternal health, a methodology could be developed as follows:

1. Define the target population: Identify the specific population that would benefit from the innovation, such as pregnant women in rural areas or low-income communities.

2. Collect baseline data: Gather information on the current state of access to maternal health services in the target population, including factors such as distance to healthcare facilities, availability of healthcare providers, and utilization of prenatal care.

3. Develop a simulation model: Create a model that simulates the implementation of the recommended innovation. This model should consider factors such as the number of telemedicine consultations, the coverage of mobile health applications, the number of community health workers deployed, or the frequency of mobile clinics.

4. Input relevant data: Input data into the simulation model, including population size, geographical distribution, and healthcare infrastructure. This data can be obtained from existing sources such as census data, health facility records, or surveys.

5. Run simulations: Run the simulation model multiple times, varying the parameters to assess different scenarios and their potential impact on improving access to maternal health. This could include scenarios with different levels of implementation, coverage, or resource allocation.

6. Analyze results: Analyze the simulation results to determine the potential impact of the recommended innovations on improving access to maternal health. This could include metrics such as the number of additional women receiving prenatal care, reductions in travel distance to healthcare facilities, or improvements in health outcomes.

7. Validate the simulation: Compare the simulation results with real-world data, if available, to validate the accuracy and reliability of the simulation model. This can help ensure that the simulation accurately reflects the potential impact of the recommended innovations.

By following this methodology, policymakers and healthcare providers can gain insights into the potential benefits and challenges of implementing innovations to improve access to maternal health. This information can guide decision-making and resource allocation to effectively address the needs of pregnant women in underserved communities.

Statistics:

Citations: 123

Authors: 35

Identifiers:

DOI: 10.1186/1478-7954-9-27

Research Areas:

Cancer, Community Interventions, Environmental, Health System and Policy, Infectious Diseases, Maternal Access, Maternal and Child Health, Mental Health, Noncommunicable Diseases, Quality of Care, Social Determinants, Substance Abuse, Technology and Innovations, Violence and Injury

Study Countries:

Tanzania

Study Design:

Case-Control Study, Cross Sectional Study, Grounded Theory, Narrative Study

Study Approach:

Mixed-methods, Qualitative, Systematic Review

Participants Gender:

Female

Yabelana ngalokhu: