Researchers at Cincinnati Children’s Hospital Medical Center have launched a new regional, geocoded perinatal data repository called the Maternal and Infant Data Hub that integrates maternal, neonatal and pediatric patient records across the greater Cincinnati region. The repository ties together information on more than 110,000 infants born at 14 regional delivery hospitals from 2013 to 2017.
This new resource securely links patient data from institutional silos in order to make it easier for researchers to study care that one patient receives at different institutions, and to link a baby’s care to the prenatal care its mother received. Investigators can also look across regional data to better answer research questions like what medications support the best long-term outcomes in treating particular conditions, what impact opioids are having on our region’s newborns, and how pollution is affecting preterm birth rates in various neighborhoods.
Eric Hall, PhD, an informaticist and associate professor of pediatrics at Cincinnati Children’s, led this massive data integration challenge. Here, he provides an inside look at the repository, its development, and research already underway. More details are also available in this paper.
To request access to a data set or to discuss potential collaborations, contact Eric Hall at firstname.lastname@example.org.
What is the new Maternal and Infant Data Hub and why is it important?
The Maternal and Infant Data Hub integrates regional maternal, neonatal, and pediatric records while protecting patient privacy and confidentiality to produce a population-based, regional research data repository. By linking data sets previously residing in institutional silos, this resource is available to investigators at Cincinnati Children’s and the University of Cincinnati Medical Center. They’re using it to answer questions about outcomes that occur between care transitions or over the life course. The repository also includes geospatial information for each individual, enabling the integration of important neighborhood and community measures.
What problem/challenge does it solve?
Measures of community maternal and infant health are instrumental in planning and allocating resources, testing relevant hypotheses, and effectively operating healthcare and community-based programs. However, integration of relevant perinatal data has proven difficult due to barriers created by regulatory concerns, privacy issues, questions related to data ownership, technical limitations, and lack of sustainable funding.
Researchers looking to conduct studies using data from numerous sources often find aggregating that data to be a daunting and expensive task. In many cases, these investigators do not need or even want access to protected health information to conduct studies; however, those sensitive data elements are needed to establish links among data sets. The Maternal and Infant Data Hub simplifies perinatal data integration by undertaking the data sharing and linking tasks, then hosting and distributing de-identified data sets to approved researchers using an “honest broker” of the data. It provides investigators pre-linked and validated data, freeing them from the many burdens of regulation and expense involved with linking the data themselves.
What makes this tool unique?
Previously implemented systems have been housed at health departments and use vital records to establish a core data set. Our data system leverages electronic clinical and billing records, providing a unique clinical component to the repository. This resource provides investigators in the greater Cincinnati area with a novel opportunity for studying perinatal health at the population level by enabling precise phenotyping (characterization of disease states) and the comparison of clinical treatments and outcomes. In addition, it supports public health surveillance and more rigorous evaluations of public health interventions.
The registry also supports the analysis of hospital utilization patterns after birth. For example, how might hospital utilization for an infant experiencing intrauterine opioid exposure differ from an infant with no prenatal exposure to substances of abuse? What neighborhood factors might affect emergency department utilization?
What have researchers learned by using this tool?
A series of studies are underway to showcase the capabilities of the resource. One involves antibiotic use in infants with gastroschisis, a birth defect of the abdominal wall in which a baby’s intestines extend outside of the baby’s body, exiting through a hole beside the belly button. The study is evaluating the risk of early onset sepsis as well as the reliability of certain laboratory tests along with antibiotic use trends in this population. Findings were published in the American Journal of Perinatology.
A second use case demonstrated the capacity for neonatal surveillance of exposures to substances of abuse. The study aimed to leverage geocoded records obtained through regional newborn coverage and regional universal maternal drug testing to report population-level rates of intrauterine opioid exposure, neonatal abstinence syndrome, and hepatitis C exposure. The study also mapped exposure rates at the census-tract level. Investigators concluded that secondary use of regional electronic health record data has potential for aiding in surveillance efforts without disrupting clinical workflows or placing an additional burden on limited resources. These capabilities could be extended to other public health concerns including preterm birth, neonatal mortality, or congenital anomalies. A manuscript detailing these findings is currently being reviewed for publication.
What would informaticists want to know about this repository?
Using identifier fields or geospatial measures, records are linked at the individual or area level, respectively. For example, a newborn’s electronic health record from a community hospital delivery could be linked to that same individual’s pediatric record at the children’s hospital using unique identifiers.
We use an iterative approach with deterministic and probabilistic components to link records. Records of all patients, both mothers and infants, from all sources are stored in a common staging table which includes each individual’s first and last names, sex, date of birth, residence address, and in the case of infants, birth weight and available parental names. The linkage process is described in detail in a separate publication.
The registry utilizes the Observational Medical Outcomes Partnership (OMOP) data model as its core table structure providing standard representations for many common healthcare data domains, including diagnoses, demographics, laboratory results, and other observational data. Data from each source are delivered via secure file transfer and undergo an ETL (extract-transform-load) into the OMOP format.
A newborn’s geocoded address is used to spatially link the individual to area-level census measures of poverty and environmental measures representing pollution levels, both during pregnancy and after birth.
Using available home address information from EHR and billing records, a latitude and longitude coordinate pair was generated corresponding to each encounter using an internally developed geocoder that assigns coordinates based on the 2015 TIGER/Line Shapefiles. The program fills in missing information using common postal abbreviations and produces geographic coordinates using address interpolation as well as precision and score values for assessing the accuracy of estimated geocodes.
The entire geocoding process is HIPAA compliant, running on a server within the Cincinnati Children’s network, removing the need to transmit patient addresses to a third party.
Using census tract definitions, which include the interpolated latitude and longitude per census tract, we assigned the nearest census tract to a given geocode by finding the shortest distance between that geocode and corresponding census tract.
What’s ahead in development and use of the repository?
The research team will undertake the integration of more data sets, including additional regional electronic health records and other ancillary data sets representing environmental measures.
In addition, several more studies are planned, including analyses of:
- Individual and community factors associated with hospital utilization
- Inpatient stays, emergency department visits, and urgent care visits during the first year of life
- Healthcare utilization measures among preterm infants to identify opportunities to avoid preventable readmissions or emergency department visits
- Growth and development outcomes between infants testing positive for intrauterine opioid exposures at the time of birth and infants for whom no exposure to substances of abuse were detected
- Antibiotic use, endomyometritis, and other perinatal outcomes among women who develop chorioamnionitis
- The impact of a targeted health literacy intervention on newborn caregivers’ knowledge and care utilization