With an estimated $1 million funding award from the Patient-Centered Outcomes Research Institute (PCORI), a team of researchers from the department of health outcomes and biomedical informatics (HOBI) is developing natural language processing (NLP) methods to access clinical notes in electronic health records and extract social and behavioral factors associated with cancer outcomes.
Yonghui Wu, Ph.D., an assistant professor of biomedical informatics in HOBI, said the proposed NLP system will unlock mentions of social determinants, behavioral determinants, and adverse events from narrative clinical notes, connect them with clinical factors (such as diagnoses) in the patient records, and then save the linked data in structured PCORnet databases for use by other researchers. The NLP methods proposed in this project also will advance the extraction of general medical concepts from clinical narratives.
Although PCORI has successfully established an extensive national network of secure patient data that scientists can use to help speed clinical research, currently only a small fraction of the data is available for use by researchers.
“Much of the detailed patient information is buried in clinical narratives that are not accessible to clinical and translational studies,” Wu said. To fill this gap, PCORI is initiating projects that develop and evaluate NLP and other related methods for leveraging free-text clinical information contained in EHRs.
Wu said clinical researchers today often are limited to using only medical codes in electronic health records to extract data for diagnoses, laboratory tests or medical expenses from PCORnet, the national Patient-Centered Clinical Research Network. However, an individual’s health outcomes are determined by a complex interplay of factors, many of which aren’t captured in medical codes.
“There are scarcely documented medical codes for a patient’s education level or employment status, or for behaviors that can either promote or undermine an individual’s health, such as cigarette smoking,” said Wu. “But providers often write down this detailed information in their patient records.”
Wu said clinical NLP holds the key to safely and securely extracting this kind of information from medical text such as clinical notes. NLP involves the use of linguistics, computer science, information engineering and artificial intelligence to program computers to process and analyze natural language. NLP has been widely used with general English text, but when applied to the clinical domain, it faces many new challenges due to the complexity of health care.
Wu and his team will leverage big data in the OneFlorida Clinical Research Consortium (OneFlorida CRC) housed at the University of Florida, along with data housed at Weill Cornell Medicine, affiliated with the New York City-based Clinical Data Research Network (NYC-CDRN). OneFlorida and NYC-CDRN are part of PCORnet.
“This project was selected for PCORI funding not only for its scientific merit and commitment to engaging patients and other stakeholders, but also for its potential to fill an important gap in our health knowledge and give people information to help them weigh the effectiveness of their care options,” said PCORI Executive Director Joe Selby, M.D., MPH. “We look forward to following the study’s progress and working with OneFlorida and Weill Cornell Medicine to share the results.”
Wu and his team plan to engage clinicians, patients, researchers and data managers to provide suggestions on how the information is mentioned and documented by physicians during patient visits.
“The input from providers and patients will help us determine where and how some of this information is documented so we can develop methods and systems to retrieve it,” Wu said.
Researchers will be asked to provide suggestions on how to identify and categorize social and behavioral factors that may play a role in disease progression and outcomes. Input received from data managers and analysts will help determine how and where the data should be stored in structured databases.
“If successful, this project will provide an easy-to-use package for researchers to bridge the gap of using clinical narratives for PCORI and other communities,” Wu said.
Key researchers in the HOBI team include Yonghui Wu, Ph.D. (principal investigator), Jiang Bian (co-principal investigator), Ph.D., William R. Hogan, M.D., M.S., and Yi Guo, Ph.D., in the department of health outcomes and biomedical informatics; and Thomas J. George, M.D., in the department of medicine’s division of hematology and oncology. Wu will lead the OneFlorida team, working closely with the NYC-CDNR team led by Jyotishman Pathak, Ph.D., at Weill Cornell Medicine. Hua Xu, Ph.D., at the University of Texas Health Science Center at Houston, will serve as a consultant to help the engagement and dissemination of NLP software to the Observational Health Data Sciences and Informatics (OHDSI) community and the AMIA community.
Wu’s award has been approved pending completion of a business and programmatic review by PCORI staff and issuance of a formal award contract.
PCORI is an independent, nonprofit organization authorized by Congress in 2010. Its mission is to fund research that will provide patients, their caregivers and clinicians with the evidence-based information needed to make better-informed healthcare decisions. For more information about PCORI’s funding, visit www.pcori.org.