Hogan Study in Journal of Biomedical Semantics Seeks to Improve Quality of Data in Electronic Health Records

2015-06-23_dr-_william_hogan-7434-e1438891578703-527x600As clinicians and researchers increasingly seek to leverage electronic health records (EHR) to improve patient care and facilitate research, they’re discovering that mining the data for relevant information isn’t as simple as typing in keywords and clicking a computer mouse. Although computer search engines are powerful and sophisticated, they cannot distinguish good information from bad. For instance, computers currently lack the ability to distinguish between a diabetes “diagnosis” based on hearsay or a lucky guess and a real diagnosis made by a doctor based on a physical exam and medical tests.

William Hogan, M.D., a professor in the Department of Health Outcomes and Policy (HOP) and director of biomedical informatics for UF’s Clinical and Translational Science Institute (CTSI), says potential errors in the data pose a major concern for clinicians and researchers using EHR data because of the risk of making decisions and conclusions based on erroneous data.

“As the EHR is increasingly relied on as a tool for researchers, one important consideration is that researchers use formal, validated approaches to assessing the quality of the data used,” he said.

Hogan recently co-authored a report titled “Diagnosis, Misdiagnosis, Lucky Guess, Hearsay, and More: An Ontological Analysis,” together with Werner Ceusters, M.D., in the Department of Biomedical Informatics at the University of Buffalo’s Jacobs School of Medicine and Biomedical Sciences in New York. The study, published in the Journal of Biomedical Semantics, aims to address some of the errors and ambiguities in EHR diagnosis data and increase the quality of the data extracted by (1) analyzing how computers make inferences about data and (2) teaching computers how to become more discriminating in their data mining.

Unfortunately, errors in electronic health records are a common occurrence. For instance, although a recent report for the Office of the National Coordinator for Health Information Technology recommends including patient-generated health data in the electronic health record, the report does not mention error as a concern for including this kind of data. Yet Hogan and Ceusters point out that known errors exist with patient self-reporting, especially in research.

The co-authors also raised concerns about the provenance, or origins, of data in electronic health records, such as who created the data, in what setting, how, when, and for what purpose.

“Knowing the provenance of symptom data is essential to determine, for instance, whether a colonoscopy is for screening or diagnosis,” Hogan said. He added that data used for insurance billing purposes are less accurate than data gathered in clinical practice, which is why knowing the source of the data is crucial.

“Researchers who do not consider data provenance risk compiling data that are systematically incomplete or incorrect,” he said.

To guide research on identifying and rectifying errors and ambiguities in the EHR, Hogan and Ceusters studied several scenarios in which some claim is made that a particular patient has a particular disease. They then determined for each scenario whether the claim was a diagnosis, misdiagnosis, hearsay, or lucky guess.  The results identified several ways in which a misdiagnosis gets the facts wrong, pointing out opportunities for further research into diagnostic error.