August 21, 2023

Natural language processing to extract social risk factors influencing health

Chris Harle, PhD

New system from Regenstrief Institute and Indiana University overcomes challenges of generalizability and portability

Social risk factors such as financial instability and housing insecurity are increasingly recognized as influencing health. But unlike diagnosis codes, prescription information, lab or other test reports, social risk factors do not adhere to standardized, controlled terminology in a patient’s electronic medical record, making this information difficult to extract from the clinical notes where they typically are found.

A new study has found that a natural language processing (NLP) system developed with leadership from Regenstrief Institute and Indiana University Richard M. Fairbanks School of Public Health informaticians showed excellent performance when ported to a new health system and tested on more than six million clinical notes of patients seen in Florida. Performance was evaluated for generalizability and portability, defined as ease and accuracy when deploying the software in a new environment and of updating its use to meet the needs of new data.

“Social factors have a great impact on our health. It’s not just the medical care that we receive, but it’s also the places where we live, the places where we work and our access to food and transportation and other resources that have a major influence on our health,” said Chris Harle, PhD, MS, the Regenstrief and IU Fairbanks School faculty member who is senior author on the study. “It’s important for the clinicians and health systems providing medical care to know about people’s social risk factors so when prescribing medications, ordering tests or planning to perform a procedure, they can better treat the whole person — perhaps with lower cost drugs or alternative sources for tests — and can also link them to services that help address their needs for a safe place to live and healthy food to eat.”

In this study, the researchers’ NLP rule-based model searched through text that physicians or other clinicians had written in the clinical notes of patients’ electronic health records, looking for key words or phrases that were likely to indicate difficulty with housing (for example: lack of permanent address) or financial needs (for example: inability to afford follow-up care) of patients at a healthcare system in a new and quite different geographic area. In spite of challenges (for example: name of a homeless shelter without indication of the facility’s function or regional variation or local nuances in language), the research scientists verified that the NLP models, with relatively simple modifications, could deliver highly accurate performance as compared to the gold standard of human review.

“Is a patient diagnosed with diabetes? It’s relatively easy to find that information in an electronic health record because the same words and codes are more likely to be used in health systems in central Indiana as are used in Florida or elsewhere in the U.S. But social risk factors don’t have nearly as established and widely used words, phrases or codes to identify them. Therefore, it’s harder to search through and determine a patient has a financial need than it is to say a patient has diabetes,” said Dr. Harle. “Our work is important for patients because ultimately their health is related to a variety of factors in their life, including social factors. For example, are clinicians incorporating in their decision making a patient’s ability to recover from a surgery as it’s going to be different if they have stable housing versus unstable housing?

“The more that we can disseminate and adapt natural language processing and other artificial intelligence methods that fully describe a patient to give clinicians a full 360 understanding of patients’ needs, the better. If we can extract social information more efficiently, it’s less costly. Then we can start to take what we’d call a population health perspective. So, if a health system can efficiently identify the patients who have housing instability — the population of patients who have this need — then the healthcare system  may be able to employ a more proactive population-based intervention to serve that whole group of people, connecting them, for example, to the housing services in the community or financial resources that might be available.”

Dr. Harle, an information scientist and health services researcher who focuses on the design, adoption, use and value of health information systems, notes that this study was a team effort across multiple institutions of professionals who work in the clinical arena (including individuals who study how patients access and use care), public health, population health and healthcare administration as well as technically knowledgeable and skilled systems specialists. “Bringing people together who have that diversity of understanding leads to pragmatically useful studies like this one,” he said.

Generalizability and portability of natural language processing system to extract individual social risk factors” is published in International Journal of Medical Informatics.

The study was supported by the Agency for Healthcare Research and Quality (R01HS028636) and the Indiana University Addictions Grand Challenge.

Authors and affiliations:
Tanja Magoc a, Katie S. Allen b d, Cara McDonnell a, Jean-Paul Russo a c, Jonathan Cummins b, Joshua R. Vest b d, Christopher A. Harle b d

a College of Medicine, University of Florida, Gainesville, FL, USA
b Regenstrief Institute, Inc., Indianapolis, IN, USA
c Miller School of Medicine, University of Miami, Miami, FL, USA
d Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, IN, USA

About Christopher A. Harle, PhD, M.S.
In addition to his role as a research scientist with the Clem McDonald Center for Biomedical Informatics at Regenstrief Institute, Christopher A. Harle, PhD, M.S., is a professor and chair of the Health Policy and Management Department at Indiana University Richard M. Fairbanks School of Public Health and associate faculty at IU Kelley School of Business.

About Regenstrief Institute
Founded in 1969 in Indianapolis, the Regenstrief Institute is a local, national and global leader dedicated to a world where better information empowers people to end disease and realize true health. A key research partner to Indiana University, Regenstrief and its research scientists are responsible for a growing number of major healthcare innovations and studies. Examples range from the development of global health information technology standards that enable the use and interoperability of electronic health records to improving patient-physician communications, to creating models of care that inform clinical practice and improve the lives of patients around the globe.

Sam Regenstrief, a nationally successful entrepreneur from Connersville, Indiana, founded the institute with the goal of making healthcare more efficient and accessible for everyone. His vision continues to guide the institute’s research mission.

About the IU Richard M. Fairbanks School of Public Health
Located on the IUPUI and Fort Wayne campuses, the IU Richard M. Fairbanks School of Public Health is committed to advancing the public’s health and well-being through education, innovation, and leadership. The Fairbanks School of Public Health is known for its expertise in biostatistics, epidemiology, cancer research, community health, environmental public health, global health, health policy, and health services administration.

Related News

screen captures from the Sickle Cell Data Dashboard

Bringing data to life: New interactive dashboard provides analysis and visualization of sickle cell disease prevalence and burden in an entire state

INDIANAPOLIS – The Indiana Sickle Cell Dashboard, launching this month on the Regenstrief Institute website, presents a dynamic, panoptic

Mental health stressors have differing impacts on odds of young adult use of nicotine or cannabis depending on race or ethnicity

Relationships of anxiety, depression, discrimination to substance use are dissimilar INDIANAPOLIS – Use of nicotine and cannabis products in