Regenstrief Institute x Merck
The Question
Clinical trials establish whether a treatment works under controlled conditions. They rarely answer the difficult questions that arise once a therapy reaches routine care: Who receives it, and who does not? What do treatment patterns look like in the real world? Where do patients fall through the gaps between diagnosis, prescription, and sustained use?
These questions sit at the intersection of clinical effectiveness, health system performance, and commercial strategy. Answering them requires data that spans years of care across diverse populations—and methods capable of finding a signal in that complexity.
The Analysis
Regenstrief’s collaboration with Merck has generated numerous peer-reviewed publications across multiple therapeutic areas, drawing on the Indiana Network for Patient Care (INPC)—one of the nation’s largest and longest-running longitudinal health data ecosystems, with records spanning 20 million unique patients, 950 million clinical encounters, and more than 150 million unstructured clinical text reports.
This body of work spans several interconnected questions:
Detection and Early Identification
Early identification of Alzheimer’s disease and related dementias (ADRD) at population scale is a persistent challenge. Regenstrief developed and validated a passive digital signature for early ADRD risk by applying natural language processing (NLP) to clinical notes alongside diagnostic and medication data within EHRs to identify risk years before formal diagnosis. Models incorporating the text of clinical notes achieved AUROC of 0.798 (1–10 years before diagnosis), compared to 0.689 using structured data alone—a substantial improvement that illustrates the value of Regenstrief’s NLP infrastructure and longitudinal data depth.
A parallel effort applied NLP to characterize chronic cough burden in EHRs—a condition affecting roughly 10% of adults, but systematically under-captured diagnostic codes. The NLP-based approach identified patients nearly sevenfold more frequently than ICD codes alone (PPV: 97%), generating a more complete and actionable picture of disease prevalence and unmet need.
Link 1: https://pubmed.ncbi.nlm.nih.gov/31784987/
Link 2: https://pubmed.ncbi.nlm.nih.gov/33345951
Treatment Patterns and Real-World Effectiveness
Understanding how patients actually receive treatment—not how clinical protocols specify they should—is central to real-world evidence generation. Regenstrief’s work in advanced small cell lung cancer (SCLC) used linked registry and EHR data to characterize treatment sequences, survival, and healthcare utilization across 498 patients over a decade. The findings documented real-world chemotherapy patterns consistent with national guidelines while confirming persistently poor outcomes, providing an empirical foundation for assessing the need for novel therapeutic approaches.
A complementary study on direct-acting antiviral (DAA) therapy for hepatitis C evaluated real-world treatment initiation and effectiveness during the first two years of all-oral DAA availability. Despite strong clinical effectiveness, fewer than 10% of eligible patients initiated treatment within the first year—with significant disparities by disease severity, payer status, and substance use history. These findings illustrate how real-world data can identify structural and behavioral barriers that trial data alone cannot surface.
Link 1: https://pubmed.ncbi.nlm.nih.gov/31828610/
Link 2: https://pubmed.ncbi.nlm.nih.gov/31437170/
Adherence and Long-Term Risk
Treatment adherence studies address a question that matters both clinically and commercially: among patients who do take a therapy as prescribed, how many remain at meaningful risk for adverse outcomes? Regenstrief’s retrospective cohort analysis of bisphosphonate adherence in women with osteoporosis found that even among those with sustained adherence (medication possession ratio ≥0.8), 35% met a composite adverse outcome—including incident fracture, persistent low bone density, or meaningful BMD decline. This evidence supports the case for clinical reassessment and alternative therapeutic strategies in high-adherence, high-risk populations.
Link: https://pubmed.ncbi.nlm.nih.gov/26657827/
Prevention and Health System Behavior
A cluster randomized trial examining clinic-based HPV vaccination intervention demonstrated that a brief, parent-targeted educational video delivered in the exam room increased vaccination uptake from 50% to 65% in intervention clinics, with parents who watched the video showing three times greater odds of initiating vaccination. The study’s use of a pragmatic, health-system-embedded trial design reflects Regenstrief’s capacity to conduct intervention research within routine care environments—not just observe them.
Link: https://pubmed.ncbi.nlm.nih.gov/30530637/
The Answer
These examples illustrate what a sustained research collaboration with Regenstrief can produce: not a single study, but a cumulative body of peer-reviewed evidence providing important insights regarding detection, treatment patterns, adherence, effectiveness, and intervention design across multiple therapeutic areas and patient populations.
The underlying infrastructure — INPC data depth, NLP capabilities, longitudinal follow-up, and embedded health system access—does not need to be built for each project. It is already in place.
Relevant domains: RWE study design, treatment pattern characterization, adherence and persistence analysis, early disease detection, population identification, intervention evaluation.



