Public health monitoring of diabetes in the era of electronic health records: Insights from the Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network
Publication: Annals of Epidemiology
Metabolic disease, and specifically diabetes, is an important area of research for Regenstrief Institute and Indiana University. Regenstrief receives data from multiple sources, including health systems contributing INPC data as well as the Indiana Department of Health. Indiana University has served as a Coordinating Center for the Assessing the Burden of Diabetes by Type in Children, Adolescents, and Young Adults (DiCAYA) Network, funded by the Centers for Disease Control and Prevention (CDC). Regenstrief also maintains a partnership with the Center for Diabetes and Metabolic Disease Data and Analytics Core at Indiana University.
While the American Diabetes Association uses clinical criteria for diagnosing diabetes such as A1c or glucose values, it is somewhat more complicated to determine which patients from electronic health records (EHR) data should be considered to have diabetes. One reason for this is that laboratory values are frequently non-fasting, so it can be difficult to use a fasting glucose level. Another challenge is that it is not always clear from documentation and retrospective data whether a patient had symptoms of hyperglycemia at the time that a laboratory test was drawn, so using the clinical criterion of a random glucose level of at least 200 mg/dL is also challenging. Thus, substantial efforts at Regenstrief and elsewhere have focused on developing what is often referred to as a computable phenotype of diabetes, or an approach to determining which patients in a database should be considered to have diabetes. Some of this work has been done in collaboration with the DiCAYA network. For researchers considering work with people diagnosed with diabetes, the Relevant Publications section below reviews recent and ongoing work on phenotypes of diabetes within EHR data.
In order to compare INPC data on diabetes to other sources, researchers may wish to explore the Diabetes Atlas, which is published by the International Diabetes Foundation and contains aggregate global data on diabetes. Another resource is the Behavioral Risk Factor Surveillance System (BRFSS), which contains yearly United States health-related behaviors survey data since the 1980s. These data are representative both at the state and national level, and state identifiers exist to indicate data for Indiana or other states. They also contain participant-reported information about prior diabetes diagnoses. In addition, some states include a diabetes-specific module with questions regarding mobility and access to care. The Agency for Healthcare Research and Quality publishes data from the Healthcare Cost and Utilization Project (HCUP), with HCUPnet data publicly available. These data include hospitalizations and emergency department visits with diabetes diagnostic codes, and data can be analyzed at the state level. Other resources include the Medical Expenditure Panel Survey from AHRQ and mortality data from the CDC which may list diabetes as a cause of death.
The INPC laboratory data relevant to diabetes tend to be fairly complete, making them a valuable source of research data. In addition to glucose and A1c values, researchers can obtain lipid panels, urine albumin to creatinine ratios, and labs related to diabetic emergencies including ketones and anion gaps. Other labs that may help identify related conditions include thyroid function and liver function tests. Beyond laboratory data, another strength of the INPC data is coverage of urban areas, including metropolitan Indianapolis. This is driven by the inclusion of data from Indiana University Health as well as Eskenazi. Indicators for rural/urban areas are quite reliable as well. This can be particularly valuable for children with diabetes, who are much more likely than adult patients with diabetes to have seen a subspecialist, typically in the metropolitan Indianapolis area.
While some data for conditions related to diabetes are easily obtained through laboratory values, obesity can be somewhat more complex. One issue is that height is frequently not obtained or recorded for every visit in adults, so for work that requires calculating body mass index measures, it may be necessary to request data from before the study period to obtain the most recently recorded height. Both height and weight can be measured directly or self-reported, so these values may be less objective than laboratory data. Note that for longitudinal data, some height and weight measures may use the metric system rather than pounds or feet/inches, so it is important to review the units of measure across time and adjust if needed.
As with all EHR data, there will be some inconsistencies in data related to diabetes. For example, some individuals will have diagnostic codes for both type 1 and type 2 diabetes, and others may have inconsistencies between laboratory values and diagnostic codes, such as an A1c suggesting diabetes but no diagnostic code for it, or vice versa. There is some missingness in the data, such that not all people in Indiana or a given community have data and for those who do have data, some data elements may be missing. Understanding current approaches to phenotyping as well as working with a Regenstrief data analyst can optimize the approach to handling imperfect data.
Another important note is that INPC generally does not receive a lot of procedure data. Specifically, CPT and ICD 9/10 procedure codes are not available through Regenstrief Data Services. This impacts the study of diabetes through procedures related to complications of diabetes, such as the amputation of an extremity or eye procedures. Alternative approaches to studying these topics could include using diagnosis codes for the procedures, though these are impacted by the quality of coding and documentation by clinicians.
Medication and prescription data may include not only typical medications for diabetes (metformin, sulfonylureas, GLP-1s, insulin) but also medications for related conditions such as statins. INPC data typically include information related to medication lists and prescriptions but are unlikely to include data related to whether prescriptions were filled.
Brian Dixon, PhD, MPA
Muchiri Wandai, PhD, MSc
Katie Allen
Thomas Duszynski, PhD, MPH
Publication: Annals of Epidemiology
Publication: JMIR Public Health Surveillance
Before beginning a research project in this area, you may wish to review the Regenstrief Data Guide
which describes available data in more detail. You can also join the INPC Users Group here
by checking ‘Regenstrief Data Services’. Alternatively, you can email askRDS@regenstrief.org.

RDS would like to hear about it, promote your work, and help make collaborative connections. Please share your scholarly products or findings by emailing askRDS@regenstrief.org.