N3C Data Enclave: the Data Enclave is a secure platform where clinical data for participating sites is stored. The Data Enclave’s technology partner is Palantir.
Public N3C PPRL Dashboard
About Privacy Preserving Record Linkage
Thank you for joining the Privacy Preserving Record Linkage (PPRL) hashing community. PPRL is a secure, HIPAA-deidentified method of connecting records across different data sources that refer to the same individual, using pseudonymization processes to maintain privacy. Linkage, in this context, involves matching records from two or more datasets using de-identified cryptographic hashes (tokens) to associate data with the same individuals anonymously, without revealing their true identifiers.
Regenstrief Institute is the partnered Linkage Honest Broker (LHB). All people deserve the best quality care. That is why Regenstrief Institute conducts research and development at the intersection of clinical medicine, technology, academia and industry.
Regenstrief is contracted by National Center for Advancing Translational Sciences (NCATS) for the N3C COVID and N3C Clinical Linkage projects. Regenstrief is the Linkage Honest Broker for both projects; this means it is a neutral entity located outside of the enclave that serves as an escrow for the de-identified tokens (“hashes”) and operates the technology platform which facilitates PPRL using these tokens. The LHB does NOT receive, store, or process Patient Health Information/Personably Identifiable Information (PHI/PII). This is ONLY held by the data contributing sites. The LHB will hold certain metadata such as the originating contributor/data source, and the nature of data associated with the received tokens, e.g., EHR data, chest x-ray, viral variant data.
Datavant is a partner of Regenstrief Institute who provides the software to perform the de-identified tokens (hashes).
The Data Enclave is a secure platform through which the harmonized clinical data provided by our contributing members is stored. The data itself can only be accessed through a secure cloud portal hosted by NCATS and cannot be downloaded or removed.
In addition to sending data to the N3C Data Enclave, sites participating in the hashing community will prepare an additional set of files that will be submitted directly to the LHB service at Regenstrief Institute. These additional files include hashed identifiers (tokens), which correspond to a unique patient ID, as well as a Manifest file that includes metadata describing site-specific information.
There are three main reasons why privacy preserving record linkage is key to this effort:
- De-identified Duplication: PPRL enables the deduplication of patients across institutions, accounting for care fragmentation without compromising privacy.
- Multi-Model Data Linkage: PPRL facilitates the de-identified linking to multi-model data, such as image data from various health system PACS systems.
- Cohort Discovery: PPRL allows for the discovery of de-identified cohort overlaps with other research studies, such as understanding the extent of overlap between the NIH All of Us cohort and the N3C COVID cohort.
Participating Entities
Sponsored by:
Supported by:
This website describes 2 main NCATS PPRL projects: National COVID Cohort Collaborative (‘N3C COVID’) and National Clinical Cohort Collaborative (‘N3C Clinical’). Sites participating must sign data governance documents to specific to each project.
N3C COVID
The N3C COVID Enclave is a partnership among many organizations to provide clinical data to improve our knowledge of COVID-19 and potential treatment strategies.
The enclave’s data represent millions of COVID-19– positive individuals from each state and nearly every county. The enclave’s size, scope and diversity help to ensure public health answers benefit all Americans and their communities.
The N3C COVID Enclave receives patient information from more than 60 health care institutions across the country. Data is harmonized from these institutions into a single format and make them available for researchers and clinicians inside the N3C COVID Enclave so that they can study COVID-19 and potential treatments as the pandemic evolves. The N3C COVID Enclave is a secure, cloud-based research environment with a powerful analytics platform. Data cannot be removed from the enclave.
Since the N3C COVID Enclave opened to researchers in September 2020, scientists have used the data to improve our understanding of COVID-19 and health equity, diabetes, cancer, COVID-19 medications and chronic obstructive pulmonary disease. Researchers currently are studying HIV and COVID-19 risk, mortality rates in rural populations, long COVID and much more.
N3C Clinical Overview
The National Clinical Cohort Collaborative (N3C Clinical) pilot projects leverage operational and governance aspects of the original N3C platform that were established already. A key difference from the N3C COVID Enclave effort is that each of the pilot projects have their own data enclave where the proposed research is conducted. They do not use any data or patient linkages from the N3C COVID Enclave, and require separate data transfer and use agreements.
In line with the governance controls described above, sites that were previously participating in N3C COVID must create a separate payload and separate set of tokens with a distinct encryption to submit for N3C Clinical.
High Level Overview of Data Flow
LHB Onboarding Process
The onboarding process is the same for all projects, but each project has its own submission requirements. Sites that have already previously participated in N3C COVID must create a separate payload and separate set of tokens with a distinct token encryption in order to submit for N3C Clinical. NOTE: The Linkage Honest Broker Agreement (LHBA) must be signed prior to onboarding with the LHB.
Onboarding with the Linkage Honest Broker (LHB) can be completed in 3 easy steps:
1) Site Registration – Provision Firewall
2) Individual Registration – Create SFTP Account
3) Setup and Connect to LHB SFTP
- Complete the Site Registration Form. In the form, list the site personnel who require access to the LHB SFTP
- Formal site name (full name of your institution)
- Site Abbreviation
- Principal Investigator’s Name
- Public Static IP or CIDR block
- List of names (first and last) and email addresses for users who should have access to the LHB SFTP
- Primary Technical Contact Name and email address for your site
- After the firewall has been set up and your IP address whitelisted, an e-mail noting completion will be sent.
- Concurrently, once the Site Registration Form is submitted, the users who require an LHB SFTP account will receive an e-mail from RILHB@regenstrief.org with a link to the Individual User Access Form.
- To complete the Individual User Access Form, you will need your public SSH key. Instructions to complete this are in the e-mail with the form link. You may also download the instructions in the SSH section below.
- Once your account has been set up, an e-mail will be sent with your username and instructions on connecting to the LHB SFTP.
Tokenization
Tokenization is a process that replaces PII with encrypted hashes called tokens. Datavant tokens are unique, irreversible, and site-specific, so that N3C project tokens for a given patient can only ever be used to link within the N3C project. These tokens are created in two steps:
Irreversible hashing: An irreversible cryptographic hash function is applied, ensuring that the patient’s PII used as input cannot be recovered from the output value
Site-specific encryption: The hash value from step 1 is encrypted with another layer of site- and project-specific encryption, protecting every Datavant partner from potential security breaches and making tokens linkable to only other tokens permitted within their project
You will be using the Datavant software to create N3C tokens in your own environment from patient PII, encrypt them for transit, and then submit them to the Linkage Honest Broker. You will receive information and resources on Datavant tokenization upon onboarding, as well as a technical support contact for the duration of your participation.
NOTE: You will be assigned a different site name, configuration, and transit site name for N3C COVID vs. N3C Clinical! If you have previously tokenized for N3C COVID, do not start tokenizing for N3C Clinical until you receive documentation for this specific project.
For more general Datavant software or portal questions, you may contact support@datavant.com.
Manifest
- Manifest file contains metadata about submission and should be created for each submission to LHB
- You can script the creation of the files for one project; most fields will be the same across submissions for one project
- However, key fields will differ between N3C COVID and N3C Clinical submissions. Refer to your Onboarding documentation for the configuration, site name and transit site name you should use!
LHB Data Package
The data package submitted to the Linkage Honest Broker (LHB) consists of 2 files zipped together:
- File 1 – Transit tokens
- File 2- Manifest
The file naming convention is dependent on which project your site is participating in.
Data Package:
There are 2 files (Tokens/Manifest) that are zipped into a folder and submitted to the Linkage Honest Broker. The naming convention is important.
The naming convention for N3C COVID is different than N3C CLINICAL. Please review below to confirm the appropriate naming convention. The letters must be in all capital letters when submitted. Otherwise, it will fail the ingestion process.
N3C COVID
Zip File (contains tokens and manifest)
SiteAbbreviation_N3C_Date.zip
– Example: ABC_N3C_20220923.zip
Token file: SiteAbbreviation_N3C_Date_TOKENS.csv
– Example: ABC_N3C_20220923_TOKENS.csv
Manifest file: SiteAbbreviation_N3C_Date_MANIFEST.csv
– Example: ABC_N3C_20220923_MANIFEST.csv
N3C CLINICAL
Zip File (contains tokens and manifest)
SiteAbbreviationTENANT_N3CCLINICAL_Date.zip
– Example: ABCTENANT_N3CCLINICAL_20220923.zip
Token file: SiteAbbreviationTENANT_N3CCLINICAL_Date_TOKENS.csv
– Example: ABCTENANT_N3CCLINICAL_20220923_TOKENS.csv
Manifest file: SiteAbbreviationTENANT_N3CCLINICAL_Date_MANIFEST.csv
– Example: ABC_N3C_20220923_MANIFEST.csv
Transfer Data Package .zip File to the LHB SFTP
- Transfer the .zip file to the remote site from your local site
- Ensure all files are named according to the file naming conventions.
- Refer to the Resources and Documents section or Site Engagement Packet, Part 2 for more detailed instructions
Resources and Documents
Private and Public SSH Key Generation
SFTP Setup Instructions
FAQs & Troubleshooting
Do I have to use FileZilla as SFTP client?
No. The Linkage Honest Broker supports any SFTP setup.
What port do I use when connecting to LHB SFTP?
Port 2222
What is the host?
lhbsftp.regenstrief.org
Do I need a password?
No. You will use the private SSH key created when submitting for your SFTP account creation.
What is the logon type?
Key File
What is my username?
E-mail address used when submitting the Individual User Access Form (usually your institution email address)
I am having trouble connecting to the LHB SFTP. What do I do?
- Confirm IP address / Range – In firewall
– https://whatismyipaddress.com/ (to get ipv4 and ipv6) - Confirm using open SSH format
- Verify private key format is .pem vs. .ppk
– .pem is typically used by Linux/MAC
– .ppk is used by FileZilla client – Windows - Confirm username
– Username is the e-mail address registered with the Linkage Honest Broker
My file contains all error tokens.
- Confirm column headers in correct order.
- Verify submitting the correct file to the LHB.
If unable to resolve:
Please contact the Linkage Honest Broker by submitting a service desk ticket.