N3C Data Enclave: the Data Enclave is a secure platform where clinical data for participating sites is stored. The Data Enclave’s technology partner is Palantir.
About
Thank you for joining the Privacy Preserving Record Linkage (PPRL) hashing community for COVID data linkage. PPRL is a means of connecting records using secure, pseudonymization processes in a data set that refer to the same individual across different data sources while maintaining the individuals’ privacy. Linkage is defined here as any operation involving two or more datasets using de-identified cryptographic hashes (tokens) to match records associated with the same individuals anonymously, without ever using the individual true identifiers.
There are three main reasons why privacy preserving record linkage is key to this effort:
- PPRL enables de-identified deduplication of patients across institutions to account for care fragmentation.
- PPRL enables de-identified linking to multi-model data, such as image data from various health system PACS systems.
- PPRL enables de-identified cohort overlap discovery from other research studies. For example, we can understand the extent of overlap between the NIH All of Us cohort and the N3C cohort.
Regenstrief Institute is the partnered Linkage Honest Broker (LHB). Regenstrief Institute is a dynamic, people-centered research organization driven by a mission to connect and innovate for better health. All people deserve the best quality care. That is why Regenstrief Institute conducts research and development at the intersection of clinical medicine, technology, academia, and industry. Regenstrief is contracted by NCATS and is a neutral entity located outside of the N3C enclave that serves as an escrow for the de-identified tokens (“hashes”) and operates the technology platform which facilitates PPRL using these tokens. The LHB does NOT receive, store, or process PHI/PII. As aforementioned this is ONLY held by the data contributing sites. The LHB will hold certain metadata such as the originating contributor/data source, and the nature of data associated with the received tokens, e.g., EHR data, chest x-ray, viral variant data. Datavant is a partner of Regenstrief Institute who provides the software to perform the de-identified tokens (hashes).
The N3C Data Enclave is a secure platform through which the harmonized clinical data provided by our contributing members is stored. The data itself can only be accessed through a secure cloud portal hosted by NCATS and cannot be downloaded or removed.
In addition to sending data to the N3C Data Enclave, sites participating in the hashing community will prepare an additional set of files that will be submitted directly to the LHB service at Regenstrief Institute. These additional files include hashed identifiers (tokens), which correspond to a unique patient ID, as well as a Manifest file that includes metadata describing site-specific information.
Participating Entities
Sponsored by:
Supported by:
Data Governance Resources
We recognize that many sites will have questions about data governance for tokenized data, such as:
- planned and potential use cases
- what use cases sites can opt into or out of
- operational firewalls between entities (e.g., where tokens will and will not be stored)
For more information on data governance, please refer to N3C data governance documents here.
The Linkage Honest Broker requires the Linkage Honest Broker Agreement to be signed prior to onboarding and sending tokenized data to the LHB. If you have any questions regarding this process, please email rilhb@regenstrief.org.
LHB Onboarding Process
After the Linkage Honest Broker Agreement (LHBA) is signed, a member of the LHB Team will reach out to begin the onboarding process. There are 3 main steps in onboarding: 1) Provision Firewall 2) Create LHB SFTP Account Access and 3) Setup and Connect to LHB SFTP
- Complete the Site Registration Form. In the form, list the site personnel who require access to the LHB SFTP
- Formal site name (full name of your institution)
- Formal Site Abbreviation
- Principal investigator’s name
- Public Static IP or CIDR block
- List of names (first and last) and email addresses for users who should have access to the LHB SFTP
- Primary Technical Contact Name and email address for your site
- After the firewall has been set up and your IP address whitelisted, an e-mail noting completion will be sent.
- Concurrently, once the Site Registration Form is submitted, the users who require an LHB SFTP account will receive an e-mail from RILHB@regenstrief.org with a link to the Individual User Access Form.
- To complete the Individual User Access Form, you will need your public SSH key. Instructions to complete this are in the e-mail with the form link. You may also download the instructions in the SSH section below.
- Once your account has been set up, an e-mail will be sent with your username and instructions on connecting to the LHB SFTP.
Private and Public SSH Key Generation
Setup
The Linkage Honest Broker hosted by Regenstrief uses a data inbox upon which your organization will upload files to via the Secure File Transfer Protocol (SFTP). To complete this setup, you need to create your public and private SSH key(s). Please download and follow the instructions.
Example DeID Input and Output
Input:
Identified data with record_ID (pseudo ID)
Output:
Set of Datavant tokens with record_ID (pseudo ID) and demographic data is removed
After running the DeID tool, your site will have Datavant tokens in your site’s encryption key. To send your data to the Linkage Honest Broker (LHB), you will first further encrypt the data using the Datavant Link tool.
Example Link Input and Output
Input:
Input to Link is output from DeID
Output:
Tokens transformed into transit tokens, record_ID is unchanged
LHB Data Package
Create a Data package to be sent to the LHB with the following items:
- Transit tokens created in previous steps, saved as a .CSV file
- Naming convention for file: SiteAbbreviation_ProjectID_Date_Description.csv
- Description: TOKENS
- Date Format: YYYYMMDD
- Example file name: UNC_N3C_20210401_TOKENS.csv
- Manifest file containing metadata about submission, saved as a .CSV file
- Naming convention for file: SiteAbbreviation_ProjectID_Date_Description.csv
- Description: MANIFEST
- Date Format: YYYYMMDD
- Example file name: UNC_N3C_20210401_MANIFEST.csv
Save the data package containing the Transit token file and the Manifest file as a .zip file
- Naming convention for file: SiteAbbreviation_ProjectID_Date.zip
- The Site Abbreviation is the same as your Enclave Site Abbreviation.
- Date Format: YYYYMMDD
- Example file name: UNC_N3C_20210401.zip
NOTE: All File names should be in ALL CAPITAL LETTERS.
LHB SFTP Setup
- Prior to submission, you will need to connect to the LHB SFTP using attached instructions.
- Download the instructions to connect to the LHB SFTP. You will need your private SSH key to complete this process.
IMPORTANT NOTE : LHB SFTP uses Port 2222.
Submit Data Package to LHB
- After you have created the transit tokens and manifest you are ready to submit the files to the Linkage Honest Broker (LHB) using the SFTP.
- Connect to the LHB SFTP (make sure the port is set to 2222)
- Locate the .ZIP file you want to submit to the LHB and transfer from the local site to the remote site.
- Please make sure the file(s) are correctly named prior to submission.
- If there are any issues with file submission a member of the Linkage Honest Broker team will contact you.