Secure cloud-based infrastructure for CTSA hub data sharing

CD2H Phase 2 Proposal

Project Title: Secure cloud-based infrastructure for CTSA hub data sharing

Point Person:

Kari Stephens,, UW


Elevator pitch:

There is incredible wealth and computational potential in CTSA program data assets, but effective utilization requires effective secure data sharing. This project leverages new, cloud based infrastructure that includes elasticity, scalability, state-of-the-art cybersecurity capabilities, and economies of scale.


Project history

This project is an extension of the phase 1 cloud pilot. In phase 1, we have: 1) executed a cloud set of instances with AWS (brokered by UW) of synthetic data to support development and testing of the UW Leaf tool, 2) created a use case to define a broad set of OMOP EHR data to structure governance approvals, data sharing, and Leaf tool testing, and 3) begun architectural designs for harmonizing OMOP vX’s across institutions in a cloud environment.

This project will utilize the current relationship built with AWS, and the multi-site OMOP instances created through this relationship. However, this solution will not be AWS specific (i.e., the solution will be scalable to any cloud vendor platform). Also, phase 2 will benefit and build upon the harmonization of data to OMOP v5.


GitHub repo:


Project description:

CTSAs use federated networks within and between themselves with little to no ability to interoperate, bottlenecked by human analysts for access to aggregated and raw datasets. CTSAs are struggling to leverage cloud based data sharing architectures to support federated ownership of datasets, both technologically and socially through proper scalable governance solutions aimed at research use. CTSAs are missing out on capitalizing on the many innovations and efficiencies the cloud offers for both existing and future software tools for big data, which include elasticity, scalability, state-of-the-art cybersecurity capabilities, economies of scale (e.g., paying for utilization only).


Proposed Solution:

  • Finish the phase 1 demonstration data sharing project by:
    • Architecting and executing harmonization of OMOP vX repositories from UW, Wash U, and Data QUEST
  • Collaborating with NCATS to establish these live datasets in their cloud environment
  • Engineer a cloud first architecture that leverages innovative technologies uniquely suited to cloud development
  • Configure both Leaf and DQe-c tools to point to the NCATS cloud instances of data to demonstrate its cloud sharing architecture with cloud based tools



This project will provide a clear pathway for CTSAs to use NCATS cloud sharing environment to store data.


Expected outputs (6 months):

  • OMOP datasets from Wash U, UW, and Data QUEST shared in the cloud and harmonized across a vX OMOP data model, with a minimum level of data quality
  • NCATS cloud environment configured for storage of the OMOP datasets and Leaf and DQe-c tools up and running – laying a foundation for future cloud data and tool usage