||5U24CA209996-05 Interpret this number
||University Of Chicago
||Building Protected Data Sharing Networks to Advance Cancer Risk Assessment and Treatment
Advances in genomics and data analytics create new opportunities for accurate risk prediction and
personalized medical treatment for even rare cancers via large-scale data federation across institutions. Yet
cancer research is often stymied by a lack of appropriate tools to streamline the transfer and sharing of clinical
patient data for cancer research. Globus services permit secure data transfer, synchronization, and sharing in
distributed environments at large scale. We propose here to extend these services so that they are appropriate
to work securely with protected human data. The extended services will allow federation of clinical patient data
for accurate cancer risk prediction, personalized treatment, as well as any other cancer research area.
Globus is widely used, with over 15,000 users, more than 8,000 storage systems accessible via Globus,
including at most leading US universities and many sites overseas, and more than 165 petabytes and 25 billion
files transferred. Adoption of Globus by biomedical researchers has been rapid and is accelerating. Biomedical
researchers at ~30 universities, government agencies, and sequencing centers have relied on Globus for
streamlined data transfer and sharing. Our “Globus Genomics” (GG) integrated Galaxy-Globus-cloud genomics
analysis system has been used by more than 300 researchers across multiple biomedical research domains,
including cancer, at over 25 institutions to analyze over 10,000 samples.
We will develop a HIPAA Enablement Toolkit that will enable Globus and other software-as-a-service providers
(including GG) to manage protected data securely (Aim 1.1). We will extend Globus security features by
implementing file name encryption and by encrypting data with user-supplied keys, and demonstrate that these
new features can be used by GG and other services to enable elastic, secure, high-performance cancer
genomics data analysis (Aim 1.2). We will integrate Globus with major cloud platforms by developing uniform
storage system interfaces (Aim 2.1), engineering high-speed transfers (Aim 2.2), and implementing search,
replication, and synchronization (Aim 2.3) on AWS, Google, Microsoft, and OpenStack-based clouds, so that
cancer researchers can transfer and share data securely and easily among these and other (e.g., local)
computing and storage platforms. The resulting tools will be applicable to any cancer type across the cancer
research spectrum. We will validate and disseminate these new technologies first within existing and emerging
breast (Aim 3.1), blood (Aim 3.2), and pancreatic (Aim 3.3) cancer research networks and then more broadly
with collaborators across the cancer research continuum (Aim 3.4). We will work closely with collaborators and
users to ensure that we meet the needs of a broad cross-section of the cancer research community that
requires transfer, sharing, and analysis of large, human data sets. We will use extensive community outreach
through multiple channels to widely disseminate our technologies.
CUF-Links: Continuous and Ubiquitous FAIRness Linkages for reproducible research.
, Kesselman C.
Computer, 2022 Aug; 55(8), p. 20-30.
Sharing Begins at Home: How Continuous and Ubiquitous FAIRness Can Enhance Research Productivity and Data Reuse.
, Foster I.
, Fraser S.
, Kesselman C.
Harvard data science review, 2022 Summer; 4(3), .
Prevalence of Inherited Mutations in Breast Cancer Predisposition Genes among Women in Uganda and Cameroon.
, Zheng Y.
, Ndom P.
, Gakwaya A.
, Makumbi T.
, Zhou A.Y.
, Yoshimatsu T.F.
, Rodriguez A.
, Madduri R.K.
, Foster I.T.
, et al.
Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology, 2020 Feb; 29(2), p. 359-367.