Data Sharing and Management
Photo Credit: UCLA Broad Stem Cell Research Center
CIRM Data Explorer
Data Explorer was established to maximize the value of CIRM-funded stem cell and genetic research by making datasets findable and accessible, thereby enabling reuse of data by other researchers.
Data Explorer is:
- A public dashboard of CIRM-funded datasets and where they can be found
- A platform for CIRM awardees to submit a Data Sharing and Management Plan (DSMP)
Discovery, Preclinical, and Clinical Program Requirements
Application Stage: Data Sharing Overview
Awarded Projects: Data Sharing and Management Plan (DSMP)
Active award stage
Grantees will report on their data sharing and management activities during regularly scheduled progress reporting and will work with CIRM staff to adjust the DSMP and other data-related milestones as necessary and align data sharing processes with other initiatives at CIRM.
Additional Preclinical Development Program Requirements
In addition to data sharing, CIRM expects that knowledge resulting from PDEV awards will be shared within the CIRM network to drive efficiency and reduce potential roadblocks by leveraging proven processes, study designs, and regulatory pathways to optimize development and eliminate redundant efforts. Sharing learnings with other CIRM awardees will improve product development progression and support a risk-based approach to both planned and unexpected changes throughout the preclinical drug development process while retaining IP and patient/donor privacy. PDEV awardees are asked in the Data Sharing Overview section of the application to certify to work with CIRM to align with knowledge sharing processes as they are implemented.
CIRM recognizes the balance between protecting intellectual property prior to commercialization and CIRM’s commitment to open science and innovation, and as such there may be Applicable Data generated as part of PDEV awards which could be treated as confidential until filing for patent protection, as trade secrets with requisite enhanced company protection, or in advance of regulatory approval. Data and knowledge sharing will be maximized to the extent that is possible.
Additional Clinical Development Program Requirements
- The trial must be registered at ClinicalTrials.gov no later than 21 calendar days after the enrollment of the first participant.
- The informed consent documents for the trial must include a specific statement relating to posting of clinical trial information at ClinicalTrials.gov
- The responsible entity must update the clinical trial record at least once every 12 months
- The trial results must be submitted to ClinicalTrials.gov 12 months from the Primary Completion Date.
Data Sharing Expectations Per Program and Award Stage
This table outlines the different data sharing steps at Application, Just-in-Time, Active Award, Award End, and Post Award stages for Discovery, Preclinical Development, and Clinical Development Programs. Resources are provided for each step:
Stage | Requirement | Resource | Discovery | Preclinical Development | Clinical Development |
Application | Data Sharing Overview | CIRM DSMP Guidelines | X | X | X |
Budget Justification | Budget Justification Guidelines | X | X | X | |
JIT/PFAR | Data Sharing and Management Plan (DSMP) | Instructions in CIRM Data Explorer | X | X | X |
Register data sharing statement on ClinicalTrials.gov | clinicaltrials.gov | X | |||
Award | Update DSMP | Instructions in CIRM Data Explorer | X | X | X |
Informed consent documents include data sharing language, if necessary | CIRM Guidance for Data Repositories and other Resources | X | X | X | |
Clinical trial record is updated at least every 12 months | clinicaltrials.gov | X | |||
Award End | Deposit data in appropriate repository | CIRM Guidance for Data Repositories and other Resources | X | X | X |
Submit final DSMP | Instructions in CIRM Data Explorer | X | X | X | |
Post Award | Results submitted to ClinicalTrials.gov 12 months from Primary Completion Date | clinicaltrials.gov | X | ||
CIRM Data Sharing and Management Plan (DSMP) Guidelines
Funded awards only
Purpose of the DSMP
To leverage CIRM-funded data and enable reuse of data by other researchers, CIRM awardees are expected to share their data consistent with FAIR (Findable, Accessible, Interoperable, and Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) data principles and reflective of practices within specific research communities. Development and execution of the CIRM Data Sharing and Management Plan (DSMP) is intended to facilitate:
- Findability of data through a public dashboard, the CIRM Data Explorer
- Accessibility of data by deposition in data repositories accessible to other researchers
- Interoperability, and
- Reusability of data by associating deposited data with necessary and sufficient metadata.
Scope of the DSMP
CIRM requires DISC, PDEV, and CLIN2 awardees to manage and preserve raw data, processed data and metadata, and share Applicable Data and metadata, i.e. make Applicable Data and metadata available to the broader scientific community through data repositories accessible to other researchers. CIRM expects all Applicable Data generated under CIRM DISC and PDEV awards to be shared no later than the time of publication or by the end of the award period, whichever comes first. CIRM expects clinical trial data and results generated under a CLIN2 award to be shared no later than 12 months after the study’s Primary Completion Date. Even data not used to support a publication, including null or negative findings, are considered data.
For some programs and data types, CIRM has developed specific data sharing expectations (e.g., data types to share, relevant standards, repository selection, timelines) that should be reflected in a DSMP. When no specific CIRM data sharing expectations apply, researchers should propose their own approaches to data sharing and management.
CIRM requires that anyone deriving data from living humans must be prepared to ensure privacy and confidentiality protections (i.e., de-identification, Certificates of Confidentiality, and other protective measures), in accordance with applicable federal, Tribal, state, and local laws and regulations.
Instructions for Submitting and Updating DSMP
The CIRM DSMP has 3 components:
- Metadata Catalog
- Data Use Limitations (DUL) Institutional Certification
- Questionnaire
Together, these components outline how the data for the funded project will be shared with the scientific community.
For all data you propose to generate, please prepare a Data Sharing and Management Plan (DSMP):
- Join/log into CIRM Data Explorer
- Complete the Metadata Catalog for expected data
- Complete, sign, and submit the Data Use Limitations (DUL) Institutional Certification form
- Complete the DSMP Questionnaire
The Metadata Catalog will be a living record:
- Initial Metadata Catalog: Prior to CIRM issuing the Notice of Award (NoA), the initial Metadata Catalog is submitted to CIRM. It contains minimal information about the anticipated data types and experimental design of the project.
- In progress Metadata Catalog: Throughout the project, the Metadata Catalog is continually updated as data are produced and metadata are collected. This ensures timely and progressive assembly of all information necessary for data deposition at the end of the project. The Metadata Catalog must be updated as part of each scientific progress report and is subject to CIRM review and approval.
- Final Metadata Catalog: At the end of the award, the Metadata Catalog, as well as the DUL form are finalized and together these documents serve as a record of metadata that is shared with the corresponding raw and processed data.
Once data have been deposited by the awardee, the metadata provided in the Metadata Catalog and the DUL information will be made public and displayed in the CIRM Data Explorer, a dashboard that scientists can use to discover CIRM-funded data and determine where they are deposited.
Data Terminology
Applicable data
- All data that are needed for another researcher to replicate results and to reuse data. Minimally this includes raw data, final processed data and metadata.
- CIRM does not anticipate that researchers will preserve and share all data produced in a study. Researchers should decide which data to preserve and share based on ethical, legal, and technical factors that may affect the extent to which data are preserved and shared. The rationale for these decisions must be provided in the DSMP Questionnaire.
- Data not used to support a publication, including null or negative findings, are also considered Applicable Data.
Data
The Intellectual Property Policy for CIRM Awards defines “Data” as: Scientific, clinical, or technical recorded information derived during the Project Period of an Award, regardless of form or the media on which it may be recorded, but not any of the following: financial, administrative, management data, other information incidental to contract administration, preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. “Data” excludes physical objects (e.g., laboratory samples).
Data generation
Generation of raw data
Data processing
All data processing steps (dry lab) following generation of raw data
Data production
Overarching term, referring to both data generation and data processing
Data products
The result of each data generation step and each data processing step (Each data product should be listed in the DSMP Metadata Catalog)
Raw data
Data produced by an instrument (e.g., raw sequence data) or by other methods, such as measurements and surveys, or obtained from a data repository
Processed data
Data produced from raw data and from subsequent processing steps (e.g., quantification files, alignment files, etc.)
Final processed data
Data produced from last processing step (e.g., aggregated quantification, etc.), on which conclusions are based
Metadata
Data that provide additional information needed to make shared raw and processed data findable, interpretable and reusable. Metadata information is requested in the DSMP.
Metadata categories in CIRM Data Explorer
- Data Product Details: methods used for data generation (machine, instrument), data processing (software toolkits, pipelines) and data sharing (data repositories).
- Biological Material Details: information about the source and modifications of the biospecimens and the final cell product used for data generation
- Goal of Experiment: information about diseases studied and/or biological questions addressed
- Sample Preparation: information about experimental approaches used to prepare the sample for data generation
- Protocols and Publications
Data standards
Guidelines or formal rules for producing, structuring, naming, and describing data. CIRM expects that an awardee will apply data standards that are common to their field of study in the production of data and to metadata that are deposited in a Data Repository. Examples of data standards can be found at CDISC or LOINC.
Data sharing
Making data available to the broader scientific community by depositing in a data repository accessible to other researchers
Primary Completion Date
For clinical trials, the date that the final subject was examined or received an intervention for the purposes of final collection of data for the primary outcome, whether the clinical trial concluded according to the pre-specified protocol or was terminated
Replicate results
Another researcher uses shared data and same code/software as original researcher to obtain the same results
Reuse data
Another researcher uses shared data and different tools / software to obtain new results, or uses shared data in combination with their own data
Applicant and Awardee Resources
Data Sharing FAQ [Coming Soon]
CIRM Data Explorer Walkthrough Videos
CIRM Guidance for Data Repositories and other Resources
This document provides a non-comprehensive set of resources for identifying and selecting domain-specific or generalist data sharing repositories for discovery, preclinical, and clinical data. Researchers are advised to reference the repository policies.
Springer Nature Data Repository Guidance
This resource provides listing and guidance on specialized and generalist repositories. The specialized repositories are categorized by scientific discipline or data types and the notes include suggestions or recommendations for repository selection. The listing also includes links to the repository entry on FAIRsharing.org where researchers can obtain more information on the repositories.
- Scientific Data mandates authors submit datasets to an appropriate public data repository.
NIH Data Resources
The NIH maintains a non-comprehensive listing of NIH-supported domain-specific data repositories as well a list of external generalist repositories. The lists are organized as tables and include descriptions as well as links to data submission and data access policies.
- NIH-supported open domain-specific data repositories—The 148 repositories listed in this table are generally open to domain-specific data submission and user access.
- Generalist Repositories—The 9 generalist repositories in this listing accept data regardless of type, format, content, or discipline.
Repository Search Tools
- Fairsharing.org—A community driven resource that promotes FAIR principles by providing a searchable database of repository profiles, data standards, and journal and funding sharing policies.
External Resources Related to Sharing Protected Health Information (PHI)
- Informed Consent for Secondary Research with Data and Biospecimens: Points to Consider and Sample Language for Future Use and/or Sharing –NIH Office of Science Policy resource for drafting informed consent language for data sharing
- Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule –US Department of Health and Human Services guidance on approaches to achieve de-identification
CIRM Data Sharing and Management Budget Justification Guidelines
What to consider when justifying resources for data sharing and management
Examples of data sharing and management costs
- Curating data and developing supporting documentation, including formatting data according to accepted community standards; de-identifying data; preparing metadata to foster discoverability, interpretation, and reuse; and formatting data for transmission to and storage at a selected repository for long-term preservation and access
- Local data management considerations, such as unique and specialized information infrastructure (only those not covered by awardee’s facilities and indirect costs), necessary to provide local management and preservation (e.g., before deposit into an established repository)
- Preserving and sharing data through established repositories, such as data deposit fees necessary for making data available and accessible. For example, if a data sharing plan proposes preserving and sharing data for 10 years in an established repository with a deposition fee, the cost for the entire 10-year period must be paid prior to the end of the project period. If the data sharing plan proposes deposition to multiple repositories, costs associated with each proposed repository may be included.
- Personnel costs required to perform data management and sharing activities. Provide effort, annual salary and personnel cost for this project.