Skip to Main Content

Open Science

Open Science is an important scientific movement that is making scientific research transparent and accessible to many groups through collaborative networks.

Repositories

https://www.nature.com/sdata/policies/repositories#social

 

View data repositories

Biological sciences 

Nucleic acid sequence 

Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.

Data types Repository options Data and metadata standards

Raw sequencing data (reads or traces)

Genome assemblies

Annotated sequences

Sample metadata

 

INSDC repositories

Genome Sequence Archive (GSA)

 

Browse data and metadata standards endorsed by the Genome Standards Consortium
Genetic variation data

dbSNP (human variations less than 50bp)
dbVar (human variations greater than 50bp)
ClinVar (human genotype & phenotype)
European Variation Archive (EVA) (all species)
Genome Sequence Archive for Human (GSA-Human)

 

Protein sequence 

UniProtKB view FAIRsharing entry

Molecular & supramolecular structure 

These repositories accept structural data for small molecules; peptides and proteins (all); and larger assemblies (EMDB).

Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR's CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.

Protein Circular Dichroism Data Bank (PCDDB) view FAIRsharing entry
Crystallography Open Database (COD) view FAIRsharing entry
Coherent X-ray Imaging Data Bank (CXIDB) view FAIRsharing entry
Biological Magnetic Resonance Data Bank (BMRB) view FAIRsharing entry
Electron Microscopy Data Bank (EMDB) view FAIRsharing entry
Worldwide Protein Data Bank (wwPDB) view FAIRsharing entry
Structural Biology Data Grid view FAIRsharing entry
Cambridge Structural Database (CSD) – managed by the Cambridge Crystallographic Data Centre (CCDC)  
Inorganic Crystal Structure Database (ICSD), deposition via CCDC  
Electron Microscopy Data Bank  

Neuroscience 

These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.

NeuroMorpho.org view FAIRsharing entry
OpenNeuro (formerly OpenfMRI) view FAIRsharing entry
G-Node view FAIRsharing entry
Neuroimaging Informatics Tools and Resources Collaboratory (NITRC) view FAIRsharing entry
EBRAINS view FAIRsharing entry

Omics 

Functional genomics

Functional genomics is a broad experimental category, and Scientific Data's recommendations in this discipline likewise bridge disparate research disciplines. Data should be deposited following the relevant community requirements where possible.

Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations.

For data linking genotyping and phenotyping information in human subjects, we strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.

ArrayExpress view FAIRsharing entry
Gene Expression Omnibus (GEO) view FAIRsharing entry
GenomeRNAi view FAIRsharing entry
dbGAP view FAIRsharing entry
The European Genome-phenome Archive (EGA) view FAIRsharing entry
Database of Interacting Proteins (DIP) view FAIRsharing entry
IntAct view FAIRsharing entry
Japanese Genotype-phenotype Archive (JGA) view FAIRsharing entry
NCBI PubChem BioAssay view FAIRsharing entry
Genomic Expression Archive (GEA) view FAIRsharing entry
GWAS Catalog view FAIRsharing entry

Metabolomics & Proteomics

Metabolomics data should be submitted following the MSI guidelines.

We ask authors to submit proteomics data to members of the ProteomeXchange consortium (listed below), following the MIAPE recommendations.

MassIVE view FAIRsharing entry
MetaboLights view FAIRsharing entry
PeptideAtlas view FAIRsharing entry
PRIDE view FAIRsharing entry
Panorama Public view FAIRsharing entry

Taxonomy & species diversity 

Environmental Data Initiative (formerly LTER Network Information System Data Portal) view re3data entry
Global Biodiversity Information Facility (GBIF) view FAIRsharing entry
Integrated Taxonomic Information System (ITIS) view FAIRsharing entry
KNB: The Knowledge Network for Biocomplexity view FAIRsharing entry
Morphobank.org view FAIRsharing entry
Movebank Data Repository view FAIRsharing entry

Mathematical & modelling resources 

BioModels Database view FAIRsharing entry
Kinetic Models of Biological Systems (KiMoSys) view FAIRsharing entry
The Network Data Exchange (NDEx) view FAIRsharing entry

Cytometry and Immunology 

FlowRepository view FAIRsharing entry
ImmPort view FAIRsharing entry

Repositories

Imaging 

Image Data Resource view FAIRsharing entry
The Cancer Imaging Archive view FAIRsharing entry
SICAS Medical Image Repository view FAIRsharing entry
Coherent X-ray Imaging Data Bank (CXIDB) view FAIRsharing entry
Cell Image Library view FAIRsharing entry

Organism-focused resources 

These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, we ask that data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).

Eukaryotic Pathogen Database Resources (EuPathDB) view FAIRsharing entry
FlyBase view FAIRsharing entry
Influenza Research Database view FAIRsharing entry
Mouse Genome Informatics (MGI) view FAIRsharing entry
Rat Genome Database (RGD) view FAIRsharing entry
VectorBase view FAIRsharing entry
WormBase view FAIRsharing entry
Xenbase view FAIRsharing entry
Zebrafish Model Organism Database (ZFIN) view FAIRsharing entry

Health sciences 

Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.

National Addiction & HIV Data Archive Program (NAHDAP) restricted data access possible view FAIRshaing entry
National Database for Autism Research (NDAR) restricted data access possible view FAIRshaing entry
The Cancer Imaging Archive restricted data access possible view FAIRshaing entry
ClinicalTrials.gov   view FAIRshaing entry
SICAS Medical Image Repository (formally Virtual Skeleton Database)   view FAIRshaing entry
PhysioNet   view FAIRshaing entry
National Database for Clinical Trials related to Mental Illness (NDCT) restricted data access possible view FAIRshaing entry
Research Domain Criteria Database (RDoCdb) restricted data access possible view FAIRshaing entry
Synapse restricted data access possible view FAIRshaing entry
UK Data Service restricted data access possible view FAIRshaing entry

Chemistry and Chemical biology 

ioChem-BD Computational Chemistry Datasets view re3data entry
NCBI PubChem BioAssay view FAIRsharing entry
NCBI PubChem Substance view FAIRsharing entry
Beilstein-Institut, STRENDA view FAIRsharing entry

Earth, Environmental and Space sciences 

Broad scope Earth & environmental sciences 

NASA Goddard Earth Sciences Data and Information Services Center view re3data entry
NERC Data Centres view re3data entry
PANGAEA view re3data entry
National Tibetan Plateau/Third Pole Environment Data Center view FAIRsharing entry
NOAA National Centers for Environmental Information (DOIs only assigned to deposited data on request) view re3data entry
HydroShare (CUAHSI) view FAIRsharing entry

Astronomy & planetary sciences 

SIMBAD Astronomical Database view re3data entry
UK Solar System Data Centre view re3data entry

Biogeochemistry and Geochemistry 

EarthChem view re3data entry
Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) view re3data entry

Climate sciences 

World Data Center for Climate at DRKZ (WDCC) view re3data entry

Ecology 

TERN Data Discovery Portal view FAIRsharing entry
Environmental Data Initiative (formerly LTER Network Information System Data Portal) view re3data entry
Global Biodiversity Information Facility (GBIF) view FAIRsharing entry
KNB: The Knowledge Network for Biocomplexity view FAIRsharing entry

Geomagnetism & Palaeomagnetism 

Magnetics Information Consortium (MagIC) view re3data entry

Ocean sciences 

Australian Antarctic Data Centre (AADC) view re3data entry
Australian Ocean Data Network (DOIs only assigned to deposited data on request) view re3data entry
Marine Data Archive  
Marine Geosciences Data System view re3data entry
SEANOE view FAIRsharing entry

Solid Earth sciences 

British Geological Survey view re3data entry
EarthChem view re3data entry
Magnetics Information Consortium (MagIC) view re3data entry
Marine Geosciences Data System view re3data entry
UNAVCO, Inc. view re3data entry
Incorporated Research Institutions for Seismology (IRIS) view re3data entry
OpenTopography view FAIRsharing entry

Physics 

HEPData view re3data entry

Materials science 

NoMaD Repository view FAIRsharing entry
Materials Cloud view FAIRsharing entry
MPContribs view re3data entry

Social sciences 

Archaeology Data Service view re3data entry
Harvard Dataverse view re3data entry
ICPSR view re3data entry
Open Science Framework view FAIRsharing entry
Qualitative Data Repository view FAIRsharing entry
UK Data Service view re3data entry

Repositories

Generalist repositories 

Scientific Data encourages authors to archive data to one of the above data-type specific repositories where possible. Where a data-type specific repository is not available, the following generalist repositories might be suitable. Generalist repositories may also be appropriate for archiving associated analyses, or experimental-control data, supplementing the primary data in a discipline-specific repository.

The generalist repositories listed below are able to accept data from all researchers, regardless of location or funding source. If your institution has its own generalist data repository this can be used to host your data as long as the repository is able to mint DataCite DOIs, and allows data to be shared under open terms of use (for example the CC0 waiver). Please note that if your chosen repository is unable to support confidential peer-review, you will be asked to temporarily deposit a copy of the dataset to one of our integrated generalist repositories to facilitate review of your article. Upon completion of peer review, the temporary copy will be erased. To use a repository which does not appear in the manuscript submission system, select 'DataCite DOI' as the repository name during the submission process.

Repository Name Information on fees/costs Size limits Integrated with Scientific Data's manuscript submission system Re3data / FAIRSharing entry
Dryad Digital Repository $120 USD for first 20 GB, and $50 USD for each additional 10 GB None stated Yes ✔ view FAIRsharing entry
figshare 100 GB free per Scientific Data manuscript.  1 TB per dataset

Yes ✔ - To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. Download instructions.

view FAIRsharing entry
Harvard Dataverse Contact repository for datasets over 1 TB

2.5 GB per file, 10 GB per dataset

No view re3data entry
Open Science Framework Free of charge 5 GB per file, multiple files can be uploaded No view FAIRsharing entry
Zenodo Donations towards sustainability encouraged 50 GB per dataset No view re3data entry
Science Data Bank Free of charge 8 GB per file, no limit to dataset size No view FAIRsharing entry