Skip to content

Latest commit

 

History

History

2020-05-25-datasharing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Data and code sharing via the Donders Repository and/or via GitHub

by Maria Carla Piastra and Johannes Algermissen (with some input from Robert Oostenveld)

What are the differences between a collection on the Donders Repository and a GitHub repository?

Persistence/ long-term accessibility

Donders Repository

  • Once you have created (and closed) a collection on the Donders Repository, you cannot delete it any more. All changes to all files are tracked; no "secret" changes possible
  • Persistence of the collection guaranteed not by single individual researcher, but by an institution (data stewards & technical group of the university, who will ensure that this does not disappear)
  • Donders/ Radboud University pays for the repository, ensures long-term data access
  • Collections get a persistent identifier: handle or DOI, these persistent idenfifiers are not available for individuals, but only for institutions who can guarantee sustainability (e.g. journals, publishers, universities)

GitHub

  • Repository can be deleted at any time (by the individual researcher). The persistence is not guaranteed by current (commercial) owner (Microsoft), thus no persistent identifier is awarded
  • However, you can export a static version of your GitHub repository to Zenodo which assigns a DOI to it
    • Initiated and hosted by the CERN (Switzerland), EU funded, and thus likely to persistent in the future
    • Uploads to Zenodo cannot be deleted any more

Possible file size

Donders Repository

  • Donders Repository designed to handle many and large files (e.g. DICOMs, MEG output files)
  • Donders/ Radboud University is paying for repository, ensures long-term data storage

GitHub

  • Technology not suited for large files (no large data files, no large image files, only code)
  • GitHub/ MicroSoft is not interested in the (free) hosting TBs of data

Goal of distribution

Donders Repository

  • Detailed control over who can access the files: users need to sign data use agreement (DUA)
  • Suited as a static copy of data and code
  • Allows for metadata:
    • Metadata: “Type” gives type of repository (DAC/ RDC/ DSC)
    • Link to publication: also possible to enter URL (e.g. link to GitHub). Alternative you can also put a URL link in the "abstract" field of the metadata
    • Availability of metadata allows to automatically transfer the collection into other systems, e.g. Narcis (Dutch inventory of research output)
    • Metadata of sharing collections is harvested by other search engines (e.g. google dataset search)

GitHub

  • Suited for continuous development, reuse by others, collaboration on a dynamic repository
  • GitHub provides specific tools for interaction with other users/ collaborators such as issues and pull requests
  • Possible to export static copy to Zenodo
  • Higher visibility, better suited to disseminate your code
    • Possible to have version of code both on the Donders Repository (static, "version 1.0") and GitHub (dynamic)
    • It is possible on GitHub to add a URL (DOI) link to the respective collection on the Donders Repository

A good way to go would be to have data+code stored as a collection on the Donders Repository and the code also available as a GitHub repository. The published version is persistent on Donders Repository, and the GitHub version can be maintaned, improved and used to create a longer lasting scientific impact.

Stages of a collection in the Donders Repository

alt text

  • Editable (read and write)
    • Status while making changes/ working on a project
  • Internal review (read only)
    • No changes possible any more (project declared "finished")
    • Makes sure that all collaborators see the same document/ file version while review/ approving a submission
    • Possible to switch back to editable
  • External review (read only)
    • Possible to share URL that allows anonymous access (e.g. for reviewers/ editors)
    • Possible to switch back to editable
  • Published
    • Not anonymously accessible any more
    • Access only under data use agreement
    • Possible to switch back to editable; but published version will not disappear; any back-and-forth results in a new version

DAC, RDC and DSC

DAC (Data Acquisition Collection)

  • Intended for raw or primary data
  • Different from project Donders Repositoryive
  • Automatically created when filing PPM (DCCN) or PPF (DCC)
    • Large files (DICOMs, MEG files) automatically stored there
    • Smaller files (behavior, EEG, Castor, task code, experiment log files) need to be added manually
    • Best organized in BIDS format
    • Must not contain any personal information that directly identifies subjects (e.g., their name, adDonders Repositoryess, telephone number, bank account, etc.).
      • Do not upload the signed informed consent forms
      • Can contain indirectly identifying information, e.g. detailed questionnaire results (with the personal information removed), photos, audio or video recordings, facial features in an anatomical MRI.
  • Not visible to researchers outside the Donders, but only internals (with the respective access rights)
    • Only for internal re-use (e.g., future colleagues in the same working group)
  • Collection does not get "published", but “archived”, so all the raw data is safely stored for the future
  • All changes to data are tracked
    • Not possible for researchers to edit data without anyone noticing
      • Prevents fraud; ensure data fidelity
      • Helps researchers in case of fraud accusations

Research Documentation Collection (RDC)

  • Store any intermediate/ processed data/ code (that might not go into the data sharing collection)
    • Documenting the scientific process
    • Share preliminary results within the project team
      • Can contain figures, tables, PowerPoint presentations, etc.
      • Should contain documents of the editorial and peer-review process
      • Must be linked to a publication
    • Document the editorial and peer-review process
  • Not visible to researchers outside the Donders, but only internals (with the respective access rights)

Data Sharing Collection (DSC)

  • share any (processed or raw) with other users outside the Donders/ outside your team
    • either during peer review (anonymously)
    • for any other researcher who signs the data use agreement (DUA).
    • Must not contain any data that make participants identifiable; see this overview
      • Anatomical MRI scans must be defaced before being shared.

How to create a new collection in the Donders Repository?

  • DCCN: ask Sandra Hermskerk or see here on the DCCN intranet
  • DCC: ask Miriam Kos (usually requested during PPF)
  • DCN: ask Bernhard Engliz
  • DCMN: ask Arthur (?), probably via Hong

References and resources