Skip to content

the entirety of datasets through identifiers.org  #15

@yarikoptic

Description

@yarikoptic

Cons

Analysis/possible difficulties

  • I do not see yet how to discover individual IDs/datasets for a particular prefix (sent out a question via their web interface; the answer was: not at the moment, but it sounded to them as an interesting feature so might come at some point)
  • Not all prefixes relate to "datasets", but some are known as "(data) collections": https://www.ebi.ac.uk/miriam/main/collections/
  • I do not think there is any versioning, but most probably it is assumed that an identifier points to immutable dataset
  • There will be a lot of datasets. So we would need some sensible structure/hierarchy. First level would be the identifier. Then we could partition even further splitting IDs on / and -.
  • There seems to be no "filename" information provided. So we would have choices:
    • like a default git-annex behavior - just use the entire url to compose a unique filename
    • one from the URL (often from Content-Disposition header field) - but that might lead to conflicts since we would allow only for a flat structure:
      • we could preanalyze the entire list of those first and see if conflicts arise. If there are conflicts, try to deduce somehow disambiguating structure. but that is unreliable in case a dataset record changes with more files etc
      • just add an arbitrary, or based on some metadata?, numeric index in addition

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions