-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine whether DATS Identifiers must always be URIs. #8
Comments
It seems that the validator is not picking up the |
OK, that sounds like a reasonable compromise. I wonder if it wouldn't be even better to support both a string id and a URI id, although the problem with that is that it wouldn't be backwards-compatible unless the string-only option is retained. I had initially created the DATS JSON for the mouse reference genome using only string ids but then realized the URI restriction and had to go back and update them all. But as I was doing so I couldn't help but think it might be better to use the "raw" ids and leave it up to the DATS consumer to find the corresponding URIs if needed. My main concern with using the URI as the only id in the DATS is that most data sources do not appear to consider the URI the primary id, meaning that if you want to extract the "real" id you have to parse the URL, however trivial that operation might be. A corollary of this observation is that there's some danger that the URIs in my DATS JSON will change, particularly in cases where I simply went to the data source web site and looked to see what URL popped up when I did a search with the primary identifier. |
The original intention was to support both: string ID and URI ID, given that sometimes there are accession numbers with no URI ID and in other cases, there are URI IDs. The problem with the validator is that requires the I will make the changes in the schemas and push the fix to the validator. |
Not sure if I should start a new ticket. I notice in the MGI JSON
If URIs are to play the role of identifiers then it's important for URIs produced by different systems match when they denote the same entity, and this requires consistent ways of rendering the ID as a URL; http params like the one you have in http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000110943 are a bad smell here. Some alternate ways to write this URL:
From a perspective of resolution of web pages, it's not super-important which is used, but for URIs as identifiers we need to pick one. An alternative is to use the CURIE in the JSON and have this expand via the JSON-LD context JSON-LD contexts for ID expansion here: https://github.com/prefixcommons/biocontext For GO we are standardizing on http://identifiers.org/ensembl/ENSMUSG00000110943 type URIs in our RDF. @jmcmurry has advocated for http://identifiers.org/ENSEMBL:ENSMUSG00000110943 but this is not well documented on the identifiers.org site and there may potentially be problems with colons in URIs. As a community we need to coalesce around a standard for our RDF/JSON-LD to link up. |
@cmungall this is correct. In the first examples, we didn't focus on this aspect. Identifiers.org or n2t would be the way to go as these have been adopted by dcppc. This is an implementation decision we (DCPPC) would need to agree on indeed. |
If you really don't want colons, stick with http://identifiers.org/ensembl/ENSMUSG00000110943 |
Just a note to indicate that the identifier-related schemas have been relaxed to support any string and that the validator code now checks against URI constraints correctly. |
The following three DATS schemas all define "identifier" as having format "uri":
However, in https://github.com/biocaddie/WG3-MetadataSpecifications/blob/master/json-instances/Uniprot-P77967.json there are identifiers like these:
I was under the impression that valid URIs have to specify a scheme, followed by ":", and that isn't the case for any of these ids except the very first one. What's the story here? Am I misunderstanding the meaning of "uri" or is this not DATS-compliant? I believe it does validate although the validator probably isn't checking the string format.
The text was updated successfully, but these errors were encountered: