Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF dump/export/conversion has missing prefixes #265

Open
jsheunis opened this issue Feb 7, 2025 · 1 comment
Open

RDF dump/export/conversion has missing prefixes #265

jsheunis opened this issue Feb 7, 2025 · 1 comment

Comments

@jsheunis
Copy link
Contributor

jsheunis commented Feb 7, 2025

While working on the integration of shacl-vue and dump-things-service, which requires the bidirectional conversion of YAML and RDF data using LinkML, we discovered some issues with conversion to RDF.

Demo

For a demonstration, let's take:

This is what an example Instrument looks like:

id: trr379:instruments/fra-cobic-mri-prisma-3t
title: "Siemens Prisma 3 Tesla MR-Scanner"
short_name: Prisma
description: >-
  MRI scanner at the Cooperative Brain Imaging Center in Frankfurt.
about:
  # MRI of the brain
  - sct:816077007
characterized_by:
  - predicate: dlspatial:at_location
    object: "geo:50.092724,8.651237"
qualified_relations:
  ror:04cvxnb49:
    roles:
      # owner
      - marcrel:own

This is the conversion code:

linkml-convert -s concepts.trr379.de/src/base/unreleased.yaml --target-class Instrument -t ttl trr379-knowledge/metadata/base-unreleased/Instrument/5ad/debee98cfea2060883bcc238d5097106e548c.yaml

This is the output:

@prefix dlprops: <https://concepts.datalad.org/s/properties/unreleased/> .
@prefix dlres: <https://concepts.datalad.org/s/resources/unreleased/> .
@prefix dlroles: <https://concepts.datalad.org/s/roles/unreleased/> .
@prefix dlspatial: <https://concepts.datalad.org/s/spatial/unreleased/> .
@prefix dlthings: <https://concepts.datalad.org/s/things/v1/> .
@prefix marcrel: <http://id.loc.gov/vocabulary/relators/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://trr379.de/instruments/fra-cobic-mri-prisma-3t> a dlres:Instrument ;
    dlprops:about "sct:816077007"^^xsd:anyURI ;
    dlprops:short_name "Prisma" ;
    dlprops:title "Siemens Prisma 3 Tesla MR-Scanner" ;
    dlroles:qualified_relations [ a dlroles:Relationship ;
            rdf:object "ror:04cvxnb49"^^xsd:anyURI ;
            dlroles:roles marcrel:own ] ;
    dlthings:characterized_by [ a dlthings:Statement ;
            rdf:object "geo:50.092724,8.651237"^^xsd:anyURI ;
            rdf:predicate dlspatial:at_location ] ;
    dlthings:description "MRI scanner at the Cooperative Brain Imaging Center in Frankfurt." .
And here's another example for a `Project`:

Data:

id: trr379:projects/q02
title: Data management for computational modeling
description: >-
  This is the information management project (INF) of the TRR379.
short_name: Q02
started_at: "2024-10-01"
characterized_by:
  - predicate: CiTO:citesAsAuthority
    object: https://www.trr379.de/projects/q02
identifiers:
  # TRR-created project code
  - creator: trr379:.
    notation: Q02
    schema_agency: TRR379
    schema_type: dlidentifiers:IssuedIdentifier
informed_by:
  # Q02 facilitates what Q01 needs
  - trr379:projects/q01
associated_with:
  # TRR379
  - trr379:.
  # DFG
  - ror:018mejw64
qualified_relations:
  trr379:contributors/michael-hanke:
    roles:
      - trr379:roles/pi
      # lead
      - marcrel:led
      # project director
      - marcrel:pdr
      # contributor
      - marcrel:ctb
      # metadata contact
      - marcrel:mdc
      # process contact
      - marcrel:prc
      # pprogrammer
      - marcrel:prg
  trr379:contributors/christine-ecker:
    roles:
      - trr379:roles/pi
      - marcrel:led
      - marcrel:ctb
  trr379:contributors/gabriele-ende:
    roles:
      - trr379:roles/pi
      - marcrel:led
      - marcrel:ctb
  trr379:contributors/klaus-mathiak:
    roles:
      - trr379:roles/pi
      - marcrel:led
      - marcrel:ctb
  # DFG is funder/sponsor
  ror:018mejw64:
    roles:
      - marcrel:fnd
      - marcrel:spn
  # Q02 sub-grant
  gepris:projekt/546006540:
    roles:
      - schema:funding
  # TRR379 umbrella grant
  gepris:projekt/512007073:
    roles:
      - schema:funding
  # host (supporting) organizations and clients
  # FZJ
  ror:02nv7yv05:
    roles:
      - marcrel:sht
      - marcrel:his
      - marcrel:cli
  # RWTH
  ror:04xfq0f34:
    roles:
      - marcrel:sht
      - marcrel:his
      - marcrel:cli
  # ZI
  ror:01hynnt93:
    roles:
      - marcrel:sht
      - marcrel:his
      - marcrel:cli
  # Uni FRA
  ror:04cvxnb49:
    roles:
      - marcrel:sht
      - marcrel:his
      - marcrel:cli

Code:

linkml-convert -s concepts.trr379.de/src/base/unreleased.yaml --target-class Project -t ttl trr379-knowledge/metadata/base-unreleased/Project/ddf/c1e610fe6ba35fd89b282f4b0961892ed1ec9.yaml

Output:

@prefix CiTO: <http://purl.org/spar/cito/> .
@prefix dlidentifiers: <https://concepts.datalad.org/s/identifiers/unreleased/> .
@prefix dlprops: <https://concepts.datalad.org/s/properties/unreleased/> .
@prefix dlprov: <https://concepts.datalad.org/s/prov/unreleased/> .
@prefix dlroles: <https://concepts.datalad.org/s/roles/unreleased/> .
@prefix dlsocial: <https://concepts.datalad.org/s/social/unreleased/> .
@prefix dltemporal: <https://concepts.datalad.org/s/temporal/unreleased/> .
@prefix dlthings: <https://concepts.datalad.org/s/things/v1/> .
@prefix marcrel: <http://id.loc.gov/vocabulary/relators/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ror: <https://ror.org/> .
@prefix schema1: <http://schema.org/> .
@prefix w3ctr: <https://www.w3.org/TR/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://trr379.de/projects/q02> a dlsocial:Project ;
    dlidentifiers:identifier [ a dlidentifiers:IssuedIdentifier ;
            dlidentifiers:creator "trr379:."^^xsd:anyURI ;
            dlidentifiers:notation "Q02" ;
            dlidentifiers:schema_agency "TRR379" ] ;
    dlprops:short_name "Q02" ;
    dlprops:title "Data management for computational modeling" ;
    dlprov:associated_with ror:018mejw64,
        <https://trr379.de/.> ;
    dlprov:informed_by <https://trr379.de/projects/q01> ;
    dlroles:qualified_relations [ a dlroles:Relationship ;
            rdf:object "ror:02nv7yv05"^^xsd:anyURI ;
            dlroles:roles marcrel:cli,
                marcrel:his,
                marcrel:sht ],
        [ a dlroles:Relationship ;
            rdf:object "ror:018mejw64"^^xsd:anyURI ;
            dlroles:roles marcrel:fnd,
                marcrel:spn ],
        [ a dlroles:Relationship ;
            rdf:object "trr379:contributors/klaus-mathiak"^^xsd:anyURI ;
            dlroles:roles marcrel:ctb,
                marcrel:led,
                <https://trr379.de/roles/pi> ],
        [ a dlroles:Relationship ;
            rdf:object "trr379:contributors/gabriele-ende"^^xsd:anyURI ;
            dlroles:roles marcrel:ctb,
                marcrel:led,
                <https://trr379.de/roles/pi> ],
        [ a dlroles:Relationship ;
            rdf:object "gepris:projekt/512007073"^^xsd:anyURI ;
            dlroles:roles schema1:funding ],
        [ a dlroles:Relationship ;
            rdf:object "trr379:contributors/michael-hanke"^^xsd:anyURI ;
            dlroles:roles marcrel:ctb,
                marcrel:led,
                marcrel:mdc,
                marcrel:pdr,
                marcrel:prc,
                marcrel:prg,
                <https://trr379.de/roles/pi> ],
        [ a dlroles:Relationship ;
            rdf:object "ror:04xfq0f34"^^xsd:anyURI ;
            dlroles:roles marcrel:cli,
                marcrel:his,
                marcrel:sht ],
        [ a dlroles:Relationship ;
            rdf:object "trr379:contributors/christine-ecker"^^xsd:anyURI ;
            dlroles:roles marcrel:ctb,
                marcrel:led,
                <https://trr379.de/roles/pi> ],
        [ a dlroles:Relationship ;
            rdf:object "gepris:projekt/546006540"^^xsd:anyURI ;
            dlroles:roles schema1:funding ],
        [ a dlroles:Relationship ;
            rdf:object "ror:01hynnt93"^^xsd:anyURI ;
            dlroles:roles marcrel:cli,
                marcrel:his,
                marcrel:sht ],
        [ a dlroles:Relationship ;
            rdf:object "ror:04cvxnb49"^^xsd:anyURI ;
            dlroles:roles marcrel:cli,
                marcrel:his,
                marcrel:sht ] ;
    dltemporal:started_at "2024-10-01"^^w3ctr:NOTE-datetime ;
    dlthings:characterized_by [ a dlthings:Statement ;
            rdf:object "https://www.trr379.de/projects/q02"^^xsd:anyURI ;
            rdf:predicate CiTO:citesAsAuthority ] ;
    dlthings:description "This is the information management project (INF) of the TRR379." .

Analysis

For the Instrument example:

  • The output TTL does not contain the trr379 prefix
  • The TRR379 base schema contains the trr379 prefix declaration, and it is included in emit_prefixes
  • The data has id: trr379:instruments/fra-cobic-mri-prisma-3t, and the output has this resolved to a named node: <https://trr379.de/instruments/fra-cobic-mri-prisma-3t> a dlres:Instrument ;
  • The schema contains the ror and sct prefix declarations, and they are not included in the output TTL (same as trr379). I added these to the emit_prefixes as well, and this made no difference to the output
  • While the conversion resolves the trr379 for the id: field in the data, the same does not happen for other fields that include the ror or sct prefixes, e.g. we get dlprops:about "sct:816077007"^^xsd:anyURI (i.e. literal with datatype) from:
    about:
    # MRI of the brain
    - sct:816077007
    

For the Project example:

  • The case of the trr379 remains the same, although with additional notes:
    • The TRR379 base schema contains the trr379 prefix declaration, and it is included in emit_prefixes
    • The output TTL does not contain the trr379 prefix
    • The prefix is resolved in the output for the id: field
    • This data sample also uses this prefix in an additional places, and its resolution differs substantially sometimes being a literal with xsd:anyURI datatype, sometimes being a resolved URI, sometimes being a CURIE.
  • The ror prefix provides follows the same type of variability

Thoughts:

  • could the difference between an available and missing prefix in RDF output be that the prefix is defined in the upstream schema vs not?
  • it seems that mostly (although not consistently) the output triples that have a named node or literal as object will have the resolved URI (of the missing prefix, e.g. trr379) or curie (of an available prefix, e.g. ror), e.g. dlprov:associated_with <https://trr379.de/.> or dlprov:associated_with ror:018mejw64, while triples with blank nodes as objects (i.e. nested objects in the data) will themselves contain literals with the CURIE format and xsd:anyURI. An example:
   dlroles:qualified_relations [ a dlroles:Relationship ;
            rdf:object "ror:02nv7yv05"^^xsd:anyURI ;
            dlroles:roles marcrel:cli,
                marcrel:his,
                marcrel:sht ],
        [ a dlroles:Relationship ;
            rdf:object "ror:018mejw64"^^xsd:anyURI ;
            dlroles:roles marcrel:fnd,
                marcrel:spn ],
        [ a dlroles:Relationship ;
            rdf:object "trr379:contributors/klaus-mathiak"^^xsd:anyURI ;
            dlroles:roles marcrel:ctb,
                marcrel:led,
                <https://trr379.de/roles/pi> ],

Although this example also shows a nested resolved <https://trr379.de/roles/pi>, which shows the inconsistency.

@jsheunis
Copy link
Contributor Author

jsheunis commented Feb 7, 2025

Further things that I tried that made no difference:

  • temporarily switching to relative imports in the trr379 base schema
  • adding the --prefix-file flag to the linkml-convert command, pointing to a yaml file that contains all prefixes also defined in the schema, including the ones that weren't in the ttl output
  • adding --prefix 'trr379=https://trr379.de/' to the linkml-convert command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant