Skip to content

IRI introspection #228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ajnelson-nist opened this issue Feb 5, 2025 · 19 comments
Open

IRI introspection #228

ajnelson-nist opened this issue Feb 5, 2025 · 19 comments
Labels
UCR Use Cases and Requirements

Comments

@ajnelson-nist
Copy link
Contributor

There is a function available in SPARQL that I do not believe is available in SHACL 1.0. If this actually is available somewhere or already under draft for 1.2, I very much welcome a link.

I have some SHACL shapes I would like to write that, for various purposes, require some review of the spelling of the subject IRI.

Use case 1

Use case 1 is a prescribed-name-form checker. With an ontology I work with, there is an expectation that owl:NamedIndividuals using the ontology should end with a UUID if they have no other practice in place to prevent IRI collisions. E.g., http://example.org/kb/Thing-36b67df4-1a42-4588-808a-19dfb79efbeb. My understanding is the only way to enforce this in SHACL 1.0 is a SPARQL constraint, because in SPARQL we can review the IRI as a string with STR($this). For reference, the shape we use1 is here23, but for saving a click, the form SPARQL constraint reads like this:

[]
            a sh:SPARQLConstraint ;
            rdfs:seeAlso <https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.3> ;
            sh:message "UcoThings are suggested to end with a UUID."@en ;
            sh:select """
			PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
			PREFIX core: <https://ontology.unifiedcyberontology.org/uco/core/>
			SELECT $this
			WHERE {
			        FILTER (
			                ! REGEX (
			                        STR($this),
			                        "[0-9a-f]{8}-[0-9a-f]{4}-[0-5][0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$",
			                        "i"
			                )
			        )
			}
		""" ] ;

Use case 2

Use case 2 is infer-able class assignment based on the URL form. For instance, suppose an RDF data model wanted to model GitHub's Issues and Pull Requests as ex:Issue and ex:PullRequest. If a owl:NamedIndividual follows the pattern https://github.com/[^/]+/[^/]/issues/\d+ (with whatever regex escaping's needed to make that work), I should be able to write a sh:TripleRule that assigns the type ex:Issue. Again, using SPARQL's STR($this), I can write a CONSTRUCT query to handle this, but I don't see an easy way to target based only on IRI spelling.

I should be able to take a graph like this:

<https://github.com/w3c/data-shapes/issues/228>
a ex:Issue ;
ex:mentions <https://github.com/w3c/data-shapes/issues/227> .

and from exactly those triples in the data graph, entail this for the yet-untyped node that's the object of ex:mentions:

<https://github.com/w3c/data-shapes/issues/227>
a ex:Issue .

Use case 3

Not my use case - @philharveyonline posted #227 , which proposes generation of new nodes. Some kind of functionality would be needed to name the created node. The SPARQL IRI(...) function would let him pull together a string and cast the string into a new node. But, I don't think there's functionality in SHACL 1.0 or SHACL-AF to do this.

Proposal

For SHACL 1.2 Core, my use cases 1 and 2 would benefit from letting a target node's lexical value be reviewed by sh:pattern. This suggests to me some kind of use of a sh:PropertyShape, with either sh:path accepting a special BNode-housed predicate like sh:inversePath ([ sh:into sh:this ]?) or a new sibling property for sh:path (sh:nodeIRI?).

I don't quite grok node expressions well enough yet to know whether the sh:PropertyShape mentality would work for #227 .

Footnotes

  1. Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

  2. Apologies for the distraction, but in case you notice it, I do suspect the query's a/rdfs:subClassOf* part is superfluous. There's no need to discuss that here, as I'll be discussing it with that community soon.

  3. A second aside, I am aware of the obsoleting RFC 9562. Another thing I'll be discussing with that community soon.

@ajnelson-nist
Copy link
Contributor Author

Editors, I was light on Labels for this issue. Please tag as appropriate.

@tfrancart
Copy link

You can already assign an sh:pattern to NodeShapes to check their IRI structure. Isn't it what you're looking for ?

shacl-ep:OrganizationNode a sh:NodeShape;
  sh:nodeKind sh:IRI;
  sh:pattern "^https://data.europarl.europa.eu/org/.*$" .

@tpluscode
Copy link
Contributor

Isn't sh:pattern only intended to work with literals?

@tfrancart
Copy link

No. It works on IRI too. We use it a lot.

@afs
Copy link
Contributor

afs commented Feb 5, 2025

sh:pattern is defined by:

The values of sh:pattern in a shape are valid pattern arguments for the SPARQL REGEX function.

and SPARQL REGEX takes string arguments so an IRI is not legal by the spec - although it is a natural extension.

See also #221

@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented Feb 5, 2025

@tfrancart , I had no idea that would work. But I just ran a test, and it did in a certain implementation (which I decline to, and will not, identify). My use case 1 is already satisfied by moving sh:pattern to a node shape. I wasn't understanding it because my read of the documentation missed that $value can also be the subject node.

I'll tinker with use case 2 a little later today.

Edited to note my test is, and will remain, undocumented.

@tpluscode
Copy link
Contributor

tpluscode commented Feb 5, 2025

sh:pattern is defined by:

The values of sh:pattern in a shape are valid pattern arguments for the SPARQL REGEX function.

Right. And the first argument is a defined as literal.

# this is valid
BIND(regex(str(<>), 'foo') as ?foo)

# but this is not
BIND(regex(<>, 'foo') as ?foo)

@tfrancart
Copy link

sh:pattern is defined by:
The values of sh:pattern in a shape are valid pattern arguments for the SPARQL REGEX function.
and SPARQl REGEX takes string arguments so an IRI is not legal by the spec - although it is a natural extension.

This should then definitely be fixed in the spec, probably by saying that the argument is wrapped in the SPARQL STR() function (?)

@afs
Copy link
Contributor

afs commented Feb 5, 2025

@tfrancart - which implementation do you use ?

The sh:pattern tests "pattern*" don't cover IRIs:

https://github.com/w3c/data-shapes/tree/gh-pages/data-shapes-test-suite/tests/core/node
https://github.com/w3c/data-shapes/tree/gh-pages/data-shapes-test-suite/tests/core/property

Correction from below: node/pattern-001.ttl dpes cove the case of validating an IRI.

@tfrancart
Copy link

@tfrancart - which implementation do you use ?

TopQuadrant SHACL lib

@afs
Copy link
Contributor

afs commented Feb 5, 2025

This should then definitely be fixed in the spec, probably by saying that the argument is wrapped in the SPARQL STR() function (?)

which would also cover numbers, date(times), lang strings, ... because STR accesses the lexical form.

sh:pattern is no longer an implicit 'sh:datatype xsd:string`.

The charter does not restrict this WG to maintain compatibility with published SHACL 1.0. So if that is common implementation practice, it is doable.

@ajnelson-nist
Copy link
Contributor Author

@tfrancart - which implementation do you use ?

The sh:pattern tests "pattern*" don't cover IRIs:

https://github.com/w3c/data-shapes/tree/gh-pages/data-shapes-test-suite/tests/core/node https://github.com/w3c/data-shapes/tree/gh-pages/data-shapes-test-suite/tests/core/property

Isn't this highlighted line a test of sh:pattern against an IRI ...

and this line the corresponding XFAIL result?

@afs
Copy link
Contributor

afs commented Feb 5, 2025

Isn't this highlighted line a test of sh:pattern against an IRI ...

Yes, your right. Spec != tests.

:data2 :q <uri:foo> .
:data2 :q <uri:bar> .

:shape1
    a sh:nodeShape ;
    sh:targetNode :data2 ;
    sh:property [
        sh:path :q ;
        sh:not [
            sh:pattern "bar$" ;
        ] ;
    ] ;
    .

gives (Apache Jena) one violation.

@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented Feb 5, 2025

I agree that this is worth clarifying or revising in the spec. Maybe we can prescribe backward from the tests?

That XFAIL sole line I highlighted should have had a few following lines highlighted too:

sh:focusNode ex:Test ;
sh:resultSeverity sh:Violation ;
sh:sourceConstraintComponent sh:PatternConstraintComponent ;

The current way it XFAILs in the test is by sh:sourceConstraintComponent sh:PatternConstraintComponent. Should the XFAIL here automatically be sh:sourceConstraintComponent sh:NodeKindConstraintComponent ? OR, should the spec relax to let IRIs be auto-cast to strings when a sh:pattern's at play?

I see the pragmatism argument for the latter, but then there's tension vs. SPARQL's definition and how much SHACL-Core wants to stay aligned with SPARQL. Possibly tension vs. something upstream in the RDF spec too, though I don't have a spec. section in mind there.

Edited to fix the spelling of the suggested constraint component.

@tpluscode
Copy link
Contributor

FWIW, I would totally like that sh:pattern officially worked for IRIs. I always found that an unnecessary limitation

@tfrancart
Copy link

FWIW, I would totally like that sh:pattern officially worked for IRIs.

I actually thought it was always the case. European Parliament does use it for all its IRI pattern specifications, e.g. see https://europarl.github.io/eli-ep/#eli:Work

Image

(and they add a skos:example IRI in their spec, too)

@afs
Copy link
Contributor

afs commented Feb 5, 2025

So this does not get overlooked later - #230 labelled "errata".

ajnelson-nist added a commit that referenced this issue Feb 5, 2025
Per discussion on Issues 228 and 230.

References:
* #228
* #230

Signed-off-by: Alex Nelson <[email protected]>
@HolgerKnublauch
Copy link
Contributor

I think this one can be closed with the discussion continuing elsewhere, @ajnelson-nist ?

@afs
Copy link
Contributor

afs commented Feb 6, 2025

continuing elsewhere

If @ajnelson-nist agrees, turn use case 2 into a new issue, labelled "inference" and "UCR".

There's more than sh:pattern in it.

(This is possibily another reason for multiple repos)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UCR Use Cases and Requirements
Projects
None yet
Development

No branches or pull requests

5 participants