Skip to content

Refining RDFTerm-equals #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
afs opened this issue Jan 24, 2025 · 8 comments · Fixed by #194
Closed

Refining RDFTerm-equals #187

afs opened this issue Jan 24, 2025 · 8 comments · Fixed by #194
Assignees
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial

Comments

@afs
Copy link
Contributor

afs commented Jan 24, 2025

#185 shows that RDFTerm-equals needs some attention.
(#25 is the issue for renaming this. Trying it out here.)

In the definition of RDFTerm-equals says:

The function is defined as follows:

  • Returns TRUE if term1 and term2 are equal RDF terms, as defined below.
  • Produces a type error if term1 and term2 are both literals having the same datatype IRI; this datatype IRI is not in the set of recognized datatype IRIs; and the lexical forms of the two literals are different from one another.
  • Returns FALSE otherwise.

Bullet 2 applies if the terms have the same dataype IRI. What if there are two diffrerent but related datatype IRIs?

"recognized" is about the data generally, and so could apply elsewhere such a comparision (if it matters), not just this function.
Currently, there is a proposal to remove it from RDF Concepts because it is also defined by RDF Semantics.

We would be better having a SPARQL-focused definition we can limit to this function (e.g. a system may be able to determine equality in some, but not all, cases).

Example 1

sameValue("0.13"^^xsd:precisionDecimal, "0.13"^^xsd:decimal) 
  • Not dispatched by the operator mapping table.
  • Different terms.
  • Not the same datatype IRI.

Returns false.
But they are the same value - and only error allows extension.

(xsd:precisionDecimal is not derived from xsd:decimal nor the other way round)

Example 2

sameValue("IV"^^my:romanNumeral, "4"^^xsd:decimal)
  • Not dispatched ("IV"^^my:romanNumeral is not mentioned in Operand Data Types
  • Different terms.
  • Different datatypes.

Returns false.

my:romanNumeral can't be made derived because the lexical space is not digits.

Example 3

sameValue("2025-01-01T00:00:00Z"^^xsd:dateTime, "2025-01-01T00:00:00Z"^^xsd:dateTimeStamp) 

Returns False.

xsd:dateTimeStamp is a derived datatype of xsd:dateTime but the spec only covers numeric derived types.

Example 4

Systems involving units (different value space to numbers)

sameValue("32"^^x:meters, "3200"^^y:centimeters)

Not error so can't be an extensions. This is an erratum.

Example 5

sameValue("abc"^^xsd:string, "IV"^^my:romanNumeral)

All it takes is to know to return FALSE is that my:romanNumeral is a number or not a legal value, without

Not error so can't be an extension. This is an erratum IMO.

Example 6

sameValue("abc"^^xsd:integer, "123"^^xsd:integer)

Probably, this should be FALSE because the operator mapping does apply to = and a SPARQL processor must understand xsd:integer.
It could be error because it can't be 123. c.f.

sameValue(1 + "abc"^^xsd:integer, "123"^^xsd:integer)

which is an error before sameValue is invoked.

Proposal

Instead of adding one or more cases, I propose defining RDFTerm-equals more in the style of "by contract"
(not exact wording for now): in order:

  1. If one or both arguments are known to be ill-typed, then error.
  2. If the arguments are the sameTerm then return TRUE.
  3. If the SPARQL processor can determine the values of both terms and it can determine the values are equal, then return TRUE.
  4. If the SPARQL processor can determine the values of the terms can not be equal, the return FALSE.
  5. Otherwise error.

The 4th case does not necessarily require values themselves because a processor may know they are different value spaces so can't be value-equal. Case 1 removes know-to-ill-typed.

Because of the order of cases 1 and 2, "sameTerm=true" does not imply "sameValue=true". They could be swapped so that is the case.

I don't think there can be a perfect solution in all cases - we are going from terms to values with incomplete knowledge in some situations.

@afs afs added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Jan 24, 2025
@afs afs self-assigned this Jan 24, 2025
@rubensworks
Copy link
Member

The note in https://www.w3.org/TR/sparql12-query/#func-RDFterm-equal starts with:

An extended implementation may support additional datatypes for literals.

Based on my interpretation, this note indicates that implementations can choose to support the examples you listed above.

Unless the point of this issue is that this note should be removed in favor of a more precise definition?

@afs
Copy link
Contributor Author

afs commented Jan 24, 2025

Based on my interpretation, this note indicates that implementations can choose to support the examples you listed above.

Unless the point of this issue is that this note should be removed in favor of a more precise definition?

Yes, a more precise definition (better coverage of possible cases) - extensions can added where there is error. That's why some of the examples can be seen as errata - e.g. two different extension datatypes.

Leave the note, even expand it. It is explaining what is going on in more accessible language.

@hartig
Copy link
Contributor

hartig commented Feb 18, 2025

I am starting to look into this one now (apologies for the delay).

@afs two questions to start with:

  1. Regarding Examples 4 and 5, I understand that these are not errors (because they are covered by the "Returns FALSE otherwise." case of the current definition of RDFterm-equal) and, thus, they cannot be an extension. But why do you say that they are an erratum? Or did you want to say that you consider it a mistake in the spec that these cases are not treated as errors (which would make it possible to define extensions if they were treated as errors)?
  2. I think there are two aspects to the issue as a whole (ignoring the related issue about renaming RDFterm-equal): One is that the current definition of RDFterm-equal relies on the notion of "recognized datatype IRIs", which has been removed completely from RDF-Concepts now. The second aspect is your question about "what if there are two diffrerent but related datatype IRIs?" I think these two aspects are somewhat orthogonal and I would prefer we address the first aspect first; that is, we first fix the definition to account for the removal of the "recognized datatype IRIs" concept, without also addressing the second aspect at the same time. The question of different but related datatype IRIs should then be tackled afterwards. What do you think of such a separation of the issue into two sub-issues?

@afs
Copy link
Contributor Author

afs commented Feb 18, 2025

  1. Regarding Examples 4 and 5, I understand that these are not errors (because they are covered by the "Returns FALSE otherwise." case of the current definition of RDFterm-equal) and, thus, they cannot be an extension. But why do you say that they are an erratum? Or did you want to say that you consider it a mistake in the spec that these cases are not treated as errors (which would make it possible to define extensions if they were treated as errors)?

Maybe erratum is too definite. The vague text in query 1.1 does not give reasonable conditions for an extension (e.g. ill-typed literals can be an extension).

The first revision for SPARQL 1.2 addressed the general issues of rdfTerm-equals but required the two literals to be of the same datatype for an extension else false is required.

In ex4, a system can know that two different datatypes give the same value, or conversely are definitely not the same value.

ex5 is similar but an illustration of partial knowledge.

@afs
Copy link
Contributor Author

afs commented Feb 18, 2025

What do you think of such a separation of the issue into two sub-issues?

#194 deals with these together by having giving the contract for extensions.

I don't see the advantage of trying to split apart the proposed 4 conditions for separate discussions then putting them back together again.

I sketched the changes in the current #194 - I have started to remove the old text and editors notes for rdfTerm-equals (with the section reordering) and I'll push that ASAP. (PS Mostly done and pushed)

@Tpt
Copy link
Contributor

Tpt commented Feb 18, 2025

+1 to this proposal.

A problem with having clause "2. If the arguments are the sameTerm then return TRUE." at position 2 is that "NaN"^^xsd:double = "NaN"^^xsd:double is true whereas IEEE 754-2008 and XML schema datatype specify that NaN ≠ NaN.

@afs
Copy link
Contributor Author

afs commented Feb 18, 2025

NaN ≠ NaN.

Good point. And also ! (NaN ≠ NaN) is true.

The operator dispatch table will have sent doubles/floats to op:numeric-equal.
If sameValue becomes callable, we should call out this as an exception, , or note they are "same value" but not =.
Added to the "callable" editors note.

See also : https://www.w3.org/TR/rdf12-concepts/#dfn-literal-term-equality

@afs
Copy link
Contributor Author

afs commented Feb 23, 2025

(long) discussion point in an editors' note in #194 on "NaN"s.

tl;dr:

I don't think there is a consistent choice and there are arguments for/against any of true, false or error. I prefer the line of argument that "sameTerm implies sameValue" because of term pattern matching.

The spec has a note that = has an operator mapping to op:numeric-equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants