Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPath] Error-free selection operator for maps or arrays, or finite-domain functions #341

Closed
dnovatchev opened this issue Feb 8, 2023 · 16 comments

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 8, 2023

In March 2021 Jarno Elovirta raised on the #general channel of the XML.com Slack the problem that the existing map or array lookup operator "?" prevents a free traversal of a nested mapp/array object. For example, this expression results in error:

[
  map {"k0": 1}, 
  map{"k0": [1, 2, 3]}
]  ?* ?("k0")  ?*

[XPTY0004] Input of lookup operator must be map or array: 1.


There are three possible types of reaction to this problem:

  1. Do nothing

  2. Relax the semantics of the map/array lookup operator "?" so that it can be applied on items of non-map/non-array type and in such case produce the empty sequence.

  3. Introduce a similar operator to "?" that will behave as it, but instead of producing an error when applied on items of non-map/non-array type it produces the empty sequence.

Obviously, we are not advocating the 1st choice above, or otherwise we wouldn't be raising any issue 😄

Choice 2 could be implemented, but this would have a few drawbacks:

  • it would bring a certain degree of backwards incompatibility
  • "silently returning nothing" is really difficult to debug or even notice unexpected results, as pointed out by @michaelhkay

This proposal is to choose alternative 3. above.

Why is it better than the 2nd one?

  • No incompatibility can be introduced, as this is a new operator.
  • The user has intentionally chosen this operator over the "?" operator, and this means that the user is well aware of the new, sometimes tricky to observe/explain/debug behavior, but the user doesn't mind these effects and is ready to deal with them.

Definition

By definition the operator "->" with left-hand-side any expression E and right-hand-side a literal string X:

   E -> X

is lexically expanded to:

   E[. instance of map(*) or . instance of array(*)]?X

Example

With the original expression provided by Jarno Elovirta, but now using the "->" operator:

[
  map {"k0": 1}, 
  map{"k0": [1, 2, 3]}
]  ->* ->("k0")  ->*

its evaluation produces the expected result (all the values within just one of the leaves of the tree), and no error:

1, 2, 3

That is, 1 ->* produces the empty sequence and no error.

Note:

Of course, the above example can be rewritten to this equivalent XPath 3.0 expression and will get the wanted result, but literally no one, myself included, will ever write this:

[
 map {"k0": 1}, 
 map{"k0": [1, 2, 3]}
] [. instance of map(*) or . instance of array(*)]      ?*
           [. instance of map(*) or . instance of array(*)]      ?k0
                                [. instance of map(*) or . instance of array(*)]   ?*

image

Thus this is all about making it possible/feasible and empowering our users!

@michaelhkay
Copy link
Contributor

michaelhkay commented Feb 8, 2023 via email

@michaelhkay
Copy link
Contributor

michaelhkay commented Feb 8, 2023

Thinking about the JSON context, I do think that it's probably common to use singletons and arrays interchangeably:

phone: '012345678'

vs:

phone: ['012345678', '98765432']

and with that in mind, a function such as array:wrap($x) which wraps $x into an array if it is not already an array might be a useful convenience.

The use case would then become $input ?* ?k0 => array:wrap() ?*

@ChristianGruen
Copy link
Contributor

I've proposed in another issue introducing an abstract supertype for maps and arrays; it's hard to find a good name but I would suggest "tablet". Then ?tablet() would select both maps and arrays. (Note, we would need to add this to the list of reserved function names that can't be used unprefixed).

If we extend the lookup operator to functions, as suggested in #51, we could possibly use function().

@dnovatchev
Copy link
Contributor Author

I think what is needed for this use case is to provide something more selective than ?. The main problem is that filtering the results of ? to return only maps or arrays (or both) is excessively verbose.

I cannot understand why the clearly stated problem is being tweaked into something it isn't.

Isn't the proposed syntax in the proposed solution: ->("k0") more selective than ?*

@michaelhkay
Copy link
Contributor

michaelhkay commented Feb 8, 2023

I cannot understand why the clearly stated problem is being tweaked into something it isn't.

Unfortunately I couldn't find Jarno's original statement of the problem, so I was having to guess what result he wanted to achieve. You showed an expression that doesn't work, but you didn't say what the original problem was.

In the end I reverse-engineered it to two possible problems: one requires us to ignore the things on the LHS that aren't maps or arrays; the other requires us to treat the things on the LHS that aren't maps or arrays as if they were arrays of length one.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Feb 8, 2023

In the end I reverse-engineered it to two possible problems: one requires us to ignore the things on the LHS that aren't maps or arrays; the other requires us to treat the things on the LHS that aren't maps or arrays as if they were arrays of length one.

I believe this proposal addresses the former problem, not the latter. Thus the proposed operator -> fully corresponds to the '/' operator in the realm of XPath path expressions, and to the C# null-conditional operator ?. operator.

And this is what really makes sense, doesn't it?

@ChristianGruen
Copy link
Contributor

To be honest, I'm confused:

That is, 1 ->* produces the empty sequence and no error.

Is that really something you would expect in practice? Do we believe the use case is common enough to justify yet another syntactic sugar?

@dnovatchev
Copy link
Contributor Author

To be honest, I'm confused:

That is, 1 ->* produces the empty sequence and no error.

Is that really something you would expect in practice? Do we believe the use case is common enough to justify yet another syntactic sugar?

This is no more different than an XPath path expression. With this XML document:

<a>
 <b>1</b>
 <b>
   <c>2</c>
 </b>
</a>

we can evaluate the expression /a/b/c and it raises no error for b/c on the first instance of <b>.

The evaluation simply ignores 1, because it is not an element, the same way ?* ?("k0") ?* ignores 1 because it is not a map or an array.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Feb 8, 2023

This is no more different than an XPath path expression. With this XML document:

/ operates on nodes, and ? operates on maps/arrays. In the example, we’re looking at an atomic value, so I don’t understand the analogy.

Thus the proposed operator -> fully corresponds to the '/' operator in the realm of XPath path expressions, and to the C# null-conditional operator ?. operator.

Also in C#, I would expect 1?.x to return an error, as the LHS is a primitive value. Maybe I am wrong?

@dnovatchev
Copy link
Contributor Author

This is no more different than an XPath path expression. With this XML document:

/ operates on nodes, and ? operates on maps/arrays. In the example, we’re looking at a primitive value, so I don’t understand the analogy.

Thus the proposed operator -> fully corresponds to the '/' operator in the realm of XPath path expressions, and to the C# null-conditional operator ?. operator.

Also in C#, I would expect 1?.x to return an error, as the LHS is a primitive value. Maybe I am wrong?

OK, here is a better corresponding XPath expression:

let $doc :=

<a>
 <b>1</b>
 <b>
   <c>2</c>
 </b>
</a>

return
   $doc/b/node()/self::c

The evaluation of this expression produces the expected:

<c>2</c>

and although it is supposed to evaluate 1/self::c , no error is produced.

@michaelhkay
Copy link
Contributor

Some observations (not an attempted resolution):

The "/" operator is defined to operate on all nodes; it fails if the LHS is not a node.

The various axes are defined to operate on all nodes (as are accessors such as name()). For example, writing @x/@y is legal, and returns an empty sequence. (Saxon gives you a warning).

There are benefits in having the axis navigation be a closed system in this way: using an axis always returns nodes, if you have nodes you can apply further axes. It does mean you can write nonsense like @x/@y, but there are still some benefits.

A tree constructed by parsing JSON doesn't have this same closure property -- you can navigate to things (strings, numbers, booleans) from which further navigation is not possible.

It's not so much the "/" operator that has this closure property, as the axes (which are all functions from nodes to nodes).

We've been looking at other issues that suggest navigation within a JSON tree should retain information about where you got to (which you lose by simply returning a string/number/boolean). Perhaps this problem is related. If ?* returned a string decorated with information about its location in the tree, then we could allow further navigation from that location.

@dnovatchev
Copy link
Contributor Author

This is no more different than an XPath path expression. With this XML document:

/ operates on nodes, and ? operates on maps/arrays. In the example, we’re looking at a primitive value, so I don’t understand the analogy.

Thus the proposed operator -> fully corresponds to the '/' operator in the realm of XPath path expressions, and to the C# null-conditional operator ?. operator.

Also in C#, I would expect 1?.x to return an error, as the LHS is a primitive value. Maybe I am wrong?

OK, here is a better corresponding XPath expression:

let $doc :=

<a>
 <b>1</b>
 <b>
   <c>2</c>
 </b>
</a>

return
   $doc/b/node()/self::c

The evaluation of this expression produces the expected:

<c>2</c>

and although it is supposed to evaluate 1/self::c , no error is produced.

And the following expression:

let $doc :=

<a>
 <b>1</b>
 <b>  5
   <c>2
     <d>3</d>
   </c>
 </b>
</a>

return
  (    
   $doc/b/node()/d
 )

doesn't raise any errors for 1/d or 5/d , it just gives us the wanted result:

<d>3</d>

@ChristianGruen
Copy link
Contributor

I even believe the null-conditional operator in C# offers what we currently have in XPath:

  • It accepts data structures (maps, arrays) as LHS operands, and it rejects atomic items/primitives.
  • Both $xpath?X and csharp?.X don’t raise an error if the LHS is empty/null, or does not contain X.

I wonder if we want to enforce new implicitness in the language. If we envisioned an entirely new language, we could treat all types equally and define a single operator for looking up functions, nodes and atomic items. For example, //A could give us a string with the value A or single map entries with the key 'A' instead of its value… etc.:

[
  'A',
  map { 'nodes': (<A/>, [ <A>B</A> ]) },
  map { 'A': 1, 'B': 2 },
]//A
→ ('A', <A/>, <A/>B</A>, map { 'A': 1 })

The implicit semantics would be comfortable (as implicitness mostly is), but type-safe-loving developers might condemn us.

@dnovatchev
Copy link
Contributor Author

I even believe the null-conditional operator in C# offers what we currently have in XPath:

  • It accepts data structures (maps, arrays) as LHS operands, and it rejects atomic items/primitives.
  • Both $xpath?X and csharp?.X don’t raise an error if the LHS is empty/null, or does not contain X.

I wonder if we want to enforce new implicitness in the language. If we envisioned an entirely new language, we could treat all types equally and define a single operator for looking up functions, nodes and atomic items. For example, //A could give us a string with the value A or single map entries with the key 'A' instead of its value… etc.:

[
  'A',
  map { 'nodes': (<A/>, [ <A>B</A> ]) },
  map { 'A': 1, 'B': 2 },
]//A
→ ('A', <A/>, <A/>B</A>, map { 'A': 1 })

The implicit semantics would be comfortable (as implicitness mostly is), but type-safe-loving developers might condemn us.

Not sure I want the same navigational operator both for XDM nodes and for maps/arrays -- maybe this would be too-confusing.

Let's be pragmatic: we need for trees formed by map/arrays, navigational capabilities that are similar in expressiveness and convenience as those provided by the XPath operator "/" for path-expressions.

@michaelhkay is right that axes are what causes the expression and evaluation of a path-expression to filter-out unwanted kinds of nodes during the navigation.

If the "child::" axis in maps/arrays navigation means "only a map or an array", and if "child::" is the main (default) axis, then we achieve the same convenience in writing a map/array path expression as we have with XPath path expressions.

@dnovatchev
Copy link
Contributor Author

Closed as we now have a more complete proposal: "CompPath (Composite-objects path) Expressions" published in issue 350

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Feb 16, 2023

Closed as we now have a more complete proposal: "CompPath (Composite-objects path) Expressions" published in issue 350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants