Maps & Arrays: Consistency & Terminology #1169

ChristianGruen · 2024-04-24T06:35:25Z

After the introduction of #1094 and #1159, and before adding more map/array operations, I think it’s time to get more serious about consistency and terminology. The current drafts employ a variety of terms that are not clearly defined, or separated from each other. We now have at least…

items, members, pairs, keys, values, entries

…which are sometimes used for maps, for arrays, or for both data structures. A first attempt to clean up, with reducing the overall effort:

A minor one: The modifier for lookups should be in singular form, analagous to node axes: item, key, value, pair.
While I first advocated the orthogonality principle for axes in lookup expressions, I now think we should stick to the existing terminology. Otherwise, we would need to revise many other existing parts of the spec. My suggestion would be to:

introduce member for arrays
only allow key, value and pair for maps
allow items for both maps and arrays

This would make it symmetric with a) the current terminology for maps and arrays, and b) enhanced for clauses, i.e. for member $m and for key $k value $v.

The reverse approach would be to drop for member $m and to also allow for key $k value $v for arrays (with for value replacing for member). In addition, we could have for pair.

With the introduction of the item axis, map:values and arrays:values should be renamed to map:items and array:items. → map:contents and array:contents, see #1179
I would suggest dropping array:members and array:of-members. The names don’t imply we’ll deal with records, and it’s not in line with for member $m either. If we want to keep these functions, we could rename them to array:pairs and array:of-pairs and add the integer positions as keys, and we should introduce and consistently use the term pair for maps and arrays.

Closely related: #826

The text was updated successfully, but these errors were encountered:

michaelhkay · 2024-04-24T08:25:19Z

A minor one: The modifier for lookups should be in singular form, analagous to node axes: item, key, value, pair.

I'm fine with that.

introduce member for arrays
only allow key, value and pair for maps
allow items for both maps and arrays
This would make it symmetric with a) the current terminology for maps and arrays, and b) enhanced for clauses, i.e. for member $m and for key $k value $v.

The minimal set would probably be

item (or content) for both maps/arrays
pair for maps
members for arrays

With the introduction of the item axis, map:values and arrays:values should be renamed to map:items and array:items.

Can't say I like the effect much. But I'm not happy with our over-use of value either. Perhaps content is better.

I would suggest dropping array:members and array:of-members. The names don’t imply we’ll deal with records, and it’s not in line with for member $m either. If we want to keep these functions, we could rename them to array:pairs and array:of-pairs and add the integer positions as keys, and we should introduce and consistently use the term pair for maps and arrays.

Yes that seems feasible.

Now the other thing in my mind is to try and unify this with labels. If we deliver the results of a lookup as maps containing key + value, why shouldn't the result also contain accessor functions equivalent to the fields in a label: specifically parent() and ancestors()? The two features definitely have significant overlap.

ChristianGruen · 2024-04-24T09:07:04Z

Now the other thing in my mind is to try and unify this with labels. If we deliver the results of a lookup as maps containing key + value, why shouldn't the result also contain accessor functions equivalent to the fields in a label: specifically parent() and ancestors()? The two features definitely have significant overlap.

Sounds reasonable (I haven’t spent time on/with pins and labels yet, I should do so soon). Would $map?pin::* or $map?label::* make sense? Regarding the naming, I felt similar to @cmsmcq that I would tend to associate “labels” with plain strings. Alternative terms for pins and labels that show their correlation could possibly be beneficial, now that we confront our poor (or brave) users with so many new concepts.

michaelhkay · 2024-04-24T14:19:02Z

I would suggest dropping array:members and array:of-members.

The main benefit of these functions is probably as primitives that can be used to define the semantics of all the other functions concisely.

For example we currently define array:join rather concisely as

array:of-members($arrays ! array:members(.))

Perhaps private functions could serve the same purpose. But if these functions are so useful as primitives, one feels that they would be useful tools for end-users as well.

A reminder of how we got here: I started this journey by defining a parcel as an item that encapsulates a sequence, with various possible representations, including perhaps as a zero-arity function, or perhaps with the option of making the internal representation entirely opaque. But that creates questions as to how "parcels" fit into the type system, so we ended up with a concrete representation (as record(value)) instead. In many ways I would be happier with the opaque concept.

ChristianGruen · 2024-04-24T16:53:48Z

The main benefit of these functions is probably as primitives that can be used to define the semantics of all the other functions concisely.

I still find the array:join/array:split variants better digestible, as exposed in #826 …but that’s just a matter of taste I guess. In particular, it’s the use of map constructors in the scope of array operations that seems confusing and unnecessarily verbose to me:

(: array { } :)
array:of-members($sequence ! map { 'value': . })
(: map:build :)
array:of-members($input ! map { 'value': $action(.) })
(: map:append :)
array:of-members((array:members($array), map { 'value': $member }))

(: vs :)
array:join($sequence ! array { . })
array:join($input ! array { $action(.) })
array:join((array:split($array), array { $member }))

Using a record constructor would possibly make the existing equivalencies, though.

A reminder of how we got here: I started this journey by defining a parcel as an item that encapsulates a sequence, with various possible representations, including perhaps as a zero-arity function, or perhaps with the option of making the internal representation entirely opaque. But that creates questions as to how "parcels" fit into the type system, so we ended up with a concrete representation (as record(value)) instead. In many ways I would be happier with the opaque concept.

True; I remember to have mentioned (one variant of) our Java binding that wraps object into function items and triggers the implicit conversion to XDM items by invoking the function item. The same could have been done with array members.

However, maybe there are not really use cases left for which an additional parcel/function/record representation is still required:

Thanks to atomization, arrays can already be supplied to functions that expect atomic items.
With for member $m, we can iterate over array members.
With pin/label, array members can be decorated.
Thanks to array axes, we get results in a flat or structured way.

I think we can already be happy if the newly added concepts are utilized by a considerable number of people. Maybe we should be careful not to overdo it, and maybe we should continue to appreciate the laudable 3.1 concepts of arrays (such as the powerful implicit atomization).

ChristianGruen · 2024-04-26T15:31:54Z

If we drop array:members, we could rename array:split to array:members:

map:merge(map:entries($map))
array:join(array:members($array))

This would feel intuitive to me, as the current spec terminology (as far as I can judge) regards array members as the counterparts of map entries.

michaelhkay · 2024-04-26T15:49:53Z

Indeed, the two function pairs array:split/join and array:members/of-members differ only in the way that they represent a "parcel", that is, the way they package a sequence as a single item. There's not a big difference, and although the latter pair seems a little more "type-safe" to me (converting an array of sequences to a sequence of arrays can be a bit confusing...), it's probably true that the first pair have a wider range of application, meaning that if either pair goes, it should be the second.

michaelhkay · 2024-04-26T16:38:05Z

Actually, I'm a bit confused about the spec of array:split.

What is the result of array:split([ (1,2), (3,4) ])?

The notes say

The function call array:split($array) produces the same result as the expression for member $m in $array return [ $m ].

Which makes the result ( [(1, 2)], [(3, 4)] ).

But this is only in the notes, and the actual rules are very informal; and none of the examples makes this clear. The only example that touches on it is the fourth example, and that one would work equally well if the result were ( [1,2], [3,4] ) -- which is probably the result many users would expect.

More examples are needed, and the first note should be moved into the normative rules.

ChristianGruen · 2024-04-26T18:08:15Z

What is the result of array:split([ (1,2), (3,4) ])?
...
Which makes the result ( [(1, 2)], [(3, 4)] ).

Yes, that's supposed to be the result. Otherwise, the result could not be reversed (which is what I wanted to point out with “This function is the inverse of array:join.”):

(: [ 1, 2, 3, 4 ] :) 
array:join([ 1, 2 ],[ 3, 4 ])

But this is only in the notes, and the actual rules are very informal; and none of the examples makes this clear.

True; sorry for that, and thanks for the hints. I still fail to understand what information is required to make the rules comprehensive enough (there are various functions, like array:fold-left, which don't have informal rules at all, possibly due to historical reasons?). I’ll be glad to revise the presentation once we decide which functions we want to keep.

ChristianGruen · 2024-10-16T10:27:06Z

As some of my suggestions in this thread are out of date, I suggest closing it. See #1338 for more recent comments.

ndw · 2024-10-22T16:12:27Z

At meeting 095, the CG agreed to close this issue with no further action.

ChristianGruen added XPath An issue related to XPath XQFO An issue related to Functions and Operators Enhancement A change or improvement to an existing feature labels Apr 24, 2024

ChristianGruen mentioned this issue Apr 24, 2024

1086 Editorial changes to array:values #1087

Merged

ChristianGruen mentioned this issue Apr 30, 2024

Editorial: array:values, map:values #1179

Closed

ndw added PRG-hard Categorized as "hard" at the Prague f2f, 2024 PRG-required Categorized as "required for 4.0" at the Prague f2f, 2024 labels Jun 5, 2024

ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Oct 16, 2024

ndw closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maps & Arrays: Consistency & Terminology #1169

Maps & Arrays: Consistency & Terminology #1169

ChristianGruen commented Apr 24, 2024 •

edited

Loading

michaelhkay commented Apr 24, 2024

ChristianGruen commented Apr 24, 2024

michaelhkay commented Apr 24, 2024 •

edited

Loading

ChristianGruen commented Apr 24, 2024

ChristianGruen commented Apr 26, 2024

michaelhkay commented Apr 26, 2024

michaelhkay commented Apr 26, 2024

ChristianGruen commented Apr 26, 2024

ChristianGruen commented Oct 16, 2024

ndw commented Oct 22, 2024

Maps & Arrays: Consistency & Terminology #1169

Maps & Arrays: Consistency & Terminology #1169

Comments

ChristianGruen commented Apr 24, 2024 • edited Loading

michaelhkay commented Apr 24, 2024

ChristianGruen commented Apr 24, 2024

michaelhkay commented Apr 24, 2024 • edited Loading

ChristianGruen commented Apr 24, 2024

ChristianGruen commented Apr 26, 2024

michaelhkay commented Apr 26, 2024

michaelhkay commented Apr 26, 2024

ChristianGruen commented Apr 26, 2024

ChristianGruen commented Oct 16, 2024

ndw commented Oct 22, 2024

ChristianGruen commented Apr 24, 2024 •

edited

Loading

michaelhkay commented Apr 24, 2024 •

edited

Loading