Fix ECMA 262 \Z and \z tests #427

ssilverman · 2020-08-14T03:39:09Z

The old test was incorrect because "\Z" is actually valid. It matches a
literal Z in ECMA 262.

tests/draft2019-09/optional/ecmascript-regex.json

ChALkeR · 2020-08-15T03:38:05Z

tests/draft2019-09/optional/ecmascript-regex.json

+        },
+        "tests": [
+            {
+                "description": "\\Z should match literal Z",


I don't think that \Z should match literal Z is always correct, so I would argue that it's not justified here.

\Z (and other unsupported flags) behavior differs in ECMA 262 RegExp depending on the mode.
It's not a valid regexp with u flag:

> /Z/.test('Z') true > /\Z/.test('Z') true > /Z/u.test('Z') true > /\Z/u.test('Z') Thrown: SyntaxError: Invalid regular expression: /\Z/: Invalid escape

Moreover, other tests here imply that pattern matching should be run in unicode mode, which is incompatible with this test.

Can you suggest the change, then? I was going by regex101.com.

ChALkeR · 2020-08-15T03:42:22Z

I don't think that any tests for \Z should be added at all unless the specification clarifies the mode which should be used for ECMA 262 regexps (i.e. Unicode vs non-Unicode), as \Z (and other invalid escapes) behavior depends on the mode.

Moreover, other tests in this repo imply Unicode mode which directly conflicts with this change:

JSON-Schema-Test-Suite/tests/draft2019-09/optional/non-bmp-regex.json

Lines 4 to 5 in fce9e9b

    
           "comment": "Optional because .Net doesn't correctly handle 32-bit Unicode characters", 
        
           "schema": { "pattern": "^🐲*$" },

Refs:

JSON Schema Core, 6.4. Regular Expressions.
ECMAScript 2020, 21.2.3.2.2 Runtime Semantics: RegExpInitialize, steps 6-8:
different U parameter in Pattern parse goal depending on the u regexp flag.
ECMAScript 2020, 21.2.1 Patterns:
Pattern -> Disjunction -> Alternative -> Term -> Atom -> AtomEscape -> CharacterEscape -> IdentityEscape:
different allowed chars depending on the U parameter.

\Z is not always valid according to ECMA 262 regular expression dialect, its validity depends on mode.
Demanding it to be always valid here doesn't seem reasonable, I think (and conflicts with other tests).

ChALkeR · 2020-08-15T04:19:13Z

That said, a js-based impl can trivially pass both the modified test here and the unicode regex test by trying to use Unicode mode, and if that throws on RegExp construction, using the non-Unicode mode.

And, arguably, that impl should also pass these tests, unless the mode is explicitly clarified in the JSON Schema spec.

Hence I don't think that there should be any tests for \Z at all unless the spec is clarified.

Also note that those tests won't be independent though, i.e. if \Z passes, 🐲* passes, one still wouldn't be able to use \Z🐲*:

> /Z🐲*/u.test('Z')
true
> /\Z🐲*/u.test('Z')
Thrown:
SyntaxError: Invalid regular expression: /\Z🐲*/: Invalid escape
> /\Z🐲*/.test('Z')
false

ssilverman · 2020-08-15T04:52:35Z

@ChALkeR I agree with everything you're saying, but you know more about JS than I do :)

ssilverman · 2020-08-15T05:00:20Z

I put back the original test. @ChALkeR feel free to suggest what you think this should look like; I'm happy to change to be what you think is more correct.

ssilverman · 2020-08-15T05:02:34Z

I don't think any of these Javascript-specific regex tests are very useful, especially if, as @ChALkeR points out, different modes may need to be used, which makes this very implementation dependent. But... I could be wrong because I'm no Javascript expert. I was just playing around with them on some regex sites and thought the tests could be improved. Maybe not?

The old test was incorrect because "\Z" is actually valid. It matches a literal Z in ECMA 262.

Julian · 2020-08-15T18:30:18Z

If this is ambiguity that needs resolving in the spec, yes, let's axe the relevant tests entirely, get the spec clarified, and then add them to the future draft, similar to #309.

Sound reasonable?

ssilverman · 2020-08-15T19:15:21Z

Wondering if the "\S" tests are in the same boat?

Julian · 2020-08-16T16:14:05Z

Seems likely if they too differ under those two different sections.

ssilverman · 2020-08-22T17:37:16Z

I'm inclined to just remove the "\Z" test instead. See #428

Julian · 2020-08-22T18:07:30Z

lgtmed that PR since it sounds like we'd all agree with that one. Obviously folks speak up if that's not the case but it sounds like all these need confirming/clarifying in the spec and then we follow what the decision is.

ssilverman · 2020-08-22T19:18:42Z

Closing for now in favour of #428

ssilverman · 2020-08-22T19:41:57Z

I went through the blame history and it turns out this “\Z” test was copied down the chain since Draft-04, and the thinking around regex parsing may have been different.

ChALkeR · 2020-08-23T13:09:12Z

@ssilverman Sorry for the delay. Yes, just removing those \Z tests seems as a reasonable solution to me, until the spec is clarified.
I don't think that we can improve those in other way, as expecting either true or false there is controversial.

Re: \S/\s — those are well defined in both cases and should be present.

What differs between the modes is now unknown escapes are treated, e.g. \A does essentially the same thing as \Z — either acts as A in non-unicode more or thows in unicode mode:

> /\A/
/\A/
> /\A/u
Uncaught SyntaxError: Invalid regular expression: /\A/: Invalid escape

\S, on the other hand, is a known escape and denotes a character class (like \d, \D, \s, \S, \w, \W) in both unicode and non-unicode modes.

ssilverman requested review from Julian and a team August 14, 2020 03:39

notEthan reviewed Aug 14, 2020

View reviewed changes

tests/draft2019-09/optional/ecmascript-regex.json Outdated Show resolved Hide resolved

notEthan reviewed Aug 14, 2020

View reviewed changes

tests/draft2019-09/optional/ecmascript-regex.json Show resolved Hide resolved

ssilverman force-pushed the ecma-z branch from a6a0d0e to 2603133 Compare August 14, 2020 08:33

Julian reviewed Aug 14, 2020

View reviewed changes

tests/draft2019-09/optional/ecmascript-regex.json Show resolved Hide resolved

ChALkeR requested changes Aug 15, 2020

View reviewed changes

ssilverman force-pushed the ecma-z branch from 2603133 to ae1d21f Compare August 15, 2020 04:57

ssilverman requested review from ChALkeR and Julian August 15, 2020 05:00

Fix ECMA 262 \Z and \z tests

5e785c1

The old test was incorrect because "\Z" is actually valid. It matches a literal Z in ECMA 262.

ssilverman force-pushed the ecma-z branch from ae1d21f to 5e785c1 Compare August 15, 2020 16:37

ssilverman closed this Aug 22, 2020

Uh oh!

Fix ECMA 262 \Z and \z tests #427

Fix ECMA 262 \Z and \z tests #427

Uh oh!

Conversation

ssilverman commented Aug 14, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChALkeR Aug 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssilverman Aug 15, 2020

Choose a reason for hiding this comment

Uh oh!

ChALkeR commented Aug 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChALkeR commented Aug 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssilverman commented Aug 15, 2020

Uh oh!

ssilverman commented Aug 15, 2020

Uh oh!

ssilverman commented Aug 15, 2020

Uh oh!

Julian commented Aug 15, 2020

Uh oh!

ssilverman commented Aug 15, 2020

Uh oh!

Julian commented Aug 16, 2020

Uh oh!

ssilverman commented Aug 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Julian commented Aug 22, 2020

Uh oh!

ssilverman commented Aug 22, 2020

Uh oh!

ssilverman commented Aug 22, 2020

Uh oh!

ChALkeR commented Aug 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ChALkeR Aug 15, 2020 •

edited

Loading

ChALkeR commented Aug 15, 2020 •

edited

Loading

ChALkeR commented Aug 15, 2020 •

edited

Loading

ssilverman commented Aug 22, 2020 •

edited

Loading

ChALkeR commented Aug 23, 2020 •

edited

Loading