Skip to content

Fix ECMA 262 \Z and \z tests #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ssilverman
Copy link
Member

The old test was incorrect because "\Z" is actually valid. It matches a
literal Z in ECMA 262.

@ssilverman ssilverman requested review from Julian and a team August 14, 2020 03:39
},
"tests": [
{
"description": "\\Z should match literal Z",
Copy link
Member

@ChALkeR ChALkeR Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that \Z should match literal Z is always correct, so I would argue that it's not justified here.

\Z (and other unsupported flags) behavior differs in ECMA 262 RegExp depending on the mode.
It's not a valid regexp with u flag:

> /Z/.test('Z')
true
> /\Z/.test('Z')
true
> /Z/u.test('Z')
true
> /\Z/u.test('Z')
Thrown:
SyntaxError: Invalid regular expression: /\Z/: Invalid escape

Moreover, other tests here imply that pattern matching should be run in unicode mode, which is incompatible with this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest the change, then? I was going by regex101.com.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 15, 2020

I don't think that any tests for \Z should be added at all unless the specification clarifies the mode which should be used for ECMA 262 regexps (i.e. Unicode vs non-Unicode), as \Z (and other invalid escapes) behavior depends on the mode.

Moreover, other tests in this repo imply Unicode mode which directly conflicts with this change:

"comment": "Optional because .Net doesn't correctly handle 32-bit Unicode characters",
"schema": { "pattern": "^🐲*$" },

Refs:

  1. JSON Schema Core, 6.4. Regular Expressions.
  2. ECMAScript 2020, 21.2.3.2.2 Runtime Semantics: RegExpInitialize, steps 6-8:
    different U parameter in Pattern parse goal depending on the u regexp flag.
  3. ECMAScript 2020, 21.2.1 Patterns:
    Pattern -> Disjunction -> Alternative -> Term -> Atom -> AtomEscape -> CharacterEscape -> IdentityEscape:
    different allowed chars depending on the U parameter.

\Z is not always valid according to ECMA 262 regular expression dialect, its validity depends on mode.
Demanding it to be always valid here doesn't seem reasonable, I think (and conflicts with other tests).

@ChALkeR
Copy link
Member

ChALkeR commented Aug 15, 2020

That said, a js-based impl can trivially pass both the modified test here and the unicode regex test by trying to use Unicode mode, and if that throws on RegExp construction, using the non-Unicode mode.

And, arguably, that impl should also pass these tests, unless the mode is explicitly clarified in the JSON Schema spec.

Hence I don't think that there should be any tests for \Z at all unless the spec is clarified.

Also note that those tests won't be independent though, i.e. if \Z passes, 🐲* passes, one still wouldn't be able to use \Z🐲*:

> /Z🐲*/u.test('Z')
true
> /\Z🐲*/u.test('Z')
Thrown:
SyntaxError: Invalid regular expression: /\Z🐲*/: Invalid escape
> /\Z🐲*/.test('Z')
false

@ssilverman
Copy link
Member Author

@ChALkeR I agree with everything you're saying, but you know more about JS than I do :)

@ssilverman
Copy link
Member Author

I put back the original test. @ChALkeR feel free to suggest what you think this should look like; I'm happy to change to be what you think is more correct.

@ssilverman ssilverman requested review from ChALkeR and Julian August 15, 2020 05:00
@ssilverman
Copy link
Member Author

I don't think any of these Javascript-specific regex tests are very useful, especially if, as @ChALkeR points out, different modes may need to be used, which makes this very implementation dependent. But... I could be wrong because I'm no Javascript expert. I was just playing around with them on some regex sites and thought the tests could be improved. Maybe not?

The old test was incorrect because "\Z" is actually valid. It matches a
literal Z in ECMA 262.
@Julian
Copy link
Member

Julian commented Aug 15, 2020

If this is ambiguity that needs resolving in the spec, yes, let's axe the relevant tests entirely, get the spec clarified, and then add them to the future draft, similar to #309.

Sound reasonable?

@ssilverman
Copy link
Member Author

Wondering if the "\S" tests are in the same boat?

@Julian
Copy link
Member

Julian commented Aug 16, 2020

Seems likely if they too differ under those two different sections.

@ssilverman
Copy link
Member Author

ssilverman commented Aug 22, 2020

I'm inclined to just remove the "\Z" test instead. See #428

@Julian
Copy link
Member

Julian commented Aug 22, 2020

lgtmed that PR since it sounds like we'd all agree with that one. Obviously folks speak up if that's not the case but it sounds like all these need confirming/clarifying in the spec and then we follow what the decision is.

@ssilverman
Copy link
Member Author

Closing for now in favour of #428

@ssilverman ssilverman closed this Aug 22, 2020
@ssilverman
Copy link
Member Author

I went through the blame history and it turns out this “\Z” test was copied down the chain since Draft-04, and the thinking around regex parsing may have been different.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 23, 2020

@ssilverman Sorry for the delay. Yes, just removing those \Z tests seems as a reasonable solution to me, until the spec is clarified.
I don't think that we can improve those in other way, as expecting either true or false there is controversial.

Re: \S/\s — those are well defined in both cases and should be present.

What differs between the modes is now unknown escapes are treated, e.g. \A does essentially the same thing as \Z — either acts as A in non-unicode more or thows in unicode mode:

> /\A/
/\A/
> /\A/u
Uncaught SyntaxError: Invalid regular expression: /\A/: Invalid escape

\S, on the other hand, is a known escape and denotes a character class (like \d, \D, \s, \S, \w, \W) in both unicode and non-unicode modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants