Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that \R matches UTS#18 definition of line boundaries #1

Merged
merged 1 commit into from
Aug 28, 2021

Conversation

mathiasbynens
Copy link
Contributor

This doesn’t affect what’s being proposed, but it does make it explicit that this matches UTS#18. This seems nice, especially since [the RegExp v flag proposal has a goal of matching UTS#18 more and more](see tc39/proposal-regexp-v-flag#42 and in particular tc39/proposal-regexp-v-flag#43), and this kind of directional alignment between proposals seems desirable.

Copy link

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 tnx

@RunDevelopment
Copy link

I have a question regarding the equivalent regex.

From what I understand, the goal of UTS#18 Line Boundaries section is to say that CRLF should be treated as if it was a single character. However, I believe that neither the current equivalent regex in this proposal ((?>\r\n?|[\x0A-\x0C\x85\u{2028}\u{2029}])) nor the one in UTF#18 ((?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}])) fulfill that goal.

The problem I see is that they still match in between CR and LF (see tc39/proposal-regexp-v-flag#42). Consider /^\r\R$/u.test("\r\n"). According to the current proposal and UTS#18, this will return true. I think this is inconsistent with the behavior of line boundary assertions (^$) in UTS#18 where the position in between CR and LF is explicitly accounted for.

Shouldn't the equivalent regexes for this proposal and UTS#18 be (?>\r\n?|(?<!\r)\n|[\x0B-\x0C\x85\u{2028}\u{2029}]) and (?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}](?<!\u{D A})) respectively?

@rbuckton
Copy link
Owner

UTS#18 affects how ^, $, and . match as well. I would argue that \R should be consistent with how ^ and $ (at least) match depending on mode. If ^ or $ can match in between a CRLF with the u flag, but not with the v flag, then perhaps \R should match the behavior in each mode (i.e., allow matching in between CRLF in u mode but not in v mode). That said, \R is a new construct under either mode, so we could diverge from ^ and $ if that's reasonable.

@rbuckton
Copy link
Owner

@RunDevelopment I've referenced your comment in #2 so that it can be discussed further outside of this PR.

@rbuckton rbuckton merged commit 0aae147 into rbuckton:main Aug 28, 2021
@mathiasbynens mathiasbynens deleted the patch-1 branch August 31, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants