diff --git a/spec.html b/spec.html index b272c6b8ab9..1455d2114d3 100644 --- a/spec.html +++ b/spec.html @@ -588,7 +588,6 @@

Terminal Symbols

In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.

Terminal symbols come in two other forms:

@@ -16277,179 +16276,48 @@

Syntax

Unicode Format-Control Characters

The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.

-

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see ) outside of comments, string literals, template literals, and regular expression literals.

+

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. These characters can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text, they are treated as white space characters (see ) outside of comments, string literals, template literals, and regular expression literals.

- +

White Space

White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a |StringLiteral|, a |RegularExpressionLiteral|, a |Template|, or a |TemplateSubstitutionTail| where they are considered significant code points forming part of a literal value. They may also occur within a |Comment|, but cannot appear within any other kind of token.

-

The ECMAScript white space code points are listed in .

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Code Points - - Name - - Abbreviation -
- `U+0009` - - CHARACTER TABULATION - - <TAB> -
- `U+000B` - - LINE TABULATION - - <VT> -
- `U+000C` - - FORM FEED (FF) - - <FF> -
- `U+FEFF` - - ZERO WIDTH NO-BREAK SPACE - - <ZWNBSP> -
- any code point in general category “Space_Separator” - - - <USP> -
-
- -

U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) code points are part of <USP>.

-
- -

Other than for the code points listed in , ECMAScript |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

-

Syntax

WhiteSpace :: - <TAB> - <VT> - <FF> - <ZWNBSP> - <USP> + <U+0009 (CHARACTER TABULATION)> + <U+000B (LINE TABULATION)> + <U+000C (FORM FEED)> + <U+0020 (SPACE)> + <U+00A0 (NO-BREAK SPACE)> + <U+FEFF (ZERO WIDTH NO-BREAK SPACE)> + > any code point with the Unicode General_Category “Space_Separator” + +

Other than for some of the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).

+
- +

Line Terminators

-

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. <LF> and <CR> line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

+

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. U+000A (LINE FEED) and U+000D (CARRIAGE RETURN) line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.

Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.

-

The ECMAScript line terminator code points are listed in .

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Code Point - - Unicode Name - - Abbreviation -
- `U+000A` - - LINE FEED (LF) - - <LF> -
- `U+000D` - - CARRIAGE RETURN (CR) - - <CR> -
- `U+2028` - - LINE SEPARATOR - - <LS> -
- `U+2029` - - PARAGRAPH SEPARATOR - - <PS> -
-
-

Only the Unicode code points in are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they meet the requirements listed in . The sequence <CR><LF> is commonly used as a line terminator. It should be considered a single |SourceCharacter| for the purpose of reporting line numbers.

+

Only the Unicode code point sequences matched by |LineTerminatorSequence| are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they are matched by |WhiteSpace|. The sequence « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » is commonly used as a line terminator. It should be considered a single |SourceCharacter| for the purpose of reporting line numbers.

Syntax

LineTerminator :: - <LF> - <CR> - <LS> - <PS> + <U+000A (LINE FEED)> + <U+000D (CARRIAGE RETURN)> + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> LineTerminatorSequence :: - <LF> - <CR> [lookahead != <LF>] - <LS> - <PS> - <CR> <LF> + <U+000A (LINE FEED)> + <U+000D (CARRIAGE RETURN)> [lookahead != <U+000A (LINE FEED)>] + <U+2028 (LINE SEPARATOR)> + <U+2029 (PARAGRAPH SEPARATOR)> + <U+000D (CARRIAGE RETURN)> <U+000A (LINE FEED)>
@@ -16560,10 +16428,10 @@

Syntax

`A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` UnicodeIDStart :: - > any Unicode code point with the Unicode property “ID_Start” + > any code point with the Unicode property “ID_Start” UnicodeIDContinue :: - > any Unicode code point with the Unicode property “ID_Continue” + > any code point with the Unicode property “ID_Continue”

The definitions of the nonterminal |UnicodeEscapeSequence| is given in .

@@ -17131,7 +16999,7 @@

Syntax

The definition of the nonterminal |HexDigit| is given in . |SourceCharacter| is defined in .

-

<LF> and <CR> cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.

+

U+000A (LINE FEED) and U+000D (CARRIAGE RETURN) cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n`, `\\x0A`, or `\\u{A}`.

@@ -17202,23 +17070,17 @@

Static Semantics: SV ( ): a String

The SV of EscapeSequence :: `0` is the String value consisting of the code unit 0x0000 (NULL).
  • - The SV of CharacterEscapeSequence :: SingleEscapeCharacter is the String value consisting of the code unit whose numeric value is determined by the |SingleEscapeCharacter| according to . + The SV of CharacterEscapeSequence :: SingleEscapeCharacter is the String value consisting of the single code unit associated with |SingleEscapeCharacter| according to .
  • - - @@ -17226,13 +17088,7 @@

    Static Semantics: SV ( ): a String

    `\\b` - - @@ -17240,13 +17096,7 @@

    Static Semantics: SV ( ): a String

    `\\t` - - @@ -17254,13 +17104,7 @@

    Static Semantics: SV ( ): a String

    `\\n` - - @@ -17268,13 +17112,7 @@

    Static Semantics: SV ( ): a String

    `\\v` - - @@ -17282,13 +17120,7 @@

    Static Semantics: SV ( ): a String

    `\\f` - - @@ -17296,13 +17128,7 @@

    Static Semantics: SV ( ): a String

    `\\r` - - @@ -17310,13 +17136,7 @@

    Static Semantics: SV ( ): a String

    `\\"` - - @@ -17324,13 +17144,7 @@

    Static Semantics: SV ( ): a String

    `\\'` - - @@ -17338,13 +17152,7 @@

    Static Semantics: SV ( ): a String

    `\\\\` - -
    - Escape Sequence - - Code Unit Value - - Unicode Character Name + |SingleEscapeCharacter| - Symbol + Code Unit
    - `0x0008` - - BACKSPACE - - <BS> + 0x0008 (BACKSPACE)
    - `0x0009` - - CHARACTER TABULATION - - <HT> + 0x0009 (CHARACTER TABULATION)
    - `0x000A` - - LINE FEED (LF) - - <LF> + 0x000A (LINE FEED)
    - `0x000B` - - LINE TABULATION - - <VT> + 0x000B (LINE TABULATION)
    - `0x000C` - - FORM FEED (FF) - - <FF> + 0x000C (FORM FEED)
    - `0x000D` - - CARRIAGE RETURN (CR) - - <CR> + 0x000D (CARRIAGE RETURN)
    - `0x0022` - - QUOTATION MARK - - `"` + 0x0022 (QUOTATION MARK)
    - `0x0027` - - APOSTROPHE - - `'` + 0x0027 (APOSTROPHE)
    - `0x005C` - - REVERSE SOLIDUS - - `\\` + 0x005C (REVERSE SOLIDUS)
    @@ -17719,7 +17527,7 @@

    Static Semantics: TRV ( ): a String

    -

    TV excludes the code units of |LineContinuation| while TRV includes them. <CR><LF> and <CR> |LineTerminatorSequence|s are normalized to <LF> for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a <CR> or <CR><LF> sequence.

    +

    TV excludes the code units of |LineContinuation| while TRV includes them. « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » and « U+000D (CARRIAGE RETURN) » |LineTerminatorSequence|s are normalized to « U+000A (LINE FEED) » for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a « U+000D (CARRIAGE RETURN) » or « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » sequence.

    @@ -33126,42 +32934,14 @@

    Expanded Years

    - +

    Time Zone Offset String Format

    ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601. The format is described by the following grammar. - The usage of Unicode code points in this grammar is listed in .

    - - - - - - - - - - - - -
    - Code Point - - Unicode Name - - Abbreviation -
    - `U+2212` - - MINUS SIGN - - <MINUS> -
    -
    -

    Syntax

    UTCOffset ::: @@ -33170,11 +32950,9 @@

    Syntax

    TemporalSign Hour HourSubcomponents[~Extended] TemporalSign ::: - ASCIISign - <MINUS> - - ASCIISign ::: one of - `+` `-` + `+` + `-` + <U+2212 (MINUS SIGN)> Hour ::: `0` DecimalDigit @@ -35942,42 +35720,24 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    CharacterEscape :: ControlEscape - 1. Return the numeric value according to . + 1. Return the numeric value of the code point associated with |ControlEscape| in . - - - - - - @@ -35985,16 +35745,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `n` - - - @@ -36002,16 +35753,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `v` - - - @@ -36019,16 +35761,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `f` - - - @@ -36036,16 +35769,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    `r` - - -
    - ControlEscape - - Numeric Value + |ControlEscape| Code Point - Unicode Name - - Symbol -
    `t` - 9 - - `U+0009` - - CHARACTER TABULATION - - <HT> + U+0009 (CHARACTER TABULATION)
    - 10 - - `U+000A` - - LINE FEED (LF) - - <LF> + U+000A (LINE FEED)
    - 11 - - `U+000B` - - LINE TABULATION - - <VT> + U+000B (LINE TABULATION)
    - 12 - - `U+000C` - - FORM FEED (FF) - - <FF> + U+000C (FORM FEED)
    - 13 - - `U+000D` - - CARRIAGE RETURN (CR) - - <CR> + U+000D (CARRIAGE RETURN)
    @@ -36061,7 +35785,7 @@

    Static Semantics: CharacterValue ( ): a non-negative integer

    1. Return the numeric value of U+0000 (NULL). -

    `\\0` represents the <NUL> character and cannot be followed by a decimal digit.

    +

    `\\0` represents U+0000 (NULL) and cannot be followed by a decimal digit.

    CharacterEscape :: HexEscapeSequence @@ -49631,7 +49355,6 @@

    Number Conversions

    Time Zone Offset String Format

    -