diff --git a/spec.html b/spec.html index b272c6b8ab9..1455d2114d3 100644 --- a/spec.html +++ b/spec.html @@ -588,7 +588,6 @@
In contrast, in the syntactic grammar, a contiguous run of fixed-width code points is a single terminal symbol.
Terminal symbols come in two other forms:
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.
-U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. These characters can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text, they are treated as white space characters (see
White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a |StringLiteral|, a |RegularExpressionLiteral|, a |Template|, or a |TemplateSubstitutionTail| where they are considered significant code points forming part of a literal value. They may also occur within a |Comment|, but cannot appear within any other kind of token.
-The ECMAScript white space code points are listed in
- Code Points - | -- Name - | -- Abbreviation - | -
---|---|---|
- `U+0009` - | -- CHARACTER TABULATION - | -- <TAB> - | -
- `U+000B` - | -- LINE TABULATION - | -- <VT> - | -
- `U+000C` - | -- FORM FEED (FF) - | -- <FF> - | -
- `U+FEFF` - | -- ZERO WIDTH NO-BREAK SPACE - | -- <ZWNBSP> - | -
- any code point in general category “Space_Separator” - | -- | -- <USP> - | -
U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) code points are part of <USP>.
-Other than for the code points listed in
Other than for some of the code points listed as explicit alternatives in |WhiteSpace|, |WhiteSpace| intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”).
+Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (
Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (
A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.
Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.
-The ECMAScript line terminator code points are listed in
- Code Point - | -- Unicode Name - | -- Abbreviation - | -
---|---|---|
- `U+000A` - | -- LINE FEED (LF) - | -- <LF> - | -
- `U+000D` - | -- CARRIAGE RETURN (CR) - | -- <CR> - | -
- `U+2028` - | -- LINE SEPARATOR - | -- <LS> - | -
- `U+2029` - | -- PARAGRAPH SEPARATOR - | -- <PS> - | -
Only the Unicode code points in
Only the Unicode code point sequences matched by |LineTerminatorSequence| are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they are matched by |WhiteSpace|. The sequence « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » is commonly used as a line terminator. It should be considered a single |SourceCharacter| for the purpose of reporting line numbers.
The definitions of the nonterminal |UnicodeEscapeSequence| is given in
The definition of the nonterminal |HexDigit| is given in
<LF> and <CR> cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.
+U+000A (LINE FEED) and U+000D (CARRIAGE RETURN) cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n`, `\\x0A`, or `\\u{A}`.
- Escape Sequence - | -- Code Unit Value - | -- Unicode Character Name + |SingleEscapeCharacter| | - Symbol + Code Unit |
---|---|---|---|
- `0x0008` - | -- BACKSPACE - | -- <BS> + 0x0008 (BACKSPACE) | |
- `0x0009` - | -- CHARACTER TABULATION - | -- <HT> + 0x0009 (CHARACTER TABULATION) | |
- `0x000A` - | -- LINE FEED (LF) - | -- <LF> + 0x000A (LINE FEED) | |
- `0x000B` - | -- LINE TABULATION - | -- <VT> + 0x000B (LINE TABULATION) | |
- `0x000C` - | -- FORM FEED (FF) - | -- <FF> + 0x000C (FORM FEED) | |
- `0x000D` - | -- CARRIAGE RETURN (CR) - | -- <CR> + 0x000D (CARRIAGE RETURN) | |
- `0x0022` - | -- QUOTATION MARK - | -- `"` + 0x0022 (QUOTATION MARK) | |
- `0x0027` - | -- APOSTROPHE - | -- `'` + 0x0027 (APOSTROPHE) | |
- `0x005C` - | -- REVERSE SOLIDUS - | -- `\\` + 0x005C (REVERSE SOLIDUS) |
TV excludes the code units of |LineContinuation| while TRV includes them. <CR><LF> and <CR> |LineTerminatorSequence|s are normalized to <LF> for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a <CR> or <CR><LF> sequence.
+TV excludes the code units of |LineContinuation| while TRV includes them. « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » and « U+000D (CARRIAGE RETURN) » |LineTerminatorSequence|s are normalized to « U+000A (LINE FEED) » for both TV and TRV. An explicit |TemplateEscapeSequence| is needed to include a « U+000D (CARRIAGE RETURN) » or « U+000D (CARRIAGE RETURN), U+000A (LINE FEED) » sequence.
ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601.
The format is described by the following grammar.
- The usage of Unicode code points in this grammar is listed in
- Code Point - | -- Unicode Name - | -- Abbreviation - | -
---|---|---|
- `U+2212` - | -- MINUS SIGN - | -- <MINUS> - | -
- ControlEscape - | -- Numeric Value + |ControlEscape| | Code Point | -- Unicode Name - | -- Symbol - |
---|---|---|---|---|
`t` | - 9 - | -- `U+0009` - | -- CHARACTER TABULATION - | -- <HT> + U+0009 (CHARACTER TABULATION) |
- 10 - | -- `U+000A` - | -- LINE FEED (LF) - | -- <LF> + U+000A (LINE FEED) | |
- 11 - | -- `U+000B` - | -- LINE TABULATION - | -- <VT> + U+000B (LINE TABULATION) | |
- 12 - | -- `U+000C` - | -- FORM FEED (FF) - | -- <FF> + U+000C (FORM FEED) | |
- 13 - | -- `U+000D` - | -- CARRIAGE RETURN (CR) - | -- <CR> + U+000D (CARRIAGE RETURN) |
`\\0` represents the <NUL> character and cannot be followed by a decimal digit.
+`\\0` represents U+0000 (NULL) and cannot be followed by a decimal digit.