[lex.charset] Extract universal-character-name grammar to new subclause

AlisdairM · AlisdairM · commit df16469f45c5 · 2024-06-17T10:19:40.000+07:00
The grammar for universal-character-name is oddly sandwiched into the
middle of the subcluase talking about the different character sets used
by the standard.  To improve the flow, extract that grammar into its own
subclause.

In the extraction, I make two other clarifying changes.  First, describe
this new subclause as 'a way to name any element of the of the tranlation
character set using just the basic character set' rather than simply
'a way to name other characters'.  Secondly, remove the 'one of' in the
grammar where there is only one option to choose.
diff --git a/source/lex.tex b/source/lex.tex
@@ -320,11 +320,69 @@
 \end{floattable}
 
 \pnum
-The \grammarterm{universal-character-name} construct provides a way to name
-other characters.
+The \defnadj{basic literal}{character set} consists of
+all characters of the basic character set,
+plus the control characters specified in \tref{lex.charset.literal}.
+
+\begin{floattable}{Additional control characters in the basic literal character set}{lex.charset.literal}{ll}
+\topline
+\ohdrx{2}{character} \\ \capsep
+\ucode{0000} & \uname{null} \\
+\ucode{0007} & \uname{alert} \\
+\ucode{0008} & \uname{backspace} \\
+\ucode{000d} & \uname{carriage return} \\
+\end{floattable}
+
+\pnum
+A \defn{code unit} is an integer value
+of character type\iref{basic.fundamental}.
+Characters in a \grammarterm{character-literal}
+other than a multicharacter or non-encodable character literal or
+in a \grammarterm{string-literal} are encoded as
+a sequence of one or more code units, as determined
+by the \grammarterm{encoding-prefix}\iref{lex.ccon,lex.string};
+this is termed the respective \defnadj{literal}{encoding}.
+The \defnadj{ordinary literal}{encoding} is
+the encoding applied to an ordinary character or string literal.
+The \defnadj{wide literal}{encoding} is the encoding applied
+to a wide character or string literal.
+
+\pnum
+A literal encoding or a locale-specific encoding of one of
+the execution character sets\iref{character.seq}
+encodes each element of the basic literal character set as
+a single code unit with non-negative value,
+distinct from the code unit for any other such element.
+\begin{note}
+A character not in the basic literal character set
+can be encoded with more than one code unit;
+the value of such a code unit can be the same as
+that of a code unit for an element of the basic literal character set.
+\end{note}
+\indextext{character!null}%
+\indextext{wide-character!null}%
+The \unicode{0000}{null} character is encoded as the value \tcode{0}.
+No other element of the translation character set
+is encoded with a code unit of value \tcode{0}.
+The code unit value of each decimal digit character after the digit \tcode{0} (\ucode{0030})
+shall be one greater than the value of the previous.
+The ordinary and wide literal encodings are otherwise
+\impldef{ordinary and wide literal encodings}.
+\indextext{UTF-8}%
+\indextext{UTF-16}%
+\indextext{UTF-32}%
+For a UTF-8, UTF-16, or UTF-32 literal,
+the implementation shall encode
+the Unicode scalar value
+corresponding to each character of the translation character set
+as specified in the Unicode Standard
+for the respective Unicode encoding form.
+\indextext{character set|)}
+
+\rSec1[lex.universal.char]{Universal Character Names}
 
 \begin{bnf}
-\nontermdef{n-char} \textnormal{one of}\br
+\nontermdef{n-char}\br
      \textnormal{any member of the translation character set except the \unicode{007d}{right curly bracket} or new-line character}
 \end{bnf}
 
@@ -358,6 +416,10 @@
     named-universal-character
 \end{bnf}
 
+\pnum
+The \grammarterm{universal-character-name} construct provides a way to name
+any element in the translation character set using just the basic character set.
+
 \pnum
 A \grammarterm{universal-character-name}
 of the form \tcode{\textbackslash u} \grammarterm{hex-quad},
@@ -399,66 +461,6 @@
 \grammarterm{universal-character-name}.
 \end{note}
 
-\pnum
-The \defnadj{basic literal}{character set} consists of
-all characters of the basic character set,
-plus the control characters specified in \tref{lex.charset.literal}.
-
-\begin{floattable}{Additional control characters in the basic literal character set}{lex.charset.literal}{ll}
-\topline
-\ohdrx{2}{character} \\ \capsep
-\ucode{0000} & \uname{null} \\
-\ucode{0007} & \uname{alert} \\
-\ucode{0008} & \uname{backspace} \\
-\ucode{000d} & \uname{carriage return} \\
-\end{floattable}
-
-\pnum
-A \defn{code unit} is an integer value
-of character type\iref{basic.fundamental}.
-Characters in a \grammarterm{character-literal}
-other than a multicharacter or non-encodable character literal or
-in a \grammarterm{string-literal} are encoded as
-a sequence of one or more code units, as determined
-by the \grammarterm{encoding-prefix}\iref{lex.ccon,lex.string};
-this is termed the respective \defnadj{literal}{encoding}.
-The \defnadj{ordinary literal}{encoding} is
-the encoding applied to an ordinary character or string literal.
-The \defnadj{wide literal}{encoding} is the encoding applied
-to a wide character or string literal.
-
-\pnum
-A literal encoding or a locale-specific encoding of one of
-the execution character sets\iref{character.seq}
-encodes each element of the basic literal character set as
-a single code unit with non-negative value,
-distinct from the code unit for any other such element.
-\begin{note}
-A character not in the basic literal character set
-can be encoded with more than one code unit;
-the value of such a code unit can be the same as
-that of a code unit for an element of the basic literal character set.
-\end{note}
-\indextext{character!null}%
-\indextext{wide-character!null}%
-The \unicode{0000}{null} character is encoded as the value \tcode{0}.
-No other element of the translation character set
-is encoded with a code unit of value \tcode{0}.
-The code unit value of each decimal digit character after the digit \tcode{0} (\ucode{0030})
-shall be one greater than the value of the previous.
-The ordinary and wide literal encodings are otherwise
-\impldef{ordinary and wide literal encodings}.
-\indextext{UTF-8}%
-\indextext{UTF-16}%
-\indextext{UTF-32}%
-For a UTF-8, UTF-16, or UTF-32 literal,
-the implementation shall encode
-the Unicode scalar value
-corresponding to each character of the translation character set
-as specified in the Unicode Standard
-for the respective Unicode encoding form.
-\indextext{character set|)}
-
 \rSec1[lex.pptoken]{Preprocessing tokens}
 
 \indextext{token!preprocessing|(}%