forked from cplusplus/draft
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathuax31.tex
132 lines (102 loc) · 4.33 KB
/
uax31.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
%!TEX root = std.tex
\infannex{uaxid}{Conformance with \UAX{31}}
\rSec1[uaxid.general]{General}
\pnum
This Annex describes the choices made in application of
\UAX{31} (``Unicode Identifier and Pattern Syntax'')
to \Cpp{} in terms of the requirements from \UAX{31} and
how they do or do not apply to this document.
In terms of \UAX{31},
this document conforms by meeting the requirements
R1 ``Default Identifiers'' and
R4 ``Equivalent Normalized Identifiers'' from \UAX{31}.
The other requirements from \UAX{31}, also listed below,
are either alternatives not taken or do not apply to this document.
\rSec1[uaxid.def]{R1 Default identifiers}
\rSec2[uaxid.def.general]{General}
\pnum
\UAX{31} specifies a default syntax for identifiers
based on properties from the Unicode Character Database, \UAX{44}.
The general syntax is
\begin{codeblock}
<Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*
\end{codeblock}
where \tcode{<Start>} has the XID_Start property,
\tcode{<Continue>} has the XID_Continue property, and
\tcode{<Medial>} is a list of characters permitted between continue characters.
For \Cpp{} we add the character \unicode{005f}{low line}, or \tcode{_},
to the set of permitted \tcode{<Start>} characters,
the \tcode{<Medial>} set is empty, and
the \tcode{<Continue>} characters are unmodified.
In the grammar used in \UAX{31}, this is
\begin{codeblock}
<Identifier> := <Start> <Continue>*
<Start> := XID_Start + @\textrm{\ucode{005f}}@
<Continue> := <Start> + XID_Continue
\end{codeblock}
\pnum
This is described in the \Cpp{} grammar in \ref{lex.name},
where \grammarterm{identifier} is formed from
\grammarterm{identifier-start} or
\grammarterm{identifier} followed by \grammarterm{identifier-continue}.
\rSec2[uaxid.def.rfmt]{R1a Restricted format characters}
\pnum
If an implementation of \UAX{31} wishes to allow format characters
such as \unicode{200d}{zero width joiner} or \unicode{200c}{zero width non-joiner}
it must define a profile allowing them, or
describe precisely which combinations are permitted.
\pnum
\Cpp{} does not allow format characters in identifiers, so this does not apply.
\rSec2[uaxid.def.stable]{R1b Stable identifiers}
\pnum
An implementation of \UAX{31} may choose to guarantee
that identifiers are stable across versions of the Unicode Standard.
Once a string qualifies as an identifier it does so in all future versions.
\pnum
\Cpp{} does not make this guarantee,
except to the extent that \UAX{31} guarantees
the stability of the XID_Start and XID_Continue properties.
\rSec1[uaxid.immutable]{R2 Immutable identifiers}
\pnum
An implementation may choose to guarantee that
the set of identifiers will never change
by fixing the set of code points allowed in identifiers forever.
\pnum
\Cpp{} does not choose to make this guarantee.
As scripts are added to Unicode,
additional characters in those scripts may become available
for use in identifiers.
\rSec1[uaxid.pattern]{R3 Pattern_White_Space and Pattern_Syntax characters}
\pnum
\UAX{31} describes how formal languages
such as computer languages should describe and implement
their use of whitespace and syntactically significant characters
during the processes of lexing and parsing.
\pnum
This document does not claim conformance with this requirement from \UAX{31}.
\rSec1[uaxid.eqn]{R4 Equivalent normalized identifiers}
\pnum
\UAX{31} requires that implementations describe
how identifiers are compared and considered equivalent.
\pnum
This document requires that identifiers be in Normalization Form C and
therefore identifiers that compare the same under NFC are equivalent.
This is described in \ref{lex.name}.
\rSec1[uaxid.eqci]{R5 Equivalent case-insensitive identifiers}
\pnum
This document considers case to be significant in identifier comparison, and
does not do any case folding.
This requirement from \UAX{31} does not apply to this document.
\rSec1[uaxid.filter]{R6 Filtered normalized identifiers}
\pnum
If any characters are excluded from normalization,
\UAX{31} requires a precise specification of those exclusions.
\pnum
This document does not make any such exclusions.
\rSec1[uaxid.filterci]{R7 Filtered case-insensitive identifiers}
\pnum
\Cpp{} identifiers are case sensitive, and
therefore this requirement from \UAX{31} does not apply.
\rSec1[uaxid.hashtag]{R8 Hashtag identifiers}
\pnum
There are no hashtags in \Cpp{}, so this requirement from \UAX{31} does not apply.