You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hex escaping is case insensitive. The value may not match a surrogate or illegal character, nor a marker character.
2196
2196
The form `\u{…}` is preferred as it is the same regardless of codepoint length.
2197
2197
2198
-
-**Fixed character classes and escapes**
2198
+
-**Fixed character classes**
2199
2199
2200
-
`\s \S \t \r \n \f \v \\ \$ \d \w \D \W \0`
2200
+
`\s \S \t \r \n \f \v \d \w \D \W`
2201
2201
2202
2202
The value of these classes do not change with Unicode versions.
2203
2203
2204
2204
`\s` for example is exactly `[\f\n\r\t\v\u{00a0}\u{1680}\u{2000}-\u{200a}\u{2028}\u{2029}\u{202f}\u{205f}\u{3000}\u{feff}]`
2205
2205
2206
-
`\\` and `\$` evaluate to `\` and `$`, respectively.
2206
+
-**Escapes**
2207
+
2208
+
`\$ \( \) \* \+ \. \/ \? \[ \\ \] \^ \{ \| \}`
2209
+
2210
+
For example, `\\`, `\*`, and `\$` match `\`, `*`, and `$`, respectively.
2211
+
2212
+
Some of these characters (such as `*`) aren't actually used as syntax in the keyboard transform syntax.
2213
+
However, they are required to be escaped in keyboard transforms, to avoid confusion or problems with characters which are syntax in regular expressions.
2214
+
2215
+
Sequences not listed here as **Fixed Character Classes** nor as **Escapes** are disallowed.
2216
+
For example, `\0` (octal escape) and `\1` (backreference) are not allowed.
2217
+
`\a` is not defined as a character class and is also disallowed.
- Warning: Character classes look superficially similar to [`uset`](#element-uset) elements, but they are distinct and referenced with the `$[...usetId]` notation in transforms. The `uset` notation cannot be embedded directly in a transform.
2223
+
If the character class begins with a caret (`^`) then it is a negation, matching all characters except for those listed.
2224
+
2225
+
Unicode properties such as `\p{…}` are not allowed.
2226
+
2227
+
One additional escape is allowed within character classes besides those listed above: `\-`, for escaping the hyphen character.
2228
+
2229
+
**Note**: Character classes look superficially similar to [`uset`](#element-uset) elements, but they are distinct and referenced with the `$[...usetId]` notation in transforms. The `uset` notation cannot be embedded directly in a transform.
2215
2230
2216
2231
-**Bounded quantifier**
2217
2232
2218
2233
`{x,y}`
2219
2234
2220
-
`x` and `y` are required single digits representing the minimum and maximum number of occurrences.
2221
-
`x` must be ≥ 0, `y` must be ≥ x and ≥ 1
2235
+
`x` and `y` are required single digits (`1` to `9`) representing the minimum and maximum number of occurrences.
2236
+
2237
+
`x` must be ≥ 0, `y` must be ≥ x and ≥ 1.
2238
+
2239
+
Unbounded quantifiers such as `{3,}` are not allowed.
Not long ago, computer systems were like separate worlds, isolated from one another. The internet and related events have changed all that. A single system can be built of many different components, hardware and software, all needing to work together. Many different technologies have been important in bridging the gaps; in the internationalization arena, Unicode has provided a lingua franca for communicating textual data. However, there remain differences in the locale data used by different systems.
231
+
Not long ago, computer systems were like separate worlds, isolated from one another. The internet and related events have changed all that. A single system can be built of many different compFonents, hardware and software, all needing to work together. Many different technologies have been important in bridging the gaps; in the internationalization arena, Unicode has provided a lingua franca for communicating textual data. However, there remain differences in the locale data used by different systems.
231
232
232
233
The best practice for internationalization is to store and communicate language-neutral data, and format that data for the client. This formatting can take place on any of a number of the components in a system; a server might format data based on the user's locale, or it could be that a client machine does the formatting. The same goes for parsing data, and locale-sensitive analysis of data.
233
234
@@ -4356,6 +4357,10 @@ Other contributors to CLDR are listed on the [CLDR Project Page](https://www.uni
4356
4357
- Added a [Time Precision](tr35-dates#Time_Precision) option, replacing the discrete time field sets.
4357
4358
- Updated the algorithm for mapping to standard skeletons.
4358
4359
4360
+
### Improvements to Keyboard Transforms
4361
+
- Added rigorous EBNF definition of transform syntax to [Transform Grammar](tr35-keyboards.md#transform-grammar).
4362
+
- Corrected and clarified escaping behavior under [Regex-like Syntax](tr35-keyboards.md#regex-like-syntax)
4363
+
4359
4364
**Changes in LDML Version 46.1 (Differences from Version 46)**
0 commit comments