Skip to content

Commit 6a5e512

Browse files
committed
CLDR-18318 kbd: escaping: update to spec
- also added modification section
1 parent 54e3b84 commit 6a5e512

File tree

2 files changed

+32
-9
lines changed

2 files changed

+32
-9
lines changed

docs/ldml/tr35-keyboards.md

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2195,30 +2195,48 @@ _Attribute:_ `from` (required)
21952195
The hex escaping is case insensitive. The value may not match a surrogate or illegal character, nor a marker character.
21962196
The form `\u{…}` is preferred as it is the same regardless of codepoint length.
21972197

2198-
- **Fixed character classes and escapes**
2198+
- **Fixed character classes**
21992199

2200-
`\s \S \t \r \n \f \v \\ \$ \d \w \D \W \0`
2200+
`\s \S \t \r \n \f \v \d \w \D \W`
22012201

22022202
The value of these classes do not change with Unicode versions.
22032203

22042204
`\s` for example is exactly `[\f\n\r\t\v\u{00a0}\u{1680}\u{2000}-\u{200a}\u{2028}\u{2029}\u{202f}\u{205f}\u{3000}\u{feff}]`
22052205

2206-
`\\` and `\$` evaluate to `\` and `$`, respectively.
2206+
- **Escapes**
2207+
2208+
`\$ \( \) \* \+ \. \/ \? \[ \\ \] \^ \{ \| \}`
2209+
2210+
For example, `\\`, `\*`, and `\$` match `\`, `*`, and `$`, respectively.
2211+
2212+
Some of these characters (such as `*`) aren't actually used as syntax in the keyboard transform syntax.
2213+
However, they are required to be escaped in keyboard transforms, to avoid confusion or problems with characters which are syntax in regular expressions.
2214+
2215+
Sequences not listed here as **Fixed Character Classes** nor as **Escapes** are disallowed.
2216+
For example, `\0` (octal escape) and `\1` (backreference) are not allowed.
2217+
`\a` is not defined as a character class and is also disallowed.
22072218

22082219
- **Character classes**
22092220

22102221
`[abc]` `[^def]` `[a-z]` `[ॲऄ-आइ-ऋ]` `[\u{093F}-\u{0944}\u{0962}\u{0963}]`
22112222

2212-
- supported
2213-
- no Unicode properties such as `\p{…}`
2214-
- Warning: Character classes look superficially similar to [`uset`](#element-uset) elements, but they are distinct and referenced with the `$[...usetId]` notation in transforms. The `uset` notation cannot be embedded directly in a transform.
2223+
If the character class begins with a caret (`^`) then it is a negation, matching all characters except for those listed.
2224+
2225+
Unicode properties such as `\p{…}` are not allowed.
2226+
2227+
One additional escape is allowed within character classes besides those listed above: `\-`, for escaping the hyphen character.
2228+
2229+
**Note**: Character classes look superficially similar to [`uset`](#element-uset) elements, but they are distinct and referenced with the `$[...usetId]` notation in transforms. The `uset` notation cannot be embedded directly in a transform.
22152230

22162231
- **Bounded quantifier**
22172232

22182233
`{x,y}`
22192234

2220-
`x` and `y` are required single digits representing the minimum and maximum number of occurrences.
2221-
`x` must be ≥ 0, `y` must be ≥ x and ≥ 1
2235+
`x` and `y` are required single digits (`1` to `9`) representing the minimum and maximum number of occurrences.
2236+
2237+
`x` must be ≥ 0, `y` must be ≥ x and ≥ 1.
2238+
2239+
Unbounded quantifiers such as `{3,}` are not allowed.
22222240

22232241
- **Optional Specifier**
22242242

docs/ldml/tr35.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,11 +223,12 @@ The LDML specification is divided into the following parts:
223223
* [Number symbols and formats without numberSystem](#number-symbols-and-formats-without-numbersystem)
224224
* [Clarified `currencyData` element ordering](#clarified-currencydata-element-ordering)
225225
* [Semantic Datetime Skeletons](#semantic-datetime-skeletons)
226+
* [Improvements to Keyboard Transforms](#improvements-to-keyboard-transforms)
226227
* [Well-formed identifiers](#well-formed-identifiers)
227228

228229
## <a name="Introduction" href="#Introduction">Introduction</a>
229230

230-
Not long ago, computer systems were like separate worlds, isolated from one another. The internet and related events have changed all that. A single system can be built of many different components, hardware and software, all needing to work together. Many different technologies have been important in bridging the gaps; in the internationalization arena, Unicode has provided a lingua franca for communicating textual data. However, there remain differences in the locale data used by different systems.
231+
Not long ago, computer systems were like separate worlds, isolated from one another. The internet and related events have changed all that. A single system can be built of many different compFonents, hardware and software, all needing to work together. Many different technologies have been important in bridging the gaps; in the internationalization arena, Unicode has provided a lingua franca for communicating textual data. However, there remain differences in the locale data used by different systems.
231232

232233
The best practice for internationalization is to store and communicate language-neutral data, and format that data for the client. This formatting can take place on any of a number of the components in a system; a server might format data based on the user's locale, or it could be that a client machine does the formatting. The same goes for parsing data, and locale-sensitive analysis of data.
233234

@@ -4356,6 +4357,10 @@ Other contributors to CLDR are listed on the [CLDR Project Page](https://www.uni
43564357
- Added a [Time Precision](tr35-dates#Time_Precision) option, replacing the discrete time field sets.
43574358
- Updated the algorithm for mapping to standard skeletons.
43584359

4360+
### Improvements to Keyboard Transforms
4361+
- Added rigorous EBNF definition of transform syntax to [Transform Grammar](tr35-keyboards.md#transform-grammar).
4362+
- Corrected and clarified escaping behavior under [Regex-like Syntax](tr35-keyboards.md#regex-like-syntax)
4363+
43594364
**Changes in LDML Version 46.1 (Differences from Version 46)**
43604365

43614366
### Well-formed identifiers

0 commit comments

Comments
 (0)