Skip to content

Commit 21a5c87

Browse files
cibucopybara-github
authored andcommitted
Tamil visual normalization rules added for flipped two-part vowel signs. Example: SIGN EE + SIGN AA -> SIGN OO
PiperOrigin-RevId: 641977604
1 parent 9300462 commit 21a5c87

File tree

2 files changed

+19
-0
lines changed

2 files changed

+19
-0
lines changed

Diff for: nisaba/scripts/brahmic/data/Taml/visual_rewrite.textproto

+15
Original file line numberDiff line numberDiff line change
@@ -106,3 +106,18 @@ item {
106106
uname: ["SIGN AU", "SIGN I"] raw: "ௌி"
107107
to_uname: ["SIGN E", "LLA", "SIGN I"] to_raw: "ெளி"
108108
}
109+
110+
# Flipped two-part vowel signs.
111+
# The non-flipped sequence is covered by NFC.
112+
item {
113+
uname: ["SIGN AA", "SIGN E"] raw: "ாெ"
114+
to_uname: ["SIGN O"] to_raw: ""
115+
}
116+
item {
117+
uname: ["SIGN AA", "SIGN EE"] raw: "ாே"
118+
to_uname: ["SIGN OO"] to_raw: ""
119+
}
120+
item {
121+
uname: ["AU LENGTH MARK", "SIGN E"] raw: "ௗெ"
122+
to_uname: ["SIGN AU"] to_raw: ""
123+
}

Diff for: nisaba/scripts/brahmic/testdata/visual_norm.textproto

+4
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ rewrite { rule: "SINH" input: "අපේ‍්‍රල්" output: "අප්
4444
# rewrite { rule: "TAML" input: "தமி​ழர்‌கள்‍" output: "தமிழர்கள்" }
4545
rewrite { rule: "TAML" input: "ஆக்‌ஷன்" output: "ஆக்‌ஷன்" }
4646

47+
rewrite { rule: "TAML" input: "காெள்" output: "கொள்" }
48+
rewrite { rule: "TAML" input: "ப்ராேஷன்" output: "ப்ரோஷன்" }
49+
rewrite { rule: "TAML" input: "சௗெந்தர்யம்" output: "சௌந்தர்யம்" }
50+
4751
rewrite { rule: "DEVA" input: "श्रीमान्‌को" output: "श्रीमान्‌को" }
4852
rewrite { rule: "DEVA" input: "गोल्‍डबर्ग" output: "गोल्डबर्ग" }
4953

0 commit comments

Comments
 (0)