Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Thai hyphenation patterns based on LibThai 0.1.29 #53

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8125910
Thai: Update dict from libthai 0.1.29
thep Apr 18, 2022
7e21557
Thai: Fix missing hyphens.
thep Apr 18, 2022
35b5c54
Thai: Add hyphenation exceptions.
thep Apr 18, 2022
131df51
Thai: Adjust hyphenation dictionary.
thep Apr 18, 2022
df0fa38
Thai: Rearrange exception rules for -นะ.
thep Apr 18, 2022
3d63217
Thai: Rearrange exception rules for -ณะ.
thep Apr 18, 2022
e732829
Thai: Handle remaining cases of -นะ.
thep Apr 18, 2022
f5c4424
Thai: Rearrange exception rules for -ชะ.
thep Apr 18, 2022
e128b67
Thai: Rearrange exception rules for -ละ.
thep Apr 18, 2022
9d406e4
Thai: Add exceptions for -ระ.
thep Apr 18, 2022
7cdac7b
Thai: Add exceptions for -ยะ.
thep Apr 18, 2022
327972e
Thai: Add exceptions for -วะ.
thep Apr 18, 2022
d08b919
Thai: Add exceptions for -มี and -มะ.
thep Apr 18, 2022
77fab77
Thai: Add exceptions for -ฑะ and -ฑุ.
thep Apr 18, 2022
c8dc094
Thai: Manage -กะ.
thep Apr 18, 2022
75d5c64
Thai: Add exceptions for -ฑ*, -ฐี.
thep Apr 18, 2022
654b93d
Thai: Add exceptions for -ติ, -ดี.
thep Apr 18, 2022
8fcce77
Thai: Add exceptions for -ลี.
thep Apr 18, 2022
a8d00f8
Thai: Add exceptions for -ปะ.
thep Apr 18, 2022
09f90a9
Thai: Manage -ทะ.
thep Apr 18, 2022
b815798
Thai: Manage -พี, -วี.
thep Apr 18, 2022
5ad346f
Thai: Revert 'ปัณรสี' hyphenation.
thep Apr 18, 2022
a03e7d7
Thai: Add exceptions for 'ปาฏลิ'.
thep Apr 18, 2022
48bc638
Thai: Adjust hyphenations.
thep Apr 18, 2022
c105622
Thai: Add exceptions for '-รา'.
thep Apr 18, 2022
e5e7f8b
Thai: Add exceptions for 'อุปถัมภก'.
thep Apr 18, 2022
07b608c
Thai: Add exceptions for 'อาศิรพจน์', 'อาศิรพาท'.
thep Apr 18, 2022
b26d4ae
Thai: Add exceptions.
thep Apr 18, 2022
b2f6534
Thai: Adjust hyphenation for 'ลิมปนะ'.
thep Apr 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
359 changes: 359 additions & 0 deletions source/th/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,362 @@
2021-12-29 Theppitak Karoonboonyanan <[email protected]>

Adjust hyphenation for 'ลิมปนะ'.

* tdict-std.txt:
- Adjust 'ลิมป-นะ' -> 'ลิม-ป-นะ' (new scheme)

2021-12-29 Theppitak Karoonboonyanan <[email protected]>

Add exceptions.

* thai-exc.pat:
- Add forcing exceptions for:
- เก~ส-รี
- จันทร~เศ-ขร
- ทวา-ทศ
- ปรัส~ส-บท
- เยีย~ร-ยง
- ฟู~จิต-สึ
- ยา~โย-อิ
- เบ-ตง
- ไอ-นุ
- รช-นิ

2021-12-29 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for 'อาศิรพจน์', 'อาศิรพาท'.

* tdict-proper.txt:
- Adjust ศิริ-พงษ์ -> ศิ-ริ-พงษ์; ศิริ-ราช -> ศิ-ริ-ราช (new scheme)
* thai-exc.pat:
- Add SUPER blocking exceptions for 'อา~ศิ-ร~พจน์', 'อา~ศิ-ร~พาท'

2021-12-29 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for 'อุปถัมภก'.

* tdict-proper.txt:
- Adjust กุม-ภ-กรรณ -> กุมภ-กรรณ (compound Sanskrit)

* thai-exc.pat:
- Add forcing exception for อุป~ถัม-ภก

2021-12-29 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for '-รา'.

* tdict-proper.txt:
- Adjust exceptions ยโส-ธรา -> ยโส-ธ-รา in accordance with วสุน-ธ-รา.

* thai-exc.pat:
- Add forcing exceptions for วสุน~ธ-รา, ยโส~ธ-รา.

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Adjust hyphenations.

* tdict-common.txt:
- Adjust ภคัน-ท-ลา-พาธ in accordance with tdict-std.txt:
- ภคัน-ทลา-พาธ -> ภคัน-ท-ลา-พาธ
- ภา-รตะ -> ภา-ร-ตะ
* tdict-std.txt:
- Adjust เม-ทนี -> เม-ท-นี in accordance with เม-ท-นี-ดล

* thai-exc.pat:
- Drop blocking exception for ภา~ร-ตะ

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for 'ปาฏลิ'.

* tdict-history.txt:
- Adjust ปาฏ-ลี-บุตร -> ปา-ฏ-ลี-บุตร in accordance with ปา-ฏ-ลิ.

* thai-exc.pat:
- Add forcing exception for ปา~ฏ-ลิ.

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Revert 'ปัณรสี' hyphenation.

* tdict-std.txt:
- Revert ปัณ-ร-สี -> ปัณ-รสี (compound Sanskrit)

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Manage -พี, -วี.

* tdict-std.txt:
- Adjust อุ-รพี -> อุ-ร-พี (new scheme)

* thai-exc.pat:
- Drop blocking exception for ทา~ร-พี
- Drop forcing exception 'ทร7พี', as '5พี.' already covers it
after the blocking 'ร6พี' was dropped.
- Drop forcing exception 'เท7พี', as '5พี.' already covers it.
- Group '-วี' exceptions together.

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Manage -ทะ.

* thai-exc.pat:
- Force '-ทะ' with '5ทะ' in general.
- Drop all other exceptions on '-ทะ'.

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ปะ.

* tdict-proper.txt:
- Adjust 'กัส-สปะ' -> 'กัส-ส-ปะ' (new scheme)

* thai-exc.pat:
- Turn blocking exceptions into forcing for:
- กัจ~ฉ-ปะ
- กัส~ส-ปะ

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ลี.

* tdict-std.txt:
- Adjust 'กติญ-ชลี' -> 'กติญ-ช-ลี' in accordance with 'อัญ-ช-ลี'.

* thai-exc.pat:
- Add forcing exceptions for:
- อัญ~ช-ลี
- ปิป~ผ-ลี

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ติ, -ดี.

* tdict-std-compound.txt:
- Adjust 'ตรี-มู-รติ' -> 'ตรี-มู-ร-ติ' in accordance with tdict-std.txt.
* tdict-std.txt:
- Adjust 'กี-รติ' -> 'กี-ร-ติ' (new scheme)

* thai-exc.pat:
- Add forcing exceptions for:
- กี~ร-ติ
- มู~ร-ติ
- อา~ร-ติ
- Add blocking exceptions for:
- อภิ~ร-ดี

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ฑ*, -ฐี.

* thai-exc.pat:
- Add forcing exceptions for:
- มณ-ฑป
- ปาณ-ฑพ
- ภัณ-ฑู
- กัณ-ฐี

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Manage -กะ.

* tdict-std.txt:
- Adjust hyphenation 'อัน-ตกะ' -> 'อัน-ต-กะ' (new scheme)
* thai-fixup.sed:
- Add fixup: 'ปิ5หก' -> 'ปิ5ห5ก'
Oddly, '1กะ' got overridden by 'ปิ5หก' without any inhibition code,
causing wrong hyphenation of 'ปิหกะ' as 'ปิ-หกะ', not 'ปิ-ห-กะ' as expected.
* thai-exc.pat:
- Drop all forcing exceptions for '-กะ', as '1กะ' already rules them all.

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ฑะ and -ฑุ.

* thai-exc.pat:
- Add forcing exceptions for:
- เคณ-ฑะ, ปิณ-ฑะ, มุณ-ฑะ
- เลณ-ฑุ, กาฐ~มาณ-ฑุ

2021-12-28 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -มี and -มะ.

* tdict-std-compound.txt:
- Adjust hyphenations '{คู่,ทาน}-บา-รมี' -> '{คู่,ทาน}-บา-ร-มี'
in accordance with tdict-std.txt.
* tdict-proper.txt:
- Adjust hyphenation 'โค-ตมี' -> 'โค-ต-มี'
- Also adjust 'โค-ตมะ' -> 'โค-ต-มะ'

* thai-exc.pat:
- Group exceptions for -มี together.
- Add forcing exceptions for:
- ปัญ~จ-มี
- โค~ต-มี
- บา~ร-มี
- ลัก~ษ-มี
- Drop blocking exceptions for:
- กู~ร-มะ (allowed in new scheme)
- Add forcing exceptions for:
- โค~ต-มะ
- Drop forcing exceptions for:
- ธรร-มะ (already ruled by '5มะ' and no longer blocked by 'ร6มะ')

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -วะ.

* tdict-common.txt:
* tdict-history.txt:
- Adjust hyphenation 'อา-สวะ' -> 'อา-ส-วะ', 'ปัล-ลวะ' -> 'ปัล-ล-วะ'.

* thai-exc.pat:
- Add forcing exceptions for:
- เก~ศ-วะ
- ปุง~ค-วะ
- ปัล~ล-วะ
- มัท~ท-วะ
- สัน~ถ-วะ
- อา~ส-วะ

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ยะ.

* thai-exc.pat:
- Add forcing exceptions for:
- ตฤ~ตี-ยะ
- มัต~ส-ยะ
- อนา~ร-ยะ, อา~ร-ยะ, เศา~ร-ยะ

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Add exceptions for -ระ.

* tdict-district.txt:
- Adjust hyphenation 'อิน-ทา-มระ' -> 'อิน-ทา-ม-ระ' for harmony with the rest.
* thai-exc.pat:
- Add forcing exceptions for:
- โล~กุ~ต-ระ
- ภา~ต-ระ
- กเล~ว-ระ
- มัต~ส-ระ
- กุ~ร-ระ
- สัง~วัจ~ฉ-ระ
- อัก~ข-ระ
- อิน~ทา~ม-ระ

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Rearrange exception rules for -ละ.

* thai-exc.pat:
- Instead of listing possible -ละ cases,
let's turn it on by default and list inhibitions instead.

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Rearrange exception rules for -ชะ.

* thai-exc.pat:
- Instead of listing possible -ชะ cases,
let's turn it on by default and list inhibitions instead.

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Handle remaining cases of -นะ.

* thai-fixup.sed:
- Fixup: 'ล5กนะ' -> 'ล5ก7นะ' for 'โล~ก-นะ' inhibited by 'ก6นะ'
in thai-exc.pat.
* thai-exc.pat:
- Add exception 'งก7นะ' for 'อัง~ก-นะ'.

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Rearrange exception rules for -ณะ.

* thai-exc.pat:
- Instead of listing possible -ณะ cases,
let's turn it on by default and list inhibitions instead.
* thai-fixup.sed:
- Fixups: 'ก5ขณะ' -> 'ก5ข5ณะ', 'ก5ขณา' -> 'ก5ข5ณา'
- This requires 'ข4ณะ' early inhibition in thai-exc.pat
before the special case 'ก5ข5ณะ' is applied.
* tdict-std.txt:
- Adjust hyphenation 'อา-ปณะ' -> 'อา-ป-ณะ' for harmony with the rest.

2021-12-27 Theppitak Karoonboonyanan <[email protected]>

Rearrange exception rules for -นะ.

* thai-exc.pat:
- Instead of listing possible -นะ cases, which have been growing,
esp. after the recent hyphenation scheme adjustment,
let's turn it on by default and list inhibitions instead.

2021-12-26 Theppitak Karoonboonyanan <[email protected]>

Adjust hyphenation dictionary.

* tdict-std.txt:
* tdict-collection.txt:
* tdict-common.txt:
* tdict-proper.txt:
* tdict-std-compound.txt:
- Adjust hyphenations.

* Makefile, +thai-fixup.sed:
- Add fixup mechanism to adjust patgen-generated rules
which cannot be corrected using exceptions.
- First fixup: '3นียะ' -> '3นี3ยะ'.

* thai-exc.pat:
- Drop 'นี7ยะ' exception, duplicated with '3นี[3]ยะ'.
- Drop 'ฉ6นี', 'าร6นี', 'วีช6นี', 'สส6นี', 'มท6นี', 'เสว6นะ' exceptions,
according to the adjusted hyphenations.

2021-12-21 Theppitak Karoonboonyanan <[email protected]>

Add hyphenation exceptions.

* thai-exc.pat:
- Move 'ง1ว' fixes to be with 'ม1ว' fixes.
- Add fixes for 'ร1ว'.

2021-12-21 Theppitak Karoonboonyanan <[email protected]>

Fix missing hyphens.

* tdict-std.txt:
- Add missing hyphens in 'กะ~รุง-กะ-รัง' and 'กะ~รุ่ง-กะ-ริ่ง'.

2021-12-21 Theppitak Karoonboonyanan <[email protected]>

Update dict from libthai 0.1.29

* Makefile.am, +tdict-currency.txt:
- Add new list.

* tdict-city.txt:
* tdict-collection.txt:
* tdict-common.txt:
* tdict-district.txt:
* tdict-geo.txt:
* tdict-history.txt:
* tdict-ict.txt:
* tdict-lang-ethnic.txt:
* tdict-proper.txt:
* tdict-science.txt:
* tdict-slang.txt:
* tdict-spell.txt:
* tdict-std.txt:
- Updated from libthai 0.1.29.

* thai-exc.pat:
- Drop '<SARA-I>7ตี' exception, duplicating generated '<SARA-I>3ตี'.

2018-08-03 Theppitak Karoonboonyanan <[email protected]>

Add hyphenation exceptions.
Expand Down
4 changes: 3 additions & 1 deletion source/th/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ all: hyph-th.tex
TDICT_SRC = \
$(srcdir)/tdict-common.txt \
$(srcdir)/tdict-collection.txt \
$(srcdir)/tdict-currency.txt \
$(srcdir)/tdict-district.txt \
$(srcdir)/tdict-city.txt \
$(srcdir)/tdict-country.txt \
Expand Down Expand Up @@ -46,11 +47,12 @@ y
thai.dic: $(TDICT_SRC)
cat $(TDICT_SRC) | LC_ALL=C sort -u > $@

thai.out: thai.dic thai.tra
thai.out: thai.dic thai.tra thai-fixup.sed
rm -f thai.pat
touch thai.pat
printf "$(PATGEN_ANS)" \
| $(PATGEN) thai.dic thai.pat thai.out $(srcdir)/thai.tra
sed -f $(srcdir)/thai-fixup.sed -i thai.out

thai-comb.pat: thai.out thai-exc.pat
cat thai.out $(srcdir)/thai-exc.pat > $@
Expand Down
Loading