Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line breaking changes from UTC-181 #1046

Merged
merged 11 commits into from
Feb 17, 2025
Merged

Conversation

eggrobin
Copy link
Member

Doing the TDD thing, see the last commit for the change to expectations.

F.1 6.1 Hyphens and Hebrew again: further adjustments to LB21a and LB20a [unicode-org/properties#308]

[181-C53] Consensus: Add a new Line_Break property value Unambiguous_Hyphen (short alias: HH) and assign this value to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0, listed below. Amend rules LB12a and LB21 of the Unicode Line Breaking Algorithm to treat HH like BA, and amend rules LB20a and LB21a to refer to the set of characters with lb=HH instead of singling out a single character or doing set arithmetic on the set of characters with lb=BA. In addition, amend rule LB20a to treat HL like AL. See L2/24-224 item 6.1. For Unicode Version 17.0.

    U+058A ֊ ARMENIAN HYPHEN
    U+05BE ‎־‎ HEBREW PUNCTUATION MAQAF
    U+1400 ᐀ CANADIAN SYLLABICS HYPHEN
    U+2010 ‐ HYPHEN
    U+2012 ‒ FIGURE DASH
    U+2013 – EN DASH
    U+2E17 ⸗ DOUBLE OBLIQUE HYPHEN
    U+2E40 ⹀ DOUBLE HYPHEN
    U+2E5D ⹝ OBLIQUE HYPHEN
    U+10EAD 𐺭 YEZIDI HYPHENATION MARK
    U+10D6E 𐵮 GARAY HYPHEN

[181-A138] Action Item for Robin Leroy, PAG: In UCD file PropertyValueAliases.txt, add a new Line_Break property value Unambiguous_Hyphen (short alias: HH). For Unicode Version 17.0. See L2/24-224 item 6.1.

[181-A139] Action Item for Robin Leroy, PAG: In UCD file LineBreak.txt and derived files, assign Line_Break=Unambiguous_Hyphen to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0. For Unicode Version 17.0. See L2/24-224 item 6.1.

[181-A141] Action Item for Robin Leroy, PAG: In UCD files LineBreakTest.txt and LineBreakTest.html, update rules LB12a, LB20a, LB21, and LB21a as described in L2/24-224 item 6.1. For Unicode Version 17.0.

[181-A142] Action Item for Robin Leroy, PAG: In UCD files LineBreakTest.txt and LineBreakTest.html, add realistic tests exercising the changes to the behaviour of rules LB20a and LB21. For Unicode Version 17.0. See L2/24-224 item 6.1.

F.1 6.3 UAX #‌14 CGJ should not break a combining character sequence [unicode-org/properties#317]

[181-C54] Consensus: Change the Line_Break assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224]

[181-A144] Action Item for Robin Leroy, PAG: In LineBreak.txt and derived files, change the Line_Break assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224]

… assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224]
…dd realistic tests exercising the changes to the behaviour of rules LB20a and LB21. For Unicode Version 17.0. See L2/24-224 item 6.1.
…ak property value Unambiguous_Hyphen (short alias: HH). For Unicode Version 17.0. See L2/24-224 item 6.1.
…_Break=Unambiguous_Hyphen to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0. For Unicode Version 17.0. See L2/24-224 item 6.1.
…pdate rules LB12a, LB20a, LB21, and LB21a as described in L2/24-224 item 6.1. For Unicode Version 17.0.
@eggrobin eggrobin requested a review from markusicu February 14, 2025 17:38
Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the separate commits (as usual) for reviewing

@eggrobin eggrobin merged commit bca50a4 into unicode-org:main Feb 17, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants