Line breaking changes from UTC-181 #1046
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Doing the TDD thing, see the last commit for the change to expectations.
F.1 6.1 Hyphens and Hebrew again: further adjustments to LB21a and LB20a [unicode-org/properties#308]
[181-C53] Consensus: Add a new Line_Break property value Unambiguous_Hyphen (short alias: HH) and assign this value to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0, listed below. Amend rules LB12a and LB21 of the Unicode Line Breaking Algorithm to treat HH like BA, and amend rules LB20a and LB21a to refer to the set of characters with lb=HH instead of singling out a single character or doing set arithmetic on the set of characters with lb=BA. In addition, amend rule LB20a to treat HL like AL. See L2/24-224 item 6.1. For Unicode Version 17.0.
[181-A138] Action Item for Robin Leroy, PAG: In UCD file PropertyValueAliases.txt, add a new Line_Break property value Unambiguous_Hyphen (short alias: HH). For Unicode Version 17.0. See L2/24-224 item 6.1.
[181-A139] Action Item for Robin Leroy, PAG: In UCD file LineBreak.txt and derived files, assign Line_Break=Unambiguous_Hyphen to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0. For Unicode Version 17.0. See L2/24-224 item 6.1.
[181-A141] Action Item for Robin Leroy, PAG: In UCD files LineBreakTest.txt and LineBreakTest.html, update rules LB12a, LB20a, LB21, and LB21a as described in L2/24-224 item 6.1. For Unicode Version 17.0.
[181-A142] Action Item for Robin Leroy, PAG: In UCD files LineBreakTest.txt and LineBreakTest.html, add realistic tests exercising the changes to the behaviour of rules LB20a and LB21. For Unicode Version 17.0. See L2/24-224 item 6.1.
F.1 6.3 UAX #14 CGJ should not break a combining character sequence [unicode-org/properties#317]
[181-C54] Consensus: Change the Line_Break assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224]
[181-A144] Action Item for Robin Leroy, PAG: In LineBreak.txt and derived files, change the Line_Break assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224]