Skip to content

Commit a19886d

Browse files
committed
mktables: Handle Unicode 16.0 new \d ranges
mktables does a lot of sanity checks on the data it gets fed. One of those is to make sure any \d group of code points is 10 long. This verifies that Unicode has given us enough code points to form 0-9. It assumes that if it got this much right, that their numeric values are also 0-9. This check has uncovered issues with the Unicode Standard in the past. Nowadays, they've cleaned up their act, and it's been many releases since there has been problems. But our checks remain, and I think they should. What happens in Unicode 16.0 was there was a range of \d characters that contain two consecutive groups of 0-9 values. The check could be changed to verify that the count is divisible by 10, but checking for this particular range is a bit safer.
1 parent a1b8194 commit a19886d

File tree

6 files changed

+9
-5
lines changed

6 files changed

+9
-5
lines changed

Diff for: charclass_invlists.inc

+1-1
Original file line numberDiff line numberDiff line change
@@ -436055,7 +436055,7 @@ static const U8 WB_table[23][23] = {
436055436055
* 3f4f32ed2a577344a508114527e721d7a8b633d32f38945d47fe0c743650c585 lib/unicore/extracted/DLineBreak.txt
436056436056
* 710abf2d581ac9c57f244c0834f9d9969d9781e0396adccd330eaae658ac7d6b lib/unicore/extracted/DNumType.txt
436057436057
* 6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d lib/unicore/extracted/DNumValues.txt
436058-
* 5b296d0f4540ce1853589060d595799065c01361bcb5077f8e2cfabdefd18a61 lib/unicore/mktables
436058+
* c1557a0885bf627ece862b3a80ee1bd24449b656e01159d4c6753c3a1ed54335 lib/unicore/mktables
436059436059
* 55d90fdc3f902e5c0b16b3378f9eaa36e970a1c09723c33de7d47d0370044012 lib/unicore/version
436060436060
* 0a6b5ab33bb1026531f816efe81aea1a8ffcd34a27cbea37dd6a70a63d73c844 regen/charset_translations.pl
436061436061
* c7ff8e0d207d3538c7feb4a1a152b159e5e902d20293b303569ea8323e84633e regen/mk_PL_charclass.pl

Diff for: lib/unicore/mktables

+4
Original file line numberDiff line numberDiff line change
@@ -13779,6 +13779,10 @@ END
1377913779
next if $range->start == 0x1D7CE; # This whole range was added in 3.1
1378013780
next if $range->end == 0x19DA && $v_version eq v5.2.0;
1378113781
next if $range->end - $range->start < 9 && $v_version le 4.0.0;
13782+
13783+
# 2 sequential series of 10 each were added in 16.0
13784+
next if $range->start == 0x116D0 && $range->end == 0x116E3;
13785+
1378213786
Carp::my_carp("Range $range unexpectedly doesn't contain 10"
1378313787
. " decimal digits. Code in regcomp.c assumes it does,"
1378413788
. " and will have to be fixed. Proceeding anyway.");

Diff for: lib/unicore/uni_keywords.pl

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: regcharclass.h

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: regexp_constants.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
* 3f4f32ed2a577344a508114527e721d7a8b633d32f38945d47fe0c743650c585 lib/unicore/extracted/DLineBreak.txt
7979
* 710abf2d581ac9c57f244c0834f9d9969d9781e0396adccd330eaae658ac7d6b lib/unicore/extracted/DNumType.txt
8080
* 6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d lib/unicore/extracted/DNumValues.txt
81-
* 5b296d0f4540ce1853589060d595799065c01361bcb5077f8e2cfabdefd18a61 lib/unicore/mktables
81+
* c1557a0885bf627ece862b3a80ee1bd24449b656e01159d4c6753c3a1ed54335 lib/unicore/mktables
8282
* 55d90fdc3f902e5c0b16b3378f9eaa36e970a1c09723c33de7d47d0370044012 lib/unicore/version
8383
* 0a6b5ab33bb1026531f816efe81aea1a8ffcd34a27cbea37dd6a70a63d73c844 regen/charset_translations.pl
8484
* c7ff8e0d207d3538c7feb4a1a152b159e5e902d20293b303569ea8323e84633e regen/mk_PL_charclass.pl

Diff for: uni_keywords.h

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)