-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare mktables for Unicode 15.1 and 16.0 #23133
base: blead
Are you sure you want to change the base?
Conversation
if (defined (my $bmg = property_ref('Bidi_Mirroring_Glyph'))) { | ||
$bmg->set_to_output_map($EXTERNAL_MAP); | ||
$bmg->set_range_size_1(1); | ||
} | ||
|
||
property_ref('Numeric_Value')->set_to_output_map($OUTPUT_ADJUSTED); | ||
|
||
# These two properties have no short names and the file names for them | ||
# clash in DOS 8.3. Work around this by creating shorter file names that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we still limited by 8.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On IRC the other day, I asked if we were still limited, and the answer was yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unicode filenames yes, but for ASCII filenames we don't AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave this as-is, since it is trivial to do, just in case. And I have WIP which should get rid of them altogether.
4894f2a
to
1f07a91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message for aa6faba has 2 misspellings. infrastructue
lacks the second r
. In incoroporated
the second o
needs removal.
1f07a91
to
de01c61
Compare
This p.r. for Unicode mktables did not make it into the March 20 dev release. Does that mean we have to defer it to the 5.43 dev cycle? |
The change isn't really user visible, it would only affect people who would want to patch in a more recent Unicode version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khwilliamson there's one unresolved conversation in this p.r. If you mark that resolved, then I think this is okay to merge.
There are more commits coming |
Add comments, and rewrap comment lines to fit 80 columns
Unicode 15.1 introduces this new property, which needs the same special handling as plain NFKC_Casefold does.
These files are changed in 15.1 to have @missings lines, whereas they didn't before. This leads to some warnings messages, so turn off looking at them, as we do for a number of other files.
We handle it by ignoring this file, new to Unicode 16.0. It consists of lists of characters that, to put it less delicately than Unicode would like, they regret creating. But there are no rules associated with them. It would be nice to have a \p{DoNotEmit} property so that applications could handle situations where this occurs. But I'm fearful that if we did something like this, that Unicode would later come up with something that had the same intention but would be subtly or unsubtly different. That has happened before, to our detriment. So I think we should wait to see what they do do, in future releases.
de01c61
to
8f58648
Compare
8f58648
to
5b52ed1
Compare
This includes several new properties, some of which are considered "provisional" by Unicode, which means they can be heavily revised or withdrawn. These properties are designed for use by scholars of hieroglyphics.
These new properties are automatically handled, but there is a problem. They have no short form names. Files are written for them based on their names, and those files are not distinguishable on a DOS 8.3 file system. The solution here is to manually override the automatically generated file names with distinguishable ones.
mktables does a lot of sanity checks on the data it gets fed. One of those is to make sure any \d group of code points is 10 long. This verifies that Unicode has given us enough code points to form 0-9. It assumes that if it got this much right, that their numeric values are also 0-9. This check has uncovered issues with the Unicode Standard in the past. Nowadays, they've cleaned up their act, and it's been many releases since there has been problems. But our checks remain, and I think they should. What happens in Unicode 16.0 was there was a range of \d characters that contain two consecutive groups of 0-9 values. The check could be changed to verify that the count is divisible by 10, but checking for this particular range is a bit safer.
There is already this method for lists of Ranges, so this is is just so callers don't need to know which they are operating on.
5b52ed1
to
32ee519
Compare
This has been repushed, with the new hieroglyphic properties now working |
perldelta not needed until the actual releases are incorporated.