Fix fallback with language-likely script but region-unlikely script#7857
Fix fallback with language-likely script but region-unlikely script#7857sffc wants to merge 2 commits intounicode-org:mainfrom
Conversation
9a3657e to
b997428
Compare
|
I think there's still more that can be done to prevent loading the same likely subtags data twice and/or prevent loading likely subtags data when it isn't needed, but let's fix the algorithm first. |
| datetime/names/month/buddhist/v1, sr-ME/3, -> sr-Latn-XK/3 | ||
| datetime/names/month/buddhist/v1, sr-ME/3s, -> sr-Latn-XK/3 | ||
| datetime/names/month/buddhist/v1, sr-XK/3, 132B, 102B, 28cb8a675f91e27b | ||
| datetime/names/month/buddhist/v1, sr-XK/3s, -> sr-XK/3 |
There was a problem hiding this comment.
observation: changing the fallback algorithm might lead to unexpected results with data that was deduplicated under the previous algorithm, as well as with old binaries that are given data that was deduplicated under the new algorithm
There was a problem hiding this comment.
I don't think so. We're adding new locales but I don't think we are removing old ones, so fallback should either hit the same thing as before or hit a better thing.
There was a problem hiding this comment.
The language fallback chains changed like:
sr-Cyrl-ME,sr-Cyrltosr-Cyrl-ME,srzh-Hans-TW,zh-Hanstozh-Hans-TW,zh
This means that previously, if data was in sr-Cyrl-ME and it matched the data in sr-Cyrl, it would be removed, and we'd fall back to sr-Cyrl data at runtime. But now
- Old data new code:
sr-Cyrl-MEwill fall back tosrinstead at runtime, even though the data was deduplicated againstsr-Cyrl - New data old code:
sr-Cyrl_MEwill fall back tosr-Cyrlat runtime, even though data was deduplicated againstsr
There was a problem hiding this comment.
In fact, because sr-Cyrl and zh-Hans shouldn't contain any data (they're the default scripts), the old logic wouldn't have done any deduplication. But the new logic does do deduplication, so new data old code will break because the fallback will never reach sr/zh where the data is now.
| datetime/patterns/date/buddhist/v1, <lookup>, 6444B, 1300 identifiers | ||
| datetime/patterns/date/buddhist/v1, <total>, 67336B, 47978B, 650 unique payloads | ||
| datetime/patterns/date/buddhist/v1, <lookup>, 6514B, 1313 identifiers | ||
| datetime/patterns/date/buddhist/v1, <total>, 67911B, 48345B, 657 unique payloads |
There was a problem hiding this comment.
why are there more unique payloads than before?
There was a problem hiding this comment.
Before we were dropping locales like zh-Hans-HK and sr-Cyrl-ME, and now we include them.
https://github.com/unicode-org/cldr/blob/main/common/main/sr_Cyrl_ME.xml
There was a problem hiding this comment.
We don't "drop" locales in datagen, we deduplicate against parents. I still fail to understand how a change to the fallback algorithm can increase the number of unique data structs.
sffc
left a comment
There was a problem hiding this comment.
The new locales appear to be:
- zh-Hans-HK
- zh-Hans-MO
- sr-Cyrl-ME
- ku-Latn-IQ
- yue-Hant-CN
|
Please add a comment somewhere, either this PR or the issue, what the behaviour change here actually is, not just which locales are affected. |
Discovered when working on #3287
Changelog
icu_locale: fix fallback with language-likely script but region-unlikely script, such as
sr-Cyrl-MEandzh-Hans-HK