Fix fallback with language-likely script but region-unlikely script by sffc · Pull Request #7857 · unicode-org/icu4x

sffc · 2026-04-08T19:17:08Z

Discovered when working on #3287

Changelog

icu_locale: fix fallback with language-likely script but region-unlikely script, such as sr-Cyrl-ME and zh-Hans-HK

sffc · 2026-04-08T19:23:17Z

I think there's still more that can be done to prevent loading the same likely subtags data twice and/or prevent loading likely subtags data when it isn't needed, but let's fix the algorithm first.

robertbastian · 2026-04-09T09:38:54Z

provider/data/datetime/fingerprints.csv

 datetime/names/month/buddhist/v1, sr-ME/3, -> sr-Latn-XK/3
 datetime/names/month/buddhist/v1, sr-ME/3s, -> sr-Latn-XK/3
-datetime/names/month/buddhist/v1, sr-XK/3, 132B, 102B, 28cb8a675f91e27b
-datetime/names/month/buddhist/v1, sr-XK/3s, -> sr-XK/3


observation: changing the fallback algorithm might lead to unexpected results with data that was deduplicated under the previous algorithm, as well as with old binaries that are given data that was deduplicated under the new algorithm

I don't think so. We're adding new locales but I don't think we are removing old ones, so fallback should either hit the same thing as before or hit a better thing.

The language fallback chains changed like:

sr-Cyrl-ME, sr-Cyrl to sr-Cyrl-ME, sr

zh-Hans-TW, zh-Hans to zh-Hans-TW, zh

This means that previously, if data was in sr-Cyrl-ME and it matched the data in sr-Cyrl, it would be removed, and we'd fall back to sr-Cyrl data at runtime. But now

Old data new code: sr-Cyrl-ME will fall back to sr instead at runtime, even though the data was deduplicated against sr-Cyrl

New data old code: sr-Cyrl_ME will fall back to sr-Cyrl at runtime, even though data was deduplicated against sr

In fact, because sr-Cyrl and zh-Hans shouldn't contain any data (they're the default scripts), the old logic wouldn't have done any deduplication. But the new logic does do deduplication, so new data old code will break because the fallback will never reach sr/zh where the data is now.

robertbastian · 2026-04-09T09:39:55Z

provider/data/datetime/fingerprints.csv

-datetime/patterns/date/buddhist/v1, <lookup>, 6444B, 1300 identifiers
-datetime/patterns/date/buddhist/v1, <total>, 67336B, 47978B, 650 unique payloads
+datetime/patterns/date/buddhist/v1, <lookup>, 6514B, 1313 identifiers
+datetime/patterns/date/buddhist/v1, <total>, 67911B, 48345B, 657 unique payloads


why are there more unique payloads than before?

Before we were dropping locales like zh-Hans-HK and sr-Cyrl-ME, and now we include them.

https://github.com/unicode-org/cldr/blob/main/common/main/sr_Cyrl_ME.xml

We don't "drop" locales in datagen, we deduplicate against parents. I still fail to understand how a change to the fallback algorithm can increase the number of unique data structs.

sffc

The new locales appear to be:

zh-Hans-HK
zh-Hans-MO
sr-Cyrl-ME
ku-Latn-IQ
yue-Hant-CN

robertbastian · 2026-04-10T09:42:51Z

Please add a comment somewhere, either this PR or the issue, what the behaviour change here actually is, not just which locales are affected.

sffc requested review from dminor and zbraniecki as code owners April 8, 2026 19:17

sffc requested review from Manishearth and robertbastian and removed request for dminor and zbraniecki April 8, 2026 19:17

Fix fallback with language-likely script but region-unlikely script

b997428

sffc force-pushed the fallback-alg-improvement branch from 9a3657e to b997428 Compare April 8, 2026 19:19

datagen

4bd5a61

sffc requested a review from a team as a code owner April 8, 2026 20:30

Manishearth approved these changes Apr 8, 2026

View reviewed changes

robertbastian reviewed Apr 9, 2026

View reviewed changes

sffc commented Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fallback with language-likely script but region-unlikely script#7857

Fix fallback with language-likely script but region-unlikely script#7857
sffc wants to merge 2 commits intounicode-org:mainfrom
sffc:fallback-alg-improvement

sffc commented Apr 8, 2026 •

edited

Loading

Uh oh!

sffc commented Apr 8, 2026

Uh oh!

robertbastian Apr 9, 2026

Uh oh!

sffc Apr 9, 2026

Uh oh!

robertbastian Apr 10, 2026 •

edited

Loading

Uh oh!

robertbastian Apr 10, 2026

Uh oh!

robertbastian Apr 9, 2026

Uh oh!

sffc Apr 9, 2026 •

edited

Loading

Uh oh!

robertbastian Apr 10, 2026

Uh oh!

sffc left a comment

Uh oh!

robertbastian commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sffc commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Uh oh!

sffc commented Apr 8, 2026

Uh oh!

robertbastian Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

sffc Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

robertbastian Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertbastian Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

robertbastian Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

sffc Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertbastian Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sffc left a comment

Choose a reason for hiding this comment

Uh oh!

robertbastian commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sffc commented Apr 8, 2026 •

edited

Loading

robertbastian Apr 10, 2026 •

edited

Loading

sffc Apr 9, 2026 •

edited

Loading