Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version 1.2, update CLDR v43 #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# Version 1.2 (2023-??-??)

- Updated to CLDR v43.

# Version 1.1 (2021-??-??)

- Updated to CLDR v40.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The data included in this package is:
- The estimated population that speaks each language
- The estimated population that writes each language

These are all extracted from the Unicode [CLDR][] data package, version 40, plus a few additional language names that fill in gaps in CLDR.
These are all extracted from the Unicode [CLDR][] data package, version 43, plus a few additional language names that fill in gaps in CLDR.

[cldr]: http://cldr.unicode.org/

Expand Down
16 changes: 11 additions & 5 deletions language_data/build_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,11 +440,17 @@ def build_data():
language_data = read_cldr_name_file(langcode, 'languages')
update_names(names_fwd, language_names_rev, language_data)

script_data = read_cldr_name_file(langcode, 'scripts')
update_names(names_fwd, script_names_rev, script_data)

territory_data = read_cldr_name_file(langcode, 'territories')
update_names(names_fwd, territory_names_rev, territory_data)
try:
script_data = read_cldr_name_file(langcode, 'scripts')
update_names(names_fwd, script_names_rev, script_data)
except FileNotFoundError:
pass

try:
territory_data = read_cldr_name_file(langcode, 'territories')
update_names(names_fwd, territory_names_rev, territory_data)
except FileNotFoundError:
pass

iana_languages, iana_scripts, iana_territories = read_iana_registry_names()
update_names(names_fwd, language_names_rev, iana_languages)
Expand Down
21 changes: 15 additions & 6 deletions language_data/data/languageInfo.xml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatching>
<languageMatches type="written_new">
<paradigmLocales locales="en en_GB es es_419 pt_BR pt_PT"/>
<matchVariable id="$enUS" value="AS+GU+MH+MP+PR+UM+US+VI"/>
<matchVariable id="$enUS" value="AS+CA+GU+MH+MP+PH+PR+UM+US+VI"/>
<matchVariable id="$cnsar" value="HK+MO"/>
<matchVariable id="$americas" value="019"/>
<matchVariable id="$maghreb" value="MA+DZ+TN+LY+MR+EH"/>
<languageMatch desired="no" supported="nb" distance="1"/> <!-- nonb -->
<languageMatch desired="nb" supported="no" distance="1"/> <!-- nbno -->
<!-- languageMatch desired="ku" supported="ckb" distance="4" oneway="true"/ --> <!-- ku ⇒ ckb -->
<!-- languageMatch desired="ckb" supported="ku" percent="8" oneway="true"/ --> <!-- ckb ⇒ ku -->
<languageMatch desired="hr" supported="bs" distance="4"/> <!-- hr ⇒ bs -->
Expand All @@ -38,18 +38,23 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatch desired="ach" supported="en" distance="30" oneway="true"/> <!-- Acoli (Southern Luo dialect in Uganda): ach ⇒ en -->
<languageMatch desired="af" supported="nl" distance="20" oneway="true"/> <!-- Afrikaans: af ⇒ nl -->
<languageMatch desired="ak" supported="en" distance="30" oneway="true"/> <!-- Akan: ak ⇒ en -->
<languageMatch desired="am" supported="en" distance="30" oneway="true"/> <!-- Amharic ⇒ English -->
<languageMatch desired="ay" supported="es" distance="20" oneway="true"/> <!-- Aymara: ay ⇒ es -->
<languageMatch desired="az" supported="ru" distance="30" oneway="true"/> <!-- Azerbaijani: az ⇒ ru -->
<languageMatch desired="bal" supported="ur" distance="20" oneway="true"/> <!-- Baluchi ⇒ Urdu -->
<languageMatch desired="be" supported="ru" distance="20" oneway="true"/> <!-- Belarusian: be ⇒ ru -->
<languageMatch desired="bem" supported="en" distance="30" oneway="true"/> <!-- Bemba (Zambia): bem ⇒ en -->
<languageMatch desired="bh" supported="hi" distance="30" oneway="true"/> <!-- Bihari languages (gets canonicalized to bho): bh ⇒ hi -->
<languageMatch desired="bn" supported="en" distance="30" oneway="true"/> <!-- Bangla: bn ⇒ en -->
<languageMatch desired="bo" supported="zh" distance="20" oneway="true"/> <!-- Tibetan ⇒ Chinese -->
<languageMatch desired="br" supported="fr" distance="20" oneway="true"/> <!-- Breton: br ⇒ fr -->
<languageMatch desired="ca" supported="es" distance="20" oneway="true"/> <!-- Catalan ⇒ Spanish -->
<languageMatch desired="ceb" supported="fil" distance="30" oneway="true"/> <!-- Cebuano: ceb ⇒ fil -->
<languageMatch desired="chr" supported="en" distance="20" oneway="true"/> <!-- Cherokee: chr ⇒ en -->
<languageMatch desired="ckb" supported="ar" distance="30" oneway="true"/> <!-- Sorani Kurdish: ckb ⇒ ar -->
<languageMatch desired="co" supported="fr" distance="20" oneway="true"/> <!-- Corsican: co ⇒ fr -->
<languageMatch desired="crs" supported="fr" distance="20" oneway="true"/> <!-- Seselwa Creole French: crs ⇒ fr -->
<languageMatch desired="cs" supported="sk" distance="20"/> <!-- Czech ⇔ Slovak -->
<languageMatch desired="cy" supported="en" distance="20" oneway="true"/> <!-- Welsh: cy ⇒ en -->
<languageMatch desired="ee" supported="en" distance="30" oneway="true"/> <!-- Ewe: ee ⇒ en -->
<languageMatch desired="eo" supported="en" distance="30" oneway="true"/> <!-- Esperanto: eo ⇒ en -->
Expand Down Expand Up @@ -88,9 +93,10 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatch desired="lo" supported="en" distance="30" oneway="true"/> <!-- Lao: lo ⇒ en -->
<languageMatch desired="loz" supported="en" distance="30" oneway="true"/> <!-- Lozi: loz ⇒ en -->
<languageMatch desired="lua" supported="fr" distance="30" oneway="true"/> <!-- Luba-Lulua: lua ⇒ fr -->
<languageMatch desired="mai" supported="hi" distance="20" oneway="true"/> <!-- Maithili ⇒ Hindi -->
<languageMatch desired="mfe" supported="en" distance="30" oneway="true"/> <!-- Morisyen: mfe ⇒ en -->
<languageMatch desired="mg" supported="fr" distance="30" oneway="true"/> <!-- Malagasy: mg ⇒ fr -->
<languageMatch desired="mi" supported="en" distance="20" oneway="true"/> <!-- Maori: mi ⇒ en -->
<languageMatch desired="mi" supported="en" distance="20" oneway="true"/> <!-- Māori: mi ⇒ en -->

<!-- CLDR-13625: Macedonian should not fall back to Bulgarian -->
<!-- languageMatch desired="mk" supported="bg" distance="30" oneway="true"/--> <!-- Macedonian: mk ⇒ bg -->
Expand Down Expand Up @@ -137,12 +143,14 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatch desired="tt" supported="ru" distance="30" oneway="true"/> <!-- Tatar: tt ⇒ ru -->
<languageMatch desired="tum" supported="en" distance="30" oneway="true"/> <!-- Tumbuka: tum ⇒ en -->
<languageMatch desired="ug" supported="zh" distance="20" oneway="true"/> <!-- Uighur: ug ⇒ zh -->
<languageMatch desired="uk" supported="ru" distance="20" oneway="true"/> <!-- Ukrainian ⇒ Russian -->
<languageMatch desired="ur" supported="en" distance="30" oneway="true"/> <!-- Urdu: ur ⇒ en -->
<languageMatch desired="uz" supported="ru" distance="30" oneway="true"/> <!-- Uzbek: uz ⇒ ru -->
<languageMatch desired="wo" supported="fr" distance="30" oneway="true"/> <!-- Wolof: wo ⇒ fr -->
<languageMatch desired="xh" supported="en" distance="30" oneway="true"/> <!-- Xhosa: xh ⇒ en -->
<languageMatch desired="yi" supported="en" distance="30" oneway="true"/> <!-- Yiddish: yi ⇒ en -->
<languageMatch desired="yo" supported="en" distance="30" oneway="true"/> <!-- Yoruba: yo ⇒ en -->
<languageMatch desired="za" supported="zh" distance="20" oneway="true"/> <!-- Zhuang languages ⇒ Chinese -->
<languageMatch desired="zu" supported="en" distance="30" oneway="true"/> <!-- Zulu: zu ⇒ en -->

<!-- START generated by GenerateLanguageMatches.java: don't manually change -->
Expand Down Expand Up @@ -359,8 +367,10 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatch desired="yue" supported="zh" distance="10" oneway="true"/> <!-- Chinese, Cantonese -->
<!-- END generated by GenerateLanguageMatches.java -->
<languageMatch desired="*" supported="*" distance="80"/> <!-- * ⇒ * -->
<languageMatch desired="am_Ethi" supported="en_Latn" distance="10" oneway="true"/>
<languageMatch desired="az_Latn" supported="ru_Cyrl" distance="10" oneway="true"/> <!-- az; Latn ⇒ ru; Cyrl -->
<languageMatch desired="bn_Beng" supported="en_Latn" distance="10" oneway="true"/> <!-- bn; Beng ⇒ en; Latn -->
<languageMatch desired="bo_Tibt" supported="zh_Hans" distance="10" oneway="true"/>
<languageMatch desired="hy_Armn" supported="ru_Cyrl" distance="10" oneway="true"/> <!-- hy; Armn ⇒ ru; Cyrl -->
<languageMatch desired="ka_Geor" supported="en_Latn" distance="10" oneway="true"/> <!-- ka; Geor ⇒ en; Latn -->
<languageMatch desired="km_Khmr" supported="en_Latn" distance="10" oneway="true"/> <!-- km; Khmr ⇒ en; Latn -->
Expand All @@ -382,9 +392,8 @@ For terms of use, see http://www.unicode.org/copyright.html
<languageMatch desired="uz_Latn" supported="ru_Cyrl" distance="10" oneway="true"/> <!-- uz; Latn ⇒ ru; Cyrl -->
<languageMatch desired="yi_Hebr" supported="en_Latn" distance="10" oneway="true"/> <!-- yi; Hebr ⇒ en; Latn -->
<languageMatch desired="sr_Latn" supported="sr_Cyrl" distance="5"/> <!-- sr; Latn ⇒ sr; Cyrl -->
<languageMatch desired="zh_Hans" supported="zh_Hant" distance="15" oneway="true"/> <!-- zh; Hans ⇒ zh; Hant -->
<languageMatch desired="zh_Hant" supported="zh_Hans" distance="19" oneway="true"/> <!-- zh; Hant ⇒ zh; Hans -->
<!-- zh_Hani: Slightly bigger distance than zh_Hant->zh_Hans -->
<languageMatch desired="za_Latn" supported="zh_Hans" distance="10" oneway="true"/>
<!-- zh_Hani: Slightly bigger distance than zh_Hant->zh_Hans was before CLDR-14355 -->
<languageMatch desired="zh_Hani" supported="zh_Hans" distance="20" oneway="true"/>
<languageMatch desired="zh_Hani" supported="zh_Hant" distance="20" oneway="true"/>
<!-- Latin transliterations of some languages, initially from CLDR-13577 -->
Expand Down
Loading