Skip to content

CLDR-19605: Add Nogai (nog) Arabic keyboard layout#5859

Open
murza-enikeeff wants to merge 1 commit into
unicode-org:mainfrom
murza-enikeeff:nog-arab-keyboard
Open

CLDR-19605: Add Nogai (nog) Arabic keyboard layout#5859
murza-enikeeff wants to merge 1 commit into
unicode-org:mainfrom
murza-enikeeff:nog-arab-keyboard

Conversation

@murza-enikeeff

Copy link
Copy Markdown

CLDR-19605

  • This PR completes the ticket.

Sociolinguistic and Technical Justification for Nogai Layout (nog-Arab)

Depends on CLDR-19605 (Core Data)

1. UNESCO Status and Current Linguistic Peril

The Nogai language (nog) is officially classified by the UNESCO Atlas of the World's Languages in Danger as "Definitely Endangered." The language faces severe existential pressure due to a historical lack of institutional support, a critical shortage of native-language schools, and systematic displacement from official and educational spheres. Providing native digital input mechanisms is a critical, non-negotiable step toward preventing total language extinction.

2. Historical Context: Forced Script Transitions as Structural Assimilation

The orthographic history of the Nogai language is a documentation of forced linguistic engineering and voluntary-compulsory Russification of minoritized indigenous peoples:

  • Pre-1928: The Nogai people utilized a highly functional Arabic-based script, maintaining deep cultural and historical ties with their heritage.
  • 1928–1938: The Arabic script was officially replaced by a Latin-based alphabet.
  • 1938–Present: As part of a centralized policy of forced cultural assimilation, the Latin script was abruptly abolished and replaced with a modified Cyrillic alphabet.

These rapid, politically driven script disruptions fractured intergenerational literacy, isolated the population from their historical literature, and acted as structural elements of linguistic ethnocide.

3. Digital Marginalization as Ongoing Assimilation

Currently, major operating systems and input engines (including Google Gboard, iOS, and Windows) completely lack native support for Nogai layouts. This absence forces Nogai speakers into absolute digital dependency on surrogate layouts:

  • Forced Substitution: Speakers are systematically forced to use either standard Russian or Kazakh keyboards.
  • Technical Fragmentation: Using the Russian layout forces users to manually split native digraphs (Аь, Оь, Уь, Нъ) into separate characters. This breaks digital text processing, renders spell-check and predictive text impossible, and corrupts corpus linguistics data.
  • Digital Colonialism: Forcing an endangered language community to adopt the dominant state language's layout (Russian) functions as an ongoing mechanism of digital assimilation, stripping the language of its visual autonomy.

4. Technical Philosophy of the Arabic Layout (nog-Arab.xml)

The proposed nog-Arab.xml layout is designed to digitize and preserve the historical literary heritage of the Nogai people, while providing a native digital environment for the active Nogai diaspora (e.g., in Turkey and the Middle East) who continue to cultivate Arabic-based materials.
Rather than being a simple adaptation of the standard Arabic keyboard, this layout is meticulously engineered to reflect the phonetic realities of the Nogai language. It heavily draws upon the historical Steppe Arabic orthographic tradition (Tote zhazu / төте жазу). Key architectural decisions include:

  • Word-Level Softness Marker: Added the High Hamza ('ٴ', U+0674) to the longpress of the 'ئ' key. This serves as a crucial word-level softness marker for front vowels, accurately aligning the digital layout with the historical steppe Arabic standard used across Central Asian and Caucasian Turkic languages.
  • Nogai-Specific Phonetics on the Primary Layer: Unique Kypchak phonemes are prioritized on the main UI. This includes specific consonants like 'ڭ' (ng / нъ), 'چ' (ch / ч), 'پ' (p / п), 'گ' (g / г), and front vowels such as 'ۈ' (ue / уь) and 'ۆ' (oe / оь), ensuring fast and fluid typing for native vocabulary without requiring additional modifiers.
  • Classical Loanword Support via Longpress: To keep the main keyboard clean and optimized for everyday Nogai phonetics, characters primarily used in classical Arabic and Islamic loanwords (such as 'ص', 'ث', 'ط', 'ظ', 'ذ') are intelligently nested as longpress options beneath their closest phonetic equivalents ('س', 'ت', 'ز').

Conclusion

By unifying these graphic systems into a cohesive, longpress-accessible architecture, this specification empowers a marginalized speech community to bypass structural barriers, reclaim their graphic history, and democratically determine the future trajectory of their language.

ALLOW_MANY_COMMITS=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant