Skip to content

WIP: migrate Ultimate Geography to Rust Brain Brew rewrite with federation#736

Open
jeprecated wants to merge 6 commits into
anki-geo:masterfrom
jeprecated:master
Open

WIP: migrate Ultimate Geography to Rust Brain Brew rewrite with federation#736
jeprecated wants to merge 6 commits into
anki-geo:masterfrom
jeprecated:master

Conversation

@jeprecated

@jeprecated jeprecated commented May 25, 2026

Copy link
Copy Markdown
Member

Hello all! Jordan here. I changed my github username from ohare93, if you're confused 😁 anyways, I finally got around to it! Deck Federation is now in Brain Brew!

This is a work-in-progress migration to the new Rust Brain Brew workflow, paired with jeprecated/brain-brew#60 things are looking very bright!

What changed

  • Replaces legacy Python Brain Brew recipes with a Rust Brain Brew brainbrew.yaml manifest.
  • Adds deck.yaml as the canonical English source.
  • Adds language, variant, and Hardcore Geography overlays.
  • Adds CI to verify and export all configured targets.
  • Current local status: brainbrew verify --manifest brainbrew.yaml --all-targets --media-root media passes for 71
    targets.

Related issues / unlocked work

Available for questions, more to come soon! 👀

@jeprecated jeprecated requested review from aplaice and axelboc May 25, 2026 09:49
@jeprecated

Copy link
Copy Markdown
Member Author

Example of a single translation PR in this new system, here's the outstanding PR #735 converted into this new system: jeprecated#16

If any strings were missed in the translation overlay it would throw an error 👌

When we make changes to the base English deck all those strings will throw errors for each language (which is the correct behaviour!).

@aplaice

aplaice commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Wow! This looks like a huge amount of work!

I wish my review below were more positive, but at the moment I'm not convinced that this is a net improvement.

It enables things that are currently tricky (e.g. flexibly allowing different note types!) and AFAIU generally makes future extensibility easier.

However, IMO it makes routine tasks that are currently straightforward more finicky and tedious. (OTOH it's possible that I'm misunderstanding how a new, improved workflow would work!)

I greatly appreciate how you've continued working on this for so long and I'm generally very excited about improvements to our tooling!

(Details/discussion below; I've been sitting on it for several weeks now, being hesitant to post the comment (sorry for the resultant delay!!), but I can't think of any particular improvements to the text.)

YAML

I'm not sure that replacing everything with YAML is, on the whole, advantageous, and, if yes, I'm not convinced the proposed structure is most convenient.

HTML/CSS

For the HTML/CSS files (deck descriptions and templates) it's IMO a minor but clear downgrade both for editing (one has to worry about proper indentation for the literal blocks) and discoverability (rather than being accessed via standard directory navigation one has to find the correct sub-key in the YAML).

CSV vs. YAML

(field data (old src/data))

Both YAML and CSV have flaws as formats, so this comparison is much trickier.

The new structure is far more verbose, but that's not necessarily a bad thing (being explicit is often valuable). (Though why do we need fields: {field.X: ...} rather than just fields: {X: ...}?)

Both CSV and YAML have their annoying quoting idiosyncrasies, though at least with CSVs users can avoid them by opening in a spreadsheet editor. (With the old anki-dm, this regularly lead to issues due to quoting inconsistencies, but with the current BrainBrew it seems to work fine.)

I'm not sure about grouping by (country, field) rather than (country, language) (i.e. having, say, all translations of country info together). There's no perfect solution since we're trying to flatten 3D data (country × field × language) into 2D or 1D, but I think that we had discussed this during the anki-dm -> brainbrew transition and decided that given the usual edit patterns it's more convenient to have all translations of a field together, rather than all fields of a country together, and I think I still agree. (i.e. we had decided to have per-field CSVs (capital.csv etc.) rather than per-translation ones (french.csv etc.))

English field as key

I'm also not convinced about using the English version of the field as the key for translations. It's a frequent, though not universal convention in localisation software (GNU gettext does this, fluent doesn't), and it means that the source version is available side-by-side with the translation, as well as (like you note) enforcing the updating of translations when the source changes. However, here, it has several disadvantages:

  1. It's often ugly:

    e.g. 'Iceland (blue background, red and white cross), Norway (red background, blue and white cross)': 'Island (modré pozadí, červený a bílý kříž), Norsko (červené pozadí, modrý a bílý kříž)'. (Obviously, our current CSVs are worse in that instead of having two such monstrosities per line we have ~16, but in the YAML version we have this for every language (~15 times) rather than only once, and the point of flattening to "1D" should be to avoid such long lines as far as possible.)

  2. At least for the country info field, the translations aren't always intended to be direct translations, so having the English version as key is misleading and means that having to update the English key, in all translations, when it changes, is just an inconvenience (rather than a correctness check). It also means that we need empty keys (foobar: '') and "additions".

    (NB I appreciate how all these many edge-cases are handled, technically, but I imagine that from a translator PoV the inconsistencies would get annoying quickly.)

    In other words: because the translations are no longer "independent", when the translator wants to add (say) a country info, they have to go back to deck.yaml, look up whether there's an English country info (and what it is) in deck.yaml, and either use the English country info as key or add a new addition: {notes.note.COUNTRY.fields.field.country-info: ..., instead of just searching for the country in country_info.csv and either adding a new field to an existing row or adding a new row.

  3. Lack of context (minor).

    (Again mainly for country/capital info.) In some cases it's useful to have the country name and not just the English country info. (We encourage using the terms present on the given Wikipedia page, so for instance to best translate "Autonomous community of Spain." it's helpful to know that this is for the Canary Islands, so one can look up how the Canary Islands are categorised in one's language on Wikipedia, instead of just directly translating the phrase.)

  4. (As noted above regarding grouping): when adding a new country, having to copy-paste the English keys (for each of the fields) into each of the translations sounds very tedious. (Similarly when updating, say, a capital info (e.g. after the capital was moved), and having to change both the English and the translated version in each file.)

    OTOH having separate files for each language indexed with the English key opens up a new workflow: only translate what one can do easily and leave the rest broken. These remaining translations would then be picked up by more competent translators later on. (We'd only ensure that none of the builds are broken before release.)

    This wasn't possible previously because we didn't have any way of indicating that a translation was missing (rather than just deliberately empty) or wasn't up-to-date. However, I'm not sure how "cost-effective" this is — translators later would have poorer access to the original context. Leaving translated builds broken for extensive periods of time is also not great practice...

Partial aside

The grouping and disambiguation of strings that repeat in English (say Independent state claimed by Georgia. or Island of Indonesia.) is really cool and even on a quick scan, it's allowed me to catch some errors/inconsistences (nárokováný vs. nárokovaný in Abkhazia/South Ossetia country info:cs, 'Indonéský ostrov.' vs. 'Ostrov Indonésie.' in Bali/Java/Sumatra country info:cs, weirdness in Azores country info:da ('Selvstændig region Autonomous region i Portugal.')).) (I'll need to remember to compile all such cases and fix them.)

This does point to a major advantage of using the English field as the key, but I think that the cons discussed above outweigh the pros.

Hardcore geography

If I understand correctly, the way the "overlapping" notes (those that have some cards in UG and some in HG) have been made to work is that in HG the notes have the "missing" fields added (via "field-fills"), so that we only have a single copy of each note, with both the UG and (old) HG cards. (Also, HG now contains UG rather than being an addition (?).) This is pretty elegant (and clean), but I'm not sure if it's the most convenient behaviour.

The problem is that now if someone imports UG, imports HG, and then updates UG (without updating HG) all of the HG cards of the overlapping notes will disappear. I believe that they won't be deleted unless one runs Tools > Empty cards, and hence they can be recovered by updating HG (but it might vary depending on Anki version). (NB the initial "import UG" step isn't necessary, but it is how I'd expect most users to start and illustrates why people might end up confused — they'd be thinking that they're only updating their UG import and that it shouldn't affect HG.)

We could obviously warn people that if they import HG they should only update via HG, but I expect such warnings to frequently fail. The issue being (usually (?)) recoverable (if one doesn't empty cards in the meantime) makes me less convinced which behaviour is the "correct" one, since the alternative of having two notes (one for UG and one for HG) in the "overlapping" cases is aesthetically displeasing.

(A similar (worse) problem would also occur if one were to use a deck that adds a completely new card (say a hypothetical currency UG adding country->currency cards). Import UG -> import currency UG -> update UG and you lose all your country->currency cards (immediately deleted, though with a warning).)

Modifying "fully" UG cards

One small, but potentially very useful thing that the new brain brew facilitates for decks like HG is modifying cards that are solely in UG (i.e. not the "overlapping" notes, but notes which are in UG and have no cards in HG).

When we removed the small dependent territory capitals from UG (and moved them to HG) we also removed the capital hints from the notes which no longer had any collisions within UG (whose "conflictors" were moved to HG) — Basseterre (Saint Kitts and Nevis) which had conflicted with Guadeloupe and Georgetown (Guyana) which had conflicted with the Cayman Islands. (The capital hints on the other side — in Guadeloupe and the Cayman Islands — were kept.)

The new field-fills mechanism allows easily modifying these notes in HG to re-add the capital hints.

(Technically, we could achieve this with our current set-up by cloning the notes from UG to HG (keeping the same guid) and re-adding the capital hints to the clone, but it would be far more brittle and harder to maintain.)

Nix as a requirement?

I'm hesitant about having nix as a dependency, but given that (AFAIU) it wouldn't be really required for most contributors, it's probably not a major issue.


In any case, thanks very much for your continuing efforts to improve BrainBrew and the tooling around AUG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants