[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

pomegranited · 2025-02-05T07:03:55Z

[SIP-153] Proposal for translating Superset asset data using custom flask-babel extraction methods

Motivation

Superset provides translation support for built-in components in the UI. However, the Open edX project also needs the user-provided terms used in the assets themselves to be translatable, e.g. dashboard and chart title, axes labels, and metric labels.

We also need these asset translations to be easily maintained between upgrades of Superset, and to be re-deployable when translations are updated.

Proposed Change

Superset uses flask-babel to extract and compile translations marked by the backend and frontend files. We propose adding new custom extraction methods to pull out asset field values. These methods would be disabled by default, and enabled using a new feature flag, SUPERSET_TRANSLATE_ASSETS.

Superset provides the backend translations to the frontend via the template bootstrap data's language_pack field. Once user-provided values are translated, they are available to any frontend component strings wrapped in @superset-ui.core.t().

To make version control and conflict management easier, we propose splitting the asset messages into separate files from the upstream-maintained application translations used by Superset UI. The asset translations will be concatenated with the application message files before being compiled. Thus, these asset message files can be maintained per instance on a Superset fork. (Note: compiled message .mo/.json files are generated, so are not version-controlled.)

Process

Configure the SUPERSET_TRANSLATE_ASSETS feature setting.
Run babel_update.sh to extract application and asset translations to (versioned) .po files.
Translators will manually update the .po files, or use a tool like Transifex to provide translations in the desired languages.
Run babel_update.sh –compile to concatenate and compile the (unversioned) message files consumed by the app.

This process could be optionally run during the Docker image build (when BUILD_TRANSLATIONS=true), or via the command line on a deployed instance. Superset may need to be restarted to apply changes to translations on a running instance.

Configuration: SUPERSET_TRANSLATE_ASSETS

If the SUPERSET_TRANSLATE_ASSETS feature flag is not found in settings, or is Falsey, the custom extraction methods will exit immediately, so that users who do not need this feature are not burdened with the overhead of extracting asset translations.

This feature flag could be a simple on/off boolean, or a more complex structure to include/exclude specific assets, depending on community feedback.

Backend: extract asset field values

Create custom asset extraction methods under superset.translations.utils which:

iterate over the assets using the Superset data APIs
pull out each translatable asset field (see below) value as the "message"
yield a tuple for each translatable field in the asset, containing:
- lineno: (generate something reasonable here)
- funcname: e.g asset_<type>_<field_name>_<uuid>
- message: value of asset field
- comments: generate a comment for translators to describe which asset this field is from, and where this field is used

Frontend: use translations

Superset provides translations to the frontend via the template bootstrap data's language_pack field. Once user-provided values are translated, they are available to the frontend components via @superset-ui.core.t(), e.g

import { t } from '@superset-ui/core';
…
    // Where the dashboard_title variable is shown to user:
    {t(dashboard_title)}

Appendix: Translatable asset fields

We've identified the following asset fields as needing translations.

Dashboard fields

dashboard_title
description
metadata.native_filter_configuration.name
metadata.native_filter_configuration.description
position.*.meta.text
position.*.meta.code
position.*.meta.sliceNameOverride

Notes:

position.*.meta fields denote positional elements in the Dashboard, e.g charts, headings, and markdown text.

Chart fields

slice_name
description
params.x_axis_label
params.y_axis_label
params.groupby.label

Dataset fields

metrics.verbose_name
columns.verbose_name

New or Changed Public Interfaces

We propose updating the command-line script babel_update.sh to:

Preserve messages.pot / messages.po files for application messages.
Generate assets.po files for asset messages.
Upstream versions will be empty; forks can maintain their own versions.
Add a compile argument which:
- Concatenates messages.po + assets.po to an git-ignored file, superset.po
- Compiles each language's .mo using pybabel compile
- Rebuilds the frontend .json files using npm run build-translations

The new custom extraction methods will be registered as setuptools entry_points in setup.py, e.g:

"babel.extractors": [
    "superset_dashboards = superset.translations.utils:extract_dashboards",
    "superset_charts = superset.translations.utils:extract_charts",
    "superset_datasets = superset.translations.utils:extract_datasets",
]

New dependencies

None

Migration Plan and Compatibility

No database migrations or user-facing changes are required to support this change.

Rejected Alternatives

SIP-60 proposed adding extra fields to each chart to store the translated text for each user-facing field, and a custom component for locating and displaying the translated field value. This approach was contested for its UI complexity, and because it requires translators to have chart-level edit access.

One respondent to SIP-60 also suggested creating a dedicated database table for i18n which could have its own UI and permissions granted for translators. Though this solution is simpler from a data perspective, it does not simplify the UI changes required to utilize these translations. It also would distance the translator from the context in which each translated term is used, making it difficult for translators to provide appropriate translations.

Open edX currently works around this issue by providing translated copies of master charts and dashboards for each supported language. This workaround mirrors Tableau 's suggested solution, however, Open edX has found this approach to be difficult to maintain, especially in an environment where operators may provide their own custom charts and dashboards.

Advantages of this proposal

No change required to the translators' current process
No additional application access/permissions required for translators
No user-visible UI/UX changes
Minimally invasive change to the codebase

Disadvantages

Translators will need to operate on the extracted .po files instead of translating directly in the Superset UI.
This issue can be mitigated using tools such as Transifex, which will host open-source project translations for free and supports machine-generated translations like Google Translate.
Translations are heavily context-dependent, and so omitting a UI means that translators are providing translations outside of the Superset environment (as they are for all other Superset UI application strings).
This issue will be mitigated by providing as much detail as possible in the .po comments generated for each term, so translators can understand how and where the terms are used.
Only a single translation can be provided for each term in each language.
For example, if we have a chart titled "Course Data" and a dashboard titled "Course Data", we can only translate "Course Data" once per language – the translation used will not know the context it was extracted from.
Translated field names will only be visible from the rendered superset-frontend, not in the data returned by the backend Superset APIs.
The Superset API is based on Flask AppBuilder's REST API, which supports translating labels, descriptions and filters using the _I_ querystring parameter. However translating the actual returned API data is not part of this feature. Addressing this issue requires a contribution to FAB. Details are outside of the scope of this proposal, but I'd love to discuss it if such a contribution would be welcome.

The text was updated successfully, but these errors were encountered:

Yisel-Ulloa · 2025-02-07T17:14:47Z

Good morning @rusackas. I am interested in this solution and would like to be informed about the topics discussed regarding it.

pomegranited added the sip Superset Improvement Proposal label Feb 5, 2025

dosubot bot added change:backend Requires changing the backend change:frontend Requires changing the frontend i18n Namespace | Anything related to localization labels Feb 5, 2025

rusackas added this to SIPs (Superset Improvement Proposals) Feb 6, 2025

rusackas moved this to Pre-discussion in SIPs (Superset Improvement Proposals) Feb 6, 2025

rusackas changed the title ~~[SIP] Translating Superset asset data using custom flask-babel extraction methods~~ [SIP-153] Translating Superset asset data using custom flask-babel extraction methods Feb 6, 2025

bmtcril mentioned this issue Feb 7, 2025

Implement translations for Superset assets openedx/tutor-contrib-aspects#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

pomegranited commented Feb 5, 2025 •

edited by rusackas

Loading

Yisel-Ulloa commented Feb 7, 2025

[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

Comments

pomegranited commented Feb 5, 2025 • edited by rusackas Loading

[SIP-153] Proposal for translating Superset asset data using custom flask-babel extraction methods

Motivation

Proposed Change

Process

Configuration: SUPERSET_TRANSLATE_ASSETS

Backend: extract asset field values

Frontend: use translations

Appendix: Translatable asset fields

Dashboard fields

Chart fields

Dataset fields

New or Changed Public Interfaces

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

Advantages of this proposal

Disadvantages

Yisel-Ulloa commented Feb 7, 2025

pomegranited commented Feb 5, 2025 •

edited by rusackas

Loading