Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139

Open
pomegranited opened this issue Feb 5, 2025 · 1 comment
Labels
change:backend Requires changing the backend change:frontend Requires changing the frontend i18n Namespace | Anything related to localization sip Superset Improvement Proposal

Comments

@pomegranited
Copy link
Contributor

pomegranited commented Feb 5, 2025

[SIP-153] Proposal for translating Superset asset data using custom flask-babel extraction methods

Motivation

Superset provides translation support for built-in components in the UI. However, the Open edX project also needs the user-provided terms used in the assets themselves to be translatable, e.g. dashboard and chart title, axes labels, and metric labels.

We also need these asset translations to be easily maintained between upgrades of Superset, and to be re-deployable when translations are updated.

Proposed Change

Superset uses flask-babel to extract and compile translations marked by the backend and frontend files. We propose adding new custom extraction methods to pull out asset field values. These methods would be disabled by default, and enabled using a new feature flag, SUPERSET_TRANSLATE_ASSETS.

Superset provides the backend translations to the frontend via the template bootstrap data's language_pack field. Once user-provided values are translated, they are available to any frontend component strings wrapped in @superset-ui.core.t().

To make version control and conflict management easier, we propose splitting the asset messages into separate files from the upstream-maintained application translations used by Superset UI. The asset translations will be concatenated with the application message files before being compiled. Thus, these asset message files can be maintained per instance on a Superset fork. (Note: compiled message .mo/.json files are generated, so are not version-controlled.)

Process

  1. Configure the SUPERSET_TRANSLATE_ASSETS feature setting.
  2. Run babel_update.sh to extract application and asset translations to (versioned) .po files.
  3. Translators will manually update the .po files, or use a tool like Transifex to provide translations in the desired languages.
  4. Run babel_update.sh –compile to concatenate and compile the (unversioned) message files consumed by the app.

This process could be optionally run during the Docker image build (when BUILD_TRANSLATIONS=true), or via the command line on a deployed instance. Superset may need to be restarted to apply changes to translations on a running instance.

Configuration: SUPERSET_TRANSLATE_ASSETS

If the SUPERSET_TRANSLATE_ASSETS feature flag is not found in settings, or is Falsey, the custom extraction methods will exit immediately, so that users who do not need this feature are not burdened with the overhead of extracting asset translations.

This feature flag could be a simple on/off boolean, or a more complex structure to include/exclude specific assets, depending on community feedback.

Backend: extract asset field values

Create custom asset extraction methods under superset.translations.utils which:

  • iterate over the assets using the Superset data APIs
  • pull out each translatable asset field (see below) value as the "message"
  • yield a tuple for each translatable field in the asset, containing:
    • lineno: (generate something reasonable here)
    • funcname: e.g asset_<type>_<field_name>_<uuid>
    • message: value of asset field
    • comments: generate a comment for translators to describe which asset this field is from, and where this field is used

Frontend: use translations

Superset provides translations to the frontend via the template bootstrap data's language_pack field. Once user-provided values are translated, they are available to the frontend components via @superset-ui.core.t(), e.g

import { t } from '@superset-ui/core';

    // Where the dashboard_title variable is shown to user:
    {t(dashboard_title)}

Appendix: Translatable asset fields

We've identified the following asset fields as needing translations.

Dashboard fields
  • dashboard_title
  • description
  • metadata.native_filter_configuration.name
  • metadata.native_filter_configuration.description
  • position.*.meta.text
  • position.*.meta.code
  • position.*.meta.sliceNameOverride

Notes:

  • position.*.meta fields denote positional elements in the Dashboard, e.g charts, headings, and markdown text.
Chart fields
  • slice_name
  • description
  • params.x_axis_label
  • params.y_axis_label
  • params.groupby.label
Dataset fields
  • metrics.verbose_name
  • columns.verbose_name

New or Changed Public Interfaces

We propose updating the command-line script babel_update.sh to:

  • Preserve messages.pot / messages.po files for application messages.
  • Generate assets.po files for asset messages.
    Upstream versions will be empty; forks can maintain their own versions.
  • Add a compile argument which:

The new custom extraction methods will be registered as setuptools entry_points in setup.py, e.g:

"babel.extractors": [
    "superset_dashboards = superset.translations.utils:extract_dashboards",
    "superset_charts = superset.translations.utils:extract_charts",
    "superset_datasets = superset.translations.utils:extract_datasets",
]

New dependencies

None

Migration Plan and Compatibility

No database migrations or user-facing changes are required to support this change.

Rejected Alternatives

SIP-60 proposed adding extra fields to each chart to store the translated text for each user-facing field, and a custom component for locating and displaying the translated field value. This approach was contested for its UI complexity, and because it requires translators to have chart-level edit access.

One respondent to SIP-60 also suggested creating a dedicated database table for i18n which could have its own UI and permissions granted for translators. Though this solution is simpler from a data perspective, it does not simplify the UI changes required to utilize these translations. It also would distance the translator from the context in which each translated term is used, making it difficult for translators to provide appropriate translations.

Open edX currently works around this issue by providing translated copies of master charts and dashboards for each supported language. This workaround mirrors Tableau 's suggested solution, however, Open edX has found this approach to be difficult to maintain, especially in an environment where operators may provide their own custom charts and dashboards.

Advantages of this proposal

  • No change required to the translators' current process
  • No additional application access/permissions required for translators
  • No user-visible UI/UX changes
  • Minimally invasive change to the codebase

Disadvantages

  • Translators will need to operate on the extracted .po files instead of translating directly in the Superset UI.
    This issue can be mitigated using tools such as Transifex, which will host open-source project translations for free and supports machine-generated translations like Google Translate.
  • Translations are heavily context-dependent, and so omitting a UI means that translators are providing translations outside of the Superset environment (as they are for all other Superset UI application strings).
    This issue will be mitigated by providing as much detail as possible in the .po comments generated for each term, so translators can understand how and where the terms are used.
  • Only a single translation can be provided for each term in each language.
    For example, if we have a chart titled "Course Data" and a dashboard titled "Course Data", we can only translate "Course Data" once per language – the translation used will not know the context it was extracted from.
  • Translated field names will only be visible from the rendered superset-frontend, not in the data returned by the backend Superset APIs.
    The Superset API is based on Flask AppBuilder's REST API, which supports translating labels, descriptions and filters using the _I_ querystring parameter. However translating the actual returned API data is not part of this feature. Addressing this issue requires a contribution to FAB. Details are outside of the scope of this proposal, but I'd love to discuss it if such a contribution would be welcome.
@pomegranited pomegranited added the sip Superset Improvement Proposal label Feb 5, 2025
@dosubot dosubot bot added change:backend Requires changing the backend change:frontend Requires changing the frontend i18n Namespace | Anything related to localization labels Feb 5, 2025
@rusackas rusackas changed the title [SIP] Translating Superset asset data using custom flask-babel extraction methods [SIP-153] Translating Superset asset data using custom flask-babel extraction methods Feb 6, 2025
@Yisel-Ulloa
Copy link

Good morning @rusackas. I am interested in this solution and would like to be informed about the topics discussed regarding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:backend Requires changing the backend change:frontend Requires changing the frontend i18n Namespace | Anything related to localization sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

2 participants