[SIP-153] Translating Superset asset data using custom flask-babel extraction methods #32139
Labels
change:backend
Requires changing the backend
change:frontend
Requires changing the frontend
i18n
Namespace | Anything related to localization
sip
Superset Improvement Proposal
[SIP-153] Proposal for translating Superset asset data using custom flask-babel extraction methods
Motivation
Superset provides translation support for built-in components in the UI. However, the Open edX project also needs the user-provided terms used in the assets themselves to be translatable, e.g. dashboard and chart title, axes labels, and metric labels.
We also need these asset translations to be easily maintained between upgrades of Superset, and to be re-deployable when translations are updated.
Proposed Change
Superset uses flask-babel to extract and compile translations marked by the backend and frontend files. We propose adding new custom extraction methods to pull out asset field values. These methods would be disabled by default, and enabled using a new feature flag,
SUPERSET_TRANSLATE_ASSETS
.Superset provides the backend translations to the frontend via the template bootstrap data's language_pack field. Once user-provided values are translated, they are available to any frontend component strings wrapped in
@superset-ui.core.t()
.To make version control and conflict management easier, we propose splitting the asset messages into separate files from the upstream-maintained application translations used by Superset UI. The asset translations will be concatenated with the application message files before being compiled. Thus, these asset message files can be maintained per instance on a Superset fork. (Note: compiled message
.mo/.json
files are generated, so are not version-controlled.)Process
SUPERSET_TRANSLATE_ASSETS
feature setting..po
files..po
files, or use a tool like Transifex to provide translations in the desired languages.babel_update.sh –compile
to concatenate and compile the (unversioned) message files consumed by the app.This process could be optionally run during the Docker image build (when BUILD_TRANSLATIONS=true), or via the command line on a deployed instance. Superset may need to be restarted to apply changes to translations on a running instance.
Configuration: SUPERSET_TRANSLATE_ASSETS
If the
SUPERSET_TRANSLATE_ASSETS
feature flag is not found in settings, or is Falsey, the custom extraction methods will exit immediately, so that users who do not need this feature are not burdened with the overhead of extracting asset translations.This feature flag could be a simple on/off boolean, or a more complex structure to include/exclude specific assets, depending on community feedback.
Backend: extract asset field values
Create custom asset extraction methods under superset.translations.utils which:
asset_<type>_<field_name>_<uuid>
Frontend: use translations
Superset provides translations to the frontend via the template bootstrap data's
language_pack
field. Once user-provided values are translated, they are available to the frontend components via@superset-ui.core.t()
, e.gAppendix: Translatable asset fields
We've identified the following asset fields as needing translations.
Dashboard fields
dashboard_title
description
metadata.native_filter_configuration.name
metadata.native_filter_configuration.description
position.*.meta.text
position.*.meta.code
position.*.meta.sliceNameOverride
Notes:
position.*.meta
fields denote positional elements in the Dashboard, e.g charts, headings, and markdown text.Chart fields
slice_name
description
params.x_axis_label
params.y_axis_label
params.groupby.label
Dataset fields
metrics.verbose_name
columns.verbose_name
New or Changed Public Interfaces
We propose updating the command-line script babel_update.sh to:
assets.po
files for asset messages.Upstream versions will be empty; forks can maintain their own versions.
compile
argument which:messages.po
+assets.po
to an git-ignored file,superset.po
.mo
using pybabel compile.json
files using npm run build-translationsThe new custom extraction methods will be registered as setuptools
entry_points
insetup.py
, e.g:New dependencies
None
Migration Plan and Compatibility
No database migrations or user-facing changes are required to support this change.
Rejected Alternatives
SIP-60 proposed adding extra fields to each chart to store the translated text for each user-facing field, and a custom component for locating and displaying the translated field value. This approach was contested for its UI complexity, and because it requires translators to have chart-level edit access.
One respondent to SIP-60 also suggested creating a dedicated database table for i18n which could have its own UI and permissions granted for translators. Though this solution is simpler from a data perspective, it does not simplify the UI changes required to utilize these translations. It also would distance the translator from the context in which each translated term is used, making it difficult for translators to provide appropriate translations.
Open edX currently works around this issue by providing translated copies of master charts and dashboards for each supported language. This workaround mirrors Tableau 's suggested solution, however, Open edX has found this approach to be difficult to maintain, especially in an environment where operators may provide their own custom charts and dashboards.
Advantages of this proposal
Disadvantages
.po
files instead of translating directly in the Superset UI.This issue can be mitigated using tools such as Transifex, which will host open-source project translations for free and supports machine-generated translations like Google Translate.
This issue will be mitigated by providing as much detail as possible in the
.po
comments generated for each term, so translators can understand how and where the terms are used.For example, if we have a chart titled "Course Data" and a dashboard titled "Course Data", we can only translate "Course Data" once per language – the translation used will not know the context it was extracted from.
The Superset API is based on Flask AppBuilder's REST API, which supports translating labels, descriptions and filters using the
_I_
querystring parameter. However translating the actual returned API data is not part of this feature. Addressing this issue requires a contribution to FAB. Details are outside of the scope of this proposal, but I'd love to discuss it if such a contribution would be welcome.The text was updated successfully, but these errors were encountered: