feat(i18n): add PT-BR multilingual keyword locale support#1725
feat(i18n): add PT-BR multilingual keyword locale support#1725antoniocarlos97ss wants to merge 2 commits into
Conversation
Adds a locales/ directory under scoring/keywords/ with full PT-BR
translations for all 9 keyword files plus a central index that handles
language detection and keyword merging.
Changes:
- packages/backend/src/scoring/keywords/locales/index.ts
Central locale registry: detectLanguage() using franc-min (< 1ms,
local, zero API calls) with MANIFEST_LOCALE env override fallback.
mergeComplexityKeywords() merges locale set into base English set
without mutating originals. English-only behavior fully preserved.
- locales/pt-BR/complexity.ts
PT-BR translations for all 14 scoring dimensions: formalLogic,
analyticalReasoning, codeGeneration, codeReview, technicalTerms,
simpleIndicators, multiStep, creative, imperativeVerbs, outputFormat,
agenticTasks, relay.
- locales/pt-BR/{calendar-management,data-analysis,email-management,
image-generation,social-media,trading,video-generation,web-browsing}.ts
PT-BR specificity keyword sets for all 8 task-type categories.
Architecture notes:
- Purely additive — no existing file modified
- English scoring unchanged when no locale match found
- New locales added by dropping files in locales/<lang>/ and
registering them in locales/index.ts
- franc-min is an optional peer dependency; scorer degrades gracefully
if not installed (falls back to English-only)
Closes mnfst#1724 (issue: Feature: Multilingual keyword support for prompt
scoring PT-BR + i18n)
Co-authored-by: Aurora (Hermes Agent) <aurora@hermes.local>
There was a problem hiding this comment.
3 issues found across 10 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/backend/src/scoring/keywords/locales/index.ts">
<violation number="1" location="packages/backend/src/scoring/keywords/locales/index.ts:91">
P1: `detectLanguage` returns ISO-639-3 codes from `franc-min` (e.g., `por`), but locale lookup expects `pt`/`pt-BR`, so PT-BR auto-detection never matches.</violation>
</file>
<file name="packages/backend/src/scoring/keywords/locales/pt-BR/complexity.ts">
<violation number="1" location="packages/backend/src/scoring/keywords/locales/pt-BR/complexity.ts:8">
P2: PT-BR complexity locale uses a permissive map type and omits canonical dimensions (`questionComplexity`, `domainSpecificity`), allowing incomplete locale coverage to compile silently.</violation>
</file>
<file name="packages/backend/src/scoring/keywords/locales/pt-BR/web-browsing.ts">
<violation number="1" location="packages/backend/src/scoring/keywords/locales/pt-BR/web-browsing.ts:23">
P2: PT-BR web-browsing keywords use accented forms without unaccented variants, but trie matching is only case-insensitive (not accent-insensitive), so common unaccented inputs can be missed.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
P1 (critical): franc-min returns ISO-639-3 codes (e.g. 'por') but
locale lookup expected BCP-47 ('pt'/'pt-BR'). Added ISO639_3_TO_BCP47
map in detectLanguage() to normalise before lookup — PT-BR
auto-detection now actually works.
P2a: PT-BR complexity locale typed as Record<string, string[]>,
silently allowing missing canonical dimensions. Changed to the new
strict ComplexityDimensions type (exported from locales/index.ts)
that lists all 14 dimensions explicitly. Added missing
questionComplexity and domainSpecificity arrays with PT-BR keywords.
P2b: web-browsing keywords included accented forms only (e.g.
'navegue', 'faça', 'página') but trie matching is case-insensitive
and NOT accent-insensitive. Added unaccented variants alongside
every accented keyword so inputs typed without accents still match.
No other files modified.
|
Thanks for the detailed review, @cubic-dev-ai! All three issues are fixed in commit P1 ✅ — ISO-639-3 → BCP-47 normalisation
const ISO639_3_TO_BCP47: Record<string, string> = {
por: 'pt',
spa: 'es',
fra: 'fr',
// ... 11 more common languages pre-mapped
};
// ...
return ISO639_3_TO_BCP47[raw] ?? raw;PT-BR auto-detection now actually resolves to the locale. P2a ✅ — Strict
|
@antoniocarlos97ss These fixes look excellent and make the multilingual support much more robust. The ISO-639-3 normalization (P1) was a critical catch—without that mapping, The move to the strict Finally, adding unaccented variants (P2b) to I don't see any other issues—this is ready to go. |
Summary
This PR implements multilingual keyword support for the Manifest prompt complexity scorer, starting with PT-BR (Brazilian Portuguese) as the first non-English locale.
Closes #1724
What changed
New files (purely additive — no existing code modified)
Architecture
locales/index.tsexposes three functions:detectLanguage(text)franc-min(< 1ms, local, zero API calls) to detect language. Falls back toMANIFEST_LOCALEenv var.getLocaleKeywords(lang)LocaleKeywordsobject for a given BCP-47 code, ornullif unsupported.mergeComplexityKeywords(base, locale)franc-minis an optional peer dependency — if not installed, detection silently falls back to English-only behavior.How to integrate into the scorer
The PR intentionally does not modify the scorer itself to allow maintainers to choose the integration point. The suggested minimal integration in
scan-messages.tsor the complexity scorer:Adding more languages
To add a new locale (e.g. Spanish):
locales/es/with the same 9 fileslocales/index.tsunderLOCALE_KEYWORDS['es']No scorer changes needed.
Testing
To manually verify PT-BR detection and scoring works:
MANIFEST_LOCALE=pt-BR # force PT-BR without needing franc-min installedNotes
franc-mingraceful degradation — works without the package installedMANIFEST_LOCALEenv override for power users / testinges,fr,deis ~50 lines eachSummary by cubic
Adds PT-BR multilingual keyword support to the prompt complexity scorer with auto language detection and safe fallback to English. Also fixes detection mapping, enforces strict typing across all 14 dimensions, and adds unaccented variants for better web-browsing matching.
New Features
franc-min, supportsMANIFEST_LOCALE; registersptandpt-BR).Bug Fixes
franc-minISO-639-3 outputs to BCP-47 for locale lookup; PT-BR auto-detection now works.ComplexityDimensionstyping; added missingquestionComplexityanddomainSpecificity.Written for commit 0cd0956. Summary will update on new commits.