Releases: neurlang/goruut
v0.6.3
🔧 Build & Infrastructure
-
Golang version bumped in:
go-ossf-slsa3-publish.ymlgo.yml
-
Integrated upstream
num2wordsandclassifierlibraries.
🌐 Language Support & Models
-
Hebrew3:
- New phonemizer model, homograph model, and packed dictionary.
- Added learned patterns.
-
English:
- Introduced homograph model.
-
German:
- Added new dictionary.
-
Minnan:
- Added new 2-series models.
-
Added new languages (unspecified which, beyond those above).
🧠 Core Features
-
Integrated a sentencizer.
-
Enabled loading of
abbr.tsvinto dictionaries. -
Finalized model for homograph result marking.
-
Parallelized homograph test.
-
Homograph-related improvements and documentation updates:
HOMOGRAPH.md,ROADMAP.md,README.md, etc.
🛠️ Other Changes
- Added zip model loader.
- Improved logging: now logs errors explicitly.
- Updated
quaternarydependency to v0.2.0. - Added and updated dev contribution docs:
CONTRIBUTING.md,DEVELOPING.md.
Full Changelog: v0.6.2...v0.6.3
v0.6.2
- repacked all dictionaries for forward / reverse phonemization (new patch version)
- fix bug: Words with identical IPA as Text were vanishing
- New numToWords languages:
- Czech
- German
- Spanish
- French
- Hungarian
- Polish
- Russian
- Slovak
- Ukrainian
Full Changelog: v0.6.1...v0.6.2
v0.6.1
- fix compilation on 386
Full Changelog: v0.6.0...v0.6.1
v0.6.0
-
retrained all models for forward / reverse phonemization (breaking change, new minor version)
-
new transformer based models for homograph word inference (english)
-
working study initiation from
null,{}map in language.json -
Handle numerics in Arabic and English
-
Add groups of same diacritics to hebrew2
-
analysis2 normalization/diacritics sorting
-
Fixed -overwrite in homograph scripts
-
Split on hanzi/nonhanzi boundary
-
don't crash if language dict not found in phonemization steps
-
User interface
- toucan TTS integration
- word editor
-
New Languages:
- cantonese
- minnan/taiwanese
- minnan/hokkien
What's Changed
- add build.sh to cmd by @thewh1teagle in #27
- rename dirty to lexicon by @thewh1teagle in #26
- format all go files by @thewh1teagle in #29
- standard format and dev recommendation in docs by @thewh1teagle in #30
- add batchsize flag and remove append flag from learn.tsv by @thewh1teagle in #36
Full Changelog: v0.5.1...v0.6.0
v0.5.1
- retrained 49 new models for forward / reverse phonemization (non-breaking change, new patch version)
- enable normalization for Vietnamese
- fix normalization non working
- split hyphen words
- remove old analysis aligner
- fix cleaning bug and do final clean
- backend: add explain world feature
- Starting with empty/null json.Map using short keywords
- fix: resolve infinite loop and improve alignment logic with Efficient Memoization for padspace languages
- Add new rule SrcDuplicate to language.json
- Clear combobox on first partial search click
- Don't overwrite model when training
- Finetune italian/tamil using --rowlossimportance 6
What's Changed
- update gitignore by @thewh1teagle in #17
- add hebrew2 folder by @thewh1teagle in #13
New Contributors
- @thewh1teagle made their first contribution in #17
Full Changelog: v0.5.0...v0.5.1
v0.5.0
- retrained all 87 models for forward / reverse phonemization (breaking change, new minor version)
- new transformer based models for out-of-dictionary word inference
- feature: punctuation preservation: toggle preserve/hide punctuation mode
- feature: multiple language dictionaries for use in the same sentence based on user preference
User interface
- out of dictionary words in underlined
- preserve punctuation in bold
New Languages:
- english/american
- english/british
Thanks to kokoro for generously offering their data.
Full Changelog: v0.4.0...v0.5.0
v0.4.0
- retrained all 85 models for forward / reverse phonemization (breaking change, new minor version)
- new quaternary models (should be faster)
- threading racing fix
- dictionary processing error fix
Full Changelog: v0.3.0...v0.4.0
v0.3.0
- retrained all 85 models for forward / reverse phonemization (breaking change, new minor version)
- fixes regarding thread safety of goruut
Full Changelog: v0.2.4...v0.3.0
v0.2.4
- fix compilation on architectures without avx512
- reverse phonemization models for all provided languages (from IPA to the language)
- fix forward direction phonemization to work (load the model)
Full Changelog: v0.2.3...v0.2.4
v0.2.3
- fix compilation on architectures without avx512
- reverse phonemization models for all provided languages (from IPA to the language)
Full Changelog: v0.2.2...v0.2.3