Charabia v0.9.0
·
64 commits
to refs/heads/main
since this release
Changes
(BREAKING) Simplify lang detection (#299) @ManyTheFish
- The Language
allow_list
change from aHashMap<Script, Vec<Language>>
to a slice ofLanguage
:&[Language].
- Add the
tokenize_with_allow_list
method to theTokenizer
, allowing to dynamically pass aLanguage
allow list without having to re-build the tokenizer.
Add math symbols to default separators (#301) @phillitrOSU
Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.
Thanks again to @ManyTheFish, @meili-bors[bot], and @phillitrOSU! 🎉