Skip to content

Charabia v0.9.0

Compare
Choose a tag to compare
@meili-bot meili-bot released this 25 Jul 13:52
· 64 commits to refs/heads/main since this release
9854134

Changes

(BREAKING) Simplify lang detection (#299) @ManyTheFish

  • The Language allow_list change from a HashMap<Script, Vec<Language>> to a slice of Language: &[Language].
  • Add the tokenize_with_allow_list method to the Tokenizer, allowing to dynamically pass a Language allow list without having to re-build the tokenizer.

Add math symbols to default separators (#301) @phillitrOSU

Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.

Thanks again to @ManyTheFish, @meili-bors[bot], and @phillitrOSU! 🎉