German Language support #584
Replies: 4 comments 5 replies
-
Agreed! It is a pity that searching for parts of compound words does not work. When searching for "Abo", documents containing "Jahresabo" should also be found. |
Beta Was this translation helpful? Give feedback.
-
Hello @sa- @ErikBrendel, Thank you for your interest in the project! |
Beta Was this translation helpful? Give feedback.
-
Hello! I have not worked tried any of these tools myself but I can point you in the direction of this collection of resources: https://github.com/adbar/German-NLP#linguistic-processing |
Beta Was this translation helpful? Give feedback.
-
Tantivy appears to have utilities to deal with compound words, although it relies on a (stemmed) dictionary: https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.SplitCompoundWords.html |
Beta Was this translation helpful? Give feedback.
-
Hey there, first of all thanks for building this!
Wanted to ask if there were any plans to include better tokenization support for German rather than just whitespace. German is a language that can stick multiple "tokens" together without a space.
e.g. when I type in "kochplatte", I would expect "induktionskochplatte" to be just as relevant as other results that only say "kochplatte"
Beta Was this translation helpful? Give feedback.
All reactions