Skip to content

Commit

Permalink
Add null byte as hard context separator
Browse files Browse the repository at this point in the history
This allows one to use \0 as artificial separator, for example when concatting lots of small strings into a large string. See this discussion for context: https://github.com/orgs/meilisearch/discussions/744
  • Loading branch information
LukasKalbertodt authored Jun 26, 2024
1 parent c983b9f commit 5637a91
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions charabia/src/separators.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ pub const DEFAULT_SEPARATORS: &[&str] = &[

#[rustfmt::skip]
pub const CONTEXT_SEPARATORS: &[&str] = &[
"\0", // Null byte, can be used as artificial separator
"᠆", // Mongolian Todo Soft Hyphen, mark the end of a paragraph.
"᚛", "᚜", // Oghams, mark start and end of text
"!", ". ", ", ", ";", "?", "¡", "§", "¶", "¿", ";", // Latin
Expand Down

0 comments on commit 5637a91

Please sign in to comment.