-
Notifications
You must be signed in to change notification settings - Fork 195
Optimized version of sliding window for semantic chunking #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey nice! That looks slick. Thank you for sharing. I didn't explore a more sophisticated method (there are definitely other ways) because I was moving quick for the tutorial MVP. I'll keep this optimized method in mind for when I update the tutorial code |
one word ... amazing
Using a plain TXT file got:
change the script:
|
Thanks for bringing both of these up. This repo isn't actively maintained and won't be updated for a bit. Apologies but too many projects going on! If anyone really wants to help develop on it please contact me |
Hi Greg,
Thanks a lot for you work!
I want to share with more optimized version of your function
combine_sentences
from the tutorial about text splitting.Instead of this function:
We can use generators and Python standard libraries to generate windows more efficiently:
By the way, I found that splitting by punctuation symbols is not working in the tutorial if there are no spaces after punctuation symbol before the next sentence (because of regex
(?<=[.?!])\s+
), could do you please tell did you explore more sophisticated methods of text splitting by sentences, and the influence of these methods on overall quality of semantic chunking?Thank you.
The text was updated successfully, but these errors were encountered: