💡 There is a SMARTER way to split your documents for GenAI apps
Вставка
- Опубліковано 12 чер 2024
- Learn semantic splitting in this hands-on tutorial to improve your language model's performance on document processing tasks.
We dive into a practical Python implementation for finding optimal segmentation points by meaning, essential for retrieval-augmented generation.
Code along with me following the GitHub-hosted notebook and elevate your app's efficiency with this smart splitting strategy.
GitHub Repo: github.com/bitswired/semantic...
🌐 Visit my blog at: www.bitswired.com
📩 Subscribe to the newsletter: newsletter.bitswired.com/
🔗 Socials:
LinkedIn: / jimi-vaubien
Twitter: / bitswired
Instagram: / bitswired
TikTok: / bitswired
00:00 Why Do We Split Documents?
02:02 Semantic Splitting: The Theory
05:06 Semantic Splitting: The Practice
11:28 Takeaways - Наука та технологія
This is one of the most powerful videos related to AI I ever seen. Very clear, very informative, and very useful. Thanks for the good content 🌹🌹🌹
Thank you very much for your kind words!
It means a lot to hear that the video had such a positive impact on you and it makes all the effort worth it.
Thanks again for watching and for taking the time to leave such a thoughtful comment 👍🏽
Great video bro, keep going with these fire topics!
Thanks frero 💪🏽
Let’s gooooo!
Let’s make it work and play Elden Ring soon ahah
Once all the vectors are loaded into the vector database the text splitting no longer matters. As long as you dont split on a compound word or phrase it doesnt really affect the vectorspace.
Hey :)
I see your point but I would say that in practice it’s not the case.
For instance if you embed an entire page versus multiple smaller paragraphs the resulting vectors will be different even though you’ve indexed the same text.
And it affects the similarity search.
That’s why pyramidal embeddings are a way to improve RAG performance by indexing the data at different precision levels and using multiple index to answer queries.
Very interesting
Thanks big boss ❤️
Love it ❤ You know how to transmit your passion, congrats 😍🦍🔥
Merci Bella ❤️🦍🐆
EKIP au max!
Good presentation but I do not understand how it's different from document AI's that can do this automatically. Why do this manually?
Hey :)
You’re right there are libraries that does it for you.
However the purpose of the video was to understand how it works in depth, to do so I proposed a simple implementation from scratch.
The goal was to help people grasp the concept.
I hope you still enjoyed the video 😁
Grat Video! But totally annoying music
It makes is hard to understand you and it distracts from your great work
If your video content would not be so great, I would have stopped watching