🧠 Turn Websites into Powerful Chatbots with LangChain And Chroma

🏛️ What is the Architecture of a Large Language Model (LLM)?

New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks

КАК Я ЖИВУ БЕЗ ДЕВУШКИ!

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД

💡 There is a SMARTER way to split your documents for GenAI apps

Bitswired

Переглядів 1 487

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 чер 2024
Learn semantic splitting in this hands-on tutorial to improve your language model's performance on document processing tasks.
We dive into a practical Python implementation for finding optimal segmentation points by meaning, essential for retrieval-augmented generation.
Code along with me following the GitHub-hosted notebook and elevate your app's efficiency with this smart splitting strategy.
GitHub Repo: github.com/bitswired/semantic...
🌐 Visit my blog at: www.bitswired.com
📩 Subscribe to the newsletter: newsletter.bitswired.com/
🔗 Socials:
LinkedIn: / jimi-vaubien
Twitter: / bitswired
Instagram: / bitswired
TikTok: / bitswired
00:00 Why Do We Split Documents?
02:02 Semantic Splitting: The Theory
05:06 Semantic Splitting: The Practice
11:28 Takeaways
Наука та технологія

КОМЕНТАРІ • 15

@HassanAllaham 27 днів тому ⁺¹
This is one of the most powerful videos related to AI I ever seen. Very clear, very informative, and very useful. Thanks for the good content 🌹🌹🌹
@bitswired 27 днів тому ⁺¹
Thank you very much for your kind words!
It means a lot to hear that the video had such a positive impact on you and it makes all the effort worth it.
Thanks again for watching and for taking the time to leave such a thoughtful comment 👍🏽
@natevaub Місяць тому ⁺²
Great video bro, keep going with these fire topics!
@bitswired Місяць тому
Thanks frero 💪🏽
Let’s gooooo!
Let’s make it work and play Elden Ring soon ahah
@cyberpunkdarren 7 днів тому ⁺¹
Once all the vectors are loaded into the vector database the text splitting no longer matters. As long as you dont split on a compound word or phrase it doesnt really affect the vectorspace.
@bitswired 6 днів тому
Hey :)
I see your point but I would say that in practice it’s not the case.
For instance if you embed an entire page versus multiple smaller paragraphs the resulting vectors will be different even though you’ve indexed the same text.
And it affects the similarity search.
That’s why pyramidal embeddings are a way to improve RAG performance by indexing the data at different precision levels and using multiple index to answer queries.
@vogendo7377 Місяць тому ⁺²
Very interesting
@bitswired Місяць тому
Thanks big boss ❤️
@mariegautier3765 Місяць тому ⁺¹
Love it ❤ You know how to transmit your passion, congrats 😍🦍🔥
@bitswired Місяць тому ⁺¹
Merci Bella ❤️🦍🐆
EKIP au max!
@oryxchannel Місяць тому ⁺¹
Good presentation but I do not understand how it's different from document AI's that can do this automatically. Why do this manually?
@bitswired Місяць тому ⁺²
Hey :)
You’re right there are libraries that does it for you.
However the purpose of the video was to understand how it works in depth, to do so I proposed a simple implementation from scratch.
The goal was to help people grasp the concept.
I hope you still enjoyed the video 😁
@MichaelScharf 4 дні тому
Grat Video! But totally annoying music
@MichaelScharf 4 дні тому
It makes is hard to understand you and it distracts from your great work
@MichaelScharf 4 дні тому
If your video content would not be so great, I would have stopped watching

Наступне

Автоматичне відтворення

🧠 Turn Websites into Powerful Chatbots with LangChain And Chroma

🧠 Turn Websites into Powerful Chatbots with LangChain And Chroma

🏛️ What is the Architecture of a Large Language Model (LLM)?

🏛️ What is the Architecture of a Large Language Model (LLM)?

New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks

New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks

КАК Я ЖИВУ БЕЗ ДЕВУШКИ!

КАК Я ЖИВУ БЕЗ ДЕВУШКИ!

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

ROCK PAPER SCISSOR! (50 MLN CHALLENGE!) feat @PANDAGIRLOFFICIAL #shorts

Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД

Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

🦾 The Cleanest way to write GenAI applications (it's NOT Langchain)

🦾 The Cleanest way to write GenAI applications (it's NOT Langchain)

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Finetuning Open-Source LLMs

Finetuning Open-Source LLMs

AIs learn to WALK

AIs learn to WALK

Unlimited AI Agents running locally with Ollama & AnythingLLM

Unlimited AI Agents running locally with Ollama & AnythingLLM

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

The spelled-out intro to language modeling: building makemore

The spelled-out intro to language modeling: building makemore

I Made PONG in ASSEMBLY

I Made PONG in ASSEMBLY

Самый мощный игровой ПК на LGA 1366 тест в играх

Самый мощный игровой ПК на LGA 1366 тест в играх

iPhone 15 Pro vs Samsung s24🤣 #shorts

iPhone 15 Pro vs Samsung s24🤣 #shorts

Самый выгодный ИГРОВОЙ КОМПЬЮТЕР с Wildberries

Самый выгодный ИГРОВОЙ КОМПЬЮТЕР с Wildberries

Mac Mini Собираем из двух один!

Mac Mini Собираем из двух один!

TOP-18 ФИШЕК iOS 18

TOP-18 ФИШЕК iOS 18

My DREAM Everyday Tech!

My DREAM Everyday Tech!

Technics 1500 Японская легенда барахлит

Technics 1500 Японская легенда барахлит

Дени против умной колонки😁

Дени против умной колонки😁