Superfast RAG with Llama 3 and Groq
Вставка
- Опубліковано 7 лип 2024
- Groq API provides access to Language Processing Units (LPUs) that enable incredibly fast LLM inference. The service offers several LLMs including Meta's Llama 3. In this video, we'll implement a RAG pipeline using Llama 3 70B via Groq, an open source e5 encoder, and the Pinecone vector database.
📌 Code:
github.com/pinecone-io/exampl...
🌲 Subscribe for Latest Articles and Videos:
www.pinecone.io/newsletter-si...
👋🏼 AI Consulting:
aurelio.ai
👾 Discord:
/ discord
Twitter: / jamescalam
LinkedIn: / jamescalam
#artificialintelligence #llama3 #groq
00:00 Groq and Llama 3 for RAG
00:37 Llama 3 in Python
04:25 Initializing e5 for Embeddings
05:56 Using Pinecone for RAG
07:24 Why We Concatenate Title and Content
10:15 Testing RAG Retrieval Performance
11:28 Initialize connection to Groq API
12:24 Generating RAG Answers with Llama 3 70B
14:37 Final Points on Why Groq Matters - Наука та технологія
hi James, Microsoft just open-sourced their graphRAG technology stack, might be cool to take a look and see how we can leverage/combine them both.
Nice thing is that you can use groq with langchain as well
Yes very true
What are your thoughts on adding a short summary description of the document or paper in each chunk including the title?
it's a good idea - I haven't tried it before but seems sensible, you would need to find a balance between too much summary which might "overpower" the meaning of the chunk itself and getting enough summary in there to be useful - but if you get something good there it feels like a great idea imo
Groq is insanely fast
Yeah it’s wild
Is there any oss embedding model you'd recommend over e5 for real/prod use cases? I've just used openai so far
gte-base or bge-base are good in benchmarks, but gotta really test them on your use case. You should also fine-tune the embeddings with your use case data.
E5 have been good, I like Jina’s embedding model, and I’ve heard some good things about BAAI bge-m3 too for hybrid search
@@jamesbriggs maybe in some future video you could cover bge-m3 :)) this model sound pretty cool (especially dense/multi-vector/sparse retrieval)
You in Bali nice! I am looking for an online job mate. I'm pretty desperate at this point
You can tell? But yes, here for a while - just work on AI stuff, get yourself out there a bit, it does take time though
is this re-usable in such way that we can switch calling groq to call open ai gpt-4o or other models?
Yeah it’s pretty simple to swap them out, they use a similar (maybe even same) API