Superfast RAG with Llama 3 and Groq

Поділитися
Вставка
  • Опубліковано 7 лип 2024
  • Groq API provides access to Language Processing Units (LPUs) that enable incredibly fast LLM inference. The service offers several LLMs including Meta's Llama 3. In this video, we'll implement a RAG pipeline using Llama 3 70B via Groq, an open source e5 encoder, and the Pinecone vector database.
    📌 Code:
    github.com/pinecone-io/exampl...
    🌲 Subscribe for Latest Articles and Videos:
    www.pinecone.io/newsletter-si...
    👋🏼 AI Consulting:
    aurelio.ai
    👾 Discord:
    / discord
    Twitter: / jamescalam
    LinkedIn: / jamescalam
    #artificialintelligence #llama3 #groq
    00:00 Groq and Llama 3 for RAG
    00:37 Llama 3 in Python
    04:25 Initializing e5 for Embeddings
    05:56 Using Pinecone for RAG
    07:24 Why We Concatenate Title and Content
    10:15 Testing RAG Retrieval Performance
    11:28 Initialize connection to Groq API
    12:24 Generating RAG Answers with Llama 3 70B
    14:37 Final Points on Why Groq Matters
  • Наука та технологія

КОМЕНТАРІ • 15

  • @awakenwithoutcoffee
    @awakenwithoutcoffee 3 дні тому +1

    hi James, Microsoft just open-sourced their graphRAG technology stack, might be cool to take a look and see how we can leverage/combine them both.

  • @tiagoc9754
    @tiagoc9754 5 днів тому +1

    Nice thing is that you can use groq with langchain as well

  • @gilbertb99
    @gilbertb99 5 днів тому +1

    What are your thoughts on adding a short summary description of the document or paper in each chunk including the title?

    • @jamesbriggs
      @jamesbriggs  5 днів тому

      it's a good idea - I haven't tried it before but seems sensible, you would need to find a balance between too much summary which might "overpower" the meaning of the chunk itself and getting enough summary in there to be useful - but if you get something good there it feels like a great idea imo

  • @tiagoc9754
    @tiagoc9754 5 днів тому +1

    Groq is insanely fast

  • @tiagoc9754
    @tiagoc9754 5 днів тому +1

    Is there any oss embedding model you'd recommend over e5 for real/prod use cases? I've just used openai so far

    • @juanpablomesalopez
      @juanpablomesalopez 5 днів тому +1

      gte-base or bge-base are good in benchmarks, but gotta really test them on your use case. You should also fine-tune the embeddings with your use case data.

    • @jamesbriggs
      @jamesbriggs  5 днів тому +1

      E5 have been good, I like Jina’s embedding model, and I’ve heard some good things about BAAI bge-m3 too for hybrid search

    • @byczong
      @byczong 5 днів тому +1

      @@jamesbriggs maybe in some future video you could cover bge-m3 :)) this model sound pretty cool (especially dense/multi-vector/sparse retrieval)

  • @content_ai_
    @content_ai_ 5 днів тому +1

    You in Bali nice! I am looking for an online job mate. I'm pretty desperate at this point

    • @jamesbriggs
      @jamesbriggs  5 днів тому +1

      You can tell? But yes, here for a while - just work on AI stuff, get yourself out there a bit, it does take time though

  • @Davorge
    @Davorge 5 днів тому

    is this re-usable in such way that we can switch calling groq to call open ai gpt-4o or other models?

    • @jamesbriggs
      @jamesbriggs  5 днів тому

      Yeah it’s pretty simple to swap them out, they use a similar (maybe even same) API