Insanely Fast LLAMA-3 on Groq Playground and API for FREE

Поділитися
Вставка
  • Опубліковано 3 тра 2024
  • Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Groq API in your own applications.
    🦾 Discord: / discord
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Patreon: / promptengineering
    💼Consulting: calendly.com/engineerprompt/c...
    📧 Business Contact: engineerprompt@gmail.com
    Become Member: tinyurl.com/y5h28s6h
    💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
    Signup for Advanced RAG:
    tally.so/r/3y9bb0
    LINKS:
    Notebook: tinyurl.com/57yhf26h
    Groq API: groq.com/
    TIMESTAMPS:
    [00:00] Getting Started with Llama 3 on Grok Cloud
    [01:49] LLAMA-3 ON Playground
    [03:03] Integrating Llama 3 into Your Applications with Grok API
    [05:40] Advanced API Features: System Messages and Streaming
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...
  • Наука та технологія

КОМЕНТАРІ • 41

  • @nac341
    @nac341 13 днів тому +16

    "we don't care about the responses, we only care about the speed". I can give you an even faster API that just returns random words :)

    • @Tofu3435
      @Tofu3435 13 днів тому

      Cool, a fast passphrase generator 😂

    • @unclecode
      @unclecode 13 днів тому

      Actually author has a point. Picture a scenario with multiple agents working together on a super complex task. You might not even care about their responses or understand their complicated talk, but all you really want is to have 800 tokens per second to handle the task in just a few seconds. At that point, the final response is all that matters. Although I wish that random word generator API or "Infinite monkey theorem" was enough to solve world complex problems 😅

    • @RickySupriyadi
      @RickySupriyadi 13 днів тому

      @@unclecode actually human do that. there was a time when our team as human work together, we must be work ridiculously fast then we all came up with stupid yet efficient way to communicate...

    • @syeshwanth6790
      @syeshwanth6790 12 днів тому +1

      He means he is not going to test the accuracy of the model in this video.
      He is demonstrating how fast the api is.
      There are other videos or articles where performance of these models have been evaluated.

    • @unclecode
      @unclecode 12 днів тому

      @@RickySupriyadi haha u right, and It's not surprising that we're inclined to use our own human collaboration methods to design multi-agent systems. There's a desire to make AI resemble us.

  • @Echoff84
    @Echoff84 13 днів тому

    Oh my! And it even has function calling too. Looking forward to Whisper integration.

    • @engineerprompt
      @engineerprompt  13 днів тому +1

      Yeah, whisper will be awesome. Need to try their function calling with llama3

  • @hanlopi
    @hanlopi 12 днів тому

    very nice explained

  • @starblaiz1986
    @starblaiz1986 11 днів тому

    Bro this is WILD! Just imagine combining this with agent frameworks like Crew AI! 😮

  • @NLPprompter
    @NLPprompter 13 днів тому +3

    i watch this with 2x playback speed its generations speed become like a dream come true

    • @Yusef-uh4wl
      @Yusef-uh4wl 8 днів тому

      Try 4x speed you will reach agi

    • @NLPprompter
      @NLPprompter 8 днів тому

      @@Yusef-uh4wl what? LOL

  • @TzaraDuchamp
    @TzaraDuchamp 13 днів тому +4

    Wow, that’s fast. Are you going to test this with function calling in an agentic workflow?

    • @TheReferrer72
      @TheReferrer72 13 днів тому

      I get 50 tokens per second for the 8b on a 3090 at home. Its a nice model.

  • @huyvo9105
    @huyvo9105 8 днів тому +1

    Sometimes it is limited, how to handle it?

  • @zhonwarmon
    @zhonwarmon 13 днів тому

    Cant wait for local models

    • @TheReferrer72
      @TheReferrer72 13 днів тому +3

      They have been around since Thursday.

    • @looseman
      @looseman 13 днів тому +3

      70b is fine for local run.

  • @NicolasEmbleton
    @NicolasEmbleton 13 днів тому +2

    Do we know how aggressively they quantize? I heard the quantization was pretty aggressive and as an outcome the models aren't "as good" as verbatim. If true, it's a reasonable tradeoff but we just need to know for sure so we can make informed decisions.

    • @Cingku
      @Cingku 13 днів тому +3

      Yes I tested it for one of my complex calculation prompt and the one in the Groq (llama 70 billion) is really bad and answer wrongly always...but if I use the one in huggingchat, it will give perfect answer every time! So quantization really decrease the performance drastically and it doesn't matter if it fast when it gives the wrong answer.

    • @NicolasEmbleton
      @NicolasEmbleton 13 днів тому +2

      @@Cingku I had fairly similar outcomes in my tests and stopped using Mistral / Mixtral back then. Maybe the free version target audience is just people testing and that would make sense. But it did not convince me to use the service. I'll give it another paid attempt see if it's any better.

  • @mirek190
    @mirek190 13 днів тому

    Why did you set for only 1024 tokens?

  • @CharlesOkwuagwu
    @CharlesOkwuagwu 13 днів тому

    Please can you show us end to end fine-tuning llama3 on custom dataset

    • @engineerprompt
      @engineerprompt  13 днів тому +1

      Check the previous video on the channel. Will be making more on fine-tuning.

  • @MrN00N3_
    @MrN00N3_ 9 днів тому

    Can you run Groq locally?

  • @unclecode
    @unclecode 13 днів тому +1

    Do u agree, Groq feels way better when u set "stream=False" :)) When you understand "stream" was a way to hide a weakness.

    • @engineerprompt
      @engineerprompt  13 днів тому +1

      I totally agree. Streaming make it worse for Groq but others used it to show they are faster than they actually are :)

  • @snehitvaddi
    @snehitvaddi 13 днів тому

    Llama3 can generate images as well right? Can I use this API to generate images?
    If so, could please make a tutorial on that or atleast a short? (BTW, subscribed to see an update on that)

    • @engineerprompt
      @engineerprompt  13 днів тому +1

      there is another model on meta.ai which can generate images. Its not part of llama3. I am not sure if its available via api. Will check it out and update on the channel.

    • @snehitvaddi
      @snehitvaddi 12 днів тому

      @@engineerprompt also, if you don't mind please leave an update as reply to this if you found any update on that

  • @Warung-AI-Channel
    @Warung-AI-Channel 8 днів тому

    We just built Llama3 #RAG powered by groq and it's extremely fast 😮

  • @greendsnow
    @greendsnow 11 днів тому +2

    Wait a second, that's extremely cheap

  • @nexuslux
    @nexuslux 13 днів тому

    Notebook link doesn’t work

    • @engineerprompt
      @engineerprompt  13 днів тому

      Can you check again, seems to be working on my end.

  • @abdelhameedhamdy
    @abdelhameedhamdy 13 днів тому

    I did not understand the difference between system and user roles !

    • @engineerprompt
      @engineerprompt  13 днів тому +1

      system role defines the behavior of the model. Think about that as a global instruction that will control the behavior model. "user" role is the actual input from the user. Hope that helps.

  • @namecUI
    @namecUI 13 днів тому +1

    You said for free ?! How this is possible ?

    • @wwkk4964
      @wwkk4964 13 днів тому +2

      Groq has too many LPUs that's why

    • @InsightCrypto
      @InsightCrypto 13 днів тому

      @@wwkk4964 its not free groq has clear pricing for models

  • @InsightCrypto
    @InsightCrypto 13 днів тому +2

    so fucked up that you wrote free