Llama 3 70B Custom AI Agent: Better Than Perplexity AI?

Поділитися
Вставка
  • Опубліковано 30 вер 2024

КОМЕНТАРІ • 33

  • @jasonjefferson6596
    @jasonjefferson6596 4 місяці тому +3

    Looking for to the RAG series 🎉

  • @tonyppe
    @tonyppe 3 місяці тому +1

    you're the real mvp.. your videos are ace, you break down these other tools into low-level code.

  • @ademczuk
    @ademczuk 4 місяці тому +3

    Surely you can have an agent running opus or GPT4o perform the testing and generate a matrix for you against your expected answers

    • @EddieAdolf
      @EddieAdolf 4 місяці тому +2

      The point is it's local, more secure/ data not being sent out/used as training data AND you're not paying for it. ... AnnnD, you can expand it into your own workflows etc.

    • @tonyppe
      @tonyppe 3 місяці тому

      @@EddieAdolf he's the real MVP for sure.

  • @kalokali7711
    @kalokali7711 4 місяці тому +3

    Hi, I like the content - very informative; I have idea for next video and very good showcase : Agents automation (*selenium based) to log to any portal or interact with any page eg. ebay and looking for specific item (*book) but not via API, just tool based on selenium library or similar - that will set foot in the door of RPA ;)

  • @malikrumi1206
    @malikrumi1206 4 місяці тому +1

    Did you get a bigger background to accommodate a bigger llama model? 😆 Ok, let me get serious and actually watch this thing ...🤣.

  • @matten_zero
    @matten_zero 4 місяці тому +3

    There are whole teams that fund raised on being able to this very task.

    • @MrAhsan99
      @MrAhsan99 3 місяці тому +1

      and This guy gave them the run for money

    • @matten_zero
      @matten_zero 3 місяці тому +1

      @@MrAhsan99 it points to how overvalued some of these startups are

    • @MrAhsan99
      @MrAhsan99 3 місяці тому

      @@matten_zero absolutely

  • @JackieUUU
    @JackieUUU 4 місяці тому +3

    impressive work!

  • @trafferz
    @trafferz 4 місяці тому

    How's it to determine which city? You don't specify in the UK. North of London Birmingham is the largest city in the world?

    • @Data-Centric
      @Data-Centric  4 місяці тому

      I would hope it could figure it out. I'm going to try the same thing with the GPT-3.5-Turbo and 4o.

  • @matten_zero
    @matten_zero 4 місяці тому

    For context Perplexity has a $50+ M* valuation.

  • @javiergimenezmoya86
    @javiergimenezmoya86 4 місяці тому

    Contex windows shouldn't be broken. It should slide and do do summarizes at time.

  • @SnakeCaseGuy
    @SnakeCaseGuy 4 місяці тому

    Hi, I see you have been doing quite some LLMing and RAGging. I was just fiddling with it, and the problem is, sometimes, it generates like real garbage, like someone github pages which don't exists, or like a lines and lines of nothing (
    ) or, maybe starts repeating. How do you prevent or catch and stop generating? If you have some examples or some videos, it would be helpful

    • @Data-Centric
      @Data-Centric  4 місяці тому

      If you're using open source LLMs, you need to be aware of the prompt formats and stop tokens. I have a video on deploying a basic llama 3 chatbot that goes into a bit more detail.

  • @FelipeHoenen
    @FelipeHoenen 4 місяці тому

    You should add Snowflake Arctic to the comparison! Apparently its 128 experts are less prone to hallucinating

    • @Data-Centric
      @Data-Centric  4 місяці тому +1

      I was wondering how a MoE architecture might do. I have Mixtral coming up, but will consider Snowflake Arctic too!

  • @andynguyen8847
    @andynguyen8847 4 місяці тому

    What temperature is set to and what quantized version are you running? The free version on groq managed to get the final question right though struggle with Aruba one. Am sure with more tweaking, we can make llama 70b do quite well on these tasks

    • @Data-Centric
      @Data-Centric  4 місяці тому

      Temperature setting is 0 and model is the 16bit version. I think the Llama models are published in 16 bit though so completely unquantized.

  • @octadion3274
    @octadion3274 4 місяці тому

    What gpu are you used? I have followed your video before to deploy in runpod but i cannot connect to host 8000, or its just takes more time to start? Please let me know!

    • @octadion3274
      @octadion3274 4 місяці тому +1

      my bad, i jusr need to wait a litle bit

  • @madhudson1
    @madhudson1 4 місяці тому

    I haven't gone back and looked at your implementation of your agent workflow, but when you're talking about restrictions in context windows with your scraping. Are you using RAG with the large documents you're scraping?

    • @supercurioTube
      @supercurioTube 4 місяці тому

      From what I recall in the previous videos, the content is fed in full in the context, leaving the LLM to extract the information from the whole page.
      That's why some pages don't fit in the 8K tokens window.
      RAG would work better by chunking and retrieving only the relevant text from the whole page but it would also make the code of the project quite a bit more complex, unless relying on a framework.

    • @madhudson1
      @madhudson1 4 місяці тому +1

      @@supercurioTube aye, a RAG stage is needed then for large contexts or some form of hierarchical context summariser.
      RAG would be better though

    • @Data-Centric
      @Data-Centric  4 місяці тому +1

      Yes, this is pretty much it. Would probably add some latency too because you would have to create the embeddings for each webpage each time you did a new search.

    • @supercurioTube
      @supercurioTube 4 місяці тому

      @@Data-Centric good point about the latency.
      Ollama recently added the ability to keep several models loaded at the same time, which would help.
      Otherwise swapping between the models for embeddings and a 8b LLM would slow things down significantly.

    • @FutureFocused-lo1jn
      @FutureFocused-lo1jn 4 місяці тому

      @@Data-Centric aye, but you could check first to see if it's already indexed