Mixture of Agents TURBO Tutorial 🚀 Better Than GPT4o AND Fast?!

Matthew Berman

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 чер 2024
Here's how to use Mixture of Agents with Groq to achieve not only incredible quality output because of MoA, but to solve the latency issue using Groq.
Check out Groq for Free: www.groq.com
UPDATE: You don't need a valid OpenAI API key for this tutorial.
Be sure to check out Pinecone for all your Vector DB needs: www.pinecone.io/
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberma...
👉🏻 LinkedIn: / forward-future-ai
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
github.com/togethercomputer/MoA
Наука та технологія

КОМЕНТАРІ • 170

@matthew_berman 5 днів тому ⁺³²
Is this the ultimate unlock for open source models to compete with closed source?
@punk3900 5 днів тому ⁺²
Groq is amazing... I read their comments on Nvidia, and there is clearly a huge potential in changing the architecture for LLM ASICs. Yet, Nvidia would like to sell one chip to rule them all rather then butchering their sota universal chipch. Yet, my last thought is that Nvidia surely secretly works on a LLM specific chip and they will show it once the competition becomes real. Thanks Matt for sharing your findings.
@lucindalinde4198 5 днів тому ⁺¹
@matthew_berman
Great video
@jeffg4686 5 днів тому
And they're STILL not talking about socialism yet...
@dahahaka 5 днів тому
Hey, what is that display that you have in the background? The one showing different animations including snake?
@olafge 5 днів тому
I wonder how much the token count increase diminishes the cost efficiency against frontier models. Would be good to add tiktoken to the code.
@3dus 5 днів тому ⁺⁴⁰
This is a serious oportunity for Groq to just employ this transparently to the user. They could have a great competitor to frontier models.
@wurstelei1356 5 днів тому ⁺³
Yea, Groq should host this MoA from Mat. Would be great.
@MilesBellas 5 днів тому ⁺²⁹
Tshirt merchandise :
"I am going to revoke these API keys"
😂😅
@matthew_berman 5 днів тому ⁺⁴
Such a good idea!!!
@nikhil_jadhav 5 днів тому
@@matthew_berman Since I saw the keys exposed, I was just waiting for you to say these lines. Once you said it I felt relieved.
@punk3900 5 днів тому
@@matthew_berman It was kind of cruel to mention this the second time you showed those keys :D
@matthew_berman 5 днів тому ⁺¹
@@nikhil_jadhav Lol!! I'll mention it as soon as I show them next time ;)
@starmap 5 днів тому ⁺¹⁷
I love that open source local models are so powerful that they can compete with the giants.
@mikezooper 5 днів тому
Not really though. If you have ten small engines driving a car, that doesn’t mean one of those engines is impressive.
@StemLG 5 днів тому ⁺⁵
@@mikezooper sure, but you're missing the fact that those engines are free
@wurstelei1356 5 днів тому ⁺¹
@@mikezooper Also, OpenAI is using something similar to MoA.
@TheRealUsername 5 днів тому
It's just that the proprietary models have hundreds of billions of parameters, compared to open source models which are 3b-70b.
@omarhabib7411 4 дні тому ⁺¹
@@TheRealUsernamewhen is llama 3 400B coming out?
@nomad1220 5 днів тому ⁺⁵
Hey Matthew - Love your vids - they are tremendously informative.
@artnikpro 5 днів тому ⁺¹³
I wonder how good it will be with Claude 3.5 Sonnet + GPT4o + Gemini 1.5 Pro
@punk3900 5 днів тому
But Groq will not run it. It can run only open source models as they just provide the infrastructure
@user-ku6oq8cn6m 5 днів тому
@punk3900 It is true Groq will not run it. However, the MoA code already seems to let you run any cloud LLM with an OpenAI formatted endpoint. And there are solutions already available to turn most cloud LLMs into an OpenAI formatted endpoint (in the cases where one is not already provided). Personally, I don't care if it is really slow (much slower than Groq). I still want to try combining a mixture of the best models (including proprietary cloud LLMs) already out there.
@titusblair День тому
Great tutorial thanks so much Matthew!
@scitechtalktv9742 5 днів тому ⁺³
Great! I will certainly try this out
@wardehaj 5 днів тому ⁺¹
Awesome instructions video. Thanks a lot!
@manuelbradovent3562 3 дні тому
Great video Matthew! Very useful. Previously I had problems using crewai and max tokens with groq and will have to check how to resolve it.
@jackflash6377 5 днів тому ⁺²
Thanks for the clear and concise instructions.
Worked flawlessly first time.
Now we just need a UI to work with it, including an artifacts window of course
@vikastripathiindia 3 дні тому
Thank you. This is brilliant!
@AhmedMagdy-ly3ng 5 днів тому
Wow 🤯
I really appreciate your work, keep going pro ❤
@punk3900 5 днів тому ⁺⁴
I wonder how asking the same model multiple times with different temperatures might work. For instance, you ask the hot head, you ask the cold head, and a medium head integrates this. I think most LLM models would hate it this way, bu it's clearly the future that we will have a unified frameworks with several LLM's asked several time and some model integrating these answers. No single model can compensate for integrating data from several sources.
@Dmitri_Schrama 5 днів тому ⁺³
Sir, you are a legend.
@MrMoonsilver 5 днів тому ⁺¹²
Hey Matt, remember GPT-Pilot? Apart from revisiting the repo (it's done amazingly well) there is an interesting use case in relation to MoE. Remember how GPT-Pilot calls an API for its requests? Wouldn't it be interesting to see how it performs if it were to call a MoE "API"? It would require to expose the MoE as an API, but it would be very interesting to see as it would enable developers to piece together much cheaper models to achieve great outcomes, the likes of 3.5 sonnet.
@matthew_berman 5 днів тому ⁺²
Good idea. I’ve seen CrewAI powering a coding AI project which looked interesting!
@14supersonic 5 днів тому ⁺¹
This would be perfect for agentic wrappers like Pythagora, Devin, Open-Devin. Paying for those expensive API's for the frontier models is not always the best option for most end-users. Especially when you're working with lots of experimental data. This could be something that could be relatively simple to implement too.
@MrMoonsilver 5 днів тому ⁺¹
@@14supersonic it might even be more accurate than the standard APIs
@wurstelei1356 5 днів тому ⁺¹
@@MrMoonsilver Plus the privacy is much higher.
@frinkfronk9198 4 дні тому
@@wurstelei1356privacy def better as long as it's fully local. running groq is still cloud. not that you were suggesting otherwise. I just mean generally to the conversation people should be aware
@MrBademy 4 дні тому
this setup is actually worth it.... good video man :)
@MrLorde76 5 днів тому
Nice went smoothly
@AlfredNutile 5 днів тому ⁺¹
Just a thought if you fork the repo then upload your changes we could just download your fork and try it out?
@l0ltaha 5 днів тому
Hey @matthew_berman , How would I go about having this whole process in a Docker container or have it as an API endpoint to where I can then connect Groq's speech to text, and that returned text is what gets passed to the prompt of MOA? Thanks and love the vids!
@BigBadBurrow 5 днів тому ⁺⁴
Just glancing at your headshot, I thought you were wearing a massive chain, like a wideboy from the Sopranos. Then I realised it's just the way your T-shirt is folded 😂
@Pthaloskies 5 днів тому ⁺²
Good idea, but we need to know the cost comparison as well.
@drlordbasil 5 днів тому ⁺³
We need to start comparing things to claude 3.5 sonnet too!
But I love MoA concept.
@danberm1755 4 дні тому
Thanks much! I might actually give that a try considering you did all the heavy lifting.
Seems like the AI orchestrators such as Crew AI or LangChain should be able to do this as well.
@kevinduck3714 5 днів тому ⁺¹
groq is an absolute marvel
@EROSNERdesign 5 днів тому
AMAZING AI NEWS!
@braineaterzombie3981 5 днів тому ⁺²
What if we use mixture of gaint models like 4o with claude opus and sonnet 3.5 and other state of art models with groq combined
@juanpasalagua2402 4 дні тому
Fantastic!
@positivevibe142 5 днів тому ⁺¹
What is the best local private AI models RAG app?
@nikhil_jadhav 5 днів тому
Trying right away!! Thank you very much.. I wonder if I expose my personal mixture of agents to someone else and they use my model as their primary model?? Or thousands of such a model interconnected to each other.. a mesh of a model looped within themselves what will happen??
@labrats-AI 5 днів тому ⁺¹
Groq is awesome !
@vikastripathiindia 3 дні тому
Can we use OpenAI as the main engine along with Groq?
@nexusphreez 5 днів тому ⁺¹
This is awesome. What would be even better is getting a GUI setup for this so that it can be used more for coding. I maytry this later.
@vash2698 5 днів тому
Any idea if this could be used as a sort of intermediary endpoint? As in I point to a local machine hosting this script as though it were a locally hosted model. If this can perform on par with or better than gpt-4o with this kind of speed then it would be incredible to use as a voice assistant in Home Assistant.
@Pregidth 5 днів тому ⁺¹
How many tokens are used for calling openAI API? Would be wonderful, if you could show how to leave OpenAI out. And full benchmark test please. Thanks Matthew!
@IntelliAmI День тому
Matthew, how are you? Groq hosted another one, I inserted it in this version of MoA that you taught us. Now, with 5 LLMs at the same time. The new model is gemma2-9b-it.
@GodFearingPookie 5 днів тому ⁺¹¹
Groq? We love local LLM
@matthew_berman 5 днів тому ⁺¹¹
Yes but you can’t achieve these speeds with local AI
@InsightCrypto 5 днів тому ⁺¹
@@matthew_berman why not try that in your super computer
@Centaurman 5 днів тому ⁺¹
Hi Matthew if someone wanted to build a home server on a 5grand budget do you reckon a dual 3090 set up could?
If not, how might a determined enthusiast make this fully local?
@HansMcMurdy 5 днів тому
You can use local Language Models but unless you are using a cust asic, the speed will be reduced substantially.
@shalinluitel1332 5 днів тому
@@matthew_berman any local, free, and open-source models which have the fastest inference time? which is the fastest so far?
@sammathew535 5 днів тому
Can I make an "API" call to this MoA and use it say, with DSPy? Have you ever considered making a tutorial on DSPy?
@nikhil_jadhav 5 днів тому
Just wondering how can I use this groq setup in Continue??
]
@paul1979uk2000 5 днів тому
I'm wondering, have any test been done with much smaller models where you can have 2 or 3 running locally on your own hardware to see if it improves quality over any of the 2 or 3 on their own?
I ask because with how APU's are developing, dedicating 20-30-40GB or more to A.I. use wouldn't be that big of a deal with how cheap memory is getting.
@42svb58 5 днів тому
How does this compare when there is RAG with structured and unstructured data???
@shrn680 3 дні тому
would there be a way to integrate this with a front end like openwebui?
@KeithBofaptos 5 днів тому
I've been thinking about this approach also. Very helpful vid. Thanks.🙏🏻.
I'm curious if on top of MoA if MCTSr would get the answer closer to 💯? And once SoHu comes online how awesome are those speeds gonna be?!
@KeithBofaptos 5 днів тому
This is also interesting:
www.nature.com/articles/s41586-024-07421-0.pdf
@zeeveener 5 днів тому ⁺²
All of these enhancements could be something you contribute back to the project in the form of configuration. Would make it a lot easier for the next wave of users
@marcusk7855 3 дні тому
Wow. That is good.
@robboerman9378 5 днів тому ⁺¹
If Matthew with his insane local machine can’t compete, I am convinced Groq is the way to go for MoA. Super interesting to see how it nails the most difficult task in the rubric and fast! ❤
@consig1iere294 5 днів тому
How is this any different than Autogen or CreawAI?
@punk3900 5 днів тому ⁺²
Oh boy, Groq's inference time is truly impressive. As most guys, however, I thought you were talking about Grok. Groq is in fact just mostly Llama on steroids. It's a pitty they cant offer larger models so far. But seeing how Groq works gave a glimpse of the speed of LLM chatbots in a year or two.
@matthew_berman 5 днів тому ⁺⁴
Llama 405b I assume is coming
@jewlouds 5 днів тому ⁺¹
@@matthew_berman I hope you are 'assuming' correctly!
@husanaaulia4717 5 днів тому
We got Moe, MoA, CoE, is there anything else?
@kai_s1985 4 дні тому ⁺¹
The biggest limitation for Groq is API limit. After some use, you cannot use it anymore.
@techwiththomas5690 5 днів тому
Can you explain how this many layers of models actually know HOW TO produce the best answer possible? How do they know what answer is better or more correct?
@psychurch 5 днів тому ⁺⁷
Not all of those Apple sentences make sense Apple.
@oguretsagressive 5 днів тому ⁺²
Sadly, even my favorite Llama 3 botched sentence #4. Maybe this test should specify that the output should be grammatically correct? Or make sense? Apple.
@wurstelei1356 5 днів тому
@@oguretsagressive A valid sentence has to meet certain criteria. The AI should keep track of this and not just output blah blah Apple.
Even if you don't explicitly tell it to produce valid sentences ending with Apple.
@KCM25NJL 3 дні тому
Little tip:
conda create -n python= && conda activate
@thays182 5 днів тому
Need the follow up tho. What is mixture of agents? How can we use it? Do we get to edit the agentic framework and structure ourselves? What possibilities now exist with this tool? I need moooore! (Amazing video and value, I never miss your posts!)
@vauths8204 5 днів тому
see now we need that but uncensored
@piotr780 3 дні тому
why use conda and not pip ?
@burnt1ce85 5 днів тому ⁺⁷
The title of your video is misleading. Your tutorial shows how to setup MoA with Groq, but you haven't demonostrated how it's "Better Than GPT4o AND Fast". Why didnt you test the MoA with your benchmarks?
@hotbit7327 5 днів тому
Exactly. He likes to exaggerate and sometimes mislead, sadly.
@tonyclif1 3 дні тому
Did you see the question mark after the word fast? Sure a little click bait, but also misread by you it seems
@ollibruno7283 5 днів тому
But it cant process pictures?
@Ha77778 5 днів тому ⁺¹
I love that , i hat open ai 😅
@DavidJNowak 4 дні тому
Excellent explanation of how to write code that uses GROQ as a manager of a mixture of agents. But you just went too fast for me to catch all your changes that make it all work. Could you write this in your newsletter or create a video for the slow, metal programming types? Thanks again. Groq AI is making programming more accessible for non-power users like most of us.
@millerjo4582 5 днів тому ⁺²
Is there any chance you’re gonna look into the new algorithmic way to produce LLM’s, this is a transformers killer supposedly, I would think that that would be really relevant to viewers.
@matthew_berman 5 днів тому ⁺²
Name?
@millerjo4582 4 дні тому
@@matthew_berman also thank you so much responding. Ridgerchu/Matmulfree
@millerjo4582 4 дні тому
@@matthew_berman I don’t know if you got that.. it looks like the comments were struck.
@dr.ignacioglez.9677 5 днів тому
Ok 🎉
@rinokpp1692 5 днів тому
What the coast of running this in a one million contect input and output
@engineeranonymous 5 днів тому
When UA-camrs has better security than Rabbit R1. He revokes API keys.
@D0J0Master 2 дні тому
Is goq censored?
@wholeness 5 днів тому
Quietly this is what Sonnet 3.5 is and the Anthropic secret. That why the API doesn't work well with using so much function calling.
@Kutsushita_yukino 5 днів тому
where in the heck did you hear this rumor
@blisphul8084 5 днів тому
If that were the case, streaming tokens would not work so well. Though having multiple models perform different tasks isn't a bad idea. That's probably why there's a bit of delay when starting an artifact in Claude.
@badomate3087 5 днів тому
Has anyone created a comparision between MoA and other multi-agent systems that can utilize LLMs? (Like Autogen) Because to me, this looks exactly like an Autogen network with a few simplifications, like no code running, and no tool or non-LLM agent usage.
So, if this is not better, or even worse than Autogen, then it might not worth to use it. Since Autogen has a lot more features (like the code running, which was mentioned in the last video).
Also, the results compared to a single (but much bigger) LLM, looks kinda obvious to me. Since the last modell receives a lot of proposed outputs, next to the prompt, and it only has to filter the best ones. This task is a lot easier than generating the correct one, for the first time, only from the prompt. And with the base idea behind MoA is this, the results are expectable.
@jay-dj4ui 5 днів тому
nice AD
@4.0.4 5 днів тому ⁺¹
"4. The old proverb says that eating an apple a day keeps the doctor away apple."
🙃
@ryanscott642 4 дні тому
This is cool but can you write some real multi-document code with these things? I don't need to make 10 sentences ending in apple. Most the things I've tried so far can't write code and I struggle to figure out their use.
@4NowIsGood 5 днів тому ⁺²
Interesting but I don't know WTF You're doing but it looks great. Unless there's an easier setup and install, for me right now it's easier to just use chat GPT.
@FriscoFatseas 5 днів тому
yeah im tempted but by the time i get this shit working gpt4 will get a random update and be better lol
@Officialsunshinex 5 днів тому ⁺¹
Brave browser local llm test o.o
@rudomeister 5 днів тому
Thats why (according to the small agents vs response-time) Microsoft specially, with all the others have whole datacenters trying to find out how a swarm of millions of small agents can work seamlessly. What else should it be? A giant multi-trillion parameter model with the name Goliath? haha
@ErickJohnson-qx8tb 5 днів тому ⁺¹
all about AI ragmodel running uncensored v2w/ MOA using groq MOA LIBRABRY blackfridays gpts library ENOUGH SAID YOUR WELCOME I WOTE MY OWN API KEY ON OPEN GUI i built LOLOLOL
@GoofyGuy-WDW 5 днів тому ⁺¹
Groq? Sounds like Grok doesn’t it?
@blisphul8084 5 днів тому
Groq had the name first. Blame Elon Musk.
@harshitdubey8673 5 днів тому ⁺¹
I tried asking MoA
if A=E, B=F, C=G, D=H
Then E=?
It got it wrong 😂
MoA’s answer:- “I”
But it’s amazing 🤩
@wrOngplan3t 4 дні тому
Would be my answer as well. What's wrong about that?
@harshitdubey8673 4 дні тому ⁺¹
@@wrOngplan3t E is predefined
@harshitdubey8673 4 дні тому
Logic not always be a sequence it could be circle ⭕️ sometimes.
@wrOngplan3t 4 дні тому
@@harshitdubey8673 Ah okay, well there's that 🙂 Thanks!
@BizAutomation4U 5 днів тому ⁺¹
I just read that GROQ no longer wants to sell cards directly for 20K but instead wants to offer a SaaS model ? This seems to contradict the benefits of running LLMs locally for privacy reasons, because now you're sending tokens out to a 3rd party web-service. I don't know why this is an either / or decision. LOTS of SMBs can afford to invest 20K in hardware. Total outlay for a serious LLM rig would have been something like 30K, which is barely half a year's salary to an entry level position. Bad move I say, but the good news is there will be competitors that correct this decision if Groq doesn't, and soon !
@cajampa 5 днів тому
Dude, you have misunderstood how groq works. Look into the details, you need maaaaaany cards to be able to fit a model. Look into the details, it is fast because is very little but very fast memory on every card.
So you need a lot if cards to fit anything useful but then you can batch run requests at crazy speeds against those servers.
@BizAutomation4U 5 днів тому ⁺¹
Ok ... What about the whole privacy thing which is the reason people want to run local LLMs. If there is an iron-clad way to prove to most people that using a Groq API for inference is not going to risk sharing data with a 3rd party, you might have a great business case (it's too deep for me technically to know), otherwise you end up with a different dog with the same fleas.
@blisphul8084 5 днів тому ⁺¹
@@cajampayeah, it seems that's the reason they offer very few models. It'd be great if you could host other models on Groq, like Qwen2, as well as any fine-tunes that you'd want to use, like Magnum or Dolphin model variants.
@cajampa 5 днів тому
@@BizAutomation4U If a business want to run Groq because they need the speed they can offer. I am pretty sure Groq can offer them an isolated instance of the servers for the right price. Groq was never about consumers running local LLM. The hardware is just not catered to this use case at all in anyway.
@cajampa 5 днів тому
@@blisphul8084 I say the same to you, if a business want to run Groq with their choice of models I am pretty sure Groq can offer it to them for the right price.
@user-td4pf6rr2t 5 днів тому
This is called Narrow AI.
@Dave-nz5jf 5 днів тому ⁺¹
There's so many advances coming so fast with all of this, I wonder if the real value in your content is the rubric. Or, more accurately, improving your rubric. Right now I think it's barely version 1.0, and it needs to be v5.0. And try adding a medical or legal question for gosh sakes.
@sanatdeveloper 5 днів тому
First😊
@hqcart1 5 днів тому
dude, all your videos talking about beating gpt-4o, and we haven't seen any!
@tamelo 5 днів тому
Groq is terrible, worse than GPT-3.
Why do you keep chilling for it?
@matthew_berman 5 днів тому ⁺²
Groq isn’t a model, it’s an inference service. They have multiple models and offer speeds and prices that are far better than anyone else.
I really like Groq.
@TheAlastairBrown 5 днів тому
There are two different companies/products. One is GROQ and the other is GROK. The one spelled with a "q" is what Matt is talking about, they are essentially a server farm designed to run 3rd party opensource LLM's quickly so you can cheaply offload computation. The one spelled with a "k" is Elon Musk/X's version of ChatGTP.
@LeandroMessi 5 днів тому
Second
@christosmelissourgos2757 5 днів тому
Honestly Matthew why do advertise this? It has been months and we still are stuck with their free package with a rate limit that you can bring no app to production yet . Waste of time and integration
@annwang5530 5 днів тому ⁺²
You are gaining weight?
@Kutsushita_yukino 5 днів тому
are you his parents?
@AI-Rainbow 5 днів тому
Is that any of your business?
@blisphul8084 5 днів тому
He didn't criticize, just pointed it out. Better to know earlier than late while it's easier to fix.
@annwang5530 5 днів тому ⁺¹
@@blisphul8084yeah, today pointing out anything is seen as an attack cuz glass society
@Tubernameu123 5 днів тому ⁺¹
Groq is too filtered/censored... too shameful not courageous.... too weak impotent.....
@finalfan321 19 годин тому
too technical too much effort unfriendly interface
@Heisenberg2097 5 днів тому ⁺⁵
Groq is nowhere near CGPT or Claude... and all of them need a lot of attention and are far away from SAI. There is currently only SUPER-FLAWED and SUPER-OVERRATED.
@greenstonegecko 5 днів тому ⁺¹
I'd need to see a benchmark for proof.
These models are super nuanced. They might score a 0/10 on task A, but a 9/10 on task B.
You cant generalize to "they suck".
These models can already pass the Turing Test. People cannot differentiate ChatGPT 3.5 from actual humans 54% of the time.
@ticketforlife2103 5 днів тому
Lol they are far away from AGI let alone ASI
@lancemarchetti8673 5 днів тому ⁺⁴
Groq is not an LLM as such, it's a booster for AI models. Like a turbo switch to get results faster. By loading your model into Groq, you save around 80% of the time you would have spent without it.
@Player-oz2nk 5 днів тому
@@lancemarchetti8673 thank you lol i coming to say this
@4.0.4 5 днів тому ⁺⁴
This is like someone saying a car dealership is nowhere near the performance of a Honda Civic in drag racing. It only communicates you're a bit new to this.
@ManjaroBlack 5 днів тому
I finally unsubscribed. Don’t know why it took me so long.
@flb5078 5 днів тому
As usual, too much coding...
@TheAlastairBrown 5 днів тому
copy the github files, and the transcript from this video, into claude. tell it to follow the transcript, and try to create what Matt is doing.
@onewhoraisesvoice 5 днів тому
@matthew_berman
Attention, you didn't revoke keys before publishing this video!

Наступне

Автоматичне відтворення

Intro to RAG for AI (Retrieval Augmented Generation)