LLaMA 3 “Hyper Speed” is INSANE! (Best Version Yet)

Поділитися
Вставка
  • Опубліковано 20 кві 2024
  • What happens when you power LLaMA with the fastest inference speeds on the market? Let's test it and find out!
    Try Llama 3 on TuneStudio - The ultimate playground for LLMs: bit.ly/llama-3
    Referral Code - BERMAN (First month free)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? 📈
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries ✅
    bit.ly/44TC45V
    Links:
    groq.com
    llama.meta.com/llama3/
    about. news/2024/04/met...
    meta.ai/
    LLM Leaderboard - bit.ly/3qHV0X7
  • Наука та технологія

КОМЕНТАРІ • 591

  • @matthew_berman
    @matthew_berman  Місяць тому +216

    Reply Yes/No on this comment to vote on the next video:
    How to build Agents with LLaMA 3 powered by Groq.

  • @marcussturup1314
    @marcussturup1314 Місяць тому +163

    The model got the 2a-1=4y question correct just so you know

    • @Benmenesesjr
      @Benmenesesjr Місяць тому +38

      Yes if thats a "hard SAT question" then I wish I had taken the SATs

    • @picklenickil
      @picklenickil Місяць тому

      American education is a joke! That's what we solved in 4th standard I guess..!

    • @matthew_berman
      @matthew_berman  Місяць тому +5

      That’s a different answer from what was shown in the SAT website

    • @yonibenami4867
      @yonibenami4867 Місяць тому +103

      The actual SAT question is : "if 2/(a-1) = 4/y , where y isn't 0 and a isn't 1, what is y in terms of a :
      and then the answer is:
      2/(a-1) = 4/y
      2y = 4(a-1)
      y = 2(a-1)
      y = 2a-2
      My guess he just copied the question wrong

    • @hunga13
      @hunga13 Місяць тому +31

      ⁠​⁠@@matthew_bermanthe models answer is correct. If SAT showing different one, they’re wrong. You can do the math by yourself to check it

  • @floriancastel
    @floriancastel Місяць тому +75

    4:55 The answer was actually correct. I don't think you asked the right question because you just need to divide both sides of the equation by 4 to get the answer.

    • @asqu
      @asqu Місяць тому +2

      4:55

    • @floriancastel
      @floriancastel Місяць тому +3

      @@asqu Thanks, I've corrected the mistake

    • @R0cky0
      @R0cky0 Місяць тому +8

      Apparently he wasn't using his brain but just copying & pasting then looking for some answer imprinted in his mind

    • @Liberty-scoots
      @Liberty-scoots Місяць тому +2

      Ai will remember this treacherous behavior in the future 😂

  • @notnotandrew
    @notnotandrew Місяць тому +15

    The model does better when you prompt it twice in the same conversation because it has the first answer in its context window. Without being directly told to do reflection, it seem that it reads the answer, notices its mistake, and corrects it subconsciously (if you could call it that).

    • @splitpierre
      @splitpierre Місяць тому +2

      Either that, or just has to do with temperature. I believe, by the groq documentation, their platform does not implement memory like chat gpt, temperature by default is 1 on groq which is medium and will give varying responses, so I believe it has to do with temperature.
      Try again with deterministic results, temperature zero.

  • @tigs9573
    @tigs9573 Місяць тому +1

    Thank you, I really appreciate your content since it is really setting me up for when I ll get the time to dive into LLM.

  • @geno5183
    @geno5183 Місяць тому +6

    Heck yeah, Matt - let's see a video on using these as Agents. THANK YOU! Keep up the amazing work!

  • @Big-Cheeky
    @Big-Cheeky Місяць тому +14

    PLEASE MAKE THAT VIDEO! :) This one was also great

  • @vickmackey24
    @vickmackey24 Місяць тому +24

    4:28 You copied the SAT question wrong. This is the *actual* question that has an answer of y = 2a - 2: "If 2/(a − 1) = 4/y , and y ≠ 0 where a ≠ 1, what is y in terms of a?"

  • @juanjesusligero391
    @juanjesusligero391 Місяць тому +22

    I'm confused. Why is the right answer to the equation question "2a-2"?
    If I understand it correctly and that's just an equation, the result should be what the LLM is answering, am I wrong?
    I mean:
    2a-1=4y
    y=(2a-1)/4
    y=a/2-1/4

  • @AtheistAdam
    @AtheistAdam Місяць тому

    Yes, and thanks for sharing.

  • @MrStarchild3001
    @MrStarchild3001 Місяць тому +3

    Randomness is normal. Unless the temperature is set to zero (which is almost never the case), you'll be getting stochastic outputs with an LLM. This is actually a feature, not a bug. By asking the same question 3 times, 5 times, 7 times etc. And then reflecting on it, you'll be getting much better answers than asking just once.

    • @roelljr
      @roelljr Місяць тому

      Exactly. I thought this was common knowledge at this point. I guess not.

  • @OccamsPlasmaGun
    @OccamsPlasmaGun Місяць тому +6

    I think the reason for the alternating right and wrong answers is that it assumes that you asked it again because you weren't happy with the previous answer. It picks the most likely answer based on that.

  • @collectivelogic
    @collectivelogic Місяць тому +2

    Your chat window is "context". That's why it's "learning". We need to see how they have the overflow setting configured, then you'll be able to know if it's a rolling or cut the middle sort of compression.
    Love your channel!

  • @dropbear9785
    @dropbear9785 Місяць тому

    Yes, hopefully exploring this 'self-reflection' behavior. It may be less comprehensive than "build me a website" type agents, but showing how to leverage groq's fast inference to make the agents "think before they respond" would be very useful...and provide some practical insights. (Also, estimating cost of some of these examples/tutorials would be a nice-to-know, since it's the first thing I'm asked when discussing LLM use cases). Thank you for your efforts ... great content as usual!

  • @Artificialintelligenceo
    @Artificialintelligenceo Місяць тому

    Great video. Nice speed.

  • @existenceisillusion6528
    @existenceisillusion6528 Місяць тому +1

    4:49 using '2a-2' implies a = 7/6, via substitution. However, it can not be incorrect to say (2a-1)/4 = y, because the implication is that all of mathematics is inconsistent.

  • @ideavr
    @ideavr Місяць тому

    At the marble and cup prompt. If we consider that Llama 3 recognizes successive prompts as successive events, then Llama 3 may have interpreted the events as follows: (1) inverting the cup on the table. So the marble falls onto the table. The cup goes into the microwave and the marble stays on the table. (2) in a second response to the same prompt, when we turn the cup over, Llama can have interpreted it as "going under the table". Thus, the marble, due to gravity, would be at the bottom of the cup. Then, the cup goes into the microwave with the marble inside. And so on.

  • @DeSinc
    @DeSinc Місяць тому +5

    The hole digging question was made not to be a maths question, but to see if the model can fathom the idea of real-world space restrictions cramming 50 people into a small hole. The point of the question is to trick the model into saying 50 people can fit into the same hole and work at the same speed which is not right.
    I would personally only consider it addressing the space requirements of a hole for the amount of people as a pass. Think if you said 5,000 people digging a 10 foot hole, it would not take 5 milliseconds. That's not how it works. That's what I would be looking for in that question.

    • @phillipweber7195
      @phillipweber7195 Місяць тому +1

      Indeed. The first answer was actually wrong. The second one was better, though not perfect. Although that still means it gave one wrong answer.
      Another factor to consider is possible exhaustion. One person working five hours straight is one thing. But if there are more people who can't work simultaneously but on a rotating basis...

  • @taylorromero9169
    @taylorromero9169 Місяць тому +5

    The variance on T/s can be explained by using a shared environment. Try the same question repeatedly after clearing the prompt and I bet it ranges from 220 to 280. Also, yes, too lenient on the passes =) Maybe create a Partial Pass to indicate something that doesn't zero shot it? It would be cool to see the pass/fails in a spreadsheet across models, but right now I couldn't trust the "Pass" based on the ones you let pass.

  • @I-Dophler
    @I-Dophler Місяць тому

    For sure! I'm astonished by the improvements in llama 3's performance on Grock. Can't wait to discover what revolutionary advancements lie ahead for this technology!

  • @JimMendenhall
    @JimMendenhall Місяць тому +2

    YES! This plus Crew AI!

  • @victorc777
    @victorc777 Місяць тому +1

    As always, Matthew, love your videos. This time, though I followed along running the same prompts on **Llama 3 8B FP16 Instruct** model on my Mac studio. I think you'll find this a bit interesting, if not you then some of your viewers.
    When following along if both your run and mine failed or passed, I am ignoring them, so you can assume if I'm not bringing it up here then mine did as well or as bad as the 70B model on Groq, which is saying something! I almost wonder if Groq is running a lower quantization, which may or may not matter, but considering the 8B model on my Mac is nearly on par with the 70B model is strange to say the least.
    The only questions that stick out to me are the Apple prompt, the Diggers prompt, and the complex Math Prompt (Answer is -18).
    - The very first time I ran the Apple prompt it gave me the correct answer, and I re-ran it 10 times with only one of them providing me with an error of a single sentence, not ending in Apple.
    - Pretty much the same thing with the Diggers prompt, I ran it many times over and got the same answer, except for once. It came up with a solution that to dig the hole would not take any less time, which would almost make sense, but the way it explained it, it was hard to follow and made it seem like 50 people were digging 50 different holes.
    - The first time I ran the complex math prompt it got it wrong, close to the same answer you got the first time, but the second time I ran it I got the correct answer. It was bittersweet since I re-ran it another 10 times and could never get the same answer again.
    I'm beginning to wonder if some of the prompts you're using are uniquely too hard or too easy for the Llama 3 models regardless of how many parameters they have.
    EDIT: when running math problems, I started to change some inference parameters, which to me seems necessary, considering math problems can have a lot of repetitiveness. So I started reducing the temperature, disabling the repeat penalty, and adjusting Min and Top P sampling. Although I am not getting the right answer, or at least I think I'm not, since I don't know how to complete the advanced math problems, but for the complex math prompt where -18 is supposedly the answer, I continue to get -22. Whether or not that is, the wrong answer is not my point, but that by reducing the temperature and removing the repetition penalty, it is at least becoming consistent, which for math problems seems like that is what our goal should be. Through constant test and research, I THINK the function should be written with the "^" symbol, according to wolfram, like this: f(x) = 2 x^3 + 3 x^2 + c x + 8

  • @Kabbinj
    @Kabbinj Місяць тому +1

    Groq is set to cache results. Any prompt + chat history gives you the same result for as long as the cache lives. So for your case, both the first and second answer is locked in place by the cache.
    Also keep in mind that the default setting of groq is a temperature higher than 0. This means there will be variations in how it answers(assuming no cache). From this at can conclude that it's not really that confident in its answer, as even the small default temperature will trip it.
    May I suggest you run these non creative prompts with temperature 0?

  • @ps0705
    @ps0705 Місяць тому

    Thanks for a great video as always, Matthew! Would you consider running your questions 10 times (not on video) if the inference speed is reasonable of course, to check the percentage of how often it gets questions right/wrong ?

  • @easypeasy2938
    @easypeasy2938 Місяць тому +1

    YES! I want to see that video! Please start from very beginning of process. Just found you and I would like to set up my first agented AI. (I have an OpenAI pro account, but I am willing to switch to whatever you recommend....looking for AI to help me learn Python, design a database and web app, and design a Kajabi course for indie musicians. Thanks!

  • @ThaiNeuralNerd
    @ThaiNeuralNerd Місяць тому

    Yes, an autonomous video showing an example using groq and whatever agent model you choose would be awesome

  • @AINEET
    @AINEET Місяць тому +5

    The guys from rabbit really need the groq hardware running the llm on their servers

  • @TheHardcard
    @TheHardcard Місяць тому

    One important factor to know are the parameter specifications. Are they floating point or integer? How many bits 16, 8, 4, 2?
    If fast inference speeds are coming from heavy quantization it could affect the results. This would be fine for many people a lot of the time, but it should also always be disclosed.
    Is Groq running full precision?

  • @chrisnatale5901
    @chrisnatale5901 Місяць тому

    Re: how to decide which of multiple answers is correct, there's been a lot of research on this. Off the top of my head there's a "use the consensus choice, or failing consensus choose the choice the LLM has the highest confidence score." That approach I used in Google's Gemma paper if I recall correctly.

  • @dhruvmehta2377
    @dhruvmehta2377 Місяць тому

    Yess i would love to see that

  • @TheColonelJJ
    @TheColonelJJ Місяць тому

    Which LLM, that can be run on a home computer, would you recommend for helping refine prompts for Stable Diffusion -- text to image?

  • @ministerpillowes
    @ministerpillowes Місяць тому +5

    8:22 Is the marble in the cup, or is the marble on the table: the question of our time 🤣

    • @Sam_Saraguy
      @Sam_Saraguy Місяць тому +1

      and the answer is: "Yes!"

  • @joepavlos3657
    @joepavlos3657 Місяць тому

    Would love to see the Crew ai with Groq idea, I would also love to see more content on using crew ai, agents to be used to train and update models. Great content as always, thank you.

  • @user-cw3jg9jq6d
    @user-cw3jg9jq6d Місяць тому

    Thank you for the content. Do you think you can point to create procedures for running LLaMA 3 on Groq please? I might have missed something but why did you fail LLaMa3 for the question about breaking into a car. I think it told you it can not provide that info, which is what you want; no?

  • @falankebills7196
    @falankebills7196 Місяць тому

    hi, how did you run the snake python script from Visual Studio? I tried but couldn't get the game screen to pop up. Any hints/help/pointers much appreciated.

  • @ThePawel36
    @ThePawel36 Місяць тому

    I'm just curious. What is the difference in quality responses between for example 4q and 8q models? Lower quantization means lower quality or higher possibility of error?

  • @mhillary04
    @mhillary04 Місяць тому

    It's interesting to see an uptick in the "Chain-of-thought" responses coming out of the latest models. Possibly some new fine tuning/agent implementations behind the scenes?

  • @mshonle
    @mshonle Місяць тому

    It’s possible you are getting different samples when you prompt twice in the same session/context due to a “repetition penalty” that affects token selection. The kinds of optimizations that groq performs (as you made in reference to your interview video) could also make the repetition penalty heuristic more advanced/nuanced. Cheers!

  • @andyjm2k
    @andyjm2k Місяць тому

    Did you modify the temperature setting? It defaults to 1 which can increase your variance

  • @hugouasd1349
    @hugouasd1349 Місяць тому +1

    giving the LLMs the question twice I would suspect works due to it not wanting to repeat itself if you had access to things like the temperature and other params you could likely get a better idea of why but that would be my guess.

  • @djglxxii
    @djglxxii Місяць тому +1

    For the microwave marble problem, would it be helpful if you were explicit in stating that the cup has no lid? Is it possible it doesn't quite understand that the cup is open?

  • @MeinDeutschkurs
    @MeinDeutschkurs Місяць тому +51

    I can‘t help myself, but I think there are 4 killers in the room: 3 alive and one dead.

    • @sbacon92
      @sbacon92 Місяць тому +8

      "There are 3 red painters in a room. A 4th red painter enters the room and paints one of the painters green."
      How many painters are in the room?
      vs
      How many red painters are in the room?
      vs
      How many green painters are in the room?
      From this perspective you can see there is another property of the killers being checked, whether are they living, that wasn't asked for and it doesn't specify if a killer stops being a killer upon death.

    • @LipoSurgeryCentres
      @LipoSurgeryCentres Місяць тому +2

      Perhaps the AI understands about human mortality? Ominous perception.

    • @matthew_berman
      @matthew_berman  Місяць тому +9

      That’s a valid answer also

    • @henrik.norberg
      @henrik.norberg Місяць тому +3

      For me it is "obvious" that there are only 3 killers. Why? Otherwise we would still count ALL killers that ever lived. Otherwise, when do someone stop count as a killer? When they have been dead for a week? A year? Hundred years? A million years? Never?

    • @alkeryn1700
      @alkeryn1700 Місяць тому +7

      @@henrik.norberg Killers are killers forever, wether dead or alive.
      you are not gonna say some genocidal historical figure is not a a killer because he's dead.
      you may use "was" because the person no longer is, but the killer part is unchanged.

  • @EnricoRos
    @EnricoRos Місяць тому

    Is llama3-70B on Groq running quantized (8-bit?) or F16? To understand if this is the baseline or less.

  • @airedav
    @airedav Місяць тому

    Thank you, Matthew. Please show us the video of Llama 3 on Groq

  • @d.d.z.
    @d.d.z. Місяць тому

    Absolutely, I'd like to see the Autogen and Crew ai video ❤

  • @jp137189
    @jp137189 Місяць тому

    @matthew_berman A quantized version of Llama 3 is available on LM Studio. I'm hoping you get a chance to play with it soon. There was a interesting nuance to your marble question on the 8B Q8 model: "The cup is inverted, meaning the opening of the cup is facing upwards, allowing the marble to remain inside the cup." I wonder how many models assume 'upside down' indicates the cup open is up, but just don't say it explicitly?

  • @HaraldEngels
    @HaraldEngels Місяць тому

    Yes I would like to see the video you proposed 🙂

  • @StefanEnslin
    @StefanEnslin Місяць тому

    Yes, Would love to see you doing this, still getting used to the CrewAI system

  • @AnthonyGarland
    @AnthonyGarland Місяць тому

    wow! amazing

  • @christiandarkin
    @christiandarkin Місяць тому

    I think when you prompt a second time it's reading the whole chat again, and treating it as context. So, when the context contains an error, there's a conflict which alerts it to respond differently

  • @arka7869
    @arka7869 Місяць тому

    here is another criteria for reviewing models: reliability or consistency. does the answer change if prompt was repeated? I mean, if I dont know the answer and I would have to rely on the model (like the math problems) how could I be sure that the answer is correct? we need STABLE answers! thank you for your testing!

  • @csharpner
    @csharpner Місяць тому

    I've been meaning to comment regarding these multiple different answers:
    You need to run the same question 3 times to give a more accurate judgement. But clear it every time and make sure you don't have the same seed number.
    What's going on: The inference injects random numbers to prevent it from repeating the same answer every time.
    Regarding not clearing, and asking the same question twice, it uses the entire conversaion to create the new answer, so it's not really asking the same question, it's ADDING the question to a conversation and the whole conversation is used to trigger a new inference.
    Just remember, there's a lot of randomness too.

  • @Maltesse1015
    @Maltesse1015 Місяць тому

    Looking forward for the Agent video with Llama3 🎉!

  • @KiraIsGod
    @KiraIsGod Місяць тому

    if you ask the same question 2 times that are somewhat hard I think the LLM assumes the first one was incorrect so it tries to fix the answer leading to an incorrect answer the 2nd time.

  • @Mr_Tangerine609
    @Mr_Tangerine609 Місяць тому

    Yes, please Matt, I would like to see you put llama three into an agent framework. Thank you.

  • @dewijones92
    @dewijones92 Місяць тому

    do you know what quant groq is using? I'd love it if you tested the unquant version :D

  • @JanBadertscher
    @JanBadertscher Місяць тому

    Thanks Matthew for the eval. Some thoughts, ideas and comments:
    1. For an objective I always remove the history.
    2. If I didn't set temp to 0, I run every question multiple times, to stochastically get more comparable results and especially measure the distribution to get a confidence score for my measured results.
    3. Trying exactly the same promt multiple times over an API like Groq? I doubt they use LLM caching or temp is set to 0. Better check twice, if they cache things.

  • @frederick6720
    @frederick6720 Місяць тому +4

    That's so interesting. Even Llama 3 8B gets the "Apple" question right when prompting it twice.

    • @TheReferrer72
      @TheReferrer72 Місяць тому

      Yes and on the first prompt it only got the 6th sentence wrong!
      6. The kids ran through the orchard to pick some Apples.

    • @mlsterlous
      @mlsterlous Місяць тому

      Not only that question. It's crazy smart overall.

  • @TheFlintStryker
    @TheFlintStryker Місяць тому +2

    Let’s build the agents!!

  • @user-zh3zb7fw2j
    @user-zh3zb7fw2j Місяць тому

    In the case where the model gives wrong answers alternating with correct answers If we give the model an additional "Prompt" like "Please think carefully about your answer to the question," I think it would be interesting what would happen to the answer? Mr. Berman

  • @UVTimeTraveller
    @UVTimeTraveller Місяць тому

    Yes! Please make the video. Thank you

  • @jimbo2112
    @jimbo2112 Місяць тому

    Could the multi inference output options serve you a random version of any one of its answers? This does not however explain how, when it explains the physics of the actions of the marble, it's inconsistent. Very bizarre...

  • @davtech
    @davtech Місяць тому

    Would love to see a video on how to setup agents.

  • @MrEnriqueag
    @MrEnriqueag Місяць тому

    I believe that by default the temperature is 0 which means that with the same input you are always gonna get the same output, if you ask the question twice thou, the input is different because it contains the original question, thats why the response is different.
    If you increase the temperature a bit, the output should be different every time, and then you can use that to generate multiple answers via api, then ask another time to reflect on it, and then provide the best answer.
    If you want I can create a quick script to test that out

  • @zinexe
    @zinexe Місяць тому

    perhaps the temperature settings are different in the online/groq version,
    for math it's probably best to have very low temp, maybe even 0

  • @tvwithtiffani
    @tvwithtiffani Місяць тому

    The reason you get the correct answer after asking a 2nd and 3rd time is the same reason chain of thought, chain of whatever works. The subsequent inference requests are taking the 1st output and using it to reason, finding the mistake and correcting it. This is why the Agent paradigm is so promising. Better than zero-shot reasoning.

    • @tvwithtiffani
      @tvwithtiffani Місяць тому

      I think you are aware of this though because you mentioned, getting a consensus of outputs. This is the same thing in a different manner.

  • @zandor117
    @zandor117 Місяць тому

    I'm looking forward to the 8b being put to the test. It's absolutely insane how performant the 8b is for it's size.

  • @8eck
    @8eck Місяць тому

    Check temperature setting. Temperature is adding randomness into the output.

  • @WINDSORONFIRE
    @WINDSORONFIRE Місяць тому

    I ran this on ollama 70b And I get the same behavior. In my case and not just for this problem but other logic problems it would give me the wrong answer. Then I tell it to check The answer and it always gets it right the second time. This model is definitely a model that would benefit from self-reflection before answering

  • @codescholar7345
    @codescholar7345 Місяць тому

    How can we improve inference speed locally?

  • @brunodangelo1146
    @brunodangelo1146 Місяць тому

    Would love to see the AutoGen test. I'm taking a go at it myself at the moment, would be auper helpful

  • @AntonioSorrentini
    @AntonioSorrentini Місяць тому

    I asked lama 3 70B on LM studio on my machine if it is multimodal and it said yes. Please how to use it in multimodal way on my local machine either with LM studio or other way?

  • @tvwithtiffani
    @tvwithtiffani Місяць тому +1

    Out of curiosity, anyone know how much heat groq hardware outputs?

  • @KEKW-lc4xi
    @KEKW-lc4xi Місяць тому +1

    Can it make card games or is that still too advanced? I think card games would be the next step up as it could have a sense of UI, drawing the ascii representations of the cards etc.

  • @MagnusMcManaman
    @MagnusMcManaman Місяць тому

    I think the problem with the cup is that LLaMA "thinks" that every time you write "placed upside down on a table" you are actually turning the cup upside down, which is the opposite of what it was before.
    So, as it were, every other time you put the cup "normally" and every other time upside down.
    LLaMA takes into account the context, so if you delete the previous text, the position of the cup "resets".

  • @micknamens8659
    @micknamens8659 Місяць тому +1

    5:20 The given function f(x)=2×3+3×2+cx+8 is equivalent to f(x)=8+9+cx+8=cx+25. Hence it is linear and can cross the x-axis only once.
    Certainly you mean instead: f(x)=2x^3+3x^2+cx+8. This is a cubic function and hence can cross the x-axis 3 times.
    When you solve f(-4)=0, you get c=-18.
    But when you solve f(12)=0, you get c=-324-8/12. So obviously 12 can't be a root of the function.
    The other roots are 2 and 1/2.

  • @asgorath5
    @asgorath5 Місяць тому

    Marble: I assume that it doesn't clear your context and that the LLM assumes the cup's orientation changes each time. That means on every "even" occasion the orientation of the cup has the opening downwards and hence moving the cup leaves the marble on the table. On every "odd" occasion, the cup has its opening face upwards and hence the marble is held in the cup when the cup is removed. I therefore assume the LLM is interpreting the term "upside down" as a continual oscillation of the orientation of the opening of the cup.

  • @Luxiel610
    @Luxiel610 Місяць тому

    its so insane that it actually wrote "Flappy bird" with a GUI. it does error in first and 2nd output and the 3rd it's so flawless. daang

  • @kabauny
    @kabauny Місяць тому

    Can someone explain to me why Groq's responses are different than Meta's responses if it is the same model weights they are using?

  • @celyasi
    @celyasi Місяць тому +5

    the answer for 2a-1=4y is correct as y=(2a-1)/4. The explanation is perfect and answer is correct!

  • @brianvh
    @brianvh Місяць тому

    Definitely would like to see this running on autogpt or change tree of thoughts etc

  • @loryo80
    @loryo80 Місяць тому

    can we use llama vision via groq?

  • @mazensmz
    @mazensmz Місяць тому

    Hi Nooby,
    you need to consider the following:
    1. any statements, words added to the context will effect the response, so ensure only direct relevant context only.
    2. When you ask "How many words in the response?" the system prompt statement effect the number given to you, you may request the llm to count and mention the response words and you will be surprised.
    Thx!

  • @HarrisonBorbarrison
    @HarrisonBorbarrison Місяць тому

    1:53 Haha Comic Sans! That was funny.

  • @ollantaytambur2165
    @ollantaytambur2165 Місяць тому +2

    4:56 why y = (2a-1)/4 is not correct ansver?

  • @thelegend7406
    @thelegend7406 Місяць тому +1

    Some readymade coffee cups have lid so llama gambles between the bith response.

    • @tlskillman
      @tlskillman Місяць тому

      I think this is poorly constructed question, as you point out.

  • @NoahtheGameplayer
    @NoahtheGameplayer Місяць тому

    I'm not sure if it's only me, but when trying to log in with a Facebook account, it sent me back to the original page and I click "try meta AI" And it keeps sending me back to the original page,
    Any help with that? Because I do want to save my history with the chat bot

  • @roelljr
    @roelljr Місяць тому

    A new logic/reasoning question for you test that is very hard for LLMs:
    Solve this puzzle:
    Puzzle: There are three piles of matches on a table - Pile A with 7 matches, Pile B with 11 matches, and Pile C with 6 matches. The goal is to rearrange the matches so that each pile contains exactly 8 matches.
    Rules:
    1. You can only add to a pile the exact number of matches it already contains.
    2. All added matches must come from one other single pile.
    3. You have only three moves to achieve the goal.

  • @abdelouahabtoutouh9304
    @abdelouahabtoutouh9304 Місяць тому

    You had to remove the system prompt from the parameters on Groq, as it pollutes the input and thus affects the output.

  • @wiltedblackrose
    @wiltedblackrose Місяць тому +9

    Also, your other test with the function is incorrect (or unclear) as well. As a simple proof check that if c = -18, then the function f doesn't have a root at x = 12:
    f(12) = 2 · 12^3 + 3 · 12^2 - 18 · 12 + 8 = 3680.
    Explanation:
    f(-4) = 0 => 2 · (-4)^3 + 3 · (-4)^2 + c · (-4) + 8 = 0 => -72 - 4c = 0, which in an of itself would imply that c = -18.
    f(12) = 0 => 2 · 12^3 + 3 · 12^2 + c · 12 + 8 = 0 => 3896 + 12 c = 0 which on the other hand implies that c = -324
    Therefore there is a contradiction. This would actually be an interesting test for an LLM, as not even GPT-4 sees it immediately, but the way you present it, it's nonsense.

    • @Sam_Saraguy
      @Sam_Saraguy Місяць тому +1

      garbage in, garbage out?

    • @wiltedblackrose
      @wiltedblackrose Місяць тому

      @@Sam_Saraguy That refers to training, not inference.

  • @axees
    @axees Місяць тому

    I've tried creating Snake with zero-shot too. Got pretty much the same result :) Maybe should try testing it by asking to create Tetris :)

  • @christiansroy
    @christiansroy Місяць тому

    @matthew_berman remember that asking the same question to the same model will give you different answers because there are randomness to it unless you specify a temperature of zero, which I don’t think you are doing here. Also, assuming the inference speed depends on the question you ask is a bit far-fetched. You have to account the fact that the load on the server will also impact the inference speed. If you ask the same question times at different time period of the day you will get different inference speed. good science is not about making quick conclusions on sparse results.

  • @dkozlov80
    @dkozlov80 Місяць тому

    Thanks. Let's try local agents on llama 3? Also please consider self corrective agents, maybe based on langchain graphs. On llama3 they should be great.

  • @chriszuidema
    @chriszuidema Місяць тому

    Snake is getting better every week!

  • @Scarage21
    @Scarage21 Місяць тому

    The marble thing is probably just the result of reflection. Models often get stuff wrong bc an earlier more-or-less-random token pushes it to the wrong path. Models cannot selfcorrect during inference, but can on a seconn iteration. So it probably spotted the incorrect reasoning of the first iteration and never generated early tokens that pushed it down the wrong path again.

  • @SanctuaryLife
    @SanctuaryLife Місяць тому +1

    With the marble in the cup dilemma, could be that the temperature settings are a little too high on the model leading it to be creative?

    • @roelljr
      @roelljr Місяць тому

      it's exactly what it is. Randomness is normal. Unless the temperature is set to zero (which is almost never the case), you'll be getting stochastic outputs with an LLM. This is actually a feature, not a bug. By asking the same question 3 times, 5 times, 7 times etc. And then reflecting on it, you'll be getting much better answers than asking just once

  • @Evolution__X
    @Evolution__X Місяць тому

    Any similar websites like groq, which hosts LLM???

  • @wiltedblackrose
    @wiltedblackrose Місяць тому +51

    My man, in what world is y = 2a - 2 the same expression as 4y = 2a - 1 ? That's not only a super easy question, but the answer you got is painfully obviously wrong!! Moreover I suspect you might be missing part of the question, because the additional information you provide about a and y are completely irrelevant.

    • @matthew_berman
      @matthew_berman  Місяць тому +2

      I used the answer in the SAT webpage

    • @wiltedblackrose
      @wiltedblackrose Місяць тому +8

      @@matthew_berman Well, you too can see it's wrong. Also, the other SAT question is wrong too. Look at my other comment

    • @dougdouglass6126
      @dougdouglass6126 Місяць тому +12

      @@matthew_bermanthis is alarmingly simple math, if you’re using the answer from an SAT page then there are two possibilities: You copied the question incorrectly, or the SAT page is wrong. It’s most likely that you copied the question wrong because the way the second part of the question is worded does not make any sense.

    • @elwyn14
      @elwyn14 Місяць тому +7

      @@dougdouglass6126 Sounds like its worth double checking, but saying things like "this is alarmingly simple math" is a bit disrespectful and assumes Matt has any interest in checking this stuff, no offense but math only becomes interesting when you've got an actual problem to solve, if the answer is already there from the SAT webpage as he said, he's being a total normal person not even looking at it.

    • @wiltedblackrose
      @wiltedblackrose Місяць тому +7

      @@elwyn14 That's nonsense. Alarming is very fitting, because this problem is so easy it can be checked for correctness at a glance, which is what we all do when we evaluate the model's response. And this is A TEST, meaning, the correctness of what we expect as an answer is the only thing that makes it valuable.

  • @CNCAddict
    @CNCAddict Місяць тому

    I maybe mistaken but on the marble question the previous answer is now part of the context...my guess is that the model reads this answer..sees that it's mistaken and corrects it.

  • @seupedro9924
    @seupedro9924 Місяць тому

    Please make a video using agents in a graphical interface. It would be really interesting