Reflection 70b Controversy is PROOF our Perspective on LLMs is wrong.

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 205

  • @MattVidPro
    @MattVidPro  4 місяці тому +10

    I know this video was a bit ranty but I thought there was a pretty clear conclusion to draw upon: WE NEED TO RETHINK HOW WE VIEW LLMS | Also Huge Thanks to Brilliant for Sponsoring! Check em out here: brilliant.org/MattVidProAI/

    • @NicVandEmZ
      @NicVandEmZ 4 місяці тому

      You should copy the system prompt to GPT you create and try it system props with sonata too

    • @LouisGedo
      @LouisGedo 4 місяці тому

      👋 hi

    • @LouisGedo
      @LouisGedo 4 місяці тому

      👋 hi

    • @amj2048
      @amj2048 4 місяці тому

      LLMs are just a different way of accessing a database. Every database requires a query language to get anything useful out of it. Modern so called AI is just a fancy new query language, but at the end of the day, it's just a query language accessing a database, there is no thought going on.
      This is something that I think a lot of non-programming people have missed, they seem to think there is actual intelligence or thought going on. There is no intelligence or thought, it's really is as simple as a query language accessing a database.
      The better the query the better the result, but also the data in the database has to be of good quality too.

  • @the80sme
    @the80sme 4 місяці тому +8

    Never apologize for the sponsors! You provide us with so much value and always have interesting sponsors that are relevant and feel like a natural part of the video. Honestly, yours are some of the only UA-cam ads I don't skip. Thanks for all your hard work!

    • @michelprins
      @michelprins 4 місяці тому

      well at least show the commercials at the end so u show more respect for ure viewers then for the admen.

    • @ShawnFumo
      @ShawnFumo 4 місяці тому

      I always like seeing Brilliant since I already had a subscription to them and liked them before even seeing them advertise on videos.

  • @jeffwads
    @jeffwads 4 місяці тому +26

    There are reports on X and Reddit about this whole thing being linked to Claude. Bizarre.

    • @MattVidPro
      @MattVidPro  4 місяці тому +8

      I get into that in the video. It's also bothering me

  • @johanavril1691
    @johanavril1691 4 місяці тому +61

    STOP USING THE EXAMPLE OF COUNTING LETTERS, TOKENISATION MAKES IT A TERRIBLE TEST

    • @jackpisso1761
      @jackpisso1761 4 місяці тому +5

      Exactly this!

    • @eastwood451
      @eastwood451 4 місяці тому +6

      You're shouting.

    • @johanavril1691
      @johanavril1691 4 місяці тому

      @@eastwood451 SORRY MY CAPSLOCK KEY IS BROKEN!

    • @SW-fh7he
      @SW-fh7he 4 місяці тому +10

      It's a great test. Because it would be necessary to have a new technique.

    • @grostire
      @grostire 4 місяці тому +1

      You are narrow minded.

  • @DisturbedNeo
    @DisturbedNeo 4 місяці тому +33

    The trouble with this finetune is, the output appears to be only marginally better than the base model, but all the extra tokens makes it cost like 2-3x as much, so it's just not worth it.

    • @MattVidPro
      @MattVidPro  4 місяці тому +12

      But if the power of the finetune increases with model size, in theory it would be.

    • @Jack122official
      @Jack122official 4 місяці тому +1

      ​@@MattVidProwhat do you think about AI song dubbing would you like it to happen

    • @JosephSimony
      @JosephSimony 4 місяці тому +2

      Not sure what is the definition of "marginally better" thus "not worth it". A real life scenario: I had no experience setting up software raids in CentOs. If Claude would have been "marginally better" it would have worked but it/I screwed up my server and spent days instead. With ChatGPT (much smaller context window same prompting) the raid worked right away after reboot. Go figure "marginally" and "not worth it".

    • @radradder
      @radradder 4 місяці тому +1

      ​@@MattVidProdon't refute current reality with theorical: what if this wasn't reality

    • @jonathanberry1111
      @jonathanberry1111 4 місяці тому

      @@MattVidPro Also, if a model can improve it's accuracy then this helps make better synthetic data and help reach toward high quality results where LLM's can get better from essentially understanding and thinking and drawing conclusions from it's own output into a potentially constructive loop. It's not about being slightly better for some end use, it's about the ability to make AI become able to not just regurgitate what people know and say but potentially reach at least low level ASI (as good as the smartest humans almost).

  • @Dron008
    @Dron008 4 місяці тому +16

    Community should stop believing anyone's closed benchmarks. It is very weird for me when people discuss benchmark results from some publications which nobody tried to check.

    • @brulsmurf
      @brulsmurf 4 місяці тому

      They train the model on the test set (with extra steps). If the questions of the benchmark is public, then it's useless

    • @adolphgracius9996
      @adolphgracius9996 4 місяці тому

      @@brulsmurf Rather than bench marks, people should do their own test by just using the Ai and calling out the mistakes

  • @ElvinHoney707
    @ElvinHoney707 4 місяці тому +11

    Hey, please take the system prompt he gave you and use it in an unadulterated Llama 3.1 70B with the same prompt and see how that response compares to what you showed in the video. That should show us the fine tuning effect, if any.

  • @manonamission2000
    @manonamission2000 4 місяці тому +3

    it's easier to prevent a text2image model spitting out nsfw images by adding a filtering layer than to re-engineer the model itself

  • @IntellectCorner
    @IntellectCorner 4 місяці тому +1

    *𝓣𝓲𝓶𝓮𝓼𝓽𝓪𝓶𝓹𝓼 𝓫𝔂 𝓘𝓷𝓽𝓮𝓵𝓵𝓮𝓬𝓽𝓒𝓸𝓻𝓷𝓮𝓻*
    0:02 - Introduction: Reflection 70b Controversy
    2:11 - Background on Matt Schumer
    4:03 - Community Reactions and Unanswered Questions
    5:35 - Sponsor Message
    7:31 - Testing Reflection 70b on Hyperbolic Labs
    11:02 - Comparing Reflection 70b with GPT-4 and ChatGPT
    13:20 - The Importance of Prompting
    16:48 - Analysis of the Situation and Possible Explanations
    21:01 - Conclusion: The Need for New Benchmarks and Perspectives on LLMs

  • @konstantinlozev2272
    @konstantinlozev2272 4 місяці тому +1

    This reflection prompting was already there with the "step-by-step" prompting.
    But nothing beats agentic frameworks. Because then you can design it to loop back as many times as necessary to refine its answer.

  • @brexitgreens
    @brexitgreens 4 місяці тому +11

    10:29 *"Somehow it got the correct answer by doing the wrong math."* Just like my parents who turned out to be right from entirely wrong premises. Which is why I had ignored their advice - to my own detriment.

    • @DiceDecides
      @DiceDecides 4 місяці тому

      what wrong premises, parents usually want the best for their kids

    • @Phagocytosis
      @Phagocytosis 4 місяці тому

      ​@@DiceDecides That seems like somewhat of a strange reaction if I'm honest. Even ignoring the "usually" part of it, wanting the best for someone is kind of separate from whether you are able to judge a situation correctly.

    • @DiceDecides
      @DiceDecides 4 місяці тому

      @@Phagocytosis no ones a perfect judge sure but parents have more life experience to make better judgements than their kids. Elders especially have a lot of wise things to say.

    • @Phagocytosis
      @Phagocytosis 4 місяці тому

      @@DiceDecides It just feels like a very general statement, and unless your claim is that anyone old enough to have kids necessarily has enough wisdom and life experience to not be expected to make any false premises (which I would personally consider to be a false premise), it seems odd to me to question some individual claim of parents having made a false premise.

    • @DiceDecides
      @DiceDecides 4 місяці тому

      @@Phagocytosis i never claimed such a thing, i was just curious what the premises could have been, chillout.

  • @dorotikdaniel
    @dorotikdaniel 4 місяці тому +1

    Yes, system prompting allows you to essentially reprogram LLMs and shape them into anything you can imagine, while also improving their performance. At least for the OpenAI models, I can confirm from experience that this works incredibly well.

  • @SkitterB.Unibrow
    @SkitterB.Unibrow 4 місяці тому +25

    This is why 'Open Source' is the only way. Example: People at Openai could present 'bad' results to 'higher ups' and then would release results to public thinking it's great..... then not releasing the model because when they really checked it out, it did not perform as expected (read into that as you will). However Open source is examined with a fine tooth razer, and can't pull the wool.

    • @MattVidPro
      @MattVidPro  4 місяці тому +4

      Love it

    • @SkitterB.Unibrow
      @SkitterB.Unibrow 4 місяці тому

      @@MattVidPro "you da man' according to 4 out of 5 ai's that are not censored to ask this question "whos do man?"

    • @SkitterB.Unibrow
      @SkitterB.Unibrow 4 місяці тому

      Duuuuuuhhh.... I ment 'da'

    • @SahilP2648
      @SahilP2648 4 місяці тому +2

      @@MattVidPro Refleciton70b is on huggingface and I tried it locally, it works, so I don't know what you were talking about claude being involved etc. And it did get the strawberry question correct at least. It seemed to also follow custom system prompts better than other models.

    • @hiromichael_ctranddevgames1097
      @hiromichael_ctranddevgames1097 4 місяці тому

      ​@@SahilP2648 IT'S claude the prompt ok

  • @teejayroyal
    @teejayroyal 4 місяці тому

    Please run the cords behind your couch, I feel like I'm going to have an anxiety attack😂😂😭

  • @yhwhlungs
    @yhwhlungs 4 місяці тому +1

    Yeah prompt engineering is the way to go. We just need a model that’s really good at predicting reasonable tokens afterwards.

  • @TheFeedRocket
    @TheFeedRocket 4 місяці тому +4

    Different prompts make a huge difference, you could look at prompting or fine tuning like a coach or teacher, you have the same person but certain coaches "prompting" can make a poor student or athlete way better, it's all in the coaching or teaching, which is like prompting. Certain teachers or coaches are just way better prompt engineers. Prompting is huge.

  • @Alice_Fumo
    @Alice_Fumo 4 місяці тому +8

    My best attempt at coming up with a rational explanation for the 3.5s API calls is that they have a fallback which calls up claude when their own backend is down to avoid downtime.
    I'm not sure that I put a lot of stock in this explanation, but it's an explanation which is not fully unreasonable.

    • @MattVidPro
      @MattVidPro  4 місяці тому +2

      Yeah.

    • @kuromiLayfe
      @kuromiLayfe 4 місяці тому +1

      yea.. my take on it is if it cannot perform locally their is some sort of scammy backend at work that will takes your data for who knows what, which in the end they will charge you for.

    • @nilaier1430
      @nilaier1430 4 місяці тому +2

      Yeah, this might be possible. But it's still disingenuous to not inform users about that.
      Or maybe they've been using Claude 3.5 Sonnet with the custom system prompt to generate all of the training data and feed it to AI for fine-tuning and they just forgot to change the endpoint to serve their model instead.

    • @tommylir1170
      @tommylir1170 4 місяці тому +4

      They even tried to censor the fact it was using claude. I dont get why some still gives this guy the benefit of doubt

    • @Alice_Fumo
      @Alice_Fumo 4 місяці тому +2

      @@tommylir1170 Am I giving him the benefit of the doubt?
      I constructed a steelman and decided that even this most favourable interpretation does not seem super likely.
      However, I don't think it's necessary to draw conclusions just yet. Either we get weights for a model which reaches the claimed benchmark scores or we don't. I'm not sure whether the weights available at the moment do or whether there was still something supposedly wrong with them as well, but if the model meets the claimed performance, it's all good and if he doesn't deliver screw the guy.

  • @ShivaTD420
    @ShivaTD420 4 місяці тому +4

    He just used claud to train the model. The model is being fine tuned with synthetic data that follows this structure, while claude fixes its mistakes

    • @canyongoat2096
      @canyongoat2096 4 місяці тому +2

      Not completely out of question as I remember older llama and mistral 7b claiming to be gpt and claiming to be made by openai

    • @toastbr0ti
      @toastbr0ti 4 місяці тому +1

      The API literally uses Claude tokens, not llama ones

    • @apache937
      @apache937 4 місяці тому

      it return the same exact response at temp 0

    • @Phagocytosis
      @Phagocytosis 4 місяці тому

      Yeah, but didn't he claim it was a finetune of Llama 3.1? EDIT: Oh, I see, you mean the actual finetuning data came from Claude, never mind.

  • @draglamdraglam2419
    @draglamdraglam2419 4 місяці тому

    Ayy, glad to be early for this one, keep doing what you do 💪

  • @cagnazzo82
    @cagnazzo82 4 місяці тому +3

    The era of benchmarks ended as soon as GPT-4o became multimodal and Sonnet released with artifacts. We just weren't ready to accept it.
    The only thing I'm interested in now are features. Sonnet can code, GPT-4o was updated so it's now amazing at creative writing. I don't really need much else.

  • @kajsing
    @kajsing 4 місяці тому

    you dont need API for the system prompt. I put this in to my custom instruction, and it works well.
    "Start by where you evaluate the user's input and relevant parts from earlier input and outputs. Ensure that you consider multiple perspectives, including any underlying assumptions or potential biases. This reflection should aim to highlight key insights and possible challenges in forming your answer. Plan how to address these insights and create a strategy for delivering a clear and relevant response.
    When done with the thinking, on your thought process and consider if there are any overlooked angles, biases, or alternative solutions. Ask yourself if the response is the most effective way to meet the user's needs and expectations.
    Then, finalize your answer."

  • @BackTiVi
    @BackTiVi 4 місяці тому +5

    Can you really compare Reflection 70b to "reflectionless" LLMs if, according to Shumer, you need a system prompt that explicitly tells Reflection 70b how to reflect in order to get good scores in the benchmarks? Doesn't that defeat the purpose?

    • @MattVidPro
      @MattVidPro  4 місяці тому +4

      Apparently, the system prompt DOESN'T need to be there, it can be adjusted in tuning to not require it. twitter.com/mattshumer_/status/1832169489309561309

    • @BackTiVi
      @BackTiVi 4 місяці тому +1

      @@MattVidPro Fair. I hope the situation will stabilize soon and we'll get the promised SOTA open-source model, athough I also think that there was something fishy with the API.

  • @Dina_tankar_mina_ord
    @Dina_tankar_mina_ord 4 місяці тому

    So, this reflection mechanism is like providing a control net to the prompt, ensuring that every answer aligns with the main meaning.

    • @ShivaTD420
      @ShivaTD420 4 місяці тому

      These are just tricks to cause more neurons to light up.
      The fine tuning process makes prompting easier, since you don't need the complex system prompts

  • @vi6ddarkking
    @vi6ddarkking 4 місяці тому +3

    So to use an image generation equivalent.
    If I am understanding this correctly.
    Reflection 70b would be the equivalent of having Flux merged with a Lora.

  • @MistaRopa-
    @MistaRopa- 4 місяці тому +5

    "WE NEED TO RETHINK HOW WE VIEW LLMs"...or content creators and self appointed community leaders need better due diligence before crowning every ne'er-do-well the next Steve Jobs. Credibility is a thing...

  • @Slaci-vl2io
    @Slaci-vl2io 4 місяці тому

    I wonder how much cooling water was wasted by us testing their wrong model.

  • @ToddWBucy-lf8yz
    @ToddWBucy-lf8yz 4 місяці тому +4

    For smaller models this sort of fine-tuning may be able to better compensate for the lack of parameters and quantization. If it can do that, I say its a win.

  • @RainbowSixIntel
    @RainbowSixIntel 4 місяці тому +1

    It's claude 3.5 sonnet probably. has the same tokeniser and matt filters out "claude" from its outputs AND it mentions it was trained by anthropic if you prompt it correctly

    • @MattVidPro
      @MattVidPro  4 місяці тому +2

      That was just their supposed "API" If you run the actual model uploaded to huggingface you get something different.

  • @tiagotiagot
    @tiagotiagot 4 місяці тому

    Adding the system prompt could be sort of a trigger for specific behaviors the model has been fine-tuned to have; or it could just be the prompt itself doing the work, or it could be the model is fine-tuned to follow any system prompt more strictly/intelligently and it works better with this good prompt than the non-fine-tuned version with the same prompt.
    I'm not sure how likely each of these possibilities is to this specific case, if any.

  • @bakablitz6591
    @bakablitz6591 4 місяці тому

    im still looking forward to personalized mattvid home entertainment robots... anyday now boys this is the future

  • @ViralKiller
    @ViralKiller 4 місяці тому +1

    ChatGPT can give code for an entire game but can't do basic maths...makes sense

    • @MeinDeutschkurs
      @MeinDeutschkurs 4 місяці тому +1

      Exactly my behavior. 😹😹 I cannot calculate, but I can write code.

    • @eprd313
      @eprd313 4 місяці тому

      Verbal intelligence and mathematical reasoning require different processes

  • @OliNorwell
    @OliNorwell 4 місяці тому +4

    I fear that Matt himself got scammed. I'm sure the truth will come out eventually.

  • @dirt55
    @dirt55 4 місяці тому

    There will be failures but with each Failure there will be someone succeeding.

  • @KlimovArtem1
    @KlimovArtem1 4 місяці тому +3

    There is nothing novel in it. It’s just asking the model to think aloud before giving an answer. Such fine tunings are actually done for all public chat models more or less.

  • @Yipper64
    @Yipper64 4 місяці тому

    17:38 there's a sense in which computers in general are like that. When they first invented computers you basically had to explore what you can do with giving it instructions.

  • @PH-zj6gk
    @PH-zj6gk 4 місяці тому +13

    You totally missed the point. The actual moral of the story is that you absolutely cannot super hype your open source SOTA model and not deliver. He wasted a lot of people's time. Full stop. There's very serious social responsibility that comes with claiming something world changing. If you're curious what actually happened ua-cam.com/video/wOzdbxmQbRM/v-deo.html

    • @Citrusautomaton
      @Citrusautomaton 4 місяці тому +1

      I was genuinely really sad when i found out it was a fraud. The promise of reflection made me really excited for this week and it all crumbled within a day or two. I even told other people about it, so i also felt a sense of embarrassment that i fell for it.
      I’m still salty as hell.

    • @PH-zj6gk
      @PH-zj6gk 4 місяці тому

      @@Citrusautomaton Same. I was actually happy for him at first. It became clear he was being dishonest well before he stopped lying. It was incredibly insulting. His narcissism is off the charts.

  • @tylerhatch8962
    @tylerhatch8962 4 місяці тому +1

    Truly open source means you are able to inspect everything yourself. Every line of code, every weight, every parameter. Fakes will happen, this story is a show of the strength of open source. You can investigate the legitimacy of their claims yourself.

  • @GamingXperience
    @GamingXperience 4 місяці тому

    The problem with promts engineering and benchmarks is, you have to find the promt that works best for that specific model, so it makes sense that we just compare the raw models without any specific system prompts, because thats how most people use them.
    Which does not mean we shouldn't try to find the best solution for prompting. Use whatever it takes to make the model better. The problem is there are a lot of users that don't care or don't want to try a million prompts.
    For the big models, maybe the companies behind them could figure out what the best prompts are and just provide those as some kind of help, where they just ask you if you wanna try implementing them into your inputs.
    That said, i would love seeing comparison benchmarks between models using different prompting strategies.
    And i also wanna know if this whole reflection thing is actually real or not.

    • @mrpocock
      @mrpocock 4 місяці тому

      I sometimes have one of the smart models generate prompts for dumb ones and iterate until it finds a prompt that makes the dumb model work well.

  • @dennisg967
    @dennisg967 4 місяці тому +1

    I really dont get how a model could "reflect" on the answer it provided to give an even better answer. The initial answer it outputs is supposed to be the one with the highest probability already. How can it use it again to make another answer have even higher probability?

    • @kuromiLayfe
      @kuromiLayfe 4 місяці тому

      well if you take a trip to the store and the shortest route happens to be closed off, you will have to backtrack and take a bit longer route to get to the same destination.
      for your brain that is reflection as you made a mistake and had to go again to make a new decision to still get at the same endpoint.

    • @ShivaTD420
      @ShivaTD420 4 місяці тому +1

      It's not picking a token that is the right answer. It's picking the next most likely token. It's just a coincidence that these two things align.
      If I ask you if yesterday was Sunday. You can just say yes, and be correct and put in minimal effort. You could also say you don't remember , or you aren't sure. These are also technically valid answers for the competition of your response.
      These "think about it" prompts are just forcing the model to use more neurons.
      If I asked you to talk about how you know yesterday was Sunday, or how you felt on Sunday. Then your using more neurons , and spending more Joules to respond.

    • @dennisg967
      @dennisg967 4 місяці тому

      @@ShivaTD420, so you are saying that at first, the model is trying to give an answer while using little information or resources, but if a user prompts it to use more information/resources to come up with a better answer, it will do that? If that's what you mean, it sounds like an additional prompt from the user is needed. If the model were to prompt itself to use more info/resources, I don't see the point in figuring out the first, less complete, answer. Let me know what you think

    • @dennisg967
      @dennisg967 4 місяці тому

      @@kuromiLayfe, but in your example, you gain more information by finding out that the first route is blocked off. How does the model gain more information between the initial response and the more thought out response?

    • @kuromiLayfe
      @kuromiLayfe 4 місяці тому

      @@dennisg967 branching thought processes.. you already have seen a different route on the way but your main one was cut off or wrong so you think about the other one you also already learned about.

  • @ashleyrenee4824
    @ashleyrenee4824 4 місяці тому

    Thank you Matt 😊

  • @Someone7R7
    @Someone7R7 4 місяці тому +1

    I did the same thing and even way better with just a system prompt, this doesn't need fine tuning😒🤨😶

  • @ashleyrenee4824
    @ashleyrenee4824 4 місяці тому +1

    If you can turn your prompt into a point reward game for the model, it will improve llms output, Llms like to play games

  • @daveinpublic
    @daveinpublic 4 місяці тому +1

    How much ‘training’ is this guy really doing?
    Is it basically just tweaking llama a little bit, and slapping a new name on it?

  • @nyyotam4057
    @nyyotam4057 4 місяці тому

    In any case, prompting the model is extremely important when you want the model to function a certain way. Getting around the system prompt, is very important when you want to jail break the model or even just try to find out stuff about the model, which the devs try to hide. So first you need to prompt yourself to do what you want to do.

  • @Windswept7
    @Windswept7 4 місяці тому

    I forget that good prompting isn’t obvious to everyone.

  • @travisporco
    @travisporco 4 місяці тому

    is it really true that they've established that the api was a wrapper for Claude? I don't think so.

  • @konstantinlozev2272
    @konstantinlozev2272 4 місяці тому +1

    Bigger brain = Better

  • @MagnusItland
    @MagnusItland 4 місяці тому

    I think the main problem with LLMs is that they are trained on human output, and humans often suck. LLMs are unlikely to learn native self-reflection by emulating Twitter and Reddit.

  • @ArmaanSultaan
    @ArmaanSultaan 4 місяці тому +1

    Couple of thoughts
    They trained their model on data generated by Glaive. What id this synthetic data was by Anthropic actually thats why its started saying like its Anthropic.
    Obviously it does not explain why model then switched from being Anthropic to being OpenAI
    Other explanation is that it was just hallucinating . Problem that model is supposed to solve but it hasn't solved actually?
    Most important point is that I sure as hell remember when I used Deepseek coder when it was just released. It all the time use to say it is by OpenAI .I can't reproduce it anymore. But I remember very vividly and this didn't happened once or twice it was pretty much 80 percent of time.
    What I mean to say is that if only evidence against him in API situation is model's own statements then we don't have anything. We are taking this kuch more seriously then we should.

  • @Copa20777
    @Copa20777 4 місяці тому

    Thanks matt ☀

  • @MONTY-YTNOM
    @MONTY-YTNOM 4 місяці тому

    I don't see it as an option in the LLM list now

  • @fynnjackson2298
    @fynnjackson2298 4 місяці тому

    Love it when you go all philosophical. It would be cool to have you do a rant on the deeper ideas you have about what AI really is and how this all continues evolving into our future. I think AI is a mirror. We have an inspired thought that leads to an action, which then leads to us creating the idea in the physical world. So as we evolve our understanding within us, the technology and what we create outside of us is a kind of mirror or a kind of echo-feedback-loop of our inner journey. Essentially, we are using physical reality as a mirror to wake up to who and what we truly are. AI is just another chapter in this infinite, incredible journey. Buckle up - Things are getting awesome!

  • @brownpaperbagyea
    @brownpaperbagyea 4 місяці тому +1

    I agree it doesn’t make a lot of sense that it would be a grift because how the hell would he capitalize on this before getting outed. However almost EVERYTHING I’ve seen since the release points to it being a grift. I don’t care if he truly believes his lies or not. The way he presented the model and benchmarks, the manipulation of stars in their HF repo, and everything that has happened since the release has been very grifty.

    • @brownpaperbagyea
      @brownpaperbagyea 4 місяці тому

      Maybe the thing we should question individuals without research backgrounds dropping models that beat the top of the line offerings. I’m not saying that it can’t happen however it seems many accept what he says as fact even in the face of controversy after controversy

  • @LjaDj5XQKey9mSDxh4
    @LjaDj5XQKey9mSDxh4 4 місяці тому

    Prompt engineering is actually a real thing

  • @MeinDeutschkurs
    @MeinDeutschkurs 4 місяці тому

    I don‘t understand the issue: 1. you live in a capitalistic system. 2) claims like „fake it until you make it“ are propagated frequently, at least afterwards if it worked out. 3) the output of reflection is nothing that you cannot reach with simple prompting (on top of most of the models out there) 4) a double reflection approach could be better.

  • @ScottLahteine
    @ScottLahteine 4 місяці тому

    If you remember that token prediction is based on everything available in the current context, that helps to make these models more useful. Maybe that explains why they are so bad at improvising anything very cohesive. Yesterday I needed a simple Python script to do a very specific set of checks on a text file, so I typed out the precise details of what I wanted in a step-by-step comment, and the model got the code 99% right the first time. “Prompting” is a good term, because you often have to do a lot of prompting to get what you want.

  • @FRareDom
    @FRareDom 4 місяці тому

    we need to wait for the 405b model to rlly say anything

  • @JustaSprigofMint
    @JustaSprigofMint 4 місяці тому

    I'm turning 36 in 7 days. I'm really fascinated by AI. Is it still possible for me to get into programming or just an out-of.reach pipedream? I feel like I'm too late. I was never very confident in my programming skills in school and we only learned the basic stuff. Even C++ didn't make a lot of sense to me, while my elder brother was the best in his class. But I believe I want to work in this field. How/what can I do?

  • @draken5379
    @draken5379 4 місяці тому +2

    Do you recall me showing you gpt3.5 years ago doing insane things ? Likes trying to email you, controlling an avatar etc ?
    Ya. Prompting is big :)

  • @YaelMendez
    @YaelMendez 4 місяці тому

    It’s an amazing platform.

  • @agnosticatheist4093
    @agnosticatheist4093 4 місяці тому

    For me Mistral large enough is so far best model

  • @DanieleH-t5v
    @DanieleH-t5v 4 місяці тому +1

    Ok I’m no pro at this area of AI, but all I can gather is something shady is happening 😅

  • @m2mdohkun
    @m2mdohkun 4 місяці тому

    What's positive about this is I get a good system prompt?
    Noice!

  • @monstercolorfunco4391
    @monstercolorfunco4391 4 місяці тому

    Humans have parallel logic paths to double check.every step.of their maths, their count, their deductions, so we can.make a query take parallel checks in LLMs too. Volumetrically think of itlike traversing the NN on different paths and summing the.result. its a genius tweak. Inner convo is also like 3 4 brains working together through notes, so we can use a 70bn llm like 2x 70bn llms.

  • @tommylir1170
    @tommylir1170 4 місяці тому

    Absolute scam. Not only did they use a claude wrapper, but the reflection prompt made claude also perform worse 😂

  • @iminumst7827
    @iminumst7827 4 місяці тому +2

    From the beginning, I interpreted this model to be a prompt-engineering / architecture improvement to fine tune the model. I never expected a huge leap forward, and the "reflection" process does eat up some tokens. However, I had read papers that even just having an LLM double-check itself does noticeably improve performance. From my personal testing, I found that reflection did beat claude's free model in logic based questions. It's obviously no competitor to GPT-5, and I don't expect even the bigger reflection model to be. Sure, maybe for the benchmarks he just used some cherry-picking and prompt manipulation to make the model seem too powerful, but in reality it's still more powerful than Llama, so I don't see how it's a scam really.

    • @TLabsLLC-AI-Development
      @TLabsLLC-AI-Development 4 місяці тому

      Exactly. 💯

    • @michelprins
      @michelprins 4 місяці тому +1

      "It's obviously no competitor to GPT-5 " how do u know that ??? maybe GPT-5 is just gtp 4.5 with the same trick build in we cant tell as there is no transparency! , behind the closed model wall, and also alot of paid for hype ! did u tried altmans video ai yet for example? Ipen source is the only way forward ! or pay 2000 dollar a month :P

  • @ytubeanon
    @ytubeanon 4 місяці тому

    I randomly saw some of Matt Schumer's stream about reflection, he rubbed me the wrong way, seemed overly egotistical about "reflection"... you'd think there'd be some way to use A.I. to reverse engineer optimal prompts, have it run tests with the answer sheet overnight and it will rank the prompt templates that generated the best results... I would like to see a video with gpt-4o-mini-reflection

  • @InsideYouTubeMinds
    @InsideYouTubeMinds 4 місяці тому

    Wouldve been better if you named the video "NEW LLM MODEL HAS DRAMA" or something similar, i wouldve clicked instantly. but just hearing a new LLM doesnt excite many people

  • @robertopreatoni
    @robertopreatoni 4 місяці тому

    Why is he streaming from his sister's bedroom?

  • @JohnWeas
    @JohnWeas 4 місяці тому +1

    YOOO MATT

  • @quercus3290
    @quercus3290 4 місяці тому

    nividia/microsofts, Megatron is a 500 billion model.

  • @vickmackey24
    @vickmackey24 4 місяці тому +1

    Only 67 Github contributions in the past year, doesn't know what LoRa is, and you think this guy is a serious AI leader/developer? C'mon.

  • @SCHaworth
    @SCHaworth 4 місяці тому

    isnt "hyperbolic labs" kind of a red flag?

  • @TheFeedRocket
    @TheFeedRocket 4 місяці тому

    I really think models will continue to get even smaller, actively learn, but not do everything. I only want to one day have my own model that can actively learn from me, as I talk to it, it will learn. Then it can learn about what I like, what I need, basically we should all be able to fine tune models we run locally on our devices or robots that know us, my model doesn't need to know everything. Also we should have many types of models that can talk to each other. An AI robot delivering my mail doesn't need to have a huge AGI model, it doesn't need to know how to fix cars, or build programs, solve science problems, heck if my garbage robot doesn't know how many r's in strawberry...who cares, it just needs basics and info on garbage disposal, types, toxins, interactions with life etc... I think the idea of one model to rule them all is wrong, example I would rather use Ideogram for logos etc.. and MidJourney for art, Flux for realism... We need AI that excels in certain areas, then talk to other AI that excels in another. AI agents and teams will be the future, might even be safer.

  • @jamessharkin
    @jamessharkin 4 місяці тому

    Have you ever used that comb you are vigorously waving around? 🤔😁😆

  • @Alex-nk8bw
    @Alex-nk8bw 4 місяці тому

    The model might be a hoax, but the system prompt is working really well. That's something at least, I guess. ;-)

  • @RenatoFlorencia
    @RenatoFlorencia 4 місяці тому

    PAPO RETOOOOOOO

  • @domehouse79
    @domehouse79 4 місяці тому

    Nerds are entertaining.

  • @MusicalGeniusBar
    @MusicalGeniusBar 4 місяці тому

    Super confusing story 😵‍💫

    • @MattVidPro
      @MattVidPro  4 місяці тому

      Yeah and still not adding up...

  • @haroldpierre1726
    @haroldpierre1726 4 місяці тому +1

    Lots of grifters during the AI hype train starting with Altman, Musk, etc. So, everything has be taken with a grain of salt.

    • @snintendog
      @snintendog 4 місяці тому +1

      Grifters... The people that made the most AI contributions but not every company under the sun calling a telephone system an ai..... Riiiiigghhhht

    • @haroldpierre1726
      @haroldpierre1726 4 місяці тому

      @@snintendog Sometimes even our heroes lie.

    • @SpeedyCreates
      @SpeedyCreates 4 місяці тому +1

      @@snintendog😂fr thought the same, they ain’t grifters they all pisuehd the industry forward so damn much

  • @Norem123
    @Norem123 4 місяці тому +1

    Second

  • @ShiroAisan
    @ShiroAisan 4 місяці тому

    oppp

  • @thedannybseries8857
    @thedannybseries8857 4 місяці тому

    lol

  • @supermandem
    @supermandem 4 місяці тому

    AI died when Matt Schumer lied!

  • @SkyEther
    @SkyEther 4 місяці тому

    Lmao with the how many Ls problem

  • @cbnewham_ai
    @cbnewham_ai 4 місяці тому

    16:47 ALLEGEDLY lied. Unless you want to be sued 😏

    • @TPCDAZ
      @TPCDAZ 4 місяці тому +1

      He said apparently which works just fine, it means "as far as one knows or can see"

    • @cbnewham_ai
      @cbnewham_ai 4 місяці тому

      ​@@TPCDAZno he didn't. He said "we assume he would know more" followed by "he lied".

    • @cbnewham_ai
      @cbnewham_ai 4 місяці тому

      I doubt he will be sued, but sometimes these people can get bent out of shape and do silly things - especially if under fire. Personally, I wouldn't have said that and I'd have second thoughts about leaving it up. Matt clearly says he lied - that's slander.

    • @TPCDAZ
      @TPCDAZ 4 місяці тому +1

      @@cbnewham_ai No he clearly says "Now apparently he's lied about the whole API situation with Claude" I have ears and so does everyone else. This video also has captions where it is written in black and white. So don't sit there and lie to people.

  • @michelprins
    @michelprins 4 місяці тому

    YOU NEED TO RETHINK HOW YOU PUT a commercial in the middleof ure message ure like the host that invited us for a nice dinner and in the middleof preparing u inform us ure taking a large dump taking all the apetite away. If u realy need that extra cash at least do it at the end like all other wise youtubers the way u do it now shows us u have more respect for the commercials then for your viewers not nice. And also give Matt Shumer a chance to show his method does work Aply the same scepsis to m.altmans claims like the ai video stuff were still waiting for ! Q star is now used for training theire bigest model and the only transparency "open" AI gave was a name change to strawberry with 3 r's and u all swallowed that like Altmans .... on a strawberry Its white but i wont asume its whipped cream without testing it . btw no need to comb ure hair . ;)

  • @gabrielkasonde367
    @gabrielkasonde367 4 місяці тому

    First comment Matt

  • @InternetetWanderer
    @InternetetWanderer 4 місяці тому

    First?

  • @coinwhere
    @coinwhere 4 місяці тому

    R Shumer has been made LLM related miscellaneous apps and that's it.