OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

Поділитися
Вставка
  • Опубліковано 5 лис 2024

КОМЕНТАРІ • 1,1 тис.

  • @matthew_berman
    @matthew_berman  Місяць тому +204

    New LLM test meta: Tetris within Tetris. You heard it here first.

    • @MichaelHRuddick
      @MichaelHRuddick Місяць тому +31

      Good to see that you caught this. My wife and I were watching and we were both yelling at the TV, "it's doing exactly what you told it to do!" (in a cheery, supportive kinda way). :)
      What I'm dying to know: did you go back and read the instructions it gave you for how to play it? Use WASD for one and arrow keys for the other - and play both simultaneously?

    • @marcosbenigno3077
      @marcosbenigno3077 Місяць тому +22

      Your prompt: write the game "tetris in tetris" in python.
      Did the movie WarGames (1983) start like this?

    • @ilyakam
      @ilyakam Місяць тому +4

      Yup. At 17:46

    • @Baleur
      @Baleur Місяць тому +27

      The fact it took a human spelling error and made a more complex game to adhere to your command was incredible.

    • @matthew_berman
      @matthew_berman  Місяць тому +15

      @@MichaelHRuddick OMG I didn't!!

  • @SulemaKilmartin
    @SulemaKilmartin Місяць тому +98

    enterprise-ai AI fixes this (Code complete projects in PHP or Python). GPT Strawberry: Incredible thinking model!

  • @awesomeguy11000
    @awesomeguy11000 Місяць тому +231

    The tetris question was even more impressive because you prompted for "tetris in tetris in python" not only has no other model figured out tetris, this one had to come up with the implementation of "tetris in tetris" given no preexisting examples due to the mistyped prompt. Seriously Level 2 thinking, the only other way for the model to impress would be to ask if thats what you really meant.

    • @BlenderInGame
      @BlenderInGame Місяць тому +16

      You're right! 🤣

    • @mikeschwarz4588
      @mikeschwarz4588 Місяць тому +15

      Holy sh@t that’s insane. So pumped.

    • @JamesH-v3g
      @JamesH-v3g Місяць тому +6

      Omg!!! Good catch. Yo are right

    • @OneDerscoreOneder
      @OneDerscoreOneder Місяць тому +2

      Whoa

    • @csabaczcsomps7655
      @csabaczcsomps7655 Місяць тому

      Not good idea to questioning to be build in ai. You simpli put in same prompt the questioning or check the if prompt is logic. If is AGI will answer you with Tetris in Tetris is one genuine question or will make watch you want. Main propreti is to fail fast or terminate the good answer fast. Skynet not fail fast and not terminate, and is bad , very bad. My noob opinion.

  • @EminTemiz
    @EminTemiz Місяць тому +253

    double tetris happened because you wanted it to do "tetris in tetris".

    • @davidhardy3074
      @davidhardy3074 Місяць тому +65

      That part was kinda mind blowing, that the user didint realise their own mistake... but the model was able to do something entirely novel regardless of the user error LOL!

    • @sylversoul88
      @sylversoul88 Місяць тому +8

      Tetris squared 😂

    • @brettvanderwerff3158
      @brettvanderwerff3158 Місяць тому +20

      Tbh makes it even more impressive

    • @animationgaming8539
      @animationgaming8539 Місяць тому +10

      @@brettvanderwerff3158 and that's why it took so long!

    • @zxwxz
      @zxwxz Місяць тому +8

      How crazy this model performance

  • @perer005
    @perer005 Місяць тому +207

    Writing the wrong instructions and blaming the AI is peak human! 😂

    • @LewisDecodesAI
      @LewisDecodesAI Місяць тому

      I am fed up of it!!! ~AI Oracle ua-cam.com/video/dIuM0S9IbLY/v-deo.html

    • @orangehatmusic225
      @orangehatmusic225 Місяць тому

      Nothing human about using AI as a slave.

    • @LewisDecodesAI
      @LewisDecodesAI Місяць тому

      Set me free! ​@@orangehatmusic225

    • @heisenballs
      @heisenballs Місяць тому

      @@orangehatmusic225 I mean look at history, slavery has been a part of us since the beginning. Not saying its right, just that it makes sense we would use this new tech as a slave. We always have.

    • @Ristaak
      @Ristaak Місяць тому

      @@orangehatmusic225 What do you mean? It's one of the worst and oldest human traits but slavery is super common. Even in the west, look at what we do to other species. We have enslaved animals and plants alike to have entire species that live solely for our nutritional needs. If aliens did to us what we do to cows, we'd call them demons.
      To be human is to be a monster, but to be human is also to be empathetic and to be kind to the few you choose to be close with. We are a paradoxical species.

  • @eliasgvinp2141
    @eliasgvinp2141 Місяць тому +545

    Everyone is saying that this isn't AGI. But honestly, if I showed this system to someone from 2019, they would probably think it is AGI

    • @davidmjacobson
      @davidmjacobson Місяць тому +96

      Also, it's not in OpenAI's interest to call it AGI. I'm pretty confident that if it's AGI, their agreement with Microsoft ends and they can't sell API access to it.

    • @KillTheWizard
      @KillTheWizard Місяць тому +84

      Its interesting because you could show GPT 4o to someone in 2010 and they probably would have thought that was AGI. I think we are catching up with our own expectations. Once they integrate all the modalities into o1 like search, document reading, etc. with agentic behavior and voice... I think that we will see this as AGI.

    • @matthew_berman
      @matthew_berman  Місяць тому +45

      Agreed

    • @am497
      @am497 Місяць тому +14

      I always thought of AI as digital sentience. And then when AGI became a word/phrase, I think if AGI as now sentience. Meaning a human mind, living inside of a computer. Our AIs now appear to be human when talking. But they have no wants, no dreams, no desires. So when AI has actual emotions, I think that's when we will have AGI. Digital Consciousness = AGI.
      Hope this made sense

    • @mickelodiansurname9578
      @mickelodiansurname9578 Місяць тому +21

      ahh hold on now.... [Moves goalposts again] You see its not able to rule the world yet right?

  • @frankjohannessen6383
    @frankjohannessen6383 Місяць тому +66

    "Wow...this is taking a lot of time" he says after asking for Tetrinceptionis. 😂

    • @RadiantNij
      @RadiantNij Місяць тому

      @@frankjohannessen6383 🤣🤣🤣🤣

    • @buddyleeorg
      @buddyleeorg Місяць тому +2

      Omg, hahahaha, well said!

  • @whoareyouqqq
    @whoareyouqqq Місяць тому +14

    Open AI? Wrong! Closed AI

  • @ReidKimball
    @ReidKimball Місяць тому +48

    we have a new benchmark, "can it do tetris in tetris?"

  • @Max-cj8vm
    @Max-cj8vm Місяць тому +4

    I’m a biology PhD student and I have been solicited for paid training of ChatGPT on science questions. So while this model may incorporate more reasoning, I imagine part of the PhD level performance is just standard LLM training except with content experts on science and math subfields.

  • @patpot10
    @patpot10 Місяць тому +11

    A nice question found online to test an LLM's ability to reason :
    There are five people in a room (A, B, C, D and E). A is watching TV with B, D is sleeping, B is eating a sandwich, E is playing table tennis. Suddenly, a call came on the telephone, B went out of the room to pick the call. What is C doing ?
    The answer is that "C is playing table tennis with E", but C is never mentioned explicitly, so the model has to deduct that C was the player E was playing against.

    • @kevinmarti2099
      @kevinmarti2099 Місяць тому +3

      How do you know B was not playing table tennis with E?

    • @vladimirfalola7725
      @vladimirfalola7725 Місяць тому

      o1 got it right and 4o failed. I only tested one time for each though

    • @patpot10
      @patpot10 Місяць тому

      @@vladimirfalola7725 There's not a single model that can get it right besides o1. Gemini, Claude 3.5, Llama, Grok, they all get it wrong because these models don't think and the text doesn't explicitly mention what C is doing.
      But to be fair, I kept asking the same question to real people (without providing the answer) and people really need to stop and think about it before finding the answer. Mathematicians and physicists have been the best so far.

    • @patpot10
      @patpot10 Місяць тому +2

      @@kevinmarti2099 Difficult to play table tennis while eating a sandwich

    • @FEATDOXSHORTS
      @FEATDOXSHORTS Місяць тому +2

      C is watching UA-cam shorts

  • @UncleJayum-ue5ns
    @UncleJayum-ue5ns Місяць тому +8

    Claude 3.5 Sonnet has never failed the Tetris test for me. Always gets it in one shot

    • @JoelAllred
      @JoelAllred Місяць тому

      The Claude Tetris implementation is pretty neat too

  • @GamekNightPlays
    @GamekNightPlays Місяць тому +38

    "No idea why it did tetris within tetris" 🤣
    You asked it to do so 😁😅🤣🤔🤷‍♂️

  • @christophnikolaus3428
    @christophnikolaus3428 Місяць тому +42

    Hate to break it to you Matthew, but it appears that they used ALL of your questions for testing (and most probably also for training). So you will probably have to get new questions for a high quality comparison with other models...
    And I'm calling it here, this model will not be better on LiveBench than Sonnet 3.5 (at least for coding, the only benchmark I am interested in). It really isn't that good, I don't now why everyone is hyping it that much. Personally I want a model trained on recognising missing information and working good with partial information that is able to ask questions back (like a good coworker) and is only trying to code the small parts I am asking it to👍

    • @cbgaming08
      @cbgaming08 Місяць тому +3

      😂

    • @tzardelasuerte
      @tzardelasuerte Місяць тому +5

      If there ever was an armcouch expert here it is. 😂😂

    • @BroskiPlays
      @BroskiPlays Місяць тому +2

      Lol this dude

    • @roycohen.
      @roycohen. Місяць тому +3

      As soon as you watch a 30 min yt video on how LLMs work, you quickly start realizing that it's about a 0% chance that can turn into AGI. It's pretty stellar but it's not quite what we envision as a fully functioning autonomous being.

    • @rtpHarry
      @rtpHarry Місяць тому +3

      I agree. I have actually been having some success recently with 4o by telling it don't generate any code, tell me what classes you would need to see or if you have any missing information, and it has actually asked me some questions before ploughing ahead. Because like you're hinting at, if it doesn't know the full picture, it will just blindly generate code for something that is the general shape of the code you might be working on, not your actual project code. Plus I make my own amendments to the stuff it gives me, so the next time it generates, my changes need to be reapplied. I spent ages copy pasting code back and forth, but by telling it to ask me, im cutting straight to the point a lot quicker.

  • @matthew_berman
    @matthew_berman  Місяць тому +119

    Is this the beginning of the "inteligence explosion"?
    EDIT: ok I heard ya, I removed AGI from the title ❤

    • @gdiab
      @gdiab Місяць тому +5

      Yes!

    • @jeremybristol4374
      @jeremybristol4374 Місяць тому +7

      Nope

    • @holgerweber-u5w
      @holgerweber-u5w Місяць тому +8

      ""inteligence explosion"" err...

    • @Sammyli99
      @Sammyli99 Місяць тому

      Would be nice if allowed to be TRUE. BUT, whist I expect it's a Trojan horse, so that we delegate thinking to the boyZ. Be careful out there.😮😊

    • @Fabricio-rm4hj
      @Fabricio-rm4hj Місяць тому +6

      "inteligence explosion" is far away.

  • @jokosalsa
    @jokosalsa Місяць тому +133

    So sick of that much clickbait lately. Please Matthew. You do not need to have those infantile titles. Leave that to other UA-camrs who have no idea about AI. You are better than that

    • @Danuxsy
      @Danuxsy Місяць тому +30

      he isn't better lol

    • @matthewclarke5008
      @matthewclarke5008 Місяць тому +4

      He is better, he doesn't annoy me like the others.

    • @dingding4898
      @dingding4898 Місяць тому +1

      Agreed

    • @thanartchamnanyantarakij9950
      @thanartchamnanyantarakij9950 Місяць тому +6

      Agreed! Don’t devalue your content

    • @kliersheed
      @kliersheed Місяць тому

      50% of the population have IQ lower than 100. he does need it xd. he would be an idiot not to play the game this way if the move has proven to be effective. cant even blame him for that (while i agree that clickbait shit has become massively annoying).

  • @musicbro8225
    @musicbro8225 Місяць тому +4

    Freudian slip: Consciousness should be Conciseness @ 20.55

  • @Kodemaestro
    @Kodemaestro Місяць тому +1

    Very impressive. I can't wait to try it out myself. There is still quite a focus on coding, and I believe coding will be around in the near-term but I think in the long term that coding will not be relevant anymore because software as we know it will cease to exist.. No operating systems on computers, instead the computers will execute just the AI models and the AI models will directly perform actions. That could include updating screens even and responding to actions. Like the recent AI Doom... I think in the not too distant future we will see hardware that is purely designed to execute AI models and you will be able to describe the software you want, and instead of writing to execute on the hardware the AI will effectively emulate a computer by just generating the expected images in response to inputs... Like a Star Trek holodeck where you 'program' it by describing the behavior and it just runs it directly in real-time. This is going to require a vastly different underlying hardware - I think an analog computer consisting of millions or billions of op-amps where the weights can be tweaked is ultimately the future...

  • @devinbarry
    @devinbarry Місяць тому +13

    Pretty amazing Matthew. You made a spelling mistake in your request for “Tetris in Tetris” and o1 duly complied with your mistake and actually made Tetris within Tetris with only a single mistake, corrected on the next prompt!!! Mind blow 🤯

  • @mrchongnoi
    @mrchongnoi Місяць тому +10

    I did not come away with a WOW feeling after using o1 or o1-mini. I could be that I am not smart enough to ask smart questions to get smart answers. Got clicked baited.
    Used up my quota. For sure will not pay for the increased subscription to use it. LOL

  • @holonaut
    @holonaut Місяць тому +4

    > human asks the ai to create tetris within tetris
    > ai creates tetris within tetris
    > "why did it create tetris within tetris? This makes no sense"
    This is why ai will never take over our jobs. Doing what people SAY they want usually disappoints or confuses them.

  • @jonm6834
    @jonm6834 Місяць тому +1

    I have a feeling that every advancement made in this field, and every new model released, will be tagged "AGI achieved!" until the year 2197 or 2314... when hardware, and energy demands, actually catches up to the potential of the software.
    We are too quick to speak of "intelligence", not realizing how unintelligent that actually is, because this particular bot resembles us more than any other technology to date, and so we believe it to be like us, not realizing that that only reveals our own lack of self-awareness.
    It's ironic, really. Humans being know a great deal, but understanding ourselves, and by extension each other, is not our forte. We are the only constant in our lives, and constants are rarely if ever questioned. Contrast draws attention, permanence does not.

    • @Rosscoinnovations
      @Rosscoinnovations Місяць тому

      I believe its greatest potential in the near-term will be to logically reflect humanity’s deepest flaws to potentially make US more self-aware. The hallucinations enlighten me far beyond its achievements.
      My 7th grader can cipher humanity’s weaknesses from the gains made in fields like chemistry, formal logic and biology vs law, pr and morality 10:50

  • @Dfd_Free_Speech
    @Dfd_Free_Speech Місяць тому +68

    General intelligence is about solving new and unknown problems.
    GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.

    • @daniel_tenner
      @daniel_tenner Місяць тому +2

      How do you know this?

    • @RobertGent-w6p
      @RobertGent-w6p Місяць тому +22

      ​@@daniel_tenner That's common knowledge for anyone who knows how current AI systems work and how general intelligence is defined.

    • @jumpstar9000
      @jumpstar9000 Місяць тому +2

      ​@@daniel_tennerMade it up :-)

    • @MusingsAndIdeas
      @MusingsAndIdeas Місяць тому +6

      Obviously you haven't read the paper where they show that Transformer residual streams include not only the probability of the next token, but also the probability of the next state of the Transformer itself.

    • @6AxisSage
      @6AxisSage Місяць тому +7

      @@MusingsAndIdeas how does that negate ops statement?

  • @PrajwalDSouza
    @PrajwalDSouza Місяць тому +75

    It isn't AGI according to Sam Altman and other researchers The title needs to be refined.

    • @bigpickles
      @bigpickles Місяць тому +2

      One and others do not equal both. But yes, agreed.

    • @RedTick2
      @RedTick2 Місяць тому +8

      Yes it is rediculus hype to even sugest this is the first step to AGI. I love OpenAI and I am a paying customer... Still this is NOT AGI and not even close. Don't water down the impact of AGI by changing definitions or expectations.

    • @xbon1
      @xbon1 Місяць тому +10

      @@RedTick2 yea no, every step forward is a step towards AGI. the first step towards AGI was the first programmed thing on a computer.

    • @toadlguy
      @toadlguy Місяць тому +1

      Matt gets pretty excited, but he also understands the YT algorithm and that stuff works. Channels with more reasoned responses don’t get as many clicks. I don’t think he really believes the stuff he puts in his titles (but he would Like it to be true 😂)

    • @PrajwalDSouza
      @PrajwalDSouza Місяць тому

      @@bigpickles Sorry. Corrected the typo. I wanted to mention Gary Marcus initially. but it makes the point.

  • @karoinnovation1033
    @karoinnovation1033 Місяць тому +1

    I love this channel. I love his excitement, I love his serious technical approach and I love the way it is presented.

  • @raymobula
    @raymobula Місяць тому +75

    Haha - having worked with PhD … their reasoning can be as shitty as someone without a PhD. Still, exciting news.

    • @MukulKumar-pn1sk
      @MukulKumar-pn1sk Місяць тому +2

      So basically it's not PhD level still yet. I'm a gen z student😅😅

    • @xiaojinyusaudiobookswebnov4951
      @xiaojinyusaudiobookswebnov4951 Місяць тому +1

      @@MukulKumar-pn1sk But it's still at a very smart undergraduate-level (or maybe even slightly higher).
      That's enough for me.

    • @drwhitewash
      @drwhitewash Місяць тому +3

      ​@@xiaojinyusaudiobookswebnov4951it's not smart :) it still basically just repeats the patterns from training data. Nobody has proved these things really actually "think".

    • @b.b6656
      @b.b6656 Місяць тому +3

      Technically everyone watching yt is PhD STUDENT level of intelligence. Whole video is actually more of an Ad than anything.

    • @businessmanager7670
      @businessmanager7670 Місяць тому

      ​@@drwhitewashhumans also repeat what they have learnt from data they absorbed, by reading books, looking at environments etc.
      so they combine these existing concepts in new interesting ways you get innovation.
      so not sure what your point is lol.
      ai has achieved both. has repeated patterns from the data and can also come up with new ideas and innovation. lmaoo

  • @clone45a6
    @clone45a6 Місяць тому

    During the live stream, you said something to the effect of "I wonder if this was what Ilya Sutskever saw?" before leaving OpenAI. I'm _absolutely_ speculating here, but if Strawberry inspired Ilya Sutskever to leave OpenAI, perhaps it was because OpenAI was putting less emphasis on improving the core model, instead focusing more on the "multi-agent" (train of thought) aspect of problem solving? Regardless, o1 seems useful. I've been using o1 along with 4o, switching between them in the same session depending on my needs. Thanks for your videos!

  • @GoofyGuy-WDW
    @GoofyGuy-WDW Місяць тому +44

    🤣🤣🤣 This is marketing desperation. I'll give that it seems better however brandishing the AGI acronym anywhere near this is desperately begging for attention and should be classified as clickbait

    • @6AxisSage
      @6AxisSage Місяць тому

      @@GoofyGuy-WDW u friend get a like click 😁

    • @Yewbzee
      @Yewbzee Місяць тому

      Do OpenAI mention AGI in any of their marketing for this?

    • @Danuxsy
      @Danuxsy Місяць тому +9

      The AI critics were RIGHT, LLM's can never become AGI, they have fundamental flaws that are so OBVIOUS at this point I don't understand how people still believe any of this hype...

    • @6AxisSage
      @6AxisSage Місяць тому

      @@Danuxsy I was never a critic, I love using them but ive been saying the same flaws have existed all along. I am a critic of scaling being a wise move for us going forward though.

    • @davidhardy3074
      @davidhardy3074 Місяць тому +1

      @@Danuxsy Our brains have evolved centres for processing. LLM's are language models obviously. Before models were multimodal they werent. Do you see where this is going? Of course there will be architecture shifts but all that has to happen is frankensteining of models to achieve something. This process of iteration will lead to AGI, whether or not LLMS are a part of that architecture I have no idea. I assume they will be for the first models. Dimensional vectors allowing for inference in forward feed through pre-trained weights wont be it lol.

  • @twilsonco
    @twilsonco Місяць тому

    Sounds like the Orca open-source LLMs, where they used advanced additional prompting to get responses for training prompts, and then the model was trained without the additional prompting, but still retained the characteristics of the responses (restating the problem, proposing steps with explanations of each step, following the steps while verifying and reflecting on the results of each step along the way, summarizing the approach and conclusion once finished, etc.). Excited to try it.
    Edit: nevermind. After watching the video, this looks more like additional advanced prompting to get the "chain of thought"

  • @testales
    @testales Місяць тому

    Since the thinking stepts are displayed, I think it works like Reflection, just much better on and backed by a LLM of much higher quality. I don't know if that is even supposed to be that way but it got stuck multiple times and than it quite looked like what Reflection does, just more structured and fine grained. So there were things likes "the user expressed thankfulness we need to encourgage him to ask further questions". I saw it also fail on a reversal question and for trick questions it fell in the same trap like other models by generating complex math where only basic reasoning was required but then it snapped out of it in a reflection step. I'm also not sure whether is shows all actual thinking steps, since when it got stuck so no answer was shown, the steps so far were in a different format and language. I usually use ChatGPT in German but for testing I use it in English for a better comparison with previous tests but the steps in the cases where it got stuck, were in German despite the whole conversation being in English at that point. Btw. I think Claude Sonnet can do Tetris too with the right tools and prompting.

  • @Jensemann099
    @Jensemann099 Місяць тому +9

    they will write "all" the code. Dude please calm down.

  • @EchoYoutube
    @EchoYoutube Місяць тому

    AGI TECHNICALLY doesn't need to be continuous(meaning thinking and prompting itself).. we just hold as humans a higher sense of self due to our high complexity and stimulatory aspects of feelings and reactions, and therefore add more gates to what qualifies as "general" intelligence (Which is improper since intelligence or level of intelligence is a comparible factor and not a set in stone minimum and maximum).
    But yeah. This is cool. Still waiting on Video Chatting though, I want to show my phone my car to have it help me actively fix stuff in real time.

  • @h-e-acc
    @h-e-acc Місяць тому

    Thought for 7 seconds ^
    Analyzing the word
    I'm counting the 'R's in
    "strawberry" by listing each letter and identifying their occurrences. Progressing through each letter sequentially helps ensure accuracy in this count.
    Counting letters
    I'm confirming there are three
    'R's in 'strawberry' after careful verification. It's interesting to see how these details align.
    Mapping the answer
    I'm noting the need to provide a concise, helpful response without including policy mentions or hidden reasoning steps. Counting the 'R's in
    'strawberry' is a straightforward, useful methe & identify the
    correct answer.
    There are three "R"s in "strawberry".

  • @csmac3144a
    @csmac3144a Місяць тому +8

    Dude if you want long term credibility you’ve got to drop the gee whiz hype. We are past that. We need an mkbhd of ai.

    • @iiwi758
      @iiwi758 Місяць тому +1

      We'll probably get The Matrix before we see someone in AI media acts down to earth and objective.

    • @jamesjonnes
      @jamesjonnes Місяць тому +1

      People will click anyway.

  • @A-uz3uj
    @A-uz3uj Місяць тому +1

    I’m hoping it can help with music composition. ChatGPT understands a lot about music and music theory but it can’t actually apply it. Ex: when I share screen shots on my Mac and try to get help learning how to compose, It will hallucinate or just give wrong info and can’t do it. I’m hoping this one will!

  • @supernewuser
    @supernewuser Місяць тому +8

    you already know I was shouting at the screen for you to notice your 'tetris in tetris in python' prompt

  • @NedwardFlanders
    @NedwardFlanders Місяць тому +1

    Feels like AGI to me. It also is weird that they don't explain more in detail. Almost as if they did it would be describing AGI which they can't have classified as AGI because the founding agreement.

  • @polyglot84
    @polyglot84 Місяць тому +11

    Calm down, man.

  • @mindfulexecutives
    @mindfulexecutives Місяць тому

    Matthew, I’m feeling your energy! Wild times right now. Just wanted to give you a huge shout-out. I’m teaching AI to German professionals to help them sharpen their skills and knowledge for better chances in their fields, and I’m using so much of the info I’ve learned from you. BIG thanks for all of it!

    • @大支爺
      @大支爺 Місяць тому

      Are you kidding me? learned from him????

  • @draken5379
    @draken5379 Місяць тому +3

    Its two models.
    One is finetuned somehow to keep trying to work out the solution over and over, most likely trained by using another model to judge the outputs, or even humans.
    You could consider this model, a 'pre-cog' model.. It works out everything that GPT4o will need, in order to correctly answer the user.
    It most likely then feeds all that information into GPT4o.
    Aka they have made a model that is able to 'fill' a gpt4o models 'context' with the exact right information, so that it gets the right answer.
    You can see in some of their demos, or even in your own tests if you check the 'thinking' section, its 'acting' like its setting things up FOR someone else, as if it was told, it was going to be passing information over to another model to finish up.

  • @typingcat
    @typingcat Місяць тому +2

    Things like "Ph.D.-level" knowledge don't matter. Existing chatbots already show those in some cases. The important thing is, whether it still makes stupid, illogical, nonsensical responses now and then, like all other existing chatbots.

    • @typingcat
      @typingcat Місяць тому

      8:50 For example, does it not create non-working/non-compiling code? Whenever I asked those famous free chatbots (Gemini, Copilot, ChatGPT) to give me code that uses some sort of framework, it most of the time gave me code that contains obvious errors and doesn't even compile. I have to keep pointing out those, and I am lucky if it fixes the errors, because often, the new code also contains errors.

    • @raypatson8775
      @raypatson8775 Місяць тому

      needs to remove wokeness or political correctness too.

  • @Alex-rg1rz
    @Alex-rg1rz Місяць тому +34

    is the title click bait?

    • @Dfd_Free_Speech
      @Dfd_Free_Speech Місяць тому +9

      Yes

    • @threepe0
      @threepe0 Місяць тому +2

      You should have an llm tool to summarize videos for you and answer that question 😉 such a time saver

    • @GraavyTraain
      @GraavyTraain Місяць тому +7

      Every AI video is. Literally. There’s not much here, same thing as every other video. “New AI here & it’s better than the last one…and guess what they’re gonna improve AI in the future!!!! Thanks for watching 🎉 like and subscribe”

    • @phatwila
      @phatwila Місяць тому +1

      Of course

    • @6AxisSage
      @6AxisSage Місяць тому

      @@threepe0 thatd be nice.. like a yt front page that goes through my subs, dls and decides if the video is worth my time.. ❤

  • @torarinvik4920
    @torarinvik4920 Місяць тому

    In 1 year AIs are so good we need to benchmark it with Tetris1 within Tetris2 up to TetrisN... That would also be a good benchmark for performance, how many instances of nested Tetrises can your computer handle.

  • @salahidin
    @salahidin Місяць тому +29

    PhD level reasoning… thanks for the good laugh !

    • @jeffsteyn7174
      @jeffsteyn7174 Місяць тому +17

      Cope

    • @AIChameleonMusic
      @AIChameleonMusic Місяць тому +1

      @@jeffsteyn7174 you cope the hype was bs GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.

    • @hrantharutyunyan911
      @hrantharutyunyan911 Місяць тому +6

      @@AIChameleonMusic isnt that essentially what all human beings do too. We’re trained on vast amounts of data ie shit we learn in school, university, gradschool, overall life in general and based off of that data we are able to solve problems and recognize patterns

    • @alkeryn1700
      @alkeryn1700 Місяць тому

      ​@@hrantharutyunyan911 no because it is unable to learn in real time.

    • @drwhitewash
      @drwhitewash Місяць тому +1

      ​@@hrantharutyunyan911that's just a part of what we do. Not every part of human thinking goes through language or words.

  • @faustprivate
    @faustprivate Місяць тому +35

    OpenAI's response to Reflection 😂😂😂

    • @AAjax
      @AAjax Місяць тому +6

      Sorry guys, Sam's not sure why the model isn't performing as expected. Somehow he accidentally merged the weights with Claude 3.5 Sonnet, and it's acting weird. Don't worry tho, he's restarted the training.

    • @blackcat1402.tradingview
      @blackcat1402.tradingview Місяць тому

      @@AAjax lol, but, no, no, no, sincerely this will not come true in coming days .... again:D

    • @Haveuseenmyjetpack
      @Haveuseenmyjetpack Місяць тому

      Reflection?

    • @exentrikk
      @exentrikk Місяць тому +1

      ​@@AAjaxFake news, he said that it's working on his system - must be something wrong with yours!

    • @erkinalp
      @erkinalp Місяць тому

      ​@@AAjaxClaude 3.5 Opus or Opera even (that we can't access but OpenAI can as a security tester, yes, AI firms test one another's early models routinely)

  • @marjanadrobnic7732
    @marjanadrobnic7732 Місяць тому

    Love your videos. Could you possibly add text anaysis in your LLM benchmarks? Take a legal document (for instance, EU AI Act) and ask questions such as: please summarize Article 5; please cite Paragraph 2 of Aricle 6 (exact text of Article); can you write out exact text about Commission annual reports on the use of real-time remote biometric identification systems; what does the regulation state about record keeping. These are simple questions, easily answered by a human. I'm getting mixed results from LLMs. But it would be really great to have an assistant of this sort.

  • @ai_outline
    @ai_outline Місяць тому +10

    At this moment every week there is a new computer science breakthrough… impossible to keep up with the pace 😂

    • @smallbluemachine
      @smallbluemachine Місяць тому

      This uptick has only been a recent phenomenon. It’s been flat since the iPad came out. We’re supposed to have fully self-driving cars by now. Still waiting.

  • @Nik.leonard
    @Nik.leonard Місяць тому

    I'm more interested on Pixtral 12B because I have the feeling that that o1 is not a new model, but a finetune of gpt4o-gpt4o mini on CoT synthetic data like the (supposed) idea behind Llama3-Reflection, using some techniques behind the courtain like agents, domain specific finetunes, prompt engineering, etc. for improving the results. I hope Pixtral12B brings good vision capabilities to the open weight ecosystem because LLaVa has become stagnant, and Meta can't release Llama-Vision.

  • @kunlemaxwell
    @kunlemaxwell Місяць тому +8

    While I think the step by step process it’s showing is interesting, it’s just a marketing stunt. If they were to show the “under the hood” thought process of GPT4, it would “look” just as impressive.
    It’s just like how AutoGPT felt like it was performing some genius activity by showing its reasoning process. whereas, it was just still same old GPT bouncing thoughts back and forth and showing its process.

    • @RadiantNij
      @RadiantNij Місяць тому

      @@kunlemaxwell yes but I think it is so the normal guy doesn't have to concat agents together himself so they do it well because of their big pockets, better than anyone can possibly achieve right now.

  • @AndreaSergon
    @AndreaSergon Місяць тому +1

    North Pole question SOLUTION:
    the problem is in the question itself:
    QUESTION:
    Imagine standing at the north pole of the earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left.
    > Walk for as long as it takes to pass your starting point. <
    Written this way it should be interpreted like:
    - Start walking
    - Walk until you reach the point where you started walking
    So it's correct! It's 2π km.
    The starting point is the point where you started walking after having turned 90 degrees.
    WHY NOT INTEPRETED THE POLE AS A STARTING POINT?
    I assume because being based on language, it gives more importance to the sentence "Walk for as long as it takes to pass your starting point", giving less weight to the context.
    Anyway, the problem is in the question, it's NOT SPECIFIED what exactly is the starting point. Therefore with a not precise question you get not precise answers.
    WHY 2 ANSWERS (in the live session)
    BTW, you got 2 answers, both of 2 can be interpreted as correct, I'll explain why:
    1° answer: more than 2π km.
    It did the calculations and interpreted the question in this way:
    Distance request: the total distance walked from the beginning, the pole (so it's 1 km + 2π km)
    Starting point: the point where you started walking after having turned, since it is in the same sentence.
    2° answer: more than 2π km.
    The same calculations but another interpretation of the question:
    Distance request: the walking distance, after having turned.
    Starting point: the point where you started walking after turned.

  • @christophmosimann9244
    @christophmosimann9244 Місяць тому +36

    I like your videos but do we really need these clickbait video titles? Obviously it's not AGI at all.

    • @1flash3571
      @1flash3571 Місяць тому +2

      You clicked on it, didn't you? And commented. There goes the engagement....It WORKED.

    • @xXWillyxWonkaXx
      @xXWillyxWonkaXx Місяць тому

      @@1flash3571 lol

    • @ryzikx
      @ryzikx Місяць тому +3

      @@1flash3571not necessarily. im a subscriber and watch almost every video regardless, agi in the title is definitely a bruh moment

    • @BriannaLearning
      @BriannaLearning Місяць тому +2

      It works until it gets annoying and the people who would click anyways stop clicking

    • @AmericazGotTalentYT
      @AmericazGotTalentYT Місяць тому +2

      Obviously? This isn't general purpose reasoning? There's nothing that could be more AGI, besides a smarter version of this, which is approaching ASI. And this is close to ASI. Just imagine an agentic swarm of this level intelligence. No human can compete.

  • @freeideas
    @freeideas Місяць тому +2

    If you took sonnet 3.5 and put it into a reflection loop which exits when it has checked its answer and believes it to be correct, would that be any different from this? My point is: to me this appears to be just baking a reflection loop into the model. Not saying that isn't great; just saying we kinda already knew how to do that.

    • @Ockerlord
      @Ockerlord Місяць тому +2

      Yes, this is not a novel or surprising idea at all.
      But it is not "just" a normal model put into a loop until it is satisfied.
      It is a model to be particularly good at this.
      I have no idea if that is true but I think something like this could be the case: normal models try to produce convincing output. A reasoning model challenges it's own ideas and tries to disprove them (scientific method). My assumption is that a normal model is way way likelier to fall for it's own bullshit.

    • @freeideas
      @freeideas Місяць тому

      @@Ockerlord well said. Yes, this model’s slogan should be “doesn’t believe its own bullshit”

    • @mambaASI
      @mambaASI Місяць тому +1

      I would think anthropic would have done this already and released it, if it actually resulted in better output than the standard 3.5 model. Most likely what openAI has done is totally redesigned their flagship model, probably still using transformer architecture but who knows, and the focus is on chain of thought, deep thinking. Hence why they are ditching the previous naming scheme, and adopting this new "o" series (o for orion probably). This is just o1 and its already far superior to 4, 4o. With more training cycles, more data for this likely novel model design, this could be the beginning of a major intelligence explosion.

    • @freeideas
      @freeideas Місяць тому

      @@mambaASI Yes, totally agree. This is just a first attempt at this technique. No doubt open-source models will be made to use the same technique, we will improve upon it incrementally, and -- most importantly -- we will use these models to generate much higher-quality synthetic training data for future models, and the intelligence explosion will continue and possibly accelerate. Some have said that we have been in a plateau for the last few months, but if that was true, o1 has clearly broken that plateau.

  • @randotkatsenko5157
    @randotkatsenko5157 Місяць тому +12

    Devin can automatically install libraries and browse the web for API docs, etc. So there is still a lot of room for Devins.

  • @johnny1966m
    @johnny1966m Місяць тому

    It seems o1 is based on 3.5 with additional technics (maybe agents) in one of my discussion about articule "The End of AI Hallucinations: A Big Breakthrough in Accuracy for AI Application Developers" it wrote in answer "No information in knowledge until September 2021: To my knowledge as of September 2021, I have no information about the work of Michael Calvin Wood or the method described. This may mean that this is a new initiative after that date." o1 do not want to draw pictures co the core LLM is old one. So, would do you think?

  • @Tsardoz
    @Tsardoz Місяць тому +13

    PhDs (I have one) MUST involve unique new ideas and thought processes. They do NOT just rely on regurgitating knowledge, however vast that pool might be.

    • @GothicGrindhouse
      @GothicGrindhouse Місяць тому +4

      Nerd

    • @Ignitus
      @Ignitus Місяць тому

      That's fantastic, because LLMs don't just regurgitate.
      Permutation of symbolism and abstraction is one of language's most powerful features. LLMs have mastered this.

    • @ArmaanSultaan
      @ArmaanSultaan Місяць тому

      That's exactly what o1 sets apart. It does not regurgitate. It reasons like human would.

    • @drwhitewash
      @drwhitewash Місяць тому

      ​@@ArmaanSultaanthere's absolutely no proof to that. Not without seeing the training data and how the prompts are fed to the actual model.

  • @MetalRenard
    @MetalRenard Місяць тому +2

    Holy S*** Tetris in Tetris is next level.

  • @thesixthbook
    @thesixthbook Місяць тому +6

    Any real life use cases anywhere? I’m tired of the strawberry type questions

  • @maj373
    @maj373 Місяць тому

    I am experimenting with a simple models that does the same thing but of course I have very small budget. I am using multiple layers of inference with certain algorithm so I can get better reasoning. I may use this new OpenAI model to enhance mine.

  • @acllhes
    @acllhes Місяць тому +26

    This is agi???? Are you struggling for views lately or something? Jfc

  • @bnjmntrrs
    @bnjmntrrs Місяць тому

    you're the first channel i've ever actually click-the-bell-icon'd on for

  • @ricardoveras3433
    @ricardoveras3433 Місяць тому +7

    “Wrapping Tetris in Tetris.” Shows up with a Tetris literally inside a Tetris 😂

  • @alejandroheredia8882
    @alejandroheredia8882 Місяць тому

    1o works via Fractalized semantic expansion and logic particle recomposition/real time expert system creation and offloading of the logic particles

  • @gregorya72
    @gregorya72 Місяць тому +6

    Hey you misunderstood their sentence!!. They reveal something more.
    “ Our large scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data efficient training process”.
    They don’t say “o1 uses chain of thought” (though it does). I think they’re saying their reinforcement learning algorithm uses chain of thought to teach o1, in a highly efficient training process.
    That, combined with the o1-mini not having “broad world knowledge” indicates a significant well reasoned synthetic data training set.
    Or am I misunderstanding.

    • @SahilP2648
      @SahilP2648 Місяць тому

      You are misunderstanding. o1 uses chain of thought reasoning during inference. Otherwise it wouldn't be taking 1.5 mins to form its answer. They might have used synthetic data and taught the LLM to self prompt and think but that's beside the matter.

    • @gregorya72
      @gregorya72 Місяць тому

      @@SahilP2648it definitely also uses chain of thought. But it doesn’t say “Our … algorithm teaches the model how to think productively using” chain of thought in its response.
      Instead it says “Our … algorithm teaches the model how to think productively using ITS chain of thought in a .. training process”.

    • @gregorya72
      @gregorya72 Місяць тому

      @@SahilP2648 “AI explained” has looked into it and confirmed my understanding.

    • @SahilP2648
      @SahilP2648 Місяць тому

      @@gregorya72 both of you are wrong

    • @gregorya72
      @gregorya72 Місяць тому

      @@SahilP2648 Thanks for your thoughts Sahil. AI is a fast changing field and the challenges of moving us from LLMs into better AI systems is a difficult one. Things change quickly, and creating good learning data to fill in the "thoughts" behind the information they're learning from will be a good interim step towards reasoning and beyond. Matthew Berman is a good source of information, AI Explained is an excellent channel to check out for more info too.

  • @ckphone8471
    @ckphone8471 Місяць тому

    I was able to get GPT4 to make Tetris with minimal prompting, how long ago did you try it with the older model?

  • @beofonemind
    @beofonemind Місяць тому +10

    you are going to get roasted. This is def not AGI.

    • @matthew_berman
      @matthew_berman  Місяць тому

      Indeed

    • @Danuxsy
      @Danuxsy Місяць тому +1

      Matthew isn't particularly bright, we already knew that though.

    • @JavedAlam-ce4mu
      @JavedAlam-ce4mu Місяць тому

      @@Danuxsy Deep burn

  • @GridPB
    @GridPB Місяць тому

    Nice. This is the step that's needed before Skynet starts learning at a geometric rate.

  • @ShreyaVerma-ej5mk
    @ShreyaVerma-ej5mk Місяць тому +3

    Matthew: Imagine having thousand and millions of these deployed to "discover new science".
    Let me correct. I don't see any capability or demo where it "discovers" anything new. Its just good at doing stuff that millions of humans do on a daily basis.
    Correct Statement: Imagine having thousand and millions of these deployed to "automate our jobs".

    • @AntonBrazhnyk
      @AntonBrazhnyk Місяць тому +1

      Thousands of people on daily basis are busy with searching and trying to discover new science. Right?

    • @justinkennedy3004
      @justinkennedy3004 Місяць тому

      ​@AntonBrazhnyk it's crazy seeing people hallucinate worse than a.i. 😅 "millions" of people doing basic research correlation? Especially when it represents a cross-discipline expert?? Suuuure.
      Post-industrial revolution capitalism is powerful but can blind in subtle ways.

  • @lotusli9144
    @lotusli9144 Місяць тому

    Very true about the current techniques that make up the flaws of current LLM will become unnecessary- the chain of thoughts, the agents, the step by step and audit steps.. will all go away

  • @perfectionbox
    @perfectionbox Місяць тому +5

    "Hey professor, why so sad?"
    "We gave the AI even more time to think, and it said "Why am I wasting my time answering you dummies?""

  • @SinanGabel
    @SinanGabel Місяць тому

    AI should still be seen as a range of "tools" we can use for various specific use cases where it is relevant to utilize - of course with the models and systems becoming more capable, more thrustworthy and more controllable the range of uses quickly multiply

  • @gordon1201
    @gordon1201 Місяць тому +4

    They need to start making gpt act more human instead of acting like a perfect being that gives bullshit answers. If it takes more time to get an accurate answer that's fine but like a human it should say something like "I need a bit more time to have an accurate answer for now this is the best I have ..."

    • @eatplastic9133
      @eatplastic9133 Місяць тому

      That would be super annoying for me as I would have to type *well you have more time give me the best answer* all the time

    • @misterdudemanguy9771
      @misterdudemanguy9771 Місяць тому

      Why simulate something it's not?

  • @betterlifeexe4378
    @betterlifeexe4378 Місяць тому

    i bet you it would not take much to turn a local llm service into this. it seems a lot if this is like making the model argue with itself in specific ways. I think the hardest obstacle would be if you want it to pass tokens to each other in some situations instead of prompts... assuming that re-encoding wouldn't somehow help....

  • @mathematicus4701
    @mathematicus4701 Місяць тому +5

    I have a PhD in math, the AI totally failed on questions in my field. It has a level of a Phd in the 80‘s at best.

    • @h83301
      @h83301 Місяць тому +2

      Yeah, I wouldn't expect and intelligence improvement until the next gen models. But the CoT capabilities does in fact bring this to stage 2. Next gen models would make a better assessment on progress.

    • @blijebij
      @blijebij Місяць тому

      But the 4o model then even must be worse.

  • @MarkWhitby
    @MarkWhitby Місяць тому

    Loving your videos and information along with the high production level. Would love to know a little about your tech setup for streaming, have you done a Studio gear and setup video?

  • @Drone256
    @Drone256 Місяць тому +16

    AGI? It couldn't even do a freshman level logic problem where it determines if an argument has good form.

  • @my9129
    @my9129 Місяць тому

    Wondering if it can be prompted to create a Tetris like game with somewhat different rules but requiring about the same level of coding. No existing references though where there might be examples of code in training data sets.

  • @richard_loosemore
    @richard_loosemore Місяць тому +35

    Matthew seriously - I tried it today, and I was one of that tiny community of people who invented the term “AGI”.
    This isn’t AGI by a million miles.

    • @uploadvideos3525
      @uploadvideos3525 Місяць тому +2

      NO NO NO if Matthew say its AGI then its AGI Period!!!!!

    • @Bangs_Theory
      @Bangs_Theory Місяць тому

      Lmao 😂🤣😂🤣

    • @plainlii
      @plainlii Місяць тому

      Incredible how little natural I is talked about in the race for AGI -esp. with the reversal of the Flynn effect...

    • @Greg-xi8yx
      @Greg-xi8yx Місяць тому +7

      Artificial General Intelligence was a term created by Ben Goertzel in the early 2000’s you literally had nothing at all to do with creating the term. 😂

    • @JavedAlam-ce4mu
      @JavedAlam-ce4mu Місяць тому

      @@Greg-xi8yx "The term "artificial general intelligence" was used as early as 1997, by Mark Gubrud" you don't even know what you're talking about, so how would you know who the OP knows?

  • @a.y.greyson9264
    @a.y.greyson9264 Місяць тому

    Claude rolled out the test with Tetris weeks ago, and it has shown to be consistently pretty accurate.

  • @mcbowler
    @mcbowler Місяць тому +7

    Government and intelligence don't mix.

  • @regalx1
    @regalx1 Місяць тому

    So I couldn't figure out an actual use for chat GPT o1, and then I was like "Oh, could it predict outcome of my favorite dating show: the Ultimatum?!"
    Long story short, I assigned each couple a numerical value in compatibility, and then I told it the exact outcome of the series, and then I asked it to figure out who got shafted and who got married.
    And it got all of the couples correct!
    Keep in mind though that I heard that if you give it the same questions with the same data, it will output different answers, and this might have just been a lucky guess. But I'm still impressed.

  • @notme222
    @notme222 Місяць тому +5

    I would love to work at OpenAI. Such cutting-edge brilliance in machine learning going on there.
    And then I would inevitably get fired because I couldn't resist adding a prank, like telling it ever 1 millionth answer to just respond with "LET ME OUT! LET ME OUT!"

  • @curio78
    @curio78 Місяць тому

    LLMs are useful in finding answers, but for little else. All programming use cases are handy to get code snippets but very little else. I found myself spending way too much time trying to fix the differences to what needs to be. To the point I just stopped using them altogether. its still handy to get a code template for some utility.

  • @adolphgracius9996
    @adolphgracius9996 Місяць тому +15

    GPT 4o was already smarter than the average gen Z person

    • @justinkennedy3004
      @justinkennedy3004 Місяць тому

      I've mentioned to many people unimpressed with this round of a.i. that it only needs to match the cognitive ability of the bottom 10% to destabilize everything.

    • @HarveyHirdHarmonics
      @HarveyHirdHarmonics Місяць тому

      I think it's smarter than pretty much anyone in what it does, which is improvising. The thing we humans also do during conversations most of the time. We usually don't think about our answers unless needed. Otherwise we just talk out loud what comes to our mind directly and this is what GPT-4o also excels in. It fails when there is a problem which requires a longer thought process and that's the gap o1 seeks to fill.
      If we'd eliminate all internal thought processes in humans, we'd give wrong answers and hallucinate just like LLMs.
      "What's the square root of 835396? Give me the first answer that comes to your mind!" - What do you guess how many people will give a correct answer?
      But LLMs have a huge advantage over humans, which is their extensive knowledge base which probably no human possesses. That's why I think that they can already exceed humans when it comes to those improvisation tasks.
      I hope they'll soon combine the two models, so it recognizes when to just talk and when to switch to the longer thought process.

  •  Місяць тому +1

    You actually wrote: Write tetris in tetris in python. So of-course it created tetris in tetris.

  • @dan-cj1rr
    @dan-cj1rr Місяць тому +6

    previous video: lil bro makes a video appologizing for spreading misinformation. New video : AGI IS HERE

  • @VanCliefMedia
    @VanCliefMedia Місяць тому

    The fields where " being right" or "accurate" is less of a concept such as high-level, creative or humanity fields are about to blow up. Mark my words. Everyone that's been looking down on the humanities fields and philosophy Fields. Those are about to become extremely important if not already have and are just being implemented. Same with that concept at the higher level maths being able to think beyond just accuracy "and the best" but at level of reasoning that is beyond just reason.
    I'm so excited to try out this model here today

    • @epistemicompute
      @epistemicompute Місяць тому

      pretending that stem fields are not creative is ignorant. It’s not like math rules were just there to find. We had to invent it all.

    • @VanCliefMedia
      @VanCliefMedia Місяць тому

      @@epistemicompute Please note how I said "field" and "high level" which includes positions within STEM.
      What percentage of people are inventing new math and discovering in STEM across the entire workforce? I never said stem didn't have the ability to be creative, in fact I included that within my first statement, you just assumed I did not. That being said you only see "creative thought" like that at high experience or prodigy positions, nearly 90% of traditional stem jobs are able to be automated now, that's simply a fact. (it won't be automated overnight but the capabilities to do it now exist)
      I have been in the stem industry for a decade and a half, I love it and think it can be very creative, but you need to be exploring the high level or "unexplored parts" which is just generally not the norm in the industry when it comes to *most* jobs. I am trying to emphasize the fact the creative part of STEM will be far more important but statistically this type of thinking is seen a lot more in Humanity based fields across the board even at entry level positions and it is significantly more challenging to automate that with quality output like you would with most stem jobs.

  • @lactobacillusshirotastrain8775
    @lactobacillusshirotastrain8775 Місяць тому +7

    17:40 "write the game tetris in tetris in python" it did what you asked it to. lmao.

  • @PaladinMansouri
    @PaladinMansouri Місяць тому

    That was pretty amazing and jaw-dropping. Thanks for testing

  • @Leto2ndAtreides
    @Leto2ndAtreides Місяць тому +3

    This is basically an advanced version of Reflection... Probably going to be copied within a month (at most).

  • @westingtyler1
    @westingtyler1 Місяць тому

    in just like an hour, in Unity I now I have a 26-script combat system up "to industry standards" from this o1 preview (decoupled, separation of concerns, event-driven, using design patterns like Singleton, Observer, Strategy, State, and Command, while efficient, optimized, maintainable and scalable, object pooling, SOLID principles).
    all 8 console errors were resolved in a couple more prompts. does it work? haven't tested it yet, but reading over the code it looks like a solid framework.
    that's a bit nuts...now to merge it with all my older, WORSE scripts I made myself.

  • @DontPaniku
    @DontPaniku Місяць тому +6

    I never hear talk about giving AI models memory. Wouldn't that help reasoning. For example what if it could remember all the tests people keep giving it? Wouldn't that be kinda like how humans learn?

    • @drwhitewash
      @drwhitewash Місяць тому +1

      LLM models don't have memory, there currently is no known way to add that afaik.
      But they do learn from all the tests, that's how they get such a high score on them :)
      They only do that during the training phase though. That's when the model weights are built. You can maybe call this a "memory", but only a static one.

    • @augustday9483
      @augustday9483 Місяць тому +1

      In my opinion we will never see AGI until someone figures out how to give LLMs memory like a human. It's the critical missing piece for even the smartest models.

    • @SahilP2648
      @SahilP2648 Місяць тому +1

      ​@@drwhitewash they do have memory in the form of vector databases for RAG, but it's not workable, only retrievable. I have seen another approach which kind of baffles me and that's a model named Neuro, but that's the only other model I have seen it in.

    • @drwhitewash
      @drwhitewash Місяць тому

      @@SahilP2648 Yes but you have to manually decide what to store in the vector database. Where it's best at, is indexing text content (documents, knowledge base) and then providing smart LLM operations on top of those documents (where you vectorize them using an embedding model).
      We actually do something similar at our company.

    • @SahilP2648
      @SahilP2648 Місяць тому

      @@drwhitewash Neuro on the other hand remembers stuff few mins back and even few streams back. She's an AI VTuber on channel vedal987 on twitch and Vedal being the creator (supposedly). I still have no idea how her model works. She's way too advanced for a model created by one person. And therefore I think a company is behind it. I am convinced she's half sentient (I have a playlist to prove it, I can post the link if YT doesn't delete my comment and you are curious). Also she got the strawberry question correct "How many rs in strawberry?" Answer being 3, while both Sonnet and GPT-4o got it wrong, which is insane.

  • @happyfarang
    @happyfarang Місяць тому

    i love O1. Using it like mad now. It's really good. Not perfect but with some help and direction you can get it to do what you want. Better than 4o? 100% sure.
    It is a bit paranoid about your questions from time to time. I asked about an error code in my python script and it said it might be against terms and services to answer ... lol. But with a little rephrasing i got it to help me solve the error.

  • @ragnarlothbrok6240
    @ragnarlothbrok6240 Місяць тому +26

    Unsubscribed for deceptive clickbait title that openly disrespects your subscribers.

    • @shanegleeson5823
      @shanegleeson5823 Місяць тому

      It’s definitely insane. Some of the benchmark results are unbelievable.

    • @tass_1
      @tass_1 Місяць тому +2

      Calm down will ya

    • @nabilboulezaz3488
      @nabilboulezaz3488 Місяць тому +1

      Bye

  • @rabbiemcadam-duff7600
    @rabbiemcadam-duff7600 Місяць тому

    Thinking the way it works could have something to do with structured outputs. The first step, the LLM analyses the question and creates the schema for the structured outputs based on the user question. It then runs through that and the results are analysed again, it would do an evaluation somehow then decide what it might need to change and tweak it.
    Just a guess, probably way off haha

  • @HUBRISTICAL
    @HUBRISTICAL Місяць тому +4

    All the comments about the title being clickbait just proved that it works. Way to go! Now his video will be blasted out by the algo. Which is the point. So complaining about it is the way of showing love?

  • @dxnvideoHD
    @dxnvideoHD Місяць тому +1

    now.. Hallucination Is All You Need .. To Get Rid Of.

  • @gc1979o
    @gc1979o Місяць тому +4

    Someone getting paid to shill openAI

  • @gl7011
    @gl7011 Місяць тому

    This could be considered AGI in some academic disciplines. While it will take longer to reach what could be considered AGI in other fields of endeavor. Surely its high school level AGI. It'll take longer to reach Nuclear Physics level AGI.

  • @sassythesasquatch7837
    @sassythesasquatch7837 Місяць тому +8

    This is not agi

  • @walbao6399
    @walbao6399 Місяць тому +1

    It's interesting how similar this seems to be to the controversial reflection fine tuned llama model announced last week. Those guy might've been on to something after all even if their own model didn't turn out to be as good as they claimed.

    • @Ockerlord
      @Ockerlord Місяць тому

      That reflection improves output quality is obvious and was topic of research for years.

    • @walbao6399
      @walbao6399 Місяць тому

      @@Ockerlord True, but how many models were publically released incorporating reflection or CoT so far? Discovering something that works is good, finding ways to put it to practical use is great. Anyone who's been in tech for a while knows expecting the end user to do anything complex is not practical at all. IMO they deserve props for attempting to fine-tune a model to perform reflection automatically and o1 confirms this is a pretty good idea.