AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution

Поділитися
Вставка
  • Опубліковано 5 гру 2024

КОМЕНТАРІ • 426

  • @CalConrad
    @CalConrad День тому +473

    The best part about the next 12 days, will be your 12 videos breaking it down.

    • @aiexplained-official
      @aiexplained-official  День тому +163

      God that might be a bit much! But I will be following everything scrupulously, don't worry

    • @countofst.germain6417
      @countofst.germain6417 День тому +76

      ​@@aiexplained-official we expect 12! No more no less.

    • @a.thales7641
      @a.thales7641 День тому +13

      I want at least 3 videos. 4 is enough.

    • @daveogfans413
      @daveogfans413 День тому +1

      @@countofst.germain6417 Best he can give is one 12 minute video or twelve 1 minute videos.

    • @Dannnneh
      @Dannnneh День тому +2

      Maybe in the format of those one-minute shorts.

  • @kairi4640
    @kairi4640 День тому +109

    I appreciate that you're one of the few ai UA-camrs that actually admits there hasn't been big news recently and doesn't overhype same stuff just for views.

  • @anywallsocket
    @anywallsocket День тому +139

    learning a 'bag of heuristics' rather than explicit maths is how i skipped through my undergraduate degree lol

    • @aiexplained-official
      @aiexplained-official  День тому +11

      Nice

    • @sitrakaforler8696
      @sitrakaforler8696 День тому +12

      Like all Engineers hahahahah

    • @Kazekoge101
      @Kazekoge101 17 годин тому

      @@sitrakaforler8696 what are these mythical "bag of heuristics" you speak of? (asking as a 11th grade math dropout)

  • @testservergameplay
    @testservergameplay День тому +131

    by far the best AI youtube channel. No unnecessary hype, no baseless rumors, no creepy AI art for thumbnails, just pure objective analysis.

    • @cashitortrashit9939
      @cashitortrashit9939 День тому

      That is so funny. I can hardly understand him he is talking with the mouthfull of marbles, no doubt🙄

    • @Adam-nw1vy
      @Adam-nw1vy День тому

      The creepy AI art for thumbnails is god awful

    • @marpleka
      @marpleka День тому

      Are you serious or such naive?

    • @testservergameplay
      @testservergameplay День тому

      @@marpleka I'm especially serious about the AI art thumbnails from other AI news channels

    • @nand3kudasai
      @nand3kudasai 9 годин тому

      No distracting music.

  • @Keatwonobe
    @Keatwonobe День тому +151

    Being under a million subs this far in is insane. Your 'spidey sense' with the nuance of ai progress is unbelievably precise.

  • @Jasonknash101
    @Jasonknash101 День тому +27

    Thanks again for your great content. I love the fact that you can avoid click bait and still give us compelling headlines

  • @gubzs
    @gubzs День тому +118

    If we solve hallucinations I am kicking off my personal "we are now in the future" project. I have spent the last year developing a procedural immersive fantasy world simulator _with the entire design portfolio_ formatted as instructions for agentic AI. I have roughly 400 pages of instruction from power balance formulas, to world history instantiation, magic systems, UI/UX, an LoD system for simulation granularity and information retention, on and on. it's been bounced off Claude from start to finish at each step so I know interpretation is solid.
    Such a thing _will_ exist. I will make certain of it.

    • @TheBouli
      @TheBouli День тому +4

      nice! Would love to test it out when it becomes playable :)

    • @dot1298
      @dot1298 День тому +3

      me too!

    • @marc_frank
      @marc_frank День тому +2

      cool

    • @maciejbala477
      @maciejbala477 День тому +2

      exciting! I'd definitely want to try it out once it comes out. So far, the only real game that is AI-driven is AI Roguelite, as far as I'm aware (I bought it but don't necessarily want to try it out just yet, as I'm waiting for 3rd party API key support since the dev told me he is considering adding it and it might come in a few weeks). AI Dungeon doesn't count, it's not really a game. Some could argue AI Roguelite isn't either (yet). That's one of the things I'm most excited about.
      But also, on the other hand, I don't actually think solving hallucinations is a trivial problem and it might never occur without architecture change, so your caveat about that definitely tempers my excitement. Would love to believe, though, lol

    • @anywallsocket
      @anywallsocket День тому +2

      @@maciejbala477 indeed, as the video explains, hallucinations are in some sense necessary for model creativity. if you want something to generalize, it needs to know how to fantasize -- this is not avoidable since it does not know the latent space you expect it to generalize to, it must guess it.

  • @AlexanderMoen
    @AlexanderMoen День тому +21

    I think the hallucination problem drastically drops once reliable, high-quality agent and function calling are out and accessible through an LLM. We just need something like a beefy 01 that has access to tons of tools that it calls reliably, and as long as those tools work properly, it'd be a huge leap forward. Fingers crossed that in the next 12 days OpenAI has something agent related out

  • @therainman7777
    @therainman7777 День тому +30

    The OpenAI announcement AND a new AI Explained video? Merry Christmas everyone! 🎄

  • @tituscrow4951
    @tituscrow4951 День тому +16

    The hallucination problem is this - it will be a long time before a model can be the fail safe for a process that has to judge physics or maths & get it right 1 shot every time. It puts a lot of uses which an Ai would be perfect for out of the picture for the foreseeable future anyway.

    • @anywallsocket
      @anywallsocket День тому

      these are definably guessing machines, their errors don't vanish, like if it learned to preform actual mathematics, they only shrink.

    • @fabim.3167
      @fabim.3167 День тому +5

      @@anywallsocket The same is true for the best human mathematicians!

    • @anywallsocket
      @anywallsocket День тому

      @@fabim.3167 haha very true! but not the same for typical functional programming -- which is what people colloquially associate ai with, and hence all their surprise when it gets stuff wrong.

  • @micbab-vg2mu
    @micbab-vg2mu День тому +9

    I turned the quiet period into a blueprint. By mapping every workflow (medical deparment in big pharma) , I've created a roadmap for the AI Agents that's already knocking at our door.

  • @breadbro0004
    @breadbro0004 День тому +12

    You are so fast! Best AI news channel imo

  • @David-tp7sr
    @David-tp7sr День тому +7

    You'll never be able to reduce "hallucinations" to 0 in most cases (unless you have a subject that you can use a verifier with, but that's not the LLM part doing the work). It's a feature of the encoding mechanism of neural networks. The brain works similarly: it hallucinates reality and then grounds it with evidence.

    • @puppergump4117
      @puppergump4117 16 годин тому

      As I understand it, the only way to reduce hallucinations is by limiting the diversity of the output. So if you ask how to make a nuke, it'll either hallucinate and give a bunch of misordered steps or give an overview that's more accurate.
      You could probably make the thing not hallucinate just with the right context.

  • @reza2kn
    @reza2kn День тому +3

    That feeling whern you open UA-cam and there's an AI Explained video in your feed! 🔥😍

  • @phronese
    @phronese День тому +4

    good insights from those papers that temper the expectations, looking forward to the full review of those papers

  • @N8O12
    @N8O12 День тому +16

    Literally like an hour ago I was scrolling through youtube and thought 'I wish AI Explained would upload again'
    Awesome video as always by the way

  • @Isabelle-w7i
    @Isabelle-w7i День тому +8

    Glad you mentioned China’s push in AI towards the end of your video. Despite working with comparatively limited hardware, they’ve managed to shrink what could’ve been a multi-year "moat" to almost nothing. It’ll be fascinating to see how this plays out in the future!

    • @adanufgail
      @adanufgail День тому

      @@Isabelle-w7i I mean Llama is at GPT3 levels and is open source so it's not hard to see how the world's largest electronics manufacturer country could catch up

    • @theforsakeen177
      @theforsakeen177 23 години тому

      @adanufgail open source or close source mean nothing to ch1, if it's stored digitally they've got it

  • @jackdurose3542
    @jackdurose3542 День тому +4

    Re: Genie 2, I don't think a high fidelity world model is necessary for embodied agents. I'd see it as kind of like imagination. We don't need to model physics accurately in our heads to know roughly what will happen if we drive a car off a cliff, and I'd guess, in the context of embodied agents, the purpose of something like this is similar.

  • @vectoralphaSec
    @vectoralphaSec День тому +4

    I'm overly excited about all your coverage over these next 12 days. Hopefully OpenAI surprise us with really exciting and cool announcements.

  • @ricosrealm
    @ricosrealm День тому +9

    Did anyone notice the defects in aerial shot of the neighborhood? There are some driveways that don't actually connect to the street, along with other unrealistic aspects.

    • @anonymes2884
      @anonymes2884 День тому +4

      Yeah, the roofs of the houses in the middle blend into the road etc. It's impressive but it's a lot like every other AI image generator - seems great until you _really_ look at it and then there are almost always weird flaws.
      (which is great in one sense - we can still _pretty much_ tell an AI image from reality, maybe except in mostly neutral categories like landscapes etc.)

    • @anywallsocket
      @anywallsocket День тому

      @@anonymes2884 yeah it's just HD slop

  • @keeganpenney169
    @keeganpenney169 День тому +3

    Knocking the real news out of the park, that's why we love you and the channel and community here phil!❤

  • @alan2here
    @alan2here День тому +13

    All humans use the bag of heuristics approach, it's called thought. Even top physicists find counterintuitive physics in the world, and think in a collection of rules of thumb.

    • @juandesalgado
      @juandesalgado День тому +2

      I was going to say, humans also hallucinate spherical cows. :) Reductionism is a thing.
      Also, I have the impression that the hallucination problem is more a matter of expression than of psychosis. We all imagine in our minds, then choose (hopefully) what to say. The models should be free to "hallucinate" inference-time tokens, but then choose to voice out loud only facts than it can confirm, or at least qualify those facts that it cannot.

  • @juandesalgado
    @juandesalgado День тому +1

    In the "games / interactive videos" at 6:38, it would be interesting to see if the model recognizes the boundary before a body of water; that is, if it prevents you from falling into the water, or if the character begins to wade through (or walk over the surface?!), or if it switches to swimming.

  • @TheForbiddenLOL
    @TheForbiddenLOL День тому +15

    Very interesting. "High probability but not reliability" seems to implicate "System 1 thinking", and would mean that an architecture change to allow LLMs to collaborate with classical symbolic computing (something more robust than function calling) would be required - which I know is an idea that's been brought up a lot, but it seems like people are still focusing on trying to get the "instinctual world model predictions" to be 100% free of hallucinations, despite humans frequently making mistakes with their immediate, unconscious intuitions about the physical world.

    • @aiexplained-official
      @aiexplained-official  День тому +1

      Well put

    • @Lerc0
      @Lerc0 День тому +2

      Does the "high probability but not reliability" extend do chain of thought though? There is theoretically the possibility that a LLM could sytemativly write out and perform each of the individual stages of a classical symbolic process with a probability that is high enough that it could be considered reliable.
      Once you get into running a variable number of iterations modifying a state, there is a much greater possibility for controlled 'thinking'. It becomes conceptually quite similar to RNNs. Chain-of-thought is at the blunt instrument level, but what happens if instead of tokenising the inner monologue, when is talking to itself, it adds custom embeddings to the context. Or even just looping through the same layers multiple times. These are easy to implement, it's the training process that gets hard here, Its like training on the next text token alone has Sapir-Whorf'd ourselves.

    • @MarkusRessel
      @MarkusRessel 20 годин тому

      I thought this would happen when OpenAI announced plugins for things like WolframAlpha, but nothing has changed since. I am highly skeptical of statements like "hallucinations will be eliminated in 2025".

  • @11Petrichor
    @11Petrichor День тому +1

    The way you described how the AI multiplies numbers reminded me of Daniel Tammet. He's a savant that can multiply long numbers in his head with very high precision. The way he described how he did it was like synesthesia. So, he visualizes numbers with textures and shapes and somehow combines them into an output.

  • @En1Gm4A
    @En1Gm4A День тому +9

    thats exactly the problem with LLMs they need a graph backbone - middle layer - for solid representation of functions, symbolic abstracions and the world. the creativity is well needed for creating that middle layer and for creative work but not for most tasks

    • @anywallsocket
      @anywallsocket День тому +1

      probably they could be interfaced with a game engine, which they could learn to control to generate situations, but invariably the engine would compute the resulting physics -- then you can feed-back on that and cound get some mildly effective self-optimizations.

    • @skierpage
      @skierpage 20 годин тому +3

      Maybe, but that's easier said than done! Decades of AI research in "solid representation and symbolic abstractions" got us approximately nowhere while in only a few years generative language models mastered language and almost anything that can be expressed as a sequence of characters.
      Meanwhile, make LLM a tool user. It writes mini programs to compute answers, process data it pulls from the web, etc. It's the difference between talking to a slightly-drunk extremely smart person at a cocktail party vs. asking her things while she's sitting in front of a computer.

    • @En1Gm4A
      @En1Gm4A 20 годин тому +1

      @@skierpage yeah true, but it really seems like that solid abstraction might be useful, and potentially much more power saveing. Or maybe solid abstraction is just the retrospect of an process that looks totally different. Let's see what rolls out to be true. I am here for the ride.

    • @anywallsocket
      @anywallsocket 20 годин тому

      @@skierpage lmao I like that metaphor!

    • @anywallsocket
      @anywallsocket 20 годин тому

      @@En1Gm4A in terms of power saving it’s really cooling these data centers, which I hear burn through pools of water every day just to keep gpt and the like online. The issue as I see it is that we’ve got computation down for now, but we have no good memory systems - ie like the brain, so we don’t have to re-compute the same sort of prompts all the time. Biological memory is leagues beyond the artificial stuff, unlike neural network computations. Neuromorphic is likely the way forward.

  • @FakoyedeTimilehin
    @FakoyedeTimilehin День тому +5

    Another banger from Phillip. Bravo!!

  • @JustSuds
    @JustSuds День тому +3

    The solution to having one apparent model that does well at hallucinating creative work and also reliably does physics is just higher level mixture of experts. The user converses with a model that is specialised in interfacing with the user, and it has access to other specialised expert models as tools. That way the prose model can level up independently from the physics model and the biology model, and so on. It comes back to that jagged frontier.

  • @ExploreTheMind-kg8je
    @ExploreTheMind-kg8je 4 години тому +1

    AI Explained: Cut Through The Hype And Straight To the Point. Always informative, entertaining and grounded in reality, thank you!!!

  • @ekstrajohn
    @ekstrajohn День тому +12

    Explaining the heuristics mechanics is gold

  • @toddwmac
    @toddwmac День тому +1

    The reason I look to you for the real news....period. Thanks and happy holidays.

  • @OriginalRaveParty
    @OriginalRaveParty День тому +2

    Thank you for being the voice of integrity in a space full of overhyped clickbait 🏆

  • @Rawi888
    @Rawi888 День тому +1

    Glad to be here. Great reporting my brother. 🩶

  • @AngeloWakstein-b7e
    @AngeloWakstein-b7e День тому +1

    Been waiting for SOOOOOO long for one of your videos! love them and cannot wait for more

  • @serg331
    @serg331 День тому +3

    You are the best ai channel. You are the one whose videos I look forward to. You are the one who doesn’t waste my time, and doesn’t lie to me. Thank you.

  • @lizardrain
    @lizardrain День тому +2

    9:23
    Isn't that why humans have a logical side of the brain and a creative side of the brain? They shouldn't be mixed together as one model, you need models to represent different parts of the brain.

  • @wck
    @wck День тому +2

    hey, I'm wondering if your SimpleBench tests include a disclaimer to the LLMs that the question is designed to confuse LLMs, with a warning to look for irrelevant information and distractors? Because Andrew Mayne recenty wrote a blog post criticizing Apple AI reasoning paper by proving that their questions get 90% better results if the prompt has a simple notice like that at the top.

    • @aiexplained-official
      @aiexplained-official  День тому +1

      Yes we cover that in the technical report, it does boost results a little bit to warn them, some more than others, but only to the 50-55% range

  • @Charles-Darwin
    @Charles-Darwin День тому +5

    I've been saying this "bag of heuristics" is the key for over a year now. Think about it, LLMs at the root of it's inner workings are heuristics of language - the semantics.... the stuff "in-between". Then, clearly this applies to audio-visual systems. It's patterns are heuristics, and since there's this common root across domains that we know of, zooming in and out of scopes and of any type is possible. Heuristics are heuristics because they apply to every level of our own universe (in their concentration)... it's just the process of gleaning or deducing and then isolating and proofing, it gradually becomes and immutable truth.
    What's even weirder about all of our human active systems, is we learn these things innately, years and years before we can even begin to define them at their essential parts. You learn what the force of gravity is within a month or two and how to exert yourself on it/with it. You understand the way light works within about that same time... but it took centuries of generations before a human actually proved these things out mathematically. It's just extremely weird to think about that 'inversion', yet we all know it so well.

  • @Kleddamag
    @Kleddamag День тому +1

    Congratulations on reaching 300k subscribers!

  • @GrindThisGame
    @GrindThisGame День тому +2

    Congrats on 300k subs!

  • @timeflex
    @timeflex День тому +1

    12:50 Yes, they don't. In order for the model to do that it must be able to analyze itself (its data), detect the pattern and then extract it as a new usable entity.

  • @michaelwoodby5261
    @michaelwoodby5261 День тому +3

    "We have to ship faster than the goalposts move" is such a killer line, and that really is the source of all these 'A.I. winter' forecasts. Progress has been insane, but expectations continue to surpass it.

  • @joelalain
    @joelalain День тому

    you're back! i was worried, we need ai updates! lol i can't wait to see if open ai release o1 finally, i'd love to see that as i hate the limits on o1-preview i feel so limited. i also noticed that grok 2 seems to be really getting good fast and i absolutely love it for new events as it seems way more neutral than most. this is an exciting time to be alive. and i say that while currently using an AI to help me understand convolutional neural network, writing the code, understanding the settings and improving the results. that's very meta

  • @JarrydRLee
    @JarrydRLee День тому

    Really appreciate the appropriate levels of hype on this channel.

  • @TheLegendaryHacker
    @TheLegendaryHacker День тому +6

    6:50 Philip has never played Elden Ring 😔

  • @elibullockpapa9012
    @elibullockpapa9012 День тому +10

    Please benchmark the amazon nova models!!

  • @chrisworth1625
    @chrisworth1625 День тому +1

    Another truly excellent video. Wonder how you get the time to plan such a rich narrative of information each week!

    • @aiexplained-official
      @aiexplained-official  День тому

      Thanks Chris, will have less time with these 12 days of announcements!

  • @GotGooped
    @GotGooped День тому +2

    When you mentioned how current LLMs can't generalize from one type of reasoning to another, I was surprised you didn't mention grokking, which is supposed to allow for that to happen. When I heard about it originally (from Bycloud's video on it, recommended if you haven't seen it yet), I assumed that it would be used in a model at some point, and be the next "big thing", but it's been years since it's been discovered and I haven't heard of a single model to do it.
    Obviously doing this naturally would require ~10x the compute which is pretty ridiculous, but supposedly there's ways to speed it up (I believe I saw something called FastGrokking or something similar). Meta getting 10x the compute from llama 3 to 4 has me excited at the possibility of it being used in a large model finally, but only time will tell. Maybe there's been some news on it since then? Would like to know what you think of this.

  • @ckq
    @ckq День тому +1

    12:00
    The heuristics are the cool part cause they're much more computationally feasible.
    If we want precise answers just use a calculator or generate some python code.
    We don't want an LLM based calculator, we want an LLM based super smart human.
    LLMs aren't supposed to be deterministic the randomness is what leads to creativity.
    As an analogy to chess, I've been interested in an LLM specialized on chess not because it would be better than stockfish (it won't), but because it helps us understand the heuristics humans use and ideally can be integrated with an LLM that speaks English to help humans get better at chess.

  • @burnytech
    @burnytech День тому +6

    Jensen's solution to everything is buying more GPUs 😂

  • @erniea5843
    @erniea5843 22 години тому

    All of the recent AI accounts is hilarious. your channel has remained high quality and not just hype.

  • @sanesanyo
    @sanesanyo День тому +2

    Been waiting for this ❤❤

  • @stephenrodwell
    @stephenrodwell День тому +1

    Thanks, excellent content, as always. 🙏🏼

  • @faolitaruna
    @faolitaruna День тому +13

    1:50 What are you watching, son?

    • @Maximillian_Space
      @Maximillian_Space День тому +7

      Science project, on Egyptian Gods, dad!

    • @anywallsocket
      @anywallsocket День тому +5

      @@Maximillian_Space there is no science there son, only the sweet scent of sin

    • @apeman939
      @apeman939 День тому +4

      There’s nothing in the prompt that calls for this level of fox mommy

    • @Maximillian_Space
      @Maximillian_Space День тому

      @@apeman939 That's just a sad realisation of the content it was trained on

    • @sirsamiboi
      @sirsamiboi День тому +2

      Furries are gonna have a field trip with Sora 😭

  • @Wheezy_calyx
    @Wheezy_calyx 22 години тому

    The genie 2 model makes me think of a robot looking at a construction zone and in a split second mapping out its path up a scaffold.

  • @Kmykzy
    @Kmykzy День тому

    9:48 My feeling is that between now and 2-3 orders of magnitude of computation increase this will be covered by an emergent function of the system if we take into consideration the human brain. You don't need to make the silicone brain be able to simulate a close to reality copy of the real world, you just need to to be good enough to simulate basic interactions while pulling physics from a second model heavily trained on spotting and flagging those physics.
    I think the breakthroughs will come by either artificially separating the tasks our brains does into synthetic nuclei like our organic brain and have this solved with only 10-100x of the current processing power, or just increase increase increase computation and let it naturally sort itself out in 4-5 orders of magnitude increase. In either case this will be solved by scale and maybe just hurried along by the compartmentalization into cores for the more complex specialized jobs like object permanence, physics simulations, logical chain of thought, etc.

  • @Devorkan
    @Devorkan День тому +1

    Why do they write million as MM? Million is M in the SI. It's almost 2025, maybe it's time we stopped using the Roman numeral for thousands which conflicts with the official SI unit for millions?

  • @christopherblare6414
    @christopherblare6414 День тому +2

    I think genie 2 is definitely a step towards embodied AI, but just a step.
    If you can get safe robots in very simple environments, then you have safe robots. Once you have irl AI robots doing a thing, then you get big data for irl examples.
    I think bad simulations could definitely bootstrap AI robots for non safety critical actions.

  • @greendra
    @greendra День тому +1

    Asides from Tesla FSD 13 releasing I agree there haven't been many AI updates. Would be great if you could do a video on it / FSD in general seeing as it's by far the best real world AI.

    • @a.thales7641
      @a.thales7641 День тому

      Suno v4 + some video tools i guess? and some kind of open source o1

    • @greendra
      @greendra День тому

      Oh yeah good shout. Suno V4 is great

  • @Dron008
    @Dron008 День тому +1

    How can you be sure your benchmark isn’t being leaked when you feed it to an API? Cloud providers could detect it and fine-tune their models specifically for the test.

  • @sitrakaforler8696
    @sitrakaforler8696 День тому +1

    Timestamps (Powered by Merlin AI)
    00:05 - AI news resurgence: OpenAI announces exciting releases over the next 12 days.
    02:10 - OpenAI's new model shows promise but raises questions amid rapid AI advancements.
    04:08 - Genie 2 enhances interactive environments using AI for gaming and web applications.
    05:57 - AI generation quality comes with limitations and unexpected errors.
    07:57 - Concerns about AI's hallucinations affecting reliability in training embodied agents.
    09:53 - Transformers struggle with robust learning of algorithms and physics.
    11:53 - AI models learn procedures but struggle with generalization across reasoning types.
    13:45 - Updates on AI models and tools, including Gemini and QWQ performance.

  • @ozten
    @ozten День тому

    Vibes for math is fascinating! As a layman... it seems like having LLms memorize multiplication, addition tables up to 12 x 12 and then to shell out to tool use for anything more complicated. I don't understand high maths, but I assume this is the wrong approach in those domains. Always insightful, thank you

  • @dansplain2393
    @dansplain2393 23 години тому

    14:33 I’m sorry that I can’t do a shock faced thumbnail… Matthew Berman mentioned?

  • @draken5379
    @draken5379 День тому

    There was tons of AI news in the last couple of days. Just cause something isnt super promoted like an OpenAI or Google reveal, doesnt mean things are happening.
    The best open source video model released this week, for one.

  • @icegiant1000
    @icegiant1000 День тому +1

    Sora is still amazing, reading the prompts they used, it occurred to me we are super close to being able to give Sora/01-preview a novel, and letting it make an entire movie on its own. Imagine you heard of a book you have never read, and hitting the render button, and out comes a complete movie. Then imagine telling AI to research peoples opinions, reviews, or scholarly papers about a novel, and work those insights into the movie to improve it. Or, perhaps you don't like how a movie ended, just tell AI to change it to how you want it. Or, you love a book, you love the AI rendered movie, so you ask AI to generate a sequel movie, one that no one has ever written. Exciting times, but I worry we are never gonna leave our houses, we will just be glued to our computers.

    • @Hexanitrobenzene
      @Hexanitrobenzene 13 годин тому

      ...and your last sentence is on to something. Dunno, I have a bad feeling all this is very unwise. Human intelligence exceeds their wisdom, proven time and time again.

  • @dylancope
    @dylancope День тому

    10:05 sounds a lot like human folk-physics, and physical hallucinations sound a lot like dreams.
    To be honest, a simulator that hallucinates random kind of implausible transitions between heuristically related physical states doesn't sound like such a bad thing for RL training

  • @DavidMCammack
    @DavidMCammack День тому

    14:33 subtly throws some well-deserved shade at
    the many UA-camrs over-hyping AI, like 😱 OMG 🤯

  • @PakistanIcecream000
    @PakistanIcecream000 День тому +2

    I look forward to the day when you update Simplebench with the performance results of Gemini experimental 1123. I know you say it is rate limited but still.

  • @SvetlinNikolovPhx
    @SvetlinNikolovPhx 6 годин тому

    As a guy who's been writing physics simulators and AI for the past 20 years:
    It's an offense to simulators to call this thing a "simulator".
    It's an imitator.

  • @trentondambrowitz1746
    @trentondambrowitz1746 День тому +2

    No AGI yet? I'm disappointed.
    I hope one of OpenAI's goodies is an improved visual reasoning model!

  • @LukeJAllen
    @LukeJAllen День тому

    Thanks for the upload!!
    almost 300k subs

  • @cacogenicist
    @cacogenicist День тому +2

    Reasons with a bunch of heuristics, eh? We're now complaining about these models being _too_ similar to human minds. 😊
    They need tools. And/or we need modular assemblages of models trained on narrower domains.

  • @Arcticwhir
    @Arcticwhir День тому +2

    2:29 .3 is basically randomness

  • @EwanNeeve
    @EwanNeeve День тому

    Can I just ask: you mention at 7:45 that you've covered this SIMA agent on your channel before, but I watch your channel religiously and I can't remember it being mentioned before. Can you please advise what the title of the relevant video is?

    • @aiexplained-official
      @aiexplained-official  День тому

      It was not the full focus of a video, it was a mention from a paper. Would be roughly around the time of the RT2, sorry I can't be more specific

    • @Hexanitrobenzene
      @Hexanitrobenzene 12 годин тому

      ​@@aiexplained-official
      Could be handy to have a searchable database of the transcripts of your videos...
      It's interesting, though, how you produce videos. Don't you have a document where you outline briefly what you will be talking about ?

  • @TheMirrorslash
    @TheMirrorslash День тому +3

    So, Genie 2 isn't at all a game generator, since it's not real time it isn't really interactive. You put in your control input first and then it generates as a response. There's no player, you don''t respond to an obstacle by jumping, the obstacle is generated because you inputed jump beforel. There's no goal and no challenge. Or am I missing something?

    • @zoeherriot
      @zoeherriot День тому +3

      We should really stop calling them games - there is no concept of game rules in these, and no good way to make them. It's just a walking simulator at best. I think by "not real time" - they are implying there is latency from your input to the actual generation of the next frame.

  • @anonymes2884
    @anonymes2884 День тому +3

    Our regular dose of sanity :). Yeah, yet to be convinced that hallucinations are solvable or can even be reliably reduced to some particular chosen percentage and I think maybe it's starting to be suggestive that they ALL know that IF they solve them, their stock skyrockets and yet none of them _have_ up to now, nor do they _really_ seem substantively closer IMO.
    Might well be coloured by my suspicion at this point that the best outcome for humanity _may well_ be if AI is mostly hype though, maybe i'm seeing what I "want" to see :).

    • @TheSCBGeneral
      @TheSCBGeneral День тому

      As he said in the video, reducing AI hallucinations to 0% would require fundamentally new architectures and training methods. Hallucinations in LLMs aren't a bug that can be simply patched out with enough time and resources; they're a symptom of the limitations of LLMs as a whole. That's why I think OpenAI is moving away from models like ChatGPT and closer towards models with improved reasoning like o1.

    • @CoolIcingcake3467
      @CoolIcingcake3467 День тому

      "maybe i'm seeing what I "want" to see :)."
      Which is confirmation bias?

  • @jsalsman
    @jsalsman День тому +2

    Never trust a math answer until it's confirmed with e.g. sympy in code execution. (Oh sorry o1-*)

  • @sharpcircle6875
    @sharpcircle6875 День тому +4

    - Babe! New-
    - Already watching it ;)

  • @ellielikesmath
    @ellielikesmath День тому

    the reason they have hallucinations is because it's still neural networks approx an infinite rule with a finite approximation. you may want to dial back the hype by a couple orders of magnitude.

  • @timwang4659
    @timwang4659 День тому +2

    Been watching this channel since pretty much the beginning back when the hype of LLMs were thru the roof and you were thinking AGI 2025. But now, it feels like we are hitting the ceiling with current levels of AI technology. The inherent flaws of transformers (hallucinations) are starting to become more prevalent. These models aren't simply regurgitating training data like skeptics claim but they are not thinking clearly and logically either. It really is just "vibes/heuristics" that these models are using. It's generalizing from the training data and if the training data is large enough, it can generalize pretty well. But in the end, it's not really "thinking". We definitely need a new paradigm.

    • @maciejbala477
      @maciejbala477 День тому

      yeah, I just made a comment as well in a similar tune. It's one aspect of LLMs that I don't see any improvements in, nor do I see a solution to fix that being talked about. It's probable that hallucinations will always be a problem no matter what with transformers-based models

    • @aiexplained-official
      @aiexplained-official  День тому

      Great points but I never said AGI 2025. The one time I guessed a figure for a proto-AGI was 2028, and that would be for LLM-Modulo systems

  • @DanielSeacrest
    @DanielSeacrest День тому

    The paper "PROCEDURAL KNOWLEDGE IN PRETRAINING DRIVES REASONING IN LARGE LANGUAGE MODELS" id be curious to see an investigation of the phenomena of grokking with this methodology.
    I feel like the occurrence of grokking within models kind of answers the "not at 100%" question and can the truly ever be completely reliable? From what I understand a completely grokked model would be able to, but it'd make sense in the mean time as a sort of learning phase they would use a kind of approximated heuristics.

  • @arssve4109
    @arssve4109 День тому +1

    Why push LLMs to do physics and maths problems at all? It is obvious they should instead be querying other dedicated calculations for constructing their math answers.. You do not need to run huge GPUs for 1789 - 12.463, a calculator from 60s can do it

    • @maciejbala477
      @maciejbala477 День тому

      I assume because you can? it's a nice challenge to overcome if they could do maths by themselves. Obviously you are right, and currently that's totally what one should be doing with the available LLMs, but it's a weird flaw for an entity that's supposed to be able to "think" logically, and so it's a challenge to solve for the future

    • @arssve4109
      @arssve4109 День тому

      @@maciejbala477 It is fun to see people try initially, but it has been a while, and it is simply not a way to do it because transformers generate probabilistic sequences, while the number of possible sequences specifying simple math operations with variable digit count exceeds model parameter count by many orders of magnitude. It is why math is about learning the rules, not what is a probabilistic next number in a sequence 3.673 + 1.746e3 = ... It is obviously not the way to do it... And any engine that can execute the rules on numbers qualifies as a calculator

  • @Milennin
    @Milennin День тому

    I don't believe hallucinations will be gone, but they'll probably be reduced even further. They're already less common in current models than they were 1-2 years ago, so that's good.

  • @impolitevegan3179
    @impolitevegan3179 День тому

    people have been reporting very good results with qwq. Have you used any system prompt?

  • @dijikstra8
    @dijikstra8 День тому

    The research looking into which neurons are activated is very interesting, and sort of reminiscent of the kind of research made on human brains, investigate which parts of the brain are activated given certain impressions for instance. It makes sense to me that something like "226-68" would activate neurons around 150-200, that's pretty similar to how humans can make a round about estimate before they start actually analyzing the question and calculating the answer to more accuracy.
    I don't think we can ever expect neural networks to always give perfect answers, humans certainly don't with our very very advanced neural networks, but we can perhaps expect them to come closer. A more likely route to me though is using agents for something like this, the neural network simply has to understand that "this question can be solved by a calculator", and then call an external calculator function which can do the calculation in a deterministic and much more efficient way.
    In a similar way, perhaps we could have e.g. physics agents which the neural network can interact with in order to get the simulation right. It's not like humans are great at imagining the exact physics of e.g. gravity without actually calculating the path an object would take.

  • @executivelifehacks6747
    @executivelifehacks6747 День тому +3

    AIexplained just dropped fam! Before GTA6!

    • @captain_crunk
      @captain_crunk День тому

      Yeah, well, that's because GTA6, just like birds, will never exist. Not now, not ever.
      _[flies away like a bir, er, pterodactyl]_

  • @mAny_oThERSs
    @mAny_oThERSs День тому +1

    Im curious how they will differentiate gpt 5 and o1. It wouldn't make sense to release o1 if gpt 5 were smarter than it anyways, but at the same time coming up with a new gpt model that isnt even sota is still kind of weak.

    • @rousabout7578
      @rousabout7578 День тому

      Your question presumes GPT 5

    • @mAny_oThERSs
      @mAny_oThERSs День тому

      @rousabout7578 well eventually it'll be there

    • @rousabout7578
      @rousabout7578 День тому

      @@mAny_oThERSs Have they confirmed they will continue the naming convention? How do you know 01 isn't it?

    • @mAny_oThERSs
      @mAny_oThERSs День тому

      @rousabout7578 they said they will continue gpt models and o models seperately

  • @Stephen_Lafferty
    @Stephen_Lafferty День тому +1

    Gosh, the twelve days of AI-mas! I wonder if there will be leaps forward or incremental upgrades?

  • @steffenaltmeier6602
    @steffenaltmeier6602 День тому

    Did you run Qwen models on your benchmark as well? the 72b parameter one seems to do very well on most benchmarks, especially considering it's size. and there is also the new QwQ model with chain of thought baked in. it's still an early version, but it's quite interesting as it's clearly o1 inspired.

    • @steffenaltmeier6602
      @steffenaltmeier6602 День тому

      wow.... i stopped the video the second you mentioned QwQ... XD

    • @steffenaltmeier6602
      @steffenaltmeier6602 День тому

      do you have any idea why QwQ did poorly? is it maybe stuck in reasoning loops as is warned about on their website?

  • @LucaCrisciOfficial
    @LucaCrisciOfficial День тому

    Autonomy and self improving are also a big step toward superintelligence. There have been some steps forward in these fields in last weeks

  • @Antremodes
    @Antremodes День тому

    On the QWQ result, did you include "You should think step-by-step." in the system prompt? I noticed it tends behave like normal Qwen 32B otherwise and they should have made a note in the model card.

  • @BunnyOfThunder
    @BunnyOfThunder День тому +1

    There hasn't been AI news? Maybe not general intelligence news but video, image, and integrations have been going pretty steady. Like Anthropic's MCP. Some of the progress is going to be elbow grease but it's no less important.

  • @ryzikx
    @ryzikx День тому +1

    i dont think hallucinations will ever go away. they will decrease but never hit 0%

  • @ginogarcia8730
    @ginogarcia8730 День тому

    is o1-preview now a little bit better as with people now getting this thing where it sometimes stops in the middle of its chain-of-thought to try and 'correct' itself and starts another chain-of-thought?

  • @Nore_258
    @Nore_258 День тому

    I think the last thing that came out, which I use nearly every day, was the DeepSeek-R1-Lite Preview, about half a month ago (14 days). From OpenAI, I guess that would be the o1 preview. Honestly, though, I prefer DeepSeek R1 Lite. There was also that botched update to 4o, which resulted in worse performance except for creative writing. Hopefully, the next 12 days will bring some much-needed new models and features from OpenAI. I'm actually excited for the Pro plan; I've never needed such high rate limits anyway.

  • @Chef_PC
    @Chef_PC 14 годин тому

    Just imagine what we'll be talking about two papers down the road.

  • @user-on6uf6om7s
    @user-on6uf6om7s День тому

    When you say these generators won't replace AAA development any time soon, when is soon? Because we've gone from a couple seconds of super low-res platformers to this in less than a year. I can't imagine this technology in 5 years which is pretty soon by any normal metric.

  • @user-pt1kj5uw3b
    @user-pt1kj5uw3b День тому +2

    Can't wait to get 30 seconds of video generation per week or something like that

  • @walidoutaleb7121
    @walidoutaleb7121 День тому

    is accounting for api cost a sensible thing for benchmarks like simple bench. for me performance per compute is as interesting as raw performance and i havent seen anyone do it.

  • @nPr26_50
    @nPr26_50 День тому +4

    Damn you're fast