AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution

AI Explained

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 гру 2024

КОМЕНТАРІ •

@kairi4640 6 днів тому ⁺¹²⁸
I appreciate that you're one of the few ai UA-camrs that actually admits there hasn't been big news recently and doesn't overhype same stuff just for views.
@CalConrad 6 днів тому ⁺⁴⁹⁸
The best part about the next 12 days, will be your 12 videos breaking it down.
@aiexplained-official 6 днів тому ⁺¹⁷⁰
God that might be a bit much! But I will be following everything scrupulously, don't worry
@countofst.germain6417 6 днів тому ⁺⁸¹
@@aiexplained-official we expect 12! No more no less.
@a.thales7641 6 днів тому ⁺¹³
I want at least 3 videos. 4 is enough.
@daveogfans413 6 днів тому ⁺¹
@@countofst.germain6417 Best he can give is one 12 minute video or twelve 1 minute videos.
@Dannnneh 6 днів тому ⁺²
Maybe in the format of those one-minute shorts.
@anywallsocket 6 днів тому ⁺¹⁵⁵
learning a 'bag of heuristics' rather than explicit maths is how i skipped through my undergraduate degree lol
@aiexplained-official 6 днів тому ⁺¹⁷
Nice
@sitrakaforler8696 5 днів тому ⁺¹⁷
Like all Engineers hahahahah
@Kazekoge101 5 днів тому ⁺²
@@sitrakaforler8696 what are these mythical "bag of heuristics" you speak of? (asking as a 11th grade math dropout)
@testservergameplay 6 днів тому ⁺¹⁴⁵
by far the best AI youtube channel. No unnecessary hype, no baseless rumors, no creepy AI art for thumbnails, just pure objective analysis.
@cashitortrashit9939 6 днів тому
That is so funny. I can hardly understand him he is talking with the mouthfull of marbles, no doubt🙄
@Adam-nw1vy 5 днів тому ⁺¹
The creepy AI art for thumbnails is god awful
@marpleka 5 днів тому ⁺¹
Are you serious or such naive?
@testservergameplay 5 днів тому ⁺¹
@@marpleka I'm especially serious about the AI art thumbnails from other AI news channels
@nand3kudasai 4 дні тому
No distracting music.
@Keatwonobe 6 днів тому ⁺¹⁵⁹
Being under a million subs this far in is insane. Your 'spidey sense' with the nuance of ai progress is unbelievably precise.
@aiexplained-official 6 днів тому ⁺¹⁶
Thanks Keat
@AmandaFessler 6 днів тому ⁺²
Philip Pharker?
@psylocyn 5 днів тому ⁺⁴
Yeah, but later you can be all hipster about being here when his channel was small
@gubzs 6 днів тому ⁺¹²³
If we solve hallucinations I am kicking off my personal "we are now in the future" project. I have spent the last year developing a procedural immersive fantasy world simulator _with the entire design portfolio_ formatted as instructions for agentic AI. I have roughly 400 pages of instruction from power balance formulas, to world history instantiation, magic systems, UI/UX, an LoD system for simulation granularity and information retention, on and on. it's been bounced off Claude from start to finish at each step so I know interpretation is solid.
Such a thing _will_ exist. I will make certain of it.
@TheBouli 6 днів тому ⁺⁴
nice! Would love to test it out when it becomes playable :)
@dot1298 6 днів тому ⁺³
me too!
@marc_frank 6 днів тому ⁺²
cool
@maciejbala477 6 днів тому ⁺²
exciting! I'd definitely want to try it out once it comes out. So far, the only real game that is AI-driven is AI Roguelite, as far as I'm aware (I bought it but don't necessarily want to try it out just yet, as I'm waiting for 3rd party API key support since the dev told me he is considering adding it and it might come in a few weeks). AI Dungeon doesn't count, it's not really a game. Some could argue AI Roguelite isn't either (yet). That's one of the things I'm most excited about.
But also, on the other hand, I don't actually think solving hallucinations is a trivial problem and it might never occur without architecture change, so your caveat about that definitely tempers my excitement. Would love to believe, though, lol
@anywallsocket 6 днів тому ⁺²
@@maciejbala477 indeed, as the video explains, hallucinations are in some sense necessary for model creativity. if you want something to generalize, it needs to know how to fantasize -- this is not avoidable since it does not know the latent space you expect it to generalize to, it must guess it.
@Jasonknash101 6 днів тому ⁺²⁸
Thanks again for your great content. I love the fact that you can avoid click bait and still give us compelling headlines
@aiexplained-official 6 днів тому ⁺²
Thanks Jason
@AlexanderMoen 6 днів тому ⁺²²
I think the hallucination problem drastically drops once reliable, high-quality agent and function calling are out and accessible through an LLM. We just need something like a beefy 01 that has access to tons of tools that it calls reliably, and as long as those tools work properly, it'd be a huge leap forward. Fingers crossed that in the next 12 days OpenAI has something agent related out
@jackdurose3542 5 днів тому ⁺⁶
Re: Genie 2, I don't think a high fidelity world model is necessary for embodied agents. I'd see it as kind of like imagination. We don't need to model physics accurately in our heads to know roughly what will happen if we drive a car off a cliff, and I'd guess, in the context of embodied agents, the purpose of something like this is similar.
@aiexplained-official 5 днів тому
True
@breadbro0004 6 днів тому ⁺¹²
You are so fast! Best AI news channel imo
@holographic_red 6 днів тому
Definitely!
@therainman7777 6 днів тому ⁺³⁰
The OpenAI announcement AND a new AI Explained video? Merry Christmas everyone! 🎄
@Isabelle-w7i 6 днів тому ⁺¹¹
Glad you mentioned China’s push in AI towards the end of your video. Despite working with comparatively limited hardware, they’ve managed to shrink what could’ve been a multi-year "moat" to almost nothing. It’ll be fascinating to see how this plays out in the future!
@adanufgail 5 днів тому
@@Isabelle-w7i I mean Llama is at GPT3 levels and is open source so it's not hard to see how the world's largest electronics manufacturer country could catch up
@11Petrichor 5 днів тому ⁺²
The way you described how the AI multiplies numbers reminded me of Daniel Tammet. He's a savant that can multiply long numbers in his head with very high precision. The way he described how he did it was like synesthesia. So, he visualizes numbers with textures and shapes and somehow combines them into an output.
@tituscrow4951 6 днів тому ⁺¹⁷
The hallucination problem is this - it will be a long time before a model can be the fail safe for a process that has to judge physics or maths & get it right 1 shot every time. It puts a lot of uses which an Ai would be perfect for out of the picture for the foreseeable future anyway.
@anywallsocket 6 днів тому ⁺¹
these are definably guessing machines, their errors don't vanish, like if it learned to preform actual mathematics, they only shrink.
@fabim.3167 6 днів тому ⁺⁵
@@anywallsocket The same is true for the best human mathematicians!
@anywallsocket 5 днів тому
@@fabim.3167 haha very true! but not the same for typical functional programming -- which is what people colloquially associate ai with, and hence all their surprise when it gets stuff wrong.
@micbab-vg2mu 6 днів тому ⁺⁹
I turned the quiet period into a blueprint. By mapping every workflow (medical deparment in big pharma) , I've created a roadmap for the AI Agents that's already knocking at our door.
@burnytech 6 днів тому ⁺¹¹
Jensen's solution to everything is buying more GPUs 😂
@bossgd100 6 днів тому
Lol
@phronese 6 днів тому ⁺⁴
good insights from those papers that temper the expectations, looking forward to the full review of those papers
@reza2kn 6 днів тому ⁺³
That feeling whern you open UA-cam and there's an AI Explained video in your feed! 🔥😍
@aiexplained-official 6 днів тому ⁺¹
Haha thank you reza
@FakoyedeTimilehin 6 днів тому ⁺⁵
Another banger from Phillip. Bravo!!
@aiexplained-official 6 днів тому
Nice
@ExploreTheMind-kg8je 4 дні тому ⁺¹
AI Explained: Cut Through The Hype And Straight To the Point. Always informative, entertaining and grounded in reality, thank you!!!
@keeganpenney169 6 днів тому ⁺³
Knocking the real news out of the park, that's why we love you and the channel and community here phil!❤
@aiexplained-official 6 днів тому
Thanks keegan
@Devorkan 5 днів тому ⁺³
Why do they write million as MM? Million is M in the SI. It's almost 2025, maybe it's time we stopped using the Roman numeral for thousands which conflicts with the official SI unit for millions?
@Hexanitrobenzene 4 дні тому ⁺¹
Timestamp ?
@alan2here 6 днів тому ⁺¹⁴
All humans use the bag of heuristics approach, it's called thought. Even top physicists find counterintuitive physics in the world, and think in a collection of rules of thumb.
@juandesalgado 5 днів тому ⁺³
I was going to say, humans also hallucinate spherical cows. :) Reductionism is a thing.
Also, I have the impression that the hallucination problem is more a matter of expression than of psychosis. We all imagine in our minds, then choose (hopefully) what to say. The models should be free to "hallucinate" inference-time tokens, but then choose to voice out loud only facts than it can confirm, or at least qualify those facts that it cannot.
@N8O12 6 днів тому ⁺¹⁶
Literally like an hour ago I was scrolling through youtube and thought 'I wish AI Explained would upload again'
Awesome video as always by the way
@aiexplained-official 6 днів тому
Thanks N8
@rua999rua999 6 днів тому
lies 😂
@Rawi888 6 днів тому ⁺¹
Glad to be here. Great reporting my brother. 🩶
@AngeloWakstein-b7e 6 днів тому ⁺¹
Been waiting for SOOOOOO long for one of your videos! love them and cannot wait for more
@juandesalgado 5 днів тому ⁺¹
In the "games / interactive videos" at 6:38, it would be interesting to see if the model recognizes the boundary before a body of water; that is, if it prevents you from falling into the water, or if the character begins to wade through (or walk over the surface?!), or if it switches to swimming.
@toddwmac 6 днів тому ⁺¹
The reason I look to you for the real news....period. Thanks and happy holidays.
@GrindThisGame 6 днів тому ⁺²
Congrats on 300k subs!
@aiexplained-official 5 днів тому
Thanks Grind!
@Kleddamag 5 днів тому ⁺¹
Congratulations on reaching 300k subscribers!
@aiexplained-official 5 днів тому
Thanks to people like you Kledda! Hope we both make it to 1mil!
@vectoralphaSec 6 днів тому ⁺⁴
I'm overly excited about all your coverage over these next 12 days. Hopefully OpenAI surprise us with really exciting and cool announcements.
@OriginalRaveParty 6 днів тому ⁺²
Thank you for being the voice of integrity in a space full of overhyped clickbait 🏆
@ricosrealm 6 днів тому ⁺⁹
Did anyone notice the defects in aerial shot of the neighborhood? There are some driveways that don't actually connect to the street, along with other unrealistic aspects.
@anonymes2884 6 днів тому ⁺⁴
Yeah, the roofs of the houses in the middle blend into the road etc. It's impressive but it's a lot like every other AI image generator - seems great until you _really_ look at it and then there are almost always weird flaws.
(which is great in one sense - we can still _pretty much_ tell an AI image from reality, maybe except in mostly neutral categories like landscapes etc.)
@anywallsocket 6 днів тому
@@anonymes2884 yeah it's just HD slop
@JustSuds 5 днів тому ⁺³
The solution to having one apparent model that does well at hallucinating creative work and also reliably does physics is just higher level mixture of experts. The user converses with a model that is specialised in interfacing with the user, and it has access to other specialised expert models as tools. That way the prose model can level up independently from the physics model and the biology model, and so on. It comes back to that jagged frontier.
@En1Gm4A 6 днів тому ⁺⁹
thats exactly the problem with LLMs they need a graph backbone - middle layer - for solid representation of functions, symbolic abstracions and the world. the creativity is well needed for creating that middle layer and for creative work but not for most tasks
@anywallsocket 6 днів тому ⁺¹
probably they could be interfaced with a game engine, which they could learn to control to generate situations, but invariably the engine would compute the resulting physics -- then you can feed-back on that and cound get some mildly effective self-optimizations.
@skierpage 5 днів тому ⁺³
Maybe, but that's easier said than done! Decades of AI research in "solid representation and symbolic abstractions" got us approximately nowhere while in only a few years generative language models mastered language and almost anything that can be expressed as a sequence of characters.
Meanwhile, make LLM a tool user. It writes mini programs to compute answers, process data it pulls from the web, etc. It's the difference between talking to a slightly-drunk extremely smart person at a cocktail party vs. asking her things while she's sitting in front of a computer.
@En1Gm4A 5 днів тому ⁺¹
@@skierpage yeah true, but it really seems like that solid abstraction might be useful, and potentially much more power saveing. Or maybe solid abstraction is just the retrospect of an process that looks totally different. Let's see what rolls out to be true. I am here for the ride.
@anywallsocket 5 днів тому
@@skierpage lmao I like that metaphor!
@anywallsocket 5 днів тому
@@En1Gm4A in terms of power saving it’s really cooling these data centers, which I hear burn through pools of water every day just to keep gpt and the like online. The issue as I see it is that we’ve got computation down for now, but we have no good memory systems - ie like the brain, so we don’t have to re-compute the same sort of prompts all the time. Biological memory is leagues beyond the artificial stuff, unlike neural network computations. Neuromorphic is likely the way forward.
@serg331 5 днів тому ⁺³
You are the best ai channel. You are the one whose videos I look forward to. You are the one who doesn’t waste my time, and doesn’t lie to me. Thank you.
@lizardrain 5 днів тому ⁺²
9:23
Isn't that why humans have a logical side of the brain and a creative side of the brain? They shouldn't be mixed together as one model, you need models to represent different parts of the brain.
@sanesanyo 6 днів тому ⁺²
Been waiting for this ❤❤
@joelalain 6 днів тому
you're back! i was worried, we need ai updates! lol i can't wait to see if open ai release o1 finally, i'd love to see that as i hate the limits on o1-preview i feel so limited. i also noticed that grok 2 seems to be really getting good fast and i absolutely love it for new events as it seems way more neutral than most. this is an exciting time to be alive. and i say that while currently using an AI to help me understand convolutional neural network, writing the code, understanding the settings and improving the results. that's very meta
@chrisworth1625 5 днів тому ⁺¹
Another truly excellent video. Wonder how you get the time to plan such a rich narrative of information each week!
@aiexplained-official 5 днів тому
Thanks Chris, will have less time with these 12 days of announcements!
@erniea5843 5 днів тому
All of the recent AI accounts is hilarious. your channel has remained high quality and not just hype.
@cacogenicist 5 днів тому ⁺²
Reasons with a bunch of heuristics, eh? We're now complaining about these models being _too_ similar to human minds. 😊
They need tools. And/or we need modular assemblages of models trained on narrower domains.
@JarrydRLee 6 днів тому
Really appreciate the appropriate levels of hype on this channel.
@wck 5 днів тому ⁺²
hey, I'm wondering if your SimpleBench tests include a disclaimer to the LLMs that the question is designed to confuse LLMs, with a warning to look for irrelevant information and distractors? Because Andrew Mayne recenty wrote a blog post criticizing Apple AI reasoning paper by proving that their questions get 90% better results if the prompt has a simple notice like that at the top.
@aiexplained-official 5 днів тому ⁺¹
Yes we cover that in the technical report, it does boost results a little bit to warn them, some more than others, but only to the 50-55% range
@TheTechTrance 19 годин тому ⁺¹
Great coverage of AI news and papers, even without Huge AI News!! and Shocked Face!! 😄
@GotGooped 6 днів тому ⁺²
When you mentioned how current LLMs can't generalize from one type of reasoning to another, I was surprised you didn't mention grokking, which is supposed to allow for that to happen. When I heard about it originally (from Bycloud's video on it, recommended if you haven't seen it yet), I assumed that it would be used in a model at some point, and be the next "big thing", but it's been years since it's been discovered and I haven't heard of a single model to do it.
Obviously doing this naturally would require ~10x the compute which is pretty ridiculous, but supposedly there's ways to speed it up (I believe I saw something called FastGrokking or something similar). Meta getting 10x the compute from llama 3 to 4 has me excited at the possibility of it being used in a large model finally, but only time will tell. Maybe there's been some news on it since then? Would like to know what you think of this.
@stephenrodwell 5 днів тому ⁺¹
Thanks, excellent content, as always. 🙏🏼
@aiexplained-official 5 днів тому
Thanks stephen!
@TheForbiddenLOL 6 днів тому ⁺¹⁶
Very interesting. "High probability but not reliability" seems to implicate "System 1 thinking", and would mean that an architecture change to allow LLMs to collaborate with classical symbolic computing (something more robust than function calling) would be required - which I know is an idea that's been brought up a lot, but it seems like people are still focusing on trying to get the "instinctual world model predictions" to be 100% free of hallucinations, despite humans frequently making mistakes with their immediate, unconscious intuitions about the physical world.
@aiexplained-official 6 днів тому ⁺¹
Well put
@Lerc0 6 днів тому ⁺²
Does the "high probability but not reliability" extend do chain of thought though? There is theoretically the possibility that a LLM could sytemativly write out and perform each of the individual stages of a classical symbolic process with a probability that is high enough that it could be considered reliable.
Once you get into running a variable number of iterations modifying a state, there is a much greater possibility for controlled 'thinking'. It becomes conceptually quite similar to RNNs. Chain-of-thought is at the blunt instrument level, but what happens if instead of tokenising the inner monologue, when is talking to itself, it adds custom embeddings to the context. Or even just looping through the same layers multiple times. These are easy to implement, it's the training process that gets hard here, Its like training on the next text token alone has Sapir-Whorf'd ourselves.
@MarkusRessel 5 днів тому
I thought this would happen when OpenAI announced plugins for things like WolframAlpha, but nothing has changed since. I am highly skeptical of statements like "hallucinations will be eliminated in 2025".
@elibullockpapa9012 6 днів тому ⁺¹⁰
Please benchmark the amazon nova models!!
@aiexplained-official 6 днів тому
Good shout
@ozten 6 днів тому
Vibes for math is fascinating! As a layman... it seems like having LLms memorize multiplication, addition tables up to 12 x 12 and then to shell out to tool use for anything more complicated. I don't understand high maths, but I assume this is the wrong approach in those domains. Always insightful, thank you
@LukeJAllen 6 днів тому
Thanks for the upload!!
almost 300k subs
@aiexplained-official 6 днів тому ⁺¹
Hope you are subbed Luke!
@timeflex 6 днів тому ⁺¹
12:50 Yes, they don't. In order for the model to do that it must be able to analyze itself (its data), detect the pattern and then extract it as a new usable entity.
@David-tp7sr 6 днів тому ⁺⁸
You'll never be able to reduce "hallucinations" to 0 in most cases (unless you have a subject that you can use a verifier with, but that's not the LLM part doing the work). It's a feature of the encoding mechanism of neural networks. The brain works similarly: it hallucinates reality and then grounds it with evidence.
@puppergump4117 5 днів тому
As I understand it, the only way to reduce hallucinations is by limiting the diversity of the output. So if you ask how to make a nuke, it'll either hallucinate and give a bunch of misordered steps or give an overview that's more accurate.
You could probably make the thing not hallucinate just with the right context.
@Hollowed2wiz 3 дні тому
Yeah I agree. But does that means that embodiement is a requirement for AGI ?
I wonder if the only way to ground an AI model in reality is to put it in a physical body.
@jsalsman 6 днів тому ⁺²
Never trust a math answer until it's confirmed with e.g. sympy in code execution. (Oh sorry o1-*)
@claudioagmfilho 6 днів тому ⁺³
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, So no generative real time video yet!
@michaelwoodby5261 6 днів тому ⁺³
"We have to ship faster than the goalposts move" is such a killer line, and that really is the source of all these 'A.I. winter' forecasts. Progress has been insane, but expectations continue to surpass it.
@ckq 6 днів тому ⁺¹
12:00
The heuristics are the cool part cause they're much more computationally feasible.
If we want precise answers just use a calculator or generate some python code.
We don't want an LLM based calculator, we want an LLM based super smart human.
LLMs aren't supposed to be deterministic the randomness is what leads to creativity.
As an analogy to chess, I've been interested in an LLM specialized on chess not because it would be better than stockfish (it won't), but because it helps us understand the heuristics humans use and ideally can be integrated with an LLM that speaks English to help humans get better at chess.
@draken5379 5 днів тому
There was tons of AI news in the last couple of days. Just cause something isnt super promoted like an OpenAI or Google reveal, doesnt mean things are happening.
The best open source video model released this week, for one.
@ekstrajohn 6 днів тому ⁺¹²
Explaining the heuristics mechanics is gold
@TheLegendaryHacker 6 днів тому ⁺⁶
6:50 Philip has never played Elden Ring 😔
@user-pt1kj5uw3b 6 днів тому ⁺²
Can't wait to get 30 seconds of video generation per week or something like that
@greendra 5 днів тому ⁺¹
Asides from Tesla FSD 13 releasing I agree there haven't been many AI updates. Would be great if you could do a video on it / FSD in general seeing as it's by far the best real world AI.
@a.thales7641 5 днів тому
Suno v4 + some video tools i guess? and some kind of open source o1
@greendra 5 днів тому
Oh yeah good shout. Suno V4 is great
@Moberri123 5 днів тому ⁺²
Congratz on 300k subs!!
@aiexplained-official 5 днів тому
Thank you!
@icegiant1000 6 днів тому ⁺¹
Sora is still amazing, reading the prompts they used, it occurred to me we are super close to being able to give Sora/01-preview a novel, and letting it make an entire movie on its own. Imagine you heard of a book you have never read, and hitting the render button, and out comes a complete movie. Then imagine telling AI to research peoples opinions, reviews, or scholarly papers about a novel, and work those insights into the movie to improve it. Or, perhaps you don't like how a movie ended, just tell AI to change it to how you want it. Or, you love a book, you love the AI rendered movie, so you ask AI to generate a sequel movie, one that no one has ever written. Exciting times, but I worry we are never gonna leave our houses, we will just be glued to our computers.
@Hexanitrobenzene 4 дні тому
...and your last sentence is on to something. Dunno, I have a bad feeling all this is very unwise. Human intelligence exceeds their wisdom, proven time and time again.
@Dron008 5 днів тому ⁺¹
How can you be sure your benchmark isn’t being leaked when you feed it to an API? Cloud providers could detect it and fine-tune their models specifically for the test.
@aiexplained-official 5 днів тому ⁺¹
We only use APIs that have that as a strict condition of privacy.
@christopherblare6414 6 днів тому ⁺²
I think genie 2 is definitely a step towards embodied AI, but just a step.
If you can get safe robots in very simple environments, then you have safe robots. Once you have irl AI robots doing a thing, then you get big data for irl examples.
I think bad simulations could definitely bootstrap AI robots for non safety critical actions.
@sitrakaforler8696 5 днів тому ⁺¹
Timestamps (Powered by Merlin AI)
00:05 - AI news resurgence: OpenAI announces exciting releases over the next 12 days.
02:10 - OpenAI's new model shows promise but raises questions amid rapid AI advancements.
04:08 - Genie 2 enhances interactive environments using AI for gaming and web applications.
05:57 - AI generation quality comes with limitations and unexpected errors.
07:57 - Concerns about AI's hallucinations affecting reliability in training embodied agents.
09:53 - Transformers struggle with robust learning of algorithms and physics.
11:53 - AI models learn procedures but struggle with generalization across reasoning types.
13:45 - Updates on AI models and tools, including Gemini and QWQ performance.
@Mapleson 5 днів тому ⁺²
10:14 So AI follows superstition and rituals, but tyey have yet to internally develop a framework similar to the scientific process?
@arssve4109 6 днів тому ⁺¹
Why push LLMs to do physics and maths problems at all? It is obvious they should instead be querying other dedicated calculations for constructing their math answers.. You do not need to run huge GPUs for 1789 - 12.463, a calculator from 60s can do it
@maciejbala477 6 днів тому
I assume because you can? it's a nice challenge to overcome if they could do maths by themselves. Obviously you are right, and currently that's totally what one should be doing with the available LLMs, but it's a weird flaw for an entity that's supposed to be able to "think" logically, and so it's a challenge to solve for the future
@arssve4109 6 днів тому
@@maciejbala477 It is fun to see people try initially, but it has been a while, and it is simply not a way to do it because transformers generate probabilistic sequences, while the number of possible sequences specifying simple math operations with variable digit count exceeds model parameter count by many orders of magnitude. It is why math is about learning the rules, not what is a probabilistic next number in a sequence 3.673 + 1.746e3 = ... It is obviously not the way to do it... And any engine that can execute the rules on numbers qualifies as a calculator
@dansplain2393 5 днів тому
14:33 I’m sorry that I can’t do a shock faced thumbnail… Matthew Berman mentioned?
@timwang4659 6 днів тому ⁺²
Been watching this channel since pretty much the beginning back when the hype of LLMs were thru the roof and you were thinking AGI 2025. But now, it feels like we are hitting the ceiling with current levels of AI technology. The inherent flaws of transformers (hallucinations) are starting to become more prevalent. These models aren't simply regurgitating training data like skeptics claim but they are not thinking clearly and logically either. It really is just "vibes/heuristics" that these models are using. It's generalizing from the training data and if the training data is large enough, it can generalize pretty well. But in the end, it's not really "thinking". We definitely need a new paradigm.
@maciejbala477 6 днів тому
yeah, I just made a comment as well in a similar tune. It's one aspect of LLMs that I don't see any improvements in, nor do I see a solution to fix that being talked about. It's probable that hallucinations will always be a problem no matter what with transformers-based models
@aiexplained-official 6 днів тому
Great points but I never said AGI 2025. The one time I guessed a figure for a proto-AGI was 2028, and that would be for LLM-Modulo systems
@sharpcircle6875 6 днів тому ⁺⁴
- Babe! New-
- Already watching it ;)
@aiexplained-official 6 днів тому
Haha
@patrickzupanc1795 6 днів тому
Thank you for the great video!
@Wheezy_calyx 5 днів тому
The genie 2 model makes me think of a robot looking at a construction zone and in a split second mapping out its path up a scaffold.
@HAL9000. 6 днів тому ⁺¹
"xxx breaks their silence" is used as clickbait titles for mainstream media. Someone can go a day to say something and its "Breaks their silence!"
@PakistanIcecream000 6 днів тому ⁺²
I look forward to the day when you update Simplebench with the performance results of Gemini experimental 1123. I know you say it is rate limited but still.
@executivelifehacks6747 6 днів тому ⁺³
AIexplained just dropped fam! Before GTA6!
@captain_crunk 6 днів тому
Yeah, well, that's because GTA6, just like birds, will never exist. Not now, not ever.
_[flies away like a bir, er, pterodactyl]_
@Kmykzy 6 днів тому
9:48 My feeling is that between now and 2-3 orders of magnitude of computation increase this will be covered by an emergent function of the system if we take into consideration the human brain. You don't need to make the silicone brain be able to simulate a close to reality copy of the real world, you just need to to be good enough to simulate basic interactions while pulling physics from a second model heavily trained on spotting and flagging those physics.
I think the breakthroughs will come by either artificially separating the tasks our brains does into synthetic nuclei like our organic brain and have this solved with only 10-100x of the current processing power, or just increase increase increase computation and let it naturally sort itself out in 4-5 orders of magnitude increase. In either case this will be solved by scale and maybe just hurried along by the compartmentalization into cores for the more complex specialized jobs like object permanence, physics simulations, logical chain of thought, etc.
@ellielikesmath 5 днів тому
the reason they have hallucinations is because it's still neural networks approx an infinite rule with a finite approximation. you may want to dial back the hype by a couple orders of magnitude.
@Stephen_Lafferty 6 днів тому ⁺¹
Gosh, the twelve days of AI-mas! I wonder if there will be leaps forward or incremental upgrades?
@TheMirrorslash 6 днів тому ⁺³
So, Genie 2 isn't at all a game generator, since it's not real time it isn't really interactive. You put in your control input first and then it generates as a response. There's no player, you don''t respond to an obstacle by jumping, the obstacle is generated because you inputed jump beforel. There's no goal and no challenge. Or am I missing something?
@zoeherriot 6 днів тому ⁺³
We should really stop calling them games - there is no concept of game rules in these, and no good way to make them. It's just a walking simulator at best. I think by "not real time" - they are implying there is latency from your input to the actual generation of the next frame.
@anonymes2884 6 днів тому ⁺³
Our regular dose of sanity :). Yeah, yet to be convinced that hallucinations are solvable or can even be reliably reduced to some particular chosen percentage and I think maybe it's starting to be suggestive that they ALL know that IF they solve them, their stock skyrockets and yet none of them _have_ up to now, nor do they _really_ seem substantively closer IMO.
Might well be coloured by my suspicion at this point that the best outcome for humanity _may well_ be if AI is mostly hype though, maybe i'm seeing what I "want" to see :).
@TheSCBGeneral 6 днів тому
As he said in the video, reducing AI hallucinations to 0% would require fundamentally new architectures and training methods. Hallucinations in LLMs aren't a bug that can be simply patched out with enough time and resources; they're a symptom of the limitations of LLMs as a whole. That's why I think OpenAI is moving away from models like ChatGPT and closer towards models with improved reasoning like o1.
@CoolIcingcake3467 5 днів тому
"maybe i'm seeing what I "want" to see :)."
Which is confirmation bias?
@SvetlinNikolovPhx 4 дні тому
As a guy who's been writing physics simulators and AI for the past 20 years:
It's an offense to simulators to call this thing a "simulator".
It's an imitator.
@dijikstra8 6 днів тому
The research looking into which neurons are activated is very interesting, and sort of reminiscent of the kind of research made on human brains, investigate which parts of the brain are activated given certain impressions for instance. It makes sense to me that something like "226-68" would activate neurons around 150-200, that's pretty similar to how humans can make a round about estimate before they start actually analyzing the question and calculating the answer to more accuracy.
I don't think we can ever expect neural networks to always give perfect answers, humans certainly don't with our very very advanced neural networks, but we can perhaps expect them to come closer. A more likely route to me though is using agents for something like this, the neural network simply has to understand that "this question can be solved by a calculator", and then call an external calculator function which can do the calculation in a deterministic and much more efficient way.
In a similar way, perhaps we could have e.g. physics agents which the neural network can interact with in order to get the simulation right. It's not like humans are great at imagining the exact physics of e.g. gravity without actually calculating the path an object would take.
@Antremodes 5 днів тому
On the QWQ result, did you include "You should think step-by-step." in the system prompt? I noticed it tends behave like normal Qwen 32B otherwise and they should have made a note in the model card.
@steffenaltmeier6602 6 днів тому
Did you run Qwen models on your benchmark as well? the 72b parameter one seems to do very well on most benchmarks, especially considering it's size. and there is also the new QwQ model with chain of thought baked in. it's still an early version, but it's quite interesting as it's clearly o1 inspired.
@steffenaltmeier6602 6 днів тому
wow.... i stopped the video the second you mentioned QwQ... XD
@steffenaltmeier6602 6 днів тому
do you have any idea why QwQ did poorly? is it maybe stuck in reasoning loops as is warned about on their website?
@ryzikx 6 днів тому ⁺¹
i dont think hallucinations will ever go away. they will decrease but never hit 0%
@classicalmechanic8914 5 днів тому ⁺²
Exact science cannot be modeled by probabilistic algorithm.
@Hexanitrobenzene 4 дні тому
Yeah, I thought so, too.
@trentondambrowitz1746 6 днів тому ⁺²
No AGI yet? I'm disappointed.
I hope one of OpenAI's goodies is an improved visual reasoning model!
@DavidMCammack 5 днів тому
14:33 subtly throws some well-deserved shade at
the many UA-camrs over-hyping AI, like 😱 OMG 🤯
@nPr26_50 6 днів тому ⁺⁴
Damn you're fast
@Charles-Darwin 6 днів тому ⁺⁵
I've been saying this "bag of heuristics" is the key for over a year now. Think about it, LLMs at the root of it's inner workings are heuristics of language - the semantics.... the stuff "in-between". Then, clearly this applies to audio-visual systems. It's patterns are heuristics, and since there's this common root across domains that we know of, zooming in and out of scopes and of any type is possible. Heuristics are heuristics because they apply to every level of our own universe (in their concentration)... it's just the process of gleaning or deducing and then isolating and proofing, it gradually becomes and immutable truth.
What's even weirder about all of our human active systems, is we learn these things innately, years and years before we can even begin to define them at their essential parts. You learn what the force of gravity is within a month or two and how to exert yourself on it/with it. You understand the way light works within about that same time... but it took centuries of generations before a human actually proved these things out mathematically. It's just extremely weird to think about that 'inversion', yet we all know it so well.
@Milennin 5 днів тому
I don't believe hallucinations will be gone, but they'll probably be reduced even further. They're already less common in current models than they were 1-2 years ago, so that's good.
@BAAPUBhendi-dv4ho 6 днів тому ⁺¹
Bruh, you went into hibernation for weeks. It's hard to live without Singularity copium
@Arcticwhir 6 днів тому ⁺²
2:29 .3 is basically randomness
@AllisterVinris 6 днів тому
I hope you will make another video (or several) about those 12 days.
Also am I the only one thinking the thing with the AI in the game being told to open the red door is VERY ironic ; you know, because of stanley parable. "Is Stanley an AI in training for embodiment" next video on The game theorists. Idk, I find it funny.
@faolitaruna 6 днів тому ⁺¹³
1:50 What are you watching, son?
@Maximillian_Space 6 днів тому ⁺⁷
Science project, on Egyptian Gods, dad!
@anywallsocket 6 днів тому ⁺⁵
@@Maximillian_Space there is no science there son, only the sweet scent of sin
@apeman939 6 днів тому ⁺⁴
There’s nothing in the prompt that calls for this level of fox mommy
@Maximillian_Space 6 днів тому
@@apeman939 That's just a sad realisation of the content it was trained on
@sirsamiboi 5 днів тому ⁺²
Furries are gonna have a field trip with Sora 😭
@LucaCrisciOfficial 5 днів тому
Autonomy and self improving are also a big step toward superintelligence. There have been some steps forward in these fields in last weeks
@GrimLife 6 днів тому
I'd like to hear your take on AWS's new Automated Reasoning checks. While foundational world models are cool, I'm more interested in seeing a text-based version of the same thing - like a super advanced text adventure game. Think Inform 7, but taken to the next level with actual reasoning capabilities.
@KevinSanMateo-p1l 6 днів тому ⁺¹
So the sora they showed us was not consumer level ready. It was a hypothetical version. Now we are getting the real thing. They would have finished much faster if they made it open source
@byrnemeister2008 6 днів тому
More likely there was no business model for it. Not good enough for Hollywood. To expensive to run for consumers.
@anonymes2884 6 днів тому ⁺⁴
Those demos are far from flawless (houses in the middle of roads, roofs merging into roads etc.) but yeah, with AI right now that _does_ seem to be what they consider "consumer level ready".
They've held it back to get the guardrails reliably right IMO (there are _very_ obvious abuse possibilities with convincing video generation).

Наступне

Автоматичне відтворення