OpenAI o1 - the biggest black box of all. Let’s break it open.

The Tech Trance

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 січ 2025

КОМЕНТАРІ • 104

@stephentsang9194 3 місяці тому ⁺⁷
u deserve so much for making your research and experience available for your audience. a super job! thank you
@TheTechTrance 3 місяці тому
I appreciate that! Glad I can help
@vanessaaa.paaark 3 місяці тому ⁺¹²
Thanks for this detailed explanation. I just shared it with a colleague who was also wondering about o1’s architecture
@TheTechTrance 3 місяці тому ⁺³
Share the love and share the knowledge 😎
@BABEENGINEER 3 місяці тому ⁺⁹
Thank you for helping us understand open AI models better ❤
@TheTechTrance 3 місяці тому
Just doing what I love 🤓
@TestMyHomeChannel 3 місяці тому ⁺²
Brilliantly condensed and fast-paced explanation of O1, mixing facts with clear logic. Thank you for demystifying such a complex concept!
@TheTechTrance 3 місяці тому
Thank you for noticing! Glad it was helpful!
@flickwtchr 3 місяці тому ⁺³
Love the straightforward presentation style. Well done. I think with several viewings of this video, I'll have at least a bit of a grasp regarding the architecture, function, etc.
@TheTechTrance 3 місяці тому
Definitely! It’s a lot I packed into it, to make it as comprehensive as possible :)
@tiffany33094 3 місяці тому ⁺⁴
Woah that was such a good breakdown. Great to understand o1 (and LLMs) on a deeper level. Thank you
@TheTechTrance 3 місяці тому
Glad it was helpful!
@viniciusdugue3063 3 місяці тому ⁺²
This video is incredible! Exactly what I was looking for.
@drhxa 3 місяці тому ⁺¹
Great explenation of Let's verify step by step and how that research was applied. Thank you so much for sharing.
Really excited to see what others do with this and how far scaling this can take us
@TheTechTrance 3 місяці тому ⁺¹
Glad you liked it! +1 on the scaling breakthrough
@SapienSpace 3 місяці тому
If accurate, this is the best explanation of this I have seen so far, thank you for sharing!
@Jerrel.A 3 місяці тому ⁺²
10+ for the topic, content and presentation skills.
@TheTechTrance 3 місяці тому ⁺¹
I appreciate that!
@Jerrel.A 3 місяці тому
@@TheTechTrance Great!
@matt.stevick 3 місяці тому ⁺²
the algorithm got me here. looks to be extremely up my ally, great!
and your a ML engineer…
subbed done ✅
@TheTechTrance 3 місяці тому
Welcome! :)
@Let010l01go 2 місяці тому
In my opinion, I think the problem with having to use so many methods and steps as you have described is that we set up a difficult model (we do it easily, we know it or we don't know it, but we do it easily first) and that is a "black box" and we have to adjust a lot of things because we don't know what it is. In the end, it becomes difficult in the end. This a Great E.P, thx a lots❤
@TheTechTrance 2 місяці тому ⁺¹
Agreed, the models are growing in complexity. As of now each of the steps serve a purpose and later down the line, a simplified version of the design will likely be developed. We will see! Thanks for watching!
@Let010l01go 2 місяці тому
@@TheTechTrance Yes, I agreed.
@DataIsBeautifulOfficial 3 місяці тому ⁺⁴
Is o1 actually reasoning, or are we just getting better at mistaking noise for intelligence?
@SunnyNagam 3 місяці тому ⁺¹
If it's just as useful who cares
@TheTechTrance 3 місяці тому ⁺¹
The mechanism for reasoning are there (called RL-Tree-Q* unofficially), so it's getting more ~intelligent.
That said, it's hallucinations are also getting more ~intelligent.
x.com/DrJimFan/status/1837174801435349131
@RickeyBowers 3 місяці тому ⁺³
I'd say that calling it "reasoning" is marketing - we need to focus on accuracy. This technique is engineered to increase accuracy.
@tollington9414 3 місяці тому ⁺²
It can only reason well on things it that are in already in its training set, and the problem is, we the consumer aren’t told what exactly is in there, so you roll the dice when you ask it to do something. It’ll do it brilliantly if its seen it before, else you’ll get a load crap back.
@RickeyBowers 3 місяці тому
@@tollington9414 the multi-step training also steers the model away from unknown topics - the effect is similar to how a student might reply with problem adjacent information without solving the problem. Errors are more difficult to find in some cases or clearly not addressing the problem in others.
@estyalasu 3 місяці тому ⁺³
Ohhh I get a full education every time I come to your channel 📚🤓
@TheTechTrance 3 місяці тому
No detail left behind 🤓
@meenuthind 2 місяці тому
Wow! What a great explanation!! 🤩
@TheTechTrance 2 місяці тому
Thank you! 😊
@joschjosch8859 3 місяці тому
Very cool. Glad I discovered your channel. Keep up the good work.
@TheTechTrance 3 місяці тому
Welcome! And thank you :)
@jesussaeta8383 3 місяці тому
Thank you so much, that was awesome.
@TheTechTrance 3 місяці тому
@@jesussaeta8383 my pleasure, glad you enjoyed!
@____2080_____ 2 місяці тому
Looking forward Graph of Thought thinking inference
@starpause 3 місяці тому
Awesome breakdown 🙏
@human_shaped 3 місяці тому
Good video. Some bits of information I think a lot of people hadn't heard.
@TheTechTrance 3 місяці тому
Yea a lot of concepts that this taps into!
@Skarredghost 3 місяці тому
Very informative video, thanks for making it!
@TheTechTrance 3 місяці тому ⁺¹
My pleasure. Glad you liked it!
@TheBestNameEverMade 3 місяці тому
Awesome explanation!
@Let010l01go 2 місяці тому
Another thing I think is whether the model looks at the world from statistics or from the real world (physics) or hypbrid, I think it's all good, depending on whether it's useful to us or not. Great E.P!🎉
@TheTechTrance 2 місяці тому ⁺¹
I believe it was trained only with text, audio, and images/video data - to develop comprehension and responses. Physical data would be used more in the context of robotics - to develop spatial understanding and take actions
@Let010l01go 2 місяці тому
@@TheTechTrance Yes, you right.
@darylallen2485 3 місяці тому
How does it feel to work in a field which is seeing such explosive growth at this point in history?
Thanks for the explanation.
@TheTechTrance 3 місяці тому
It feels invigorating! Also overwhelming at times, since it’s going at such fast speeds. But I guess there’s no slowing down in sight so we do our best :)
@junchen-jm2vg 3 місяці тому
I am impressed of your presentation from original rules perception, you may go to Open AI as presales, looking forward to having deep discussion in O2.
@TheTechTrance 3 місяці тому
i appreciate that! OpenAI can contact me anytime haha
@elpablitorodriguezharrera 3 місяці тому
Whoa thanks for your explanation!
If this how o1 was trained, do you think it's the most effective & efficient way?
And what do you think could be improved with memory, caching, and context window?
@TheTechTrance 3 місяці тому
You're welcome!
In terms of effective, I think RL is a a great way to achieve/emulate System 2 thinking.
In terms of efficient, I wonder why OpenAI keeps the model purely LLM-based. They could also be incorporating logic based languages... like programming languages into their chain of thought. Then o1 would falter less on "how many r's are in strawberry" and "when is 9.11 greater than 9.9" type of questions
No thoughts on their memory and such, I'm moreso familiar with their model architecture/design :)
@elpablitorodriguezharrera 3 місяці тому
@@TheTechTrance that's such a good idea, the question is, how do we know that LLM is doing nothing when it's in the idle mode? The more I learn these AI thing by reading books and papers as having no computer science degree the more I don't understand nothing.
@thenextension9160 3 місяці тому
This was great thank you
@yeezythabest 3 місяці тому ⁺²
Who tf can dislike this video ?
@TheTechTrance 3 місяці тому ⁺¹
!!! 🥺🥺
@mimameta 3 місяці тому
Probably Sam Altman
@GNARGNARHEAD 3 місяці тому
great analysis, thanks
@TheTechTrance 3 місяці тому
My pleasure!
@gregoryw1 3 місяці тому
So interesting and helpful
@schnibitz 3 місяці тому
So is there a chance that they’re going to eventually be able to drastically improve on things like hallucinations and inaccuracies by simply increasing the inference time?
@TheTechTrance 3 місяці тому
that's what we're seeing with o1 already! of course more improvements are always and still needed, but this is in the right direction
@andrewlewin6525 3 місяці тому ⁺³
Sheesh… you put the open back into the openAI 😅
@TheTechTrance 3 місяці тому ⁺¹
hahah this made me laugh
@TheTechTrance 3 місяці тому ⁺¹
somebody had to do it!
@gileneusz 3 місяці тому
I wouldn't be surprised if there are 1-30 instances of gpt-4omini running in behind simultaneously, and one gpt-4o instance deciding which are correct
@TheTechTrance 3 місяці тому ⁺¹
That would be the 05:12 majority vote approach (similar to rolling a dice and seeing which one we land on most) but o1 is instead doing the Tree of Thoughts (a more elegant approach)
@gileneusz 3 місяці тому
@@TheTechTrance thanks!
@gregtanaka3406 3 місяці тому
Well done!
@squidinjam 3 місяці тому
great video!
@ahmadzaimhilmi 3 місяці тому
While I acknowledge o1 is super good, I just feel that the reasoning method can be replicated with agents framework like crewai or autogen. It's only a matter of time before someone shares his/project on github.
@TheTechTrance 3 місяці тому
The agents frameworks are great for getting tasks done, but I'm not so sure about solving problems. eg crosswords, math problems, coding exercises, etc. o1 is geared towards solving problems via reasoning
@potatoetales 3 місяці тому
Love the video! But please make sure ding.mp3 is not way louder than the rest of the video 🙏
@TheTechTrance 3 місяці тому
Noted, thanks for the feedback!
@quantumspark343 2 місяці тому
I think Q* stands for Quiet STaR (thinking and self taught reasoner), which is another paper, not the Q learning with A*
@TheTechTrance 2 місяці тому ⁺¹
I believe you are right, good catch!
@quantumspark343 2 місяці тому
@@TheTechTrancewow thanks, wasnt expecting that 😳
@geldverdienenmitgeld2663 3 місяці тому
If correct, still many human feedback necessary in the loop of AI training.
@memegazer 3 місяці тому
Not with strawbrary (joke spelling)
With straberry they used syntheticdata and and an expert agent
basically the synthetic data would generate search trees in steps, and the expert would only reward when the correct answer was arrived at with fewest steps
@TheTechTrance 3 місяці тому ⁺¹
There are two separate moments when human feedback is used for reinforcement learning:
- RLHF, but now that's been transitioned to RLAIF (at 10:40)
- RL-Tree-Q* (unofficial name): to train its Process Reward Model, a human labels whether the steps of a solution are correct, incorrect, or neither (at 13:58)
@memegazer 3 місяці тому
I do not think that humans are going to be a bottle neck with synthetic data
if you read the google paper on universal provers they demonstrate that a simple implementation of occam's razor removes the need for dependence on humans for feedback
@gileneusz 3 місяці тому
5:57 I also have this book, and I also read it, kind of 😆
@TheTechTrance 3 місяці тому
haha guilty!
@and1play5 3 місяці тому
thank uuuuuuu
@VR_Wizard 3 місяці тому
Are you sure about the active learning part with iterative human labeling of the examples it messed up?
O1 is good at Coding and Math booth problems where the final answear can be checked automatically. So yes active learning would make sense but the system can check itself if the answear was correct and only use the paths that led to the true answear. Also it could look for the path with the least steps leading to the correct nswear likely this is also the best path. All this needs no human labeling and would explain why math and coding got so much better (In my testing coding did not get so much better often Antropics sonnet is doing a better job. Math seems to see bigger gains but even here it failed often solving my problems)
@TheTechTrance 3 місяці тому
The active learning is for solutions with the wrong final answer but highly rated steps. The existence of these solutions can be automatically checked for, but their steps would still need human labeling - to see how and at which step it arrived at the wrong final answer
@VR_Wizard 3 місяці тому
@@TheTechTrance thanks for the reply.
I would agree that human labeling makes sense in some cases like:
1. The model never converges for some problem types.
2. Improving performance on one type of problem reduces performance on solving others.
3. We need to validate reasoning patterns that could transfer to non-verifiable domains.
However I question the need for human labeling by default in math/coding problems. If highly rated steps lead to wrong answers, those steps were fundamentally wrong for that type of problem and should be rated lower. Since we can automatically explore paths and verify answers, the system can find optimal reasoning patterns on its own. The only situation where rating paths lower doesn't work is when this hurts the performance when solving other tasks.
@Charles-Darwin 3 місяці тому
Wouldnt omni be the 'cortex', and not the 2nd brain. I would think gpt4/t, since theyre quite good, and they have that deep breadth to them like our own 2nd brain function. I think theyve just shifted 4/t to the 2nd brain tasks and have omni out front for input streams. Reason i think so, is our cortex needs to be in the nanosecond rates, where omni clearly is magitudes faster and with all modalities just like our own. (I dont see too many ppl discussing the speed with this new liquidity of inputs, by far the most impressive aspect of omni imo).
See a plant 🌵 and you immediately know its a plant (omni/cortex/1st brain driven), but what variant/type of plant? Can you eat it? Well thats where you contemplate and ponder on it (2nd brain) by tapping into all relevant knowledge and deduce: well maybe its prickly, pricklies hurt, it might be quite the ordeal to eat it despite it probably being safe to.
I think o1 is all things held constant (models wise), theyve just added COT to the cluster and maybe bc the scientists comments, there might be some novel new rlhf-replacement.
[Wrote this while listening, i see you mention this toward the end]😅
@TheTechTrance 3 місяці тому
Your thoughts were spot on!
@ran_domness 3 місяці тому
How confident are you that this is actually how the model was created?
@TheTechTrance 3 місяці тому ⁺¹
I’m very confident. I did my research and cited my sources. It’s in consensus with other industry leaders. Of course there’s details not included that only an OpenAI researcher would have, but hopefully this video gave you a better understanding of how o1 was designed, trained, and its impact wrt the neural scaling laws.
@IvanMeouch 3 місяці тому ⁺¹
Good luck with the channel. I love seeing women engineers.
@TheTechTrance 3 місяці тому
thank you, just getting started :)
@MeridianMindset 3 місяці тому
Holy Based
@Tony_Indiana 3 місяці тому
This was f*cking awesome! I have my throat, head, hands and other parts tattooed. And in my own way, I understood it. My problem.... ability to extend compute time during inference will cuck accessibility and democratization of AI technology.
Fancy people like her can still get their hair did and do their ai fancy stuff. But the rest of us - ugh.
Did anyone catch it? I was getting high. Whilst listening to some banging dubstep! But she mentioned "01" models and QAR algorithm. This is speculative stuff ATM ya? The Q-Learning, Carlo Tree searchin.... holy sherlock homie. I mean this limb needs more branches.
Come on.. If we learned anything from Q: Never trust an intelligent woman. The candy is not a reward, it is a trap. Yet over and over and over the same mistakes made. rawr people.
(sorry)
@joseph24gt 3 місяці тому
！
@hanskraut2018 3 місяці тому ⁺²
You are the most gorgeous model in the end the scaling laws can't account for that 🌹
@etesianSealine 3 місяці тому
Loser
@jayeifler8812 3 місяці тому
Wow, are you married?
@TrungTran-hq2ys 3 місяці тому
PILLAMEEOWR

Наступне

Автоматичне відтворення

AI can't cross this line and we don't know why.