Love the straightforward presentation style. Well done. I think with several viewings of this video, I'll have at least a bit of a grasp regarding the architecture, function, etc.
Great explenation of Let's verify step by step and how that research was applied. Thank you so much for sharing. Really excited to see what others do with this and how far scaling this can take us
In my opinion, I think the problem with having to use so many methods and steps as you have described is that we set up a difficult model (we do it easily, we know it or we don't know it, but we do it easily first) and that is a "black box" and we have to adjust a lot of things because we don't know what it is. In the end, it becomes difficult in the end. This a Great E.P, thx a lots❤
Agreed, the models are growing in complexity. As of now each of the steps serve a purpose and later down the line, a simplified version of the design will likely be developed. We will see! Thanks for watching!
The mechanism for reasoning are there (called RL-Tree-Q* unofficially), so it's getting more ~intelligent. That said, it's hallucinations are also getting more ~intelligent. x.com/DrJimFan/status/1837174801435349131
It can only reason well on things it that are in already in its training set, and the problem is, we the consumer aren’t told what exactly is in there, so you roll the dice when you ask it to do something. It’ll do it brilliantly if its seen it before, else you’ll get a load crap back.
@@tollington9414 the multi-step training also steers the model away from unknown topics - the effect is similar to how a student might reply with problem adjacent information without solving the problem. Errors are more difficult to find in some cases or clearly not addressing the problem in others.
Another thing I think is whether the model looks at the world from statistics or from the real world (physics) or hypbrid, I think it's all good, depending on whether it's useful to us or not. Great E.P!🎉
I believe it was trained only with text, audio, and images/video data - to develop comprehension and responses. Physical data would be used more in the context of robotics - to develop spatial understanding and take actions
It feels invigorating! Also overwhelming at times, since it’s going at such fast speeds. But I guess there’s no slowing down in sight so we do our best :)
I am impressed of your presentation from original rules perception, you may go to Open AI as presales, looking forward to having deep discussion in O2.
Whoa thanks for your explanation! If this how o1 was trained, do you think it's the most effective & efficient way? And what do you think could be improved with memory, caching, and context window?
You're welcome! In terms of effective, I think RL is a a great way to achieve/emulate System 2 thinking. In terms of efficient, I wonder why OpenAI keeps the model purely LLM-based. They could also be incorporating logic based languages... like programming languages into their chain of thought. Then o1 would falter less on "how many r's are in strawberry" and "when is 9.11 greater than 9.9" type of questions No thoughts on their memory and such, I'm moreso familiar with their model architecture/design :)
@@TheTechTrance that's such a good idea, the question is, how do we know that LLM is doing nothing when it's in the idle mode? The more I learn these AI thing by reading books and papers as having no computer science degree the more I don't understand nothing.
So is there a chance that they’re going to eventually be able to drastically improve on things like hallucinations and inaccuracies by simply increasing the inference time?
That would be the 05:12 majority vote approach (similar to rolling a dice and seeing which one we land on most) but o1 is instead doing the Tree of Thoughts (a more elegant approach)
While I acknowledge o1 is super good, I just feel that the reasoning method can be replicated with agents framework like crewai or autogen. It's only a matter of time before someone shares his/project on github.
The agents frameworks are great for getting tasks done, but I'm not so sure about solving problems. eg crosswords, math problems, coding exercises, etc. o1 is geared towards solving problems via reasoning
Not with strawbrary (joke spelling) With straberry they used syntheticdata and and an expert agent basically the synthetic data would generate search trees in steps, and the expert would only reward when the correct answer was arrived at with fewest steps
There are two separate moments when human feedback is used for reinforcement learning: - RLHF, but now that's been transitioned to RLAIF (at 10:40) - RL-Tree-Q* (unofficial name): to train its Process Reward Model, a human labels whether the steps of a solution are correct, incorrect, or neither (at 13:58)
I do not think that humans are going to be a bottle neck with synthetic data if you read the google paper on universal provers they demonstrate that a simple implementation of occam's razor removes the need for dependence on humans for feedback
Are you sure about the active learning part with iterative human labeling of the examples it messed up? O1 is good at Coding and Math booth problems where the final answear can be checked automatically. So yes active learning would make sense but the system can check itself if the answear was correct and only use the paths that led to the true answear. Also it could look for the path with the least steps leading to the correct nswear likely this is also the best path. All this needs no human labeling and would explain why math and coding got so much better (In my testing coding did not get so much better often Antropics sonnet is doing a better job. Math seems to see bigger gains but even here it failed often solving my problems)
The active learning is for solutions with the wrong final answer but highly rated steps. The existence of these solutions can be automatically checked for, but their steps would still need human labeling - to see how and at which step it arrived at the wrong final answer
@@TheTechTrance thanks for the reply. I would agree that human labeling makes sense in some cases like: 1. The model never converges for some problem types. 2. Improving performance on one type of problem reduces performance on solving others. 3. We need to validate reasoning patterns that could transfer to non-verifiable domains. However I question the need for human labeling by default in math/coding problems. If highly rated steps lead to wrong answers, those steps were fundamentally wrong for that type of problem and should be rated lower. Since we can automatically explore paths and verify answers, the system can find optimal reasoning patterns on its own. The only situation where rating paths lower doesn't work is when this hurts the performance when solving other tasks.
Wouldnt omni be the 'cortex', and not the 2nd brain. I would think gpt4/t, since theyre quite good, and they have that deep breadth to them like our own 2nd brain function. I think theyve just shifted 4/t to the 2nd brain tasks and have omni out front for input streams. Reason i think so, is our cortex needs to be in the nanosecond rates, where omni clearly is magitudes faster and with all modalities just like our own. (I dont see too many ppl discussing the speed with this new liquidity of inputs, by far the most impressive aspect of omni imo). See a plant 🌵 and you immediately know its a plant (omni/cortex/1st brain driven), but what variant/type of plant? Can you eat it? Well thats where you contemplate and ponder on it (2nd brain) by tapping into all relevant knowledge and deduce: well maybe its prickly, pricklies hurt, it might be quite the ordeal to eat it despite it probably being safe to. I think o1 is all things held constant (models wise), theyve just added COT to the cluster and maybe bc the scientists comments, there might be some novel new rlhf-replacement. [Wrote this while listening, i see you mention this toward the end]😅
I’m very confident. I did my research and cited my sources. It’s in consensus with other industry leaders. Of course there’s details not included that only an OpenAI researcher would have, but hopefully this video gave you a better understanding of how o1 was designed, trained, and its impact wrt the neural scaling laws.
This was f*cking awesome! I have my throat, head, hands and other parts tattooed. And in my own way, I understood it. My problem.... ability to extend compute time during inference will cuck accessibility and democratization of AI technology. Fancy people like her can still get their hair did and do their ai fancy stuff. But the rest of us - ugh. Did anyone catch it? I was getting high. Whilst listening to some banging dubstep! But she mentioned "01" models and QAR algorithm. This is speculative stuff ATM ya? The Q-Learning, Carlo Tree searchin.... holy sherlock homie. I mean this limb needs more branches. Come on.. If we learned anything from Q: Never trust an intelligent woman. The candy is not a reward, it is a trap. Yet over and over and over the same mistakes made. rawr people. (sorry)
u deserve so much for making your research and experience available for your audience. a super job! thank you
I appreciate that! Glad I can help
Thanks for this detailed explanation. I just shared it with a colleague who was also wondering about o1’s architecture
Share the love and share the knowledge 😎
Thank you for helping us understand open AI models better ❤
Just doing what I love 🤓
Brilliantly condensed and fast-paced explanation of O1, mixing facts with clear logic. Thank you for demystifying such a complex concept!
Thank you for noticing! Glad it was helpful!
Love the straightforward presentation style. Well done. I think with several viewings of this video, I'll have at least a bit of a grasp regarding the architecture, function, etc.
Definitely! It’s a lot I packed into it, to make it as comprehensive as possible :)
Woah that was such a good breakdown. Great to understand o1 (and LLMs) on a deeper level. Thank you
Glad it was helpful!
This video is incredible! Exactly what I was looking for.
Great explenation of Let's verify step by step and how that research was applied. Thank you so much for sharing.
Really excited to see what others do with this and how far scaling this can take us
Glad you liked it! +1 on the scaling breakthrough
If accurate, this is the best explanation of this I have seen so far, thank you for sharing!
10+ for the topic, content and presentation skills.
I appreciate that!
@@TheTechTrance Great!
the algorithm got me here. looks to be extremely up my ally, great!
and your a ML engineer…
subbed done ✅
Welcome! :)
In my opinion, I think the problem with having to use so many methods and steps as you have described is that we set up a difficult model (we do it easily, we know it or we don't know it, but we do it easily first) and that is a "black box" and we have to adjust a lot of things because we don't know what it is. In the end, it becomes difficult in the end. This a Great E.P, thx a lots❤
Agreed, the models are growing in complexity. As of now each of the steps serve a purpose and later down the line, a simplified version of the design will likely be developed. We will see! Thanks for watching!
@@TheTechTrance Yes, I agreed.
Is o1 actually reasoning, or are we just getting better at mistaking noise for intelligence?
If it's just as useful who cares
The mechanism for reasoning are there (called RL-Tree-Q* unofficially), so it's getting more ~intelligent.
That said, it's hallucinations are also getting more ~intelligent.
x.com/DrJimFan/status/1837174801435349131
I'd say that calling it "reasoning" is marketing - we need to focus on accuracy. This technique is engineered to increase accuracy.
It can only reason well on things it that are in already in its training set, and the problem is, we the consumer aren’t told what exactly is in there, so you roll the dice when you ask it to do something. It’ll do it brilliantly if its seen it before, else you’ll get a load crap back.
@@tollington9414 the multi-step training also steers the model away from unknown topics - the effect is similar to how a student might reply with problem adjacent information without solving the problem. Errors are more difficult to find in some cases or clearly not addressing the problem in others.
Ohhh I get a full education every time I come to your channel 📚🤓
No detail left behind 🤓
Wow! What a great explanation!! 🤩
Thank you! 😊
Very cool. Glad I discovered your channel. Keep up the good work.
Welcome! And thank you :)
Thank you so much, that was awesome.
@@jesussaeta8383 my pleasure, glad you enjoyed!
Looking forward Graph of Thought thinking inference
Awesome breakdown 🙏
Good video. Some bits of information I think a lot of people hadn't heard.
Yea a lot of concepts that this taps into!
Very informative video, thanks for making it!
My pleasure. Glad you liked it!
Awesome explanation!
Another thing I think is whether the model looks at the world from statistics or from the real world (physics) or hypbrid, I think it's all good, depending on whether it's useful to us or not. Great E.P!🎉
I believe it was trained only with text, audio, and images/video data - to develop comprehension and responses. Physical data would be used more in the context of robotics - to develop spatial understanding and take actions
@@TheTechTrance Yes, you right.
How does it feel to work in a field which is seeing such explosive growth at this point in history?
Thanks for the explanation.
It feels invigorating! Also overwhelming at times, since it’s going at such fast speeds. But I guess there’s no slowing down in sight so we do our best :)
I am impressed of your presentation from original rules perception, you may go to Open AI as presales, looking forward to having deep discussion in O2.
i appreciate that! OpenAI can contact me anytime haha
Whoa thanks for your explanation!
If this how o1 was trained, do you think it's the most effective & efficient way?
And what do you think could be improved with memory, caching, and context window?
You're welcome!
In terms of effective, I think RL is a a great way to achieve/emulate System 2 thinking.
In terms of efficient, I wonder why OpenAI keeps the model purely LLM-based. They could also be incorporating logic based languages... like programming languages into their chain of thought. Then o1 would falter less on "how many r's are in strawberry" and "when is 9.11 greater than 9.9" type of questions
No thoughts on their memory and such, I'm moreso familiar with their model architecture/design :)
@@TheTechTrance that's such a good idea, the question is, how do we know that LLM is doing nothing when it's in the idle mode? The more I learn these AI thing by reading books and papers as having no computer science degree the more I don't understand nothing.
This was great thank you
Who tf can dislike this video ?
!!! 🥺🥺
Probably Sam Altman
great analysis, thanks
My pleasure!
So interesting and helpful
So is there a chance that they’re going to eventually be able to drastically improve on things like hallucinations and inaccuracies by simply increasing the inference time?
that's what we're seeing with o1 already! of course more improvements are always and still needed, but this is in the right direction
Sheesh… you put the open back into the openAI 😅
hahah this made me laugh
somebody had to do it!
I wouldn't be surprised if there are 1-30 instances of gpt-4omini running in behind simultaneously, and one gpt-4o instance deciding which are correct
That would be the 05:12 majority vote approach (similar to rolling a dice and seeing which one we land on most) but o1 is instead doing the Tree of Thoughts (a more elegant approach)
@@TheTechTrance thanks!
Well done!
great video!
While I acknowledge o1 is super good, I just feel that the reasoning method can be replicated with agents framework like crewai or autogen. It's only a matter of time before someone shares his/project on github.
The agents frameworks are great for getting tasks done, but I'm not so sure about solving problems. eg crosswords, math problems, coding exercises, etc. o1 is geared towards solving problems via reasoning
Love the video! But please make sure ding.mp3 is not way louder than the rest of the video 🙏
Noted, thanks for the feedback!
I think Q* stands for Quiet STaR (thinking and self taught reasoner), which is another paper, not the Q learning with A*
I believe you are right, good catch!
@@TheTechTrancewow thanks, wasnt expecting that 😳
If correct, still many human feedback necessary in the loop of AI training.
Not with strawbrary (joke spelling)
With straberry they used syntheticdata and and an expert agent
basically the synthetic data would generate search trees in steps, and the expert would only reward when the correct answer was arrived at with fewest steps
There are two separate moments when human feedback is used for reinforcement learning:
- RLHF, but now that's been transitioned to RLAIF (at 10:40)
- RL-Tree-Q* (unofficial name): to train its Process Reward Model, a human labels whether the steps of a solution are correct, incorrect, or neither (at 13:58)
I do not think that humans are going to be a bottle neck with synthetic data
if you read the google paper on universal provers they demonstrate that a simple implementation of occam's razor removes the need for dependence on humans for feedback
5:57 I also have this book, and I also read it, kind of 😆
haha guilty!
thank uuuuuuu
Are you sure about the active learning part with iterative human labeling of the examples it messed up?
O1 is good at Coding and Math booth problems where the final answear can be checked automatically. So yes active learning would make sense but the system can check itself if the answear was correct and only use the paths that led to the true answear. Also it could look for the path with the least steps leading to the correct nswear likely this is also the best path. All this needs no human labeling and would explain why math and coding got so much better (In my testing coding did not get so much better often Antropics sonnet is doing a better job. Math seems to see bigger gains but even here it failed often solving my problems)
The active learning is for solutions with the wrong final answer but highly rated steps. The existence of these solutions can be automatically checked for, but their steps would still need human labeling - to see how and at which step it arrived at the wrong final answer
@@TheTechTrance thanks for the reply.
I would agree that human labeling makes sense in some cases like:
1. The model never converges for some problem types.
2. Improving performance on one type of problem reduces performance on solving others.
3. We need to validate reasoning patterns that could transfer to non-verifiable domains.
However I question the need for human labeling by default in math/coding problems. If highly rated steps lead to wrong answers, those steps were fundamentally wrong for that type of problem and should be rated lower. Since we can automatically explore paths and verify answers, the system can find optimal reasoning patterns on its own. The only situation where rating paths lower doesn't work is when this hurts the performance when solving other tasks.
Wouldnt omni be the 'cortex', and not the 2nd brain. I would think gpt4/t, since theyre quite good, and they have that deep breadth to them like our own 2nd brain function. I think theyve just shifted 4/t to the 2nd brain tasks and have omni out front for input streams. Reason i think so, is our cortex needs to be in the nanosecond rates, where omni clearly is magitudes faster and with all modalities just like our own. (I dont see too many ppl discussing the speed with this new liquidity of inputs, by far the most impressive aspect of omni imo).
See a plant 🌵 and you immediately know its a plant (omni/cortex/1st brain driven), but what variant/type of plant? Can you eat it? Well thats where you contemplate and ponder on it (2nd brain) by tapping into all relevant knowledge and deduce: well maybe its prickly, pricklies hurt, it might be quite the ordeal to eat it despite it probably being safe to.
I think o1 is all things held constant (models wise), theyve just added COT to the cluster and maybe bc the scientists comments, there might be some novel new rlhf-replacement.
[Wrote this while listening, i see you mention this toward the end]😅
Your thoughts were spot on!
How confident are you that this is actually how the model was created?
I’m very confident. I did my research and cited my sources. It’s in consensus with other industry leaders. Of course there’s details not included that only an OpenAI researcher would have, but hopefully this video gave you a better understanding of how o1 was designed, trained, and its impact wrt the neural scaling laws.
Good luck with the channel. I love seeing women engineers.
thank you, just getting started :)
Holy Based
This was f*cking awesome! I have my throat, head, hands and other parts tattooed. And in my own way, I understood it. My problem.... ability to extend compute time during inference will cuck accessibility and democratization of AI technology.
Fancy people like her can still get their hair did and do their ai fancy stuff. But the rest of us - ugh.
Did anyone catch it? I was getting high. Whilst listening to some banging dubstep! But she mentioned "01" models and QAR algorithm. This is speculative stuff ATM ya? The Q-Learning, Carlo Tree searchin.... holy sherlock homie. I mean this limb needs more branches.
Come on.. If we learned anything from Q: Never trust an intelligent woman. The candy is not a reward, it is a trap. Yet over and over and over the same mistakes made. rawr people.
(sorry)
!
You are the most gorgeous model in the end the scaling laws can't account for that 🌹
Loser
Wow, are you married?
PILLAMEEOWR