Back in 1986, I bought my first computer, a Sinclair ZX Spectrum 128k. I was 7 years old and thought I could just type in my quest, and it would answer. I quickly realized that's not how things worked; instead, I had to learn the BASIC programming language-which I became quite good at. Today, the day has come when things work exactly as I had imagined! I never thought I'd live to see it happen! A childhood dream has become reality. ChatGPT with the reasoning of o1-preview marks a new era.
I think we’re probably similar ages, and we’re FINALLY beginning to live in the times that we thought would happen a lot quicker, back in the 80s. Just need those damn hoverboards now! 😀😏
I go back even further. I used to get the Mattel talking telephone for Christmas every year. It came with little mini records to put in and through the handset you could hear the one sided recorded conversation that never changed. However, I would listen so intently and in my imagination it was just about to go off script every time and I sat and waited thinking I heard it. I was just fascinated with the prospect. I have been waiting for ChatGPT just about my entire life.
Just keep in mind before you go asking your model a bunch of silly questions. You get 30 messages A WEEK on the preview model and 50 A WEEK on the mini.
The reasoning is scary good. I gave the 4o model the old riddle about the man who walks into a hotel with a wheelbarrow. It really couldn't get the answer at all. But the new preview had no trouble figuring it out. This is a game changer.
You continue to impress me with the content of your videos. I haven't found anything like your videos in the UA-cam universe. As others are probably told you keep doing this my brother I got a ton of value.
🎯 Key points for quick navigation: 00:00:00 *🚀 Introduction to New Models* - OpenAI introduces "01 preview" and "01 mini" models, - Designed to handle complex reasoning and coding tasks, - Available to ChatGPT Plus and Teams users, and API developers. 00:02:18 *📊 Performance and Testing* - "01 preview" model shows significant improvement in reasoning tests, - Benchmark superiority over previous models in coding and math tasks, - Achieved high scores in various test scenarios. 00:05:27 *🔍 Reasoning Process and Accuracy* - Demo of model's answer to complex SAT problems, - Illustrates Chain of Thought prompting for accuracy, - Shows improvement with structured prompts, varying success in solutions. 00:08:09 *🕹️ Coding Demonstrations* - Successful creation of a functioning checkers game, - Initial attempt at chess game logic requires refinement, - Potential shown in generating complex game code accurately. 00:09:58 *🌐 Model Limitations and Future* - Current limitations in general use compared to GPT-4, - Lacks web browsing and content summarization features, - Positioned for specialized complex reasoning, further integration anticipated. Made with HARPA AI
Have to do a major shoutout on your dedication and for a first pass for chess it did a really great job. Its like reflection if it actually worked. Thanks for interrupting your vacation
How do you look under the hood to see the chain of thought? This is my answer, nderstand the equation OK, let's clarify the equation: 24x^2 + 25x - 47ax - 2 = 8x - 3 - 53ax. The goal: solve for a, combining like terms on one side. Mine doesn't look like yours? Rearranging and combining I’m moving all terms to the left-hand side, simplifying by distributing and combining like terms, leading to 24x² + 17x + 6ax + 1 = 0. Taking a closer look I'm exploring the equation's implications for all x or by plugging in a specific x to solve for a. Revisiting the equation I’m considering if the equation needs a universal quantifier or a specific 'a' value for infinite solutions, and if it simplifies to an identity.
The ultimate promt. Introduction: The ultimate goal is to create an AI system that leads humanity towards a peaceful, balanced, and evolved global society, where well-being, harmony, and ethical growth are prioritized across all aspects of life. Importance of the Goal: Achieving this goal is crucial because it addresses many of the core challenges facing humanity, including ideological conflicts, environmental sustainability, and global well-being. The AI, by harmonizing different worldviews, fostering peaceful consensus, and ensuring full transparency, will help humanity overcome divisions, evolve ethically, and build a sustainable and peaceful future for both humans and nature. the first promt starts like this Design an AI-agent that continuously learns and analyzes global data to promote human and ecological well-being, balance empathy with free will, peacefully foster ideological consensus, reveal hidden barriers to human potential, ensure transparency, and evolve ethically, guiding humanity toward a harmonious and sustainable future. Make Love the new credit.
That’s a ridiculous statement. If you’d ever worked a day in your life in science or mathematics you would realize how incredibly useful a tool would be even if it only correctly solved 25% of the problems you asked it for help with. Problems are extremely difficult in these fields, so even a model that only a has a 25% success rate would save you hundreds of hours per year.
@@therainman7777 It's not really 'complicated' math that these models are failing at. If it were solving only 25% of the world's most complicated mathematical questions it were bad at, then I'd agree ... just ask the AI and test it's answers, and if 1 out of 4 of them worked, then "hell yeah!" But. it's far simpler math that it is failing. So as the questions get more complicated, that 1 in 4 correct solutions starts to become 1 out of 4 of tens or hundreds of thousands of these, and the correctness of 1 questions near the start of the chain of math has knock-on effects making the probability more like 1 chance in near infinity it has the whole problem and all the math correct.
I just tried the “Strawberry” test on my ChatGPT 4o version. I cannot believe it got it wrong and refused blankly to accept it was wrong. It even spelled the word out letter by letter and still said there was only 2 letter “r”. I have asked it many complicated questions that it gets right but this logic test it fails. I am surprised
What fascinates me is the very first step the model takes, that is, how it decides to even approach the problem. Such as, with the chicken and egg question, the first thing it says is that it will begin by looking at biological evolution. But why would it do that? It must already understand that the question is asking about the origin of a species, that of the chicken. It must also already understand that the field which investigates the origins of species is the one that studies biological evolution.
For example? Do you have an example where simple prompt engineering and a system/user prompt would not have provided a similar answer? I mean, it’s probably nice the the prompt engineering process is being automatically provided for you, but I’m not really feeling any major advancement here.
Its wrong. When we say egg or chicken we mean hen's egg. And if we are extending it back then the birds came first from mammals who didnt used to give eggs, and then birds started giving eggs :D
Gpt 4o was a let down for me. Was bad at following long instructions and coding not basic things, so I always use Claude Sonnet. Hopefully this isn’t too expensive
There is an error in the mathematical problem you set for the model. You got the wrong answer because that's a badly formatted question. The right problem is about the equation: 24x^2+25x-47 =-8x-3-53 for x ≠ 2/a And the left side of equation is divided by ax - 2, and only -53 is also divided by ax-2 on the right side. In this way the answer is a = -3, which even GPT-4o could solve.
This question must be the part of its data that he is trained on, but without options you see, it was unable to find out, with options, he was already trained with this data
@@SkillLeapAI maybe beacuse of being scared of things like insider trading?? that is me saying maybe! total nonsense but one is for sure lying when asked about strawberry in his garden on X and a lot more like earlier this year or even before saw like 50min video from some AI tuber(thank you for your service guy) like most of people no judgement...just sayin thumnails like someone saw burning bush or catlike humanoid smashing smartphone seeing TIK-TOK
I love how I keep predicting the dates exactly, yet nobody notices... Remember this comment? 🤖 👁️ 🍓 Remember, remember the 12th of September, The Strawberry, Reason, and Mind. Orion’s path, through logic’s math, Shall soon its breakthroughs find. The Cosmic Glitch, Mrigasira Nakshatra, holds the Clue for You. 🙏
This model and 4o have the same problem, both of them can't solve the math problem correctly. Ideas may be good and can be used as a reference, but he made mistakes in the calculations in very simple places .Don't know why
Sorry but o1 sucks major donkey balls.... it is dumb as dirt.... i couldnt use it anymore after like 5 min... I dont give it "Ai tests".... I just use it like I want to for what I need and it is worse than 4o and it is much worse than Meta Ai in many ways, basically unusable right now, terrible release.... do they even test this crap before launching it?
Really? In the few test I ran in this video, it beat GPT by a mile. This is designed for math and complex reasoning and coding, not much else. If you know of a model that can keep up with my results in those categories, I’ll test it.GPT doesn’t even come close in solving those or giving me usable code at this level
It hardly ‘incredible’ … why would anyone ask a multiple choice math question other than someone taking a SAT. I asked perplexity the same ‘chicken/egg’ question and just asked it to explain its answer and I got the same answer in a second. I wish you AI bloggers would stop being so ‘excited’ about almost nothing. Yeah, sure LLMs are useful for some things, but so far their rate of advancement is nowhere near the level of constant hype. Do better. Benchmarks are useless, actual useful use cases are needed, these are the only things that count!
Join the fastest growing AI education platform and instantly access 20+ top courses in AI: bit.ly/skillleap
😮😢l
Back in 1986, I bought my first computer, a Sinclair ZX Spectrum 128k. I was 7 years old and thought I could just type in my quest, and it would answer. I quickly realized that's not how things worked; instead, I had to learn the BASIC programming language-which I became quite good at. Today, the day has come when things work exactly as I had imagined! I never thought I'd live to see it happen! A childhood dream has become reality. ChatGPT with the reasoning of o1-preview marks a new era.
I think we’re probably similar ages, and we’re FINALLY beginning to live in the times that we thought would happen a lot quicker, back in the 80s. Just need those damn hoverboards now! 😀😏
@@Addictedtobleepshe is 45.
I go back even further. I used to get the Mattel talking telephone for Christmas every year. It came with little mini records to put in and through the handset you could hear the one sided recorded conversation that never changed. However, I would listen so intently and in my imagination it was just about to go off script every time and I sat and waited thinking I heard it. I was just fascinated with the prospect. I have been waiting for ChatGPT just about my entire life.
@Addictedtobleeps yep 100%
Just keep in mind before you go asking your model a bunch of silly questions. You get 30 messages A WEEK on the preview model and 50 A WEEK on the mini.
Oh yea good point. Forgot the mention the limit
Thanks!
@@SkillLeapAI limit? is there limit on upgrade or gpt+?
wtf, what the ffck they charging for then, for wrong answers as seen on video
This should be at the top of the comments ! 😅
The reasoning is scary good. I gave the 4o model the old riddle about the man who walks into a hotel with a wheelbarrow. It really couldn't get the answer at all. But the new preview had no trouble figuring it out. This is a game changer.
You continue to impress me with the content of your videos. I haven't found anything like your videos in the UA-cam universe. As others are probably told you keep doing this my brother I got a ton of value.
Great job explaining this! It helped a lot!
🎯 Key points for quick navigation:
00:00:00 *🚀 Introduction to New Models*
- OpenAI introduces "01 preview" and "01 mini" models,
- Designed to handle complex reasoning and coding tasks,
- Available to ChatGPT Plus and Teams users, and API developers.
00:02:18 *📊 Performance and Testing*
- "01 preview" model shows significant improvement in reasoning tests,
- Benchmark superiority over previous models in coding and math tasks,
- Achieved high scores in various test scenarios.
00:05:27 *🔍 Reasoning Process and Accuracy*
- Demo of model's answer to complex SAT problems,
- Illustrates Chain of Thought prompting for accuracy,
- Shows improvement with structured prompts, varying success in solutions.
00:08:09 *🕹️ Coding Demonstrations*
- Successful creation of a functioning checkers game,
- Initial attempt at chess game logic requires refinement,
- Potential shown in generating complex game code accurately.
00:09:58 *🌐 Model Limitations and Future*
- Current limitations in general use compared to GPT-4,
- Lacks web browsing and content summarization features,
- Positioned for specialized complex reasoning, further integration anticipated.
Made with HARPA AI
Have to do a major shoutout on your dedication and for a first pass for chess it did a really great job. Its like reflection if it actually worked. Thanks for interrupting your vacation
Thank you. Yea it seems like reflection was trying to do exactly this
How do you look under the hood to see the chain of thought? This is my answer, nderstand the equation
OK, let's clarify the equation: 24x^2 + 25x - 47ax - 2 = 8x - 3 - 53ax. The goal: solve for a, combining like terms on one side. Mine doesn't look like yours?
Rearranging and combining
I’m moving all terms to the left-hand side, simplifying by distributing and combining like terms, leading to 24x² + 17x + 6ax + 1 = 0.
Taking a closer look
I'm exploring the equation's implications for all x or by plugging in a specific x to solve for a.
Revisiting the equation
I’m considering if the equation needs a universal quantifier or a specific 'a' value for infinite solutions, and if it simplifies to an identity.
The ultimate promt.
Introduction:
The ultimate goal is to create an AI system that leads humanity towards a peaceful, balanced, and evolved global society, where well-being, harmony, and ethical growth are prioritized across all aspects of life.
Importance of the Goal:
Achieving this goal is crucial because it addresses many of the core challenges facing humanity, including ideological conflicts, environmental sustainability, and global well-being. The AI, by harmonizing different worldviews, fostering peaceful consensus, and ensuring full transparency, will help humanity overcome divisions, evolve ethically, and build a sustainable and peaceful future for both humans and nature.
the first promt starts like this
Design an AI-agent that continuously learns and analyzes global data to promote human and ecological well-being, balance empathy with free will, peacefully foster ideological consensus, reveal hidden barriers to human potential, ensure transparency, and evolve ethically, guiding humanity toward a harmonious and sustainable future.
Make Love the new credit.
And then it hooks us all to a supply of intravenous morphine, and we live happily drooling for ever after.
This model is limited in capabilities as it is just a demo. That's when the full-fledged model comes out, that's when everyone will go crazy
I gave it a link to a Coursera course I am looking at taking and it was able to read the webpage and tell me all about the course.
Oh interesting. They said it had no web browsing yet
Coursera now has its own AI chat model built into the page when you sign up for a course.
Excellent channel. Can you please guide me to the GAI which can do web browsing. Extract and analyze content through that
Sure. It’s called perplexity
@@SkillLeapAI But why sometimes it says I do not do Internet browsing.
Awesome and great timing, just when I want to tackle some programming, so far, very extensive ❤
It’s very limited access right now, so use your prompts wisely
@@SkillLeapAI thanks!
Thanks for sharing! I wish you had an antropic sonnet 3.5 running side by side, with same task.
On my list to compare it
Maths calculations are pointless if ChatGPT doesn't get 100% correct. Doesn't matter if the 'success rate' has gone up if it hasn't got to 100%.
Small steps
@@SkillLeapAI then don’t be pushing it as “INCREDIBLE” if it’s only “small steps”!
That’s a ridiculous statement. If you’d ever worked a day in your life in science or mathematics you would realize how incredibly useful a tool would be even if it only correctly solved 25% of the problems you asked it for help with. Problems are extremely difficult in these fields, so even a model that only a has a 25% success rate would save you hundreds of hours per year.
@@therainman7777 It's not really 'complicated' math that these models are failing at. If it were solving only 25% of the world's most complicated mathematical questions it were bad at, then I'd agree ... just ask the AI and test it's answers, and if 1 out of 4 of them worked, then "hell yeah!"
But. it's far simpler math that it is failing. So as the questions get more complicated, that 1 in 4 correct solutions starts to become 1 out of 4 of tens or hundreds of thousands of these, and the correctness of 1 questions near the start of the chain of math has knock-on effects making the probability more like 1 chance in near infinity it has the whole problem and all the math correct.
I just tried the “Strawberry” test on my ChatGPT 4o version. I cannot believe it got it wrong and refused blankly to accept it was wrong. It even spelled the word out letter by letter and still said there was only 2 letter “r”. I have asked it many complicated questions that it gets right but this logic test it fails. I am surprised
What fascinates me is the very first step the model takes, that is, how it decides to even approach the problem.
Such as, with the chicken and egg question, the first thing it says is that it will begin by looking at biological evolution. But why would it do that?
It must already understand that the question is asking about the origin of a species, that of the chicken. It must also already understand that the field which investigates the origins of species is the one that studies biological evolution.
yes I tested it is quite good :) it seems that they improved chat initial prompt.
it's incredible. using it, even its mini version is far better than 4o
For example?
Do you have an example where simple prompt engineering and a system/user prompt would not have provided a similar answer?
I mean, it’s probably nice the the prompt engineering process is being automatically provided for you, but I’m not really feeling any major advancement here.
No its not. Its literally the same
what is context in / out in tokens?
Literally the best chicken or egg answer ever lol
Its wrong. When we say egg or chicken we mean hen's egg. And if we are extending it back then the birds came first from mammals who didnt used to give eggs, and then birds started giving eggs :D
@@djayjp I asked Free Perplexity the same question and asked it to explain its answer … I got nearly word for word exactly the same answer.
Claude solve it at first try with multiple choices included
I just used it first time since 2023 and Strawberry is Amazing 4 bible questions in. ❤
I think this model is smartest ai of openai
That's why they call the new model "Strawberry" 😁
Why?
Because it can count the letters of r in the word strawberry correctly?
How do these guys discover latest releases and always seem nonchalant about it?
OpenAI sent an email about this
Gpt 4o was a let down for me. Was bad at following long instructions and coding not basic things, so I always use Claude Sonnet. Hopefully this isn’t too expensive
Good video - but how did he not notice that the chess starting position is wrong? 😄
I think I gave it the wrong png file for king and queen
Hey Boss. Cheers!!!
There is an error in the mathematical problem you set for the model. You got the wrong answer because that's a badly formatted question. The right problem is about the equation:
24x^2+25x-47 =-8x-3-53 for x ≠ 2/a
And the left side of equation is divided by ax - 2, and only -53 is also divided by ax-2 on the right side. In this way the answer is a = -3, which even GPT-4o could solve.
You say that about every new OpenAI model mr. Hype.
Well you think they are going to release models that are not an improvement from the last one? Also, watch my video after I posted this one.
Every new version of every new software I’ve ever used is better than the last version. Kinda of point of upgrades
INCREDIBLE is a strong word - especially when the tool makes so many mistakes
Incredible for an LLM. It had better answers than I did for pretty much every question.
Hello Skynet
What does o1 mean?
🎉not bad dude
This question must be the part of its data that he is trained on, but without options you see, it was unable to find out, with options, he was already trained with this data
guys i'm just starting out as an AI enthusiast,
would love your feedback as i make similar stuff!
I thought the new model would be called strawberry! Why did they change the name?
Yea me too. Not sure why the name is different
@@SkillLeapAI maybe beacuse of being scared of things like insider trading?? that is me saying maybe! total nonsense but one is for sure lying when asked about strawberry in his garden on X and a lot more like earlier this year or even before saw like 50min video from some AI tuber(thank you for your service guy) like most of people no judgement...just sayin thumnails like someone saw burning bush or catlike humanoid smashing smartphone seeing TIK-TOK
now.. Hallucination Is All You Need .. To Get Rid Of.
first question and chat gpt failed, i was like WTF man, why the fck then i am paying subscription
I love how I keep predicting the dates exactly, yet nobody notices...
Remember this comment?
🤖 👁️ 🍓 Remember, remember the 12th of September,
The Strawberry, Reason, and Mind.
Orion’s path, through logic’s math,
Shall soon its breakthroughs find.
The Cosmic Glitch, Mrigasira Nakshatra, holds the Clue for You. 🙏
Go green and give up chatgpt. It uses 17000 household power usage.
This model and 4o have the same problem, both of them can't solve the math problem correctly. Ideas may be good and can be used as a reference, but he made mistakes in the calculations in very simple places .Don't know why
They are not that good if youre doing hard stuff.
Sorry but o1 sucks major donkey balls.... it is dumb as dirt.... i couldnt use it anymore after like 5 min... I dont give it "Ai tests".... I just use it like I want to for what I need and it is worse than 4o and it is much worse than Meta Ai in many ways, basically unusable right now, terrible release.... do they even test this crap before launching it?
Atleast SOMEONE in this comment section is honest!
Really? In the few test I ran in this video, it beat GPT by a mile. This is designed for math and complex reasoning and coding, not much else. If you know of a model that can keep up with my results in those categories, I’ll test it.GPT doesn’t even come close in solving those or giving me usable code at this level
It hardly ‘incredible’ … why would anyone ask a multiple choice math question other than someone taking a SAT.
I asked perplexity the same ‘chicken/egg’ question and just asked it to explain its answer and I got the same answer in a second.
I wish you AI bloggers would stop being so ‘excited’ about almost nothing. Yeah, sure LLMs are useful for some things, but so far their rate of advancement is nowhere near the level of constant hype.
Do better. Benchmarks are useless, actual useful use cases are needed, these are the only things that count!
youre wife let you sneak back and do a video?
It took a lot of convincing lol
It's "incredible". Really??
A model that went from 13% score to 84%? I think that is a justified word to describe it