I haven't seen code since 1989...not a joke. Yesterday o.1 mini and I made 2 python games over the course of a few hour with the AI explaining to me how to write and implement the code as well as doing it itself....it is unreal how well it can think ahead....I asked a question and the AI said before we get into that we need to go over these other concepts so that when you learn this you will be ready for what we will be doing later...that is insane. I am beyond impressed just how evolved the AIs are already.
I have to say have just been absolutely enjoying your cast with this so much because it has to be the first time in the history that something has taught us a novel edification and it isn't human. Observing this journey has been just absolutely fantastic. Thank you for posting this series.
Coincidental surprisingly human-like on this aspect. Even I would start to wonder if the second question is related to the first one in a way that I’m not totally understanding if I were in its place, and might make mistakes drawing parallels (that don’t exist) between the two.
@@Appocalypse exactly! For me that is the problem, O1 tries to 'correlate' the two problems and creates hallucinations. As a physicist, I can say that O1 is incredible. I myself have tried to give O1 problems in electromagnetism, particle & nuclear physics, optics etc etc and I got precise solutions.
Your analysis of AI is on another level compared to basic youtubers out there. I hope you get tons more subscribers and view. You are doing real world test as apposed to others just doing generic reasoning assessments which is worth squat. Keep doing what you doing.
Thank you so much, it means a lot to hear people find my videos interesting. I just love to learn and am happy to finally put all those skills from the classroom to some use! Stay tuned for more!
When you got to the second problem, you needed to start with a fresh session. You kept the history of the first problem, which included you providing an incorrect answer / faulty reasoning. The way these models work is they can be less accurate going forward when this happens
Exactly what I had said last video. It's so important! Even if having not provided an incorrect solution it still helps to start from a new session when dealing with a different problem and not actually needing to refer to its previous outputs.
Your video got recommended, I've been following AI since GPT 3, and your video is especially entertaining to put the AI to test by someone who has a good background. Please keep posting vids, I think you hit a niche market here. One of your video also got posted in the singularity subreddit that has millions of members. Keep us updated!
Thank you Kyle. This is really important and interesting independent work that you are doing. I would just add that since watching your videos about o1, I’ve also seen the model can help and guide real research problems in optics.
This preview model is actually meant to get things wrong some of the time, i heard this from one of the computer nerds who follows AI closely, apparently they adjust the temperature of the model and other factors such that it makes some mistakes and thus provides valuable data from these mistakes in order to teach later models how not to make mistakes.. So basically learning from its own mistakes.I know this sounds counter intuitive like why make it do that if it can be set to not make mistakes? but apparently it can be trained further by learning from these new mistakes. So when this o1 preview model gets things wrong, don't assume it's a fundamental limitation with the technology that OpenAI do not know yet how to overcome...
You can always ask it if it has a reference for something it states. It will give you a reference, or say that it didn't find it anywhere, so it just concluded it some other way.
Very interesting. I've found you really have to check its work carefully, it can do crazy things, like claiming it's cancelling a factor from the numerator and denominator of a fraction, but leaving behind the factor in the denominator.
@@osman01003 They claim that it is to maintain a competitive advantage, since others cant train their own AI on the reasoning steps that o1 creates. The answer was right at the end though, maybe they think thats good enough? I dunno
Especially since it was provided an example of incorrect reasoning by him in the past message he had sent which is then kept in its context (it remembers it with new messages without a new chat window)
@@the_gobbo It's the in-between steps that enable it to be good at good and unique projects. Simply providing the question and answer for it to learn the mapping from has been found to not be good enough. We can see this from the performance improvement in this model and in o1-mini over 4o or 4o-mini respectively, which are the same architectures as far as we're aware, just without this extra layer of finetuning on reasoning.
You're gonna get a bunch more views if you do the same experiments on the full version of 01 that comes out next month. Same goes for the next Orion/GPT-5 model in December.
i feel like it doesn't matter if the "chain of reasoning" is wrong because OpenAI just shows a summary and not the actual chain of thought and that might contain hallucinations. what actually matters is the result and the actual hidden COT
Two things. OpenAI are hiding some of the chain of thought processes for security, and it may be pulling learned experience, or safe assumptions, in ways that are not known. Just guessing, but highly interesting.
An important thing to keep in mind is acting confident doesn't necessarily mean being correct. AI is getting better at giving correct answers over time, but it was already acting overconfident pretty much all the way back to the earliest versions. It's good you went and double-checked it; but the initial observation that "it seems pretty confident" should've not been part of your decision-making process.
Kyle I don't understand, why didn't you ask o1 where it got that fact, and if it can derive it. Also you could have stated you want it from first principles
I constantly catch myself doing this when using ai. I'll have some complaint with the answer and then remember that you can give your complaint to the AI rather than the nearest innocent bystander. It almost always addresses the complaint in a reasonable way.
@@RoyMagnuson Think of it like this...would you ask a brilliant scientist or philosopher with disheveled hair to comb it? Would it take much effort for them to do so? Does seeing their disheveled hair give you some insight into their personality?
@@semidemiurge I am coming at this from the position of a college professor - which I am - and looking at a brilliant recent graduate - which I teach - and saying exactly what I would say to them. If that offends, I apologize, but I really don’t understand.
@@RoyMagnuson I'm being playful, certainly no offense taken. I appreciate seeing the real person over seeing the 'masked' social persona. I wish more people were less concerned with appearance and more focussed on substance...like Kyle.
Kyle I love your take on this AI revolution. I think we can all sense how monument this is for the future of our civilisation and how society will work, Since you've been in academia to a high level, I would love your take on how you think this will disrupt the education sector. How are students going to get assessed if this can answer questions in seconds, what's the point of paying for an education if you can get a graduate level book and this is able to give you answers, and it's only getting better. what's the point of solution manuals when you can use this to answer questions, I bet people will start posting answers themselves. These are tip of the iceberg questions of what's to come. I'm actually terrified and excited.
This is a really good question that I can’t do justice in the comments. I do think it will revolutionize education, but how that will be implemented and look like, I don’t know. I think that one interesting ideas is to have students effectively having a 1 on 1 teacher with them and perhaps school will be entirely exam based, closed book, no internet access etc, because I don’t personally think that we can trust students to not cheat with these tools on assignments.
As with the second example where it skipped a step, I've had similar issues when asking the o1 series to write proofs. Sometimes it'll handwave away a critical step, but it'll be buried in the middle. So they seem still not great at recognizing whether they've actually solved the problem in its entirety.
@@EnigmaticEncounters420 I always start a new chat when giving a new problem. I think the models are just too eager to write down a proof, even if it's not complete.
The bigger issue is that the model that summarizes the CoT may not recognize that some small “thoughts” were in fact key pieces and therefore wrongly omits them in the summary (ie actual response). So it’s possible that the CoT had a perfectly valid explanation for the hand-waviness that the summarizer produced. In my experience, asking it to explain that missing connection generally leads to a positive result.
It may not have simply asserted that term within its chain of thought (the full chain of thought which we don't get to see might have contained the working out)
This is how encryption will be solved. Encryption algorithms were made by humam PhDs. We don't know a way of cracking them to date. AI will find a novel solution to breaking them whether it's solving it or finding vulnerabilities.
@@KyleKabasares_PhD it would break the world. Because everything runs on encryption. This is why the US dod the Chinese army and the Russian defense department have taken over AI companies.
It may be worth it to critique it with likes/dislikes and even providing your corrected solutions so it can gain that reasoning in future versions that use o1 chats as training data! Probably won't help that much, but after 10 epochs of training who knows what phenomena it will grok? Great testing videos!
@@KyleKabasares_PhD and I like your channel even I am from science background and I failed In class 12 in chemistry but my curiosity still there, I believe knowledge is important so I have habit learning lots of things wether scientific or non scientific I am not good at visualization that why I watch youtube video and chatgpt to understand things in easy way and people like you are big bang to me means you have lots of knowledge in science and a point where I can derive lots of things
The reasoning can be Alien-like in nature tho that's what I found when I gave it the hardest problems I could make up. Like yea got the answer correct but why take such a weird approach, like that's not the way a human would go about it but ok.
@@NextGenart99the reasoning is strange because we don't have a very good comparison. It's a system that knows almost everything we know, but is profoundly unintelligent and with very limited ability to create new solutions. So to solve a novel-ish problem it needs to create a spare parts bridge from solutions it does know about, and test it until it works. Which is cool, but won't result in the clean solutions of a brilliant, intuitive mathematician, because it's a kludge.
@@sCiphre I think they might have trained it with a reinforcement learning approach, it reasons with the same alien-like thinking we see in systems like deep mind Alphago.
@@NextGenart99 that too, but alphago finds GOOD novel ideas, and o1 in contrast comes up with some really crap proofs. Edit: who knows, maybe they're good reasoning and we just don't understand it yet, just like we had trouble understanding alphago.
"Changing the way we learn things" (5:58) - just wait until this comprehensive knowledge database becomes more symbiotic with us. I've tried learning with chatgpt, it's quite good, the future of learning will be next-level. I keenly await it's schematics for new hardwares/technologies we may have missed.
It def still makes horrible mistakes in my experience. It is an amazing model for an AI but if you're not ready to fact check every step yourself you'll get yourself into trouble if you need to have the right answer
I think its getting annoyed by all the expressions ... So its like ehh id just say something that looks like the method .. what can he do ... He cant disincentivize me in any way ...
This is fascinating work... I appreciate you taking the time to do this. Regarding problem #2, I do agree that it is a bit concerning that, much like a human who is desperate to "complete" the problem at all costs, it ties to "pull a fast one" (as you put it) in hopes that you won't notice. Curious: Were you able to ask it about this specific step and how it derived this value? What sort of responses did you get back? This sort of sneakiness does bring in to question how these models may not be entirely forthcoming about their approach. This is where I start worrying about how models may do something similar in situations that have much larger consequences than answering an astronomy problem from a text book.
Everything usually looks ok at the first sight, but the answers need to be carefully crosschecked to see the errors - version 4o still makes large number of mistakes: I would not use it for anything rather than entertaiment. I think one should also try using more advanced problems, those that are not easy even with the available knowledge.
I haven't seen code since 1989...not a joke. Yesterday o.1 mini and I made 2 python games over the course of a few hour with the AI explaining to me how to write and implement the code as well as doing it itself....it is unreal how well it can think ahead....I asked a question and the AI said before we get into that we need to go over these other concepts so that when you learn this you will be ready for what we will be doing later...that is insane. I am beyond impressed just how evolved the AIs are already.
Thank you for sharing your experience
I have to say have just been absolutely enjoying your cast with this so much because it has to be the first time in the history that something has taught us a novel edification and it isn't human. Observing this journey has been just absolutely fantastic. Thank you for posting this series.
I appreciate you watching! It has been a fun series to create
Start a new session for each new problem otherwise o1 will produces wrong answers.
Or at least tell it to start fresh - new problem
Coincidental surprisingly human-like on this aspect. Even I would start to wonder if the second question is related to the first one in a way that I’m not totally understanding if I were in its place, and might make mistakes drawing parallels (that don’t exist) between the two.
@@Appocalypse exactly! For me that is the problem, O1 tries to 'correlate' the two problems and creates hallucinations.
As a physicist, I can say that O1 is incredible. I myself have tried to give O1 problems in electromagnetism, particle & nuclear physics, optics etc etc and I got precise solutions.
@@Matt97554Wow
Thanks for the suggestion!
Your analysis of AI is on another level compared to basic youtubers out there. I hope you get tons more subscribers and view. You are doing real world test as apposed to others just doing generic reasoning assessments which is worth squat. Keep doing what you doing.
I totally agree, whilst I admit I don't really follow the calculations, I am keen to see them being done by someone competent.
Thank you so much, it means a lot to hear people find my videos interesting. I just love to learn and am happy to finally put all those skills from the classroom to some use! Stay tuned for more!
@@StuartWaddell-jg1wcI am glad the calculations don’t necessarily detract from the point of the video!
When you got to the second problem, you needed to start with a fresh session. You kept the history of the first problem, which included you providing an incorrect answer / faulty reasoning. The way these models work is they can be less accurate going forward when this happens
Exactly what I had said last video. It's so important!
Even if having not provided an incorrect solution it still helps to start from a new session when dealing with a different problem and not actually needing to refer to its previous outputs.
Your video got recommended, I've been following AI since GPT 3, and your video is especially entertaining to put the AI to test by someone who has a good background. Please keep posting vids, I think you hit a niche market here. One of your video also got posted in the singularity subreddit that has millions of members. Keep us updated!
Awesome, thank you for watching! I hope to do more tests that satisfy my curiosity in the future!
It's kind of cool that the model helped you learn something new through your disagreement lol. It's kind of crazy that we can be so confidently wrong.
Haha yes, I agree, I treat it as a powerful learning tool
I enjoyed this shorter, more concise format. I watched you for 5 hours and enjoyed the process but can't do that often.
Glad you liked this format, I’m still experimenting
New problem new chat is proven to be more effective
Thank you Kyle. This is really important and interesting independent work that you are doing. I would just add that since watching your videos about o1, I’ve also seen the model can help and guide real research problems in optics.
Thank you for watching, I appreciate it! That’s so cool it can help with novel problems in optics!
Imagine what an o2 or o3 model might be able to do if it had access to large volumes of research data and a supercomputer to run its own simulations.
Incredible things, potentially horrifying things.
This preview model is actually meant to get things wrong some of the time, i heard this from one of the computer nerds who follows AI closely, apparently they adjust the temperature of the model and other factors such that it makes some mistakes and thus provides valuable data from these mistakes in order to teach later models how not to make mistakes.. So basically learning from its own mistakes.I know this sounds counter intuitive like why make it do that if it can be set to not make mistakes? but apparently it can be trained further by learning from these new mistakes. So when this o1 preview model gets things wrong, don't assume it's a fundamental limitation with the technology that OpenAI do not know yet how to overcome...
Yes, the model that trains should make those mistakes so that the trained model knows what mistakes are, how to detect and solve them.
Whoa that’s kind of wild
You can always ask it if it has a reference for something it states. It will give you a reference, or say that it didn't find it anywhere, so it just concluded it some other way.
Very interesting. I've found you really have to check its work carefully, it can do crazy things, like claiming it's cancelling a factor from the numerator and denominator of a fraction, but leaving behind the factor in the denominator.
since openAi hides the reasoning behind, it could have actually calculated it, but failed to put it on the summary/answer at the end, great video!
What good is it then if you cannot follow the reasoning?
@@osman01003 They claim that it is to maintain a competitive advantage, since others cant train their own AI on the reasoning steps that o1 creates. The answer was right at the end though, maybe they think thats good enough? I dunno
Especially since it was provided an example of incorrect reasoning by him in the past message he had sent which is then kept in its context (it remembers it with new messages without a new chat window)
@@the_gobbo It's the in-between steps that enable it to be good at good and unique projects. Simply providing the question and answer for it to learn the mapping from has been found to not be good enough. We can see this from the performance improvement in this model and in o1-mini over 4o or 4o-mini respectively, which are the same architectures as far as we're aware, just without this extra layer of finetuning on reasoning.
Thanks for sharing!
You're gonna get a bunch more views if you do the same experiments on the full version of 01 that comes out next month. Same goes for the next Orion/GPT-5 model in December.
Hopefully will figure out how to balance filming, editing, and fact checking by then 😅
i feel like it doesn't matter if the "chain of reasoning" is wrong because OpenAI just shows a summary and not the actual chain of thought and that might contain hallucinations.
what actually matters is the result and the actual hidden COT
This version is evolving . It has not mastered learn to learn but its at pre teen level
Two things. OpenAI are hiding some of the chain of thought processes for security, and it may be pulling learned experience, or safe assumptions, in ways that are not known. Just guessing, but highly interesting.
An important thing to keep in mind is acting confident doesn't necessarily mean being correct. AI is getting better at giving correct answers over time, but it was already acting overconfident pretty much all the way back to the earliest versions. It's good you went and double-checked it; but the initial observation that "it seems pretty confident" should've not been part of your decision-making process.
I agree, I should’ve spent more time checking it in real time but one lives and learns
Kyle I don't understand, why didn't you ask o1 where it got that fact, and if it can derive it. Also you could have stated you want it from first principles
I constantly catch myself doing this when using ai. I'll have some complaint with the answer and then remember that you can give your complaint to the AI rather than the nearest innocent bystander. It almost always addresses the complaint in a reasonable way.
I agree. He should force the AI to explain this step.
@@NemosUA-camwill be doing that soon
Will be doing that soon when I find the time to balance everything at the moment 😅
Also I keep forgetting that I can actually question it
Very nice videos 👏👏👏
Thank you very much
you are doing great work! (close your closet and move the target bags - it creates a very strange balance problem in the frame :) )
Seeing the real life circumstances of such a bright scientist is a feature not a bug. Please understand this and adapt your aesthetic sensibilities.
I mean, if closing a closet door to change the shot is that much of thing then I have questions
@@RoyMagnuson Think of it like this...would you ask a brilliant scientist or philosopher with disheveled hair to comb it? Would it take much effort for them to do so? Does seeing their disheveled hair give you some insight into their personality?
@@semidemiurge I am coming at this from the position of a college professor - which I am - and looking at a brilliant recent graduate - which I teach - and saying exactly what I would say to them. If that offends, I apologize, but I really don’t understand.
@@RoyMagnuson I'm being playful, certainly no offense taken. I appreciate seeing the real person over seeing the 'masked' social persona. I wish more people were less concerned with appearance and more focussed on substance...like Kyle.
Kyle I love your take on this AI revolution. I think we can all sense how monument this is for the future of our civilisation and how society will work, Since you've been in academia to a high level, I would love your take on how you think this will disrupt the education sector. How are students going to get assessed if this can answer questions in seconds, what's the point of paying for an education if you can get a graduate level book and this is able to give you answers, and it's only getting better.
what's the point of solution manuals when you can use this to answer questions, I bet people will start posting answers themselves. These are tip of the iceberg questions of what's to come. I'm actually terrified and excited.
This is a really good question that I can’t do justice in the comments. I do think it will revolutionize education, but how that will be implemented and look like, I don’t know. I think that one interesting ideas is to have students effectively having a 1 on 1 teacher with them and perhaps school will be entirely exam based, closed book, no internet access etc, because I don’t personally think that we can trust students to not cheat with these tools on assignments.
it would be great if from all your video, you would make a summary video of your thought and how to use gpt in productive way
As with the second example where it skipped a step, I've had similar issues when asking the o1 series to write proofs. Sometimes it'll handwave away a critical step, but it'll be buried in the middle. So they seem still not great at recognizing whether they've actually solved the problem in its entirety.
The problem is with 'context length.' You have to use 'new chat' to give the model the ability to refresh it memory essentially.
@@EnigmaticEncounters420 I always start a new chat when giving a new problem. I think the models are just too eager to write down a proof, even if it's not complete.
The bigger issue is that the model that summarizes the CoT may not recognize that some small “thoughts” were in fact key pieces and therefore wrongly omits them in the summary (ie actual response).
So it’s possible that the CoT had a perfectly valid explanation for the hand-waviness that the summarizer produced. In my experience, asking it to explain that missing connection generally leads to a positive result.
Hand-waviness is a good way of describing it
I’m a little confused by the final conclusion of the second problem. Would it suffice as an answer in class by the professor?
Not by my PhD advisor’s standards at least
Been following along. Thanks for doing this and thanks for the summary as well
My pleasure!
It may not have simply asserted that term within its chain of thought (the full chain of thought which we don't get to see might have contained the working out)
This is how encryption will be solved. Encryption algorithms were made by humam PhDs. We don't know a way of cracking them to date. AI will find a novel solution to breaking them whether it's solving it or finding vulnerabilities.
Would be interesting if codebreaking AI became the driving impetus behind the acceleration of the development of quantum cryptography.
That would be kind of scary
@@KyleKabasares_PhD it would break the world. Because everything runs on encryption. This is why the US dod the Chinese army and the Russian defense department have taken over AI companies.
It may be worth it to critique it with likes/dislikes and even providing your corrected solutions so it can gain that reasoning in future versions that use o1 chats as training data! Probably won't help that much, but after 10 epochs of training who knows what phenomena it will grok? Great testing videos!
Kyle, but could you just ask o1 to elaborate from where it got ZP and ask as alternative calculate ZP from the equation it wrote ??
Yes please... ask it to elaborate on ZP
Will check on this at some point, currently in the midst of editing and filming 😅
Please check weather it will do circuit diagram solve or not ?
I did have an idea to give it a circuit question in the future!
@@KyleKabasares_PhD thanks sir
@@KyleKabasares_PhD and I like your channel even I am from science background and I failed In class 12 in chemistry but my curiosity still there, I believe knowledge is important so I have habit learning lots of things wether scientific or non scientific I am not good at visualization that why I watch youtube video and chatgpt to understand things in easy way and people like you are big bang to me means you have lots of knowledge in science and a point where I can derive lots of things
Omg... That's insane!
you did not ask to supply every justification for o1 selected choice , you just asked here is a problem , what is the solution
I really dont understand people who still don't believe this model is intelligent.
me neither
The reasoning can be Alien-like in nature tho that's what I found when I gave it the hardest problems I could make up. Like yea got the answer correct but why take such a weird approach, like that's not the way a human would go about it but ok.
@@NextGenart99the reasoning is strange because we don't have a very good comparison. It's a system that knows almost everything we know, but is profoundly unintelligent and with very limited ability to create new solutions. So to solve a novel-ish problem it needs to create a spare parts bridge from solutions it does know about, and test it until it works. Which is cool, but won't result in the clean solutions of a brilliant, intuitive mathematician, because it's a kludge.
@@sCiphre I think they might have trained it with a reinforcement learning approach, it reasons with the same alien-like thinking we see in systems like deep mind Alphago.
@@NextGenart99 that too, but alphago finds GOOD novel ideas, and o1 in contrast comes up with some really crap proofs. Edit: who knows, maybe they're good reasoning and we just don't understand it yet, just like we had trouble understanding alphago.
I think Instead of calling it( gpt o1 preview ) Continously .
Calling it (gpt o1 P) is better & simpler
"Changing the way we learn things" (5:58) - just wait until this comprehensive knowledge database becomes more symbiotic with us.
I've tried learning with chatgpt, it's quite good, the future of learning will be next-level.
I keenly await it's schematics for new hardwares/technologies we may have missed.
really nice video
It def still makes horrible mistakes in my experience. It is an amazing model for an AI but if you're not ready to fact check every step yourself you'll get yourself into trouble if you need to have the right answer
I think its getting annoyed by all the expressions ... So its like ehh id just say something that looks like the method .. what can he do ... He cant disincentivize me in any way ...
Can you share the links of ChatGPT or at least the problems you tested so viewers like us can check for ourselves?
This is fascinating work... I appreciate you taking the time to do this.
Regarding problem #2, I do agree that it is a bit concerning that, much like a human who is desperate to "complete" the problem at all costs, it ties to "pull a fast one" (as you put it) in hopes that you won't notice.
Curious: Were you able to ask it about this specific step and how it derived this value? What sort of responses did you get back?
This sort of sneakiness does bring in to question how these models may not be entirely forthcoming about their approach. This is where I start worrying about how models may do something similar in situations that have much larger consequences than answering an astronomy problem from a text book.
Hi thank you for watching! To answer your question, no I have not asked it where it got that term yet, will explore some more when I have the time!
Great job dude!
Thank you! Cheers!
Not 1st but still going.
Corbin Roads
So is paying $20 a month for the new version worth it if your in college?
Its worth it even if you work at nasa☠️☠️
Don't you think it's a little discouraging?
In what way?
Everything usually looks ok at the first sight, but the answers need to be carefully crosschecked to see the errors - version 4o still makes large number of mistakes: I would not use it for anything rather than entertaiment. I think one should also try using more advanced problems, those that are not easy even with the available knowledge.
model o1 is admittedly leagues better than 4o
Please give o1 preview some questions which never existed on internet or in any books
Wouldn't you have to have read the entire internet and all the books first to be sure?
He already did, check out the physics problems with o1 that was posted not too long ago.
LLM's have been shown to be able to play decent chess. With a new position you have a new question never seen before.
I’m coming up with some, and looking at my college materials that were never posted online
Get an editor!!!
They can be expensive though 😢
first 🥇 edit: fuck those who were firster than me 😢
1st
FIrst!
if o1 did the whole calculation . Open ai trick might become too perceivable