Fact-Checking OpenAI o1-preview on Graduate-Level Astronomy Problems

Kyle Kabasares

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лис 2024

КОМЕНТАРІ • 147

@averylawton5802 Місяць тому ⁺⁹
I haven't seen code since 1989...not a joke. Yesterday o.1 mini and I made 2 python games over the course of a few hour with the AI explaining to me how to write and implement the code as well as doing it itself....it is unreal how well it can think ahead....I asked a question and the AI said before we get into that we need to go over these other concepts so that when you learn this you will be ready for what we will be doing later...that is insane. I am beyond impressed just how evolved the AIs are already.
@KyleKabasares_PhD Місяць тому ⁺¹
Thank you for sharing your experience
@SirajFlorida Місяць тому ⁺⁴
I have to say have just been absolutely enjoying your cast with this so much because it has to be the first time in the history that something has taught us a novel edification and it isn't human. Observing this journey has been just absolutely fantastic. Thank you for posting this series.
@KyleKabasares_PhD Місяць тому
I appreciate you watching! It has been a fun series to create
@Matt97554 Місяць тому ⁺⁵⁸
Start a new session for each new problem otherwise o1 will produces wrong answers.
@NicholsonNeisler-fz3gi Місяць тому ⁺⁹
Or at least tell it to start fresh - new problem
@Appocalypse Місяць тому ⁺⁸
Coincidental surprisingly human-like on this aspect. Even I would start to wonder if the second question is related to the first one in a way that I’m not totally understanding if I were in its place, and might make mistakes drawing parallels (that don’t exist) between the two.
@Matt97554 Місяць тому ⁺¹⁰
@@Appocalypse exactly! For me that is the problem, O1 tries to 'correlate' the two problems and creates hallucinations.
As a physicist, I can say that O1 is incredible. I myself have tried to give O1 problems in electromagnetism, particle & nuclear physics, optics etc etc and I got precise solutions.
@jimpresser3438 Місяць тому
@@Matt97554Wow
@KyleKabasares_PhD Місяць тому ⁺⁴
Thanks for the suggestion!
@lifes_magic_moments Місяць тому ⁺⁹
Your analysis of AI is on another level compared to basic youtubers out there. I hope you get tons more subscribers and view. You are doing real world test as apposed to others just doing generic reasoning assessments which is worth squat. Keep doing what you doing.
@StuartWaddell-jg1wc Місяць тому ⁺¹
I totally agree, whilst I admit I don't really follow the calculations, I am keen to see them being done by someone competent.
@KyleKabasares_PhD Місяць тому ⁺¹
Thank you so much, it means a lot to hear people find my videos interesting. I just love to learn and am happy to finally put all those skills from the classroom to some use! Stay tuned for more!
@KyleKabasares_PhD Місяць тому
@@StuartWaddell-jg1wcI am glad the calculations don’t necessarily detract from the point of the video!
@Bird1502 Місяць тому ⁺¹⁸
When you got to the second problem, you needed to start with a fresh session. You kept the history of the first problem, which included you providing an incorrect answer / faulty reasoning. The way these models work is they can be less accurate going forward when this happens
@lio1234234 Місяць тому ⁺⁴
Exactly what I had said last video. It's so important!
Even if having not provided an incorrect solution it still helps to start from a new session when dealing with a different problem and not actually needing to refer to its previous outputs.
@rptlee Місяць тому ⁺⁵
Your video got recommended, I've been following AI since GPT 3, and your video is especially entertaining to put the AI to test by someone who has a good background. Please keep posting vids, I think you hit a niche market here. One of your video also got posted in the singularity subreddit that has millions of members. Keep us updated!
@KyleKabasares_PhD Місяць тому ⁺¹
Awesome, thank you for watching! I hope to do more tests that satisfy my curiosity in the future!
@EnigmaticEncounters420 Місяць тому ⁺⁸
It's kind of cool that the model helped you learn something new through your disagreement lol. It's kind of crazy that we can be so confidently wrong.
@KyleKabasares_PhD Місяць тому
Haha yes, I agree, I treat it as a powerful learning tool
@trevthea5781 Місяць тому ⁺⁵
I enjoyed this shorter, more concise format. I watched you for 5 hours and enjoyed the process but can't do that often.
@KyleKabasares_PhD Місяць тому ⁺¹
Glad you liked this format, I’m still experimenting
@Pixel165 Місяць тому ⁺⁷
New problem new chat is proven to be more effective
@NemosYouTube Місяць тому ⁺²
Thank you Kyle. This is really important and interesting independent work that you are doing. I would just add that since watching your videos about o1, I’ve also seen the model can help and guide real research problems in optics.
@KyleKabasares_PhD Місяць тому
Thank you for watching, I appreciate it! That’s so cool it can help with novel problems in optics!
@netscrooge Місяць тому ⁺⁶
Imagine what an o2 or o3 model might be able to do if it had access to large volumes of research data and a supercomputer to run its own simulations.
@KyleKabasares_PhD Місяць тому ⁺³
Incredible things, potentially horrifying things.
@hypersonicmonkeybrains3418 Місяць тому ⁺⁸
This preview model is actually meant to get things wrong some of the time, i heard this from one of the computer nerds who follows AI closely, apparently they adjust the temperature of the model and other factors such that it makes some mistakes and thus provides valuable data from these mistakes in order to teach later models how not to make mistakes.. So basically learning from its own mistakes.I know this sounds counter intuitive like why make it do that if it can be set to not make mistakes? but apparently it can be trained further by learning from these new mistakes. So when this o1 preview model gets things wrong, don't assume it's a fundamental limitation with the technology that OpenAI do not know yet how to overcome...
@senetcord6643 Місяць тому
Yes, the model that trains should make those mistakes so that the trained model knows what mistakes are, how to detect and solve them.
@KyleKabasares_PhD Місяць тому
Whoa that’s kind of wild
@ingmarkronfeldt6174 Місяць тому ⁺²
You can always ask it if it has a reference for something it states. It will give you a reference, or say that it didn't find it anywhere, so it just concluded it some other way.
@JohnSmall314 Місяць тому ⁺¹
Very interesting. I've found you really have to check its work carefully, it can do crazy things, like claiming it's cancelling a factor from the numerator and denominator of a fraction, but leaving behind the factor in the denominator.
@the_gobbo Місяць тому ⁺⁹
since openAi hides the reasoning behind, it could have actually calculated it, but failed to put it on the summary/answer at the end, great video!
@osman01003 Місяць тому
What good is it then if you cannot follow the reasoning?
@the_gobbo Місяць тому ⁺⁴
@@osman01003 They claim that it is to maintain a competitive advantage, since others cant train their own AI on the reasoning steps that o1 creates. The answer was right at the end though, maybe they think thats good enough? I dunno
@lio1234234 Місяць тому ⁺¹
Especially since it was provided an example of incorrect reasoning by him in the past message he had sent which is then kept in its context (it remembers it with new messages without a new chat window)
@lio1234234 Місяць тому
@@the_gobbo It's the in-between steps that enable it to be good at good and unique projects. Simply providing the question and answer for it to learn the mapping from has been found to not be good enough. We can see this from the performance improvement in this model and in o1-mini over 4o or 4o-mini respectively, which are the same architectures as far as we're aware, just without this extra layer of finetuning on reasoning.
@KyleKabasares_PhD Місяць тому
Thanks for sharing!
@雪鷹魚英語培訓的領航 Місяць тому ⁺⁶
You're gonna get a bunch more views if you do the same experiments on the full version of 01 that comes out next month. Same goes for the next Orion/GPT-5 model in December.
@KyleKabasares_PhD Місяць тому
Hopefully will figure out how to balance filming, editing, and fact checking by then 😅
@calliped-co5mj Місяць тому ⁺⁵
i feel like it doesn't matter if the "chain of reasoning" is wrong because OpenAI just shows a summary and not the actual chain of thought and that might contain hallucinations.
what actually matters is the result and the actual hidden COT
@user-eg2oe7pv2i Місяць тому
This version is evolving . It has not mastered learn to learn but its at pre teen level
@mradford10 Місяць тому ⁺¹
Two things. OpenAI are hiding some of the chain of thought processes for security, and it may be pulling learned experience, or safe assumptions, in ways that are not known. Just guessing, but highly interesting.
@tiagotiagot Місяць тому ⁺⁴
An important thing to keep in mind is acting confident doesn't necessarily mean being correct. AI is getting better at giving correct answers over time, but it was already acting overconfident pretty much all the way back to the earliest versions. It's good you went and double-checked it; but the initial observation that "it seems pretty confident" should've not been part of your decision-making process.
@KyleKabasares_PhD Місяць тому ⁺⁵
I agree, I should’ve spent more time checking it in real time but one lives and learns
@percy9228 Місяць тому ⁺⁶
Kyle I don't understand, why didn't you ask o1 where it got that fact, and if it can derive it. Also you could have stated you want it from first principles
@duaneeitzen1025 Місяць тому ⁺¹
I constantly catch myself doing this when using ai. I'll have some complaint with the answer and then remember that you can give your complaint to the AI rather than the nearest innocent bystander. It almost always addresses the complaint in a reasonable way.
@NemosYouTube Місяць тому
I agree. He should force the AI to explain this step.
@KyleKabasares_PhD Місяць тому
@@NemosUA-camwill be doing that soon
@KyleKabasares_PhD Місяць тому
Will be doing that soon when I find the time to balance everything at the moment 😅
@KyleKabasares_PhD Місяць тому
Also I keep forgetting that I can actually question it
@gnsdgabriel Місяць тому ⁺⁴
Very nice videos 👏👏👏
@KyleKabasares_PhD Місяць тому ⁺¹
Thank you very much
@RoyMagnuson Місяць тому ⁺¹
you are doing great work! (close your closet and move the target bags - it creates a very strange balance problem in the frame :) )
@semidemiurge Місяць тому ⁺¹
Seeing the real life circumstances of such a bright scientist is a feature not a bug. Please understand this and adapt your aesthetic sensibilities.
@RoyMagnuson Місяць тому
I mean, if closing a closet door to change the shot is that much of thing then I have questions
@semidemiurge Місяць тому
@@RoyMagnuson Think of it like this...would you ask a brilliant scientist or philosopher with disheveled hair to comb it? Would it take much effort for them to do so? Does seeing their disheveled hair give you some insight into their personality?
@RoyMagnuson Місяць тому
@@semidemiurge I am coming at this from the position of a college professor - which I am - and looking at a brilliant recent graduate - which I teach - and saying exactly what I would say to them. If that offends, I apologize, but I really don’t understand.
@semidemiurge Місяць тому ⁺¹
@@RoyMagnuson I'm being playful, certainly no offense taken. I appreciate seeing the real person over seeing the 'masked' social persona. I wish more people were less concerned with appearance and more focussed on substance...like Kyle.
@percy9228 Місяць тому
Kyle I love your take on this AI revolution. I think we can all sense how monument this is for the future of our civilisation and how society will work, Since you've been in academia to a high level, I would love your take on how you think this will disrupt the education sector. How are students going to get assessed if this can answer questions in seconds, what's the point of paying for an education if you can get a graduate level book and this is able to give you answers, and it's only getting better.
what's the point of solution manuals when you can use this to answer questions, I bet people will start posting answers themselves. These are tip of the iceberg questions of what's to come. I'm actually terrified and excited.
@KyleKabasares_PhD Місяць тому
This is a really good question that I can’t do justice in the comments. I do think it will revolutionize education, but how that will be implemented and look like, I don’t know. I think that one interesting ideas is to have students effectively having a 1 on 1 teacher with them and perhaps school will be entirely exam based, closed book, no internet access etc, because I don’t personally think that we can trust students to not cheat with these tools on assignments.
@mohammadtorikh Місяць тому
it would be great if from all your video, you would make a summary video of your thought and how to use gpt in productive way
@blueblimp Місяць тому ⁺²
As with the second example where it skipped a step, I've had similar issues when asking the o1 series to write proofs. Sometimes it'll handwave away a critical step, but it'll be buried in the middle. So they seem still not great at recognizing whether they've actually solved the problem in its entirety.
@EnigmaticEncounters420 Місяць тому
The problem is with 'context length.' You have to use 'new chat' to give the model the ability to refresh it memory essentially.
@blueblimp Місяць тому
@@EnigmaticEncounters420 I always start a new chat when giving a new problem. I think the models are just too eager to write down a proof, even if it's not complete.
@Appocalypse Місяць тому ⁺²
The bigger issue is that the model that summarizes the CoT may not recognize that some small “thoughts” were in fact key pieces and therefore wrongly omits them in the summary (ie actual response).
So it’s possible that the CoT had a perfectly valid explanation for the hand-waviness that the summarizer produced. In my experience, asking it to explain that missing connection generally leads to a positive result.
@KyleKabasares_PhD Місяць тому ⁺¹
Hand-waviness is a good way of describing it
@elsah3339 Місяць тому ⁺²
I’m a little confused by the final conclusion of the second problem. Would it suffice as an answer in class by the professor?
@KyleKabasares_PhD Місяць тому
Not by my PhD advisor’s standards at least
@vincebracken3872 Місяць тому
Been following along. Thanks for doing this and thanks for the summary as well
@KyleKabasares_PhD Місяць тому
My pleasure!
@haileycollet4147 Місяць тому
It may not have simply asserted that term within its chain of thought (the full chain of thought which we don't get to see might have contained the working out)
@tzardelasuerte Місяць тому ⁺¹
This is how encryption will be solved. Encryption algorithms were made by humam PhDs. We don't know a way of cracking them to date. AI will find a novel solution to breaking them whether it's solving it or finding vulnerabilities.
@ThreeChe Місяць тому
Would be interesting if codebreaking AI became the driving impetus behind the acceleration of the development of quantum cryptography.
@KyleKabasares_PhD Місяць тому
That would be kind of scary
@tzardelasuerte Місяць тому
@@KyleKabasares_PhD it would break the world. Because everything runs on encryption. This is why the US dod the Chinese army and the Russian defense department have taken over AI companies.
@OpenSourceAnarchist Місяць тому
It may be worth it to critique it with likes/dislikes and even providing your corrected solutions so it can gain that reasoning in future versions that use o1 chats as training data! Probably won't help that much, but after 10 epochs of training who knows what phenomena it will grok? Great testing videos!
@Maxi-xw1jb Місяць тому ⁺²
Kyle, but could you just ask o1 to elaborate from where it got ZP and ask as alternative calculate ZP from the equation it wrote ??
@trevthea5781 Місяць тому ⁺¹
Yes please... ask it to elaborate on ZP
@KyleKabasares_PhD Місяць тому ⁺²
Will check on this at some point, currently in the midst of editing and filming 😅
@integrateeverything Місяць тому ⁺²
Please check weather it will do circuit diagram solve or not ?
@KyleKabasares_PhD Місяць тому ⁺¹
I did have an idea to give it a circuit question in the future!
@integrateeverything Місяць тому
@@KyleKabasares_PhD thanks sir
@integrateeverything Місяць тому
@@KyleKabasares_PhD and I like your channel even I am from science background and I failed In class 12 in chemistry but my curiosity still there, I believe knowledge is important so I have habit learning lots of things wether scientific or non scientific I am not good at visualization that why I watch youtube video and chatgpt to understand things in easy way and people like you are big bang to me means you have lots of knowledge in science and a point where I can derive lots of things
@nickbobrowski Місяць тому ⁺¹
Omg... That's insane!
@user-eg2oe7pv2i Місяць тому ⁺¹
you did not ask to supply every justification for o1 selected choice , you just asked here is a problem , what is the solution
@duduzilezulu5494 Місяць тому ⁺²
I really dont understand people who still don't believe this model is intelligent.
@kennyphan9612 Місяць тому
me neither
@NextGenart99 Місяць тому
The reasoning can be Alien-like in nature tho that's what I found when I gave it the hardest problems I could make up. Like yea got the answer correct but why take such a weird approach, like that's not the way a human would go about it but ok.
@sCiphre Місяць тому
@@NextGenart99the reasoning is strange because we don't have a very good comparison. It's a system that knows almost everything we know, but is profoundly unintelligent and with very limited ability to create new solutions. So to solve a novel-ish problem it needs to create a spare parts bridge from solutions it does know about, and test it until it works. Which is cool, but won't result in the clean solutions of a brilliant, intuitive mathematician, because it's a kludge.
@NextGenart99 Місяць тому
@@sCiphre I think they might have trained it with a reinforcement learning approach, it reasons with the same alien-like thinking we see in systems like deep mind Alphago.
@sCiphre Місяць тому
@@NextGenart99 that too, but alphago finds GOOD novel ideas, and o1 in contrast comes up with some really crap proofs. Edit: who knows, maybe they're good reasoning and we just don't understand it yet, just like we had trouble understanding alphago.
@AlfarrisiMuammar Місяць тому ⁺³
I think Instead of calling it( gpt o1 preview ) Continously .
Calling it (gpt o1 P) is better & simpler
@computergoboom-dg9co Місяць тому ⁺²
"Changing the way we learn things" (5:58) - just wait until this comprehensive knowledge database becomes more symbiotic with us.
I've tried learning with chatgpt, it's quite good, the future of learning will be next-level.
I keenly await it's schematics for new hardwares/technologies we may have missed.
@johanavril1691 Місяць тому ⁺¹
really nice video
@drhxa Місяць тому ⁺²
It def still makes horrible mistakes in my experience. It is an amazing model for an AI but if you're not ready to fact check every step yourself you'll get yourself into trouble if you need to have the right answer
@parthasarathyvenkatadri Місяць тому ⁺¹
I think its getting annoyed by all the expressions ... So its like ehh id just say something that looks like the method .. what can he do ... He cant disincentivize me in any way ...
@HelloCorbra Місяць тому ⁺³
Can you share the links of ChatGPT or at least the problems you tested so viewers like us can check for ourselves?
@RoryM-m1x Місяць тому
This is fascinating work... I appreciate you taking the time to do this.
Regarding problem #2, I do agree that it is a bit concerning that, much like a human who is desperate to "complete" the problem at all costs, it ties to "pull a fast one" (as you put it) in hopes that you won't notice.
Curious: Were you able to ask it about this specific step and how it derived this value? What sort of responses did you get back?
This sort of sneakiness does bring in to question how these models may not be entirely forthcoming about their approach. This is where I start worrying about how models may do something similar in situations that have much larger consequences than answering an astronomy problem from a text book.
@KyleKabasares_PhD Місяць тому
Hi thank you for watching! To answer your question, no I have not asked it where it got that term yet, will explore some more when I have the time!
@mogiba4180 Місяць тому
Great job dude!
@KyleKabasares_PhD Місяць тому
Thank you! Cheers!
@HybridHumaan Місяць тому
Not 1st but still going.
@LisaDominguez-y5e Місяць тому
Corbin Roads
@bskarpa Місяць тому ⁺²
So is paying $20 a month for the new version worth it if your in college?
@dariusdragomir9414 Місяць тому ⁺¹⁵
Its worth it even if you work at nasa☠️☠️
@Sergio_From_Spain Місяць тому
Don't you think it's a little discouraging?
@KyleKabasares_PhD Місяць тому
In what way?
@ivnisandzic Місяць тому ⁺¹
Everything usually looks ok at the first sight, but the answers need to be carefully crosschecked to see the errors - version 4o still makes large number of mistakes: I would not use it for anything rather than entertaiment. I think one should also try using more advanced problems, those that are not easy even with the available knowledge.
@mr.gullible2506 Місяць тому ⁺³
model o1 is admittedly leagues better than 4o
@nani3209 Місяць тому ⁺²
Please give o1 preview some questions which never existed on internet or in any books
@MrC0MPUT3R Місяць тому ⁺⁴
Wouldn't you have to have read the entire internet and all the books first to be sure?
@elsah3339 Місяць тому ⁺³
He already did, check out the physics problems with o1 that was posted not too long ago.
@Linshark Місяць тому ⁺¹
LLM's have been shown to be able to play decent chess. With a new position you have a new question never seen before.
@KyleKabasares_PhD Місяць тому
I’m coming up with some, and looking at my college materials that were never posted online
@NicholsonNeisler-fz3gi Місяць тому ⁺⁴
Get an editor!!!
@KyleKabasares_PhD Місяць тому ⁺²
They can be expensive though 😢
@elon-69-musk Місяць тому ⁺²
first 🥇 edit: fuck those who were firster than me 😢
@phen-themoogle7651 Місяць тому
1st
@motomanta Місяць тому
FIrst!
@user-eg2oe7pv2i Місяць тому
if o1 did the whole calculation . Open ai trick might become too perceivable