Can ChatGPT4o or o1-mini Solve This Graduate-Level Physics Problem?

Kyle Kabasares

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лис 2024

КОМЕНТАРІ • 59

@semidemiurge Місяць тому ⁺¹⁶
Once o1 replies with an answer and you either know it is wrong or suspect that it is, the next step should be asking it how it could "test" its answer. Prompt something like this, "I think your answer iw wrong, are there tests you could do to confirm that your answer is correct?"
@KyleKabasares_PhD Місяць тому ⁺⁷
Thanks for the suggestion!
@harshitbhatt3243 Місяць тому ⁺²
That is an awesome suggestion 👏 Never thought about it this way, nobody told me that either...its a great way to fully utilise a reasoning capable model like that...Thanks mate! 😀 👍
@Matt97554 Місяць тому ⁺³
O1 should be able to 'see' the traces in order to interpret them better. I am sure that if he had the possibility to see the scheme, it would have solved the problem correctly.
Even if it took 3 prompts to get the correct result, it is still impressive.
@beijingChef Місяць тому ⁺⁷
I recommend you post complex problem which should cost more than 1 day for human to solve. That's your best value at this moment. Anything less than 30 minutes for human being, there are so many youtubers can do it. I have many complex real world problems, but I don't think it's world knowledge can handle it.
@KyleKabasares_PhD Місяць тому ⁺⁵
I have a few books that have problems that are of that kind: multi part and conceptually challenging. Problem is, I need to do them first myself 😅 might try and get around to doing some of the more doable ones and testing those against o1-preview and o1-mini
@KyleKabasares_PhD Місяць тому ⁺³
Also, I didn’t do theory in graduate school or for my current job, so my PhD physics problem-solving abilities have diminished in the 5+ years since I took graduate coursework, so it’ll be awhile before I’ll be comfortable solving problems of those kind at a quick rate.
@beijingChef Місяць тому
@@KyleKabasares_PhD 😂😂😂
@amanadhav9514 Місяць тому ⁺³
Hey Kyle, I have not taken physics in university but I did take quite a few proof theoretic math courses. I was wondering if you could give chatgpt more mathematical logic problems instead of straight calculations (which we know can be simplified using wolfram to a certain extent). Some real analysis proofs, or measure theory could be a very good test of reasoning and thinking
@kevinsm2039 Місяць тому ⁺⁴
I wonder if could re create Walter whites blue crystal meth formula of 99%
@electronjoe Місяць тому
Loving your channel, this is very educational - fantastic in depth data points for model prompting, behavior, performance across models. Keep up the good work!
I bet these o1 models would do a good job thinking through Fermi Estimates...
@KyleKabasares_PhD Місяць тому
@@electronjoe Thank you so much for watching!
@marcos-123q Місяць тому ⁺¹
That is quite impressive, however, you still need to provide it with the appropriate direction.
@SmokingNoir Місяць тому
The problem with using LLMs for problems like this is that they're not deterministic so there is always a non zero possibility they will return a wrong answer and when they do return a wrong answer it really looks correct. You always need to verify so you either need to know the answer already or have extensive in domain knowledge which really limits their usefulness.
@nickrobinson7096 Місяць тому ⁺³
Similarly to my first comment, we tend to judge this AI by comparing to pre-determined answers and rather unfairly think its no good when it gets the wrong answer (or at the very least, means we cannot whole heartily trust it). But when you think about say an area of novel research where there is not necessarily an exact way to validate our answers, its entirely possible that a PhD candidate or researched could make just the same mistakes as the Ai is making here. Not quite sure what I'm getting at with this comment but there's something in it.
@perorenchino2036 Місяць тому
You are pointing out the equivalanve between Ai AND HUMAN MIND IN PROBLEM SOLVING,
@nickrobinson7096 Місяць тому
@@perorenchino2036 Yes, but I feel I'm trying to say a bit more than that in that how we Judge an AI needs to be contextualised against our own limits.. i.e. Will we say we have arrived at AGI if an AI makes a mistake? What about a simple mistake (but one that an average person could still make, for example a syllogism or riddle). We are holding an AI to a super human standard. Which is great, but I don't think people realise on mass that already machines are now solving problems, far beyond the lay persons own capabilities. Others are criticising AIs because 'it has the problem already in the training'. This seems to be a fallacy though because you can easily give it a novel problem, and it will now think through a series of steps to try and solve. To be honest, I don't care if it gets the answer right at this stage, because there's no guarantee that the human would either and for the most past the AI appears to attempt at least reasonable, if not entirely correct, steps. And yet somehow we still judge this as ... a fail? Would we do the same if it were an undergrad student. or even an average person just giving a hard problem a go?
@Terminator-GPT-101 Місяць тому
@@nickrobinson7096 Yes, there are logical fallacies and conceptual issues that can be associated with the idea that for an AI to be considered as AGI (Artificial General Intelligence), it must have a "perfectionist" mentality, making very few errors or mistakes:
1. Perfectionist Fallacy: This is the expectation that a solution or entity (in this case, AGI) must be flawless or perfect to be acceptable. It sets an unrealistically high standard that might not be necessary for AGI to be effective or considered as general intelligence.
2. False Dilemma/False Dichotomy (Either/Or Fallacy): This fallacy might manifest if the argument implies that AI must either be perfect or it cannot be considered AGI, ignoring the possibility that AGI could be imperfect yet still meet the criteria for general intelligence.
3. Straw Man Fallacy: This could occur if someone misrepresents the requirements for AGI by suggesting it must be perfect, thereby creating an easier position to attack or dismiss. In reality, AGI is generally defined by its ability to understand, learn, and apply intelligence across a wide range of tasks, not by its perfection.
4. Equivocation: This could happen if "perfection" is used ambiguously, conflating different meanings of intelligence or capability. For instance, human intelligence is not perfect and is marked by errors and learning from them, yet it is still considered general intelligence.
5. Nirvana Fallacy: This involves comparing an actual situation to an unrealistic, idealized version. Expecting AGI to function without errors might be an idealized notion, ignoring practical limitations and iterative improvements in AI development.
In discussions about AGI, it's crucial to focus on its ability to perform a wide range of cognitive tasks effectively (which is why it's called "general intelligence" and not "narrow intelligence") rather than holding it to a standard of perfection. That being said, there are AI benchmarks or tests that measure "general intelligence" such as the GAIA benchmark, which in Dr. Alan Thompson's youtube channel ( ua-cam.com/video/JpQA7nB_P6o/v-deo.html ), he tested the o1 model on a level 3 GAIA benchmark problem and o1 managed to solve it. Also, on his website ( lifearchitect.ai/agi ), Dr. Alan Thompson estimated that we are already at 81% AGI as of September 2024.
@elon-69-musk Місяць тому ⁺¹
love watching this videos ❤
@KyleKabasares_PhD Місяць тому
I’m glad!
@dR-bAbAk Місяць тому ⁺¹
Thanks for the interesting videos. I suggest making a video to test these models to calculate this integral: $\int_{0}^{1}\frac{\tanh^{-1}(x\sqrt{2-x^2})}{x} dx$. I calculated the exact value to be $\frac{3\pi^2}{16}. Even Mathematica cannot give the exact answer. Good Luck!
@KyleKabasares_PhD Місяць тому ⁺¹
Thank you for the suggestion!
@dR-bAbAk Місяць тому
@@KyleKabasares_PhD Actually, I guided the o1 model to solve it, and it was able to do so after receiving some hints from me. It would be interesting to see if you could ask it to solve the problem as well, to determine if it has really learned from my guidance.
@villyron4444 Місяць тому ⁺¹
" no please " lmao
@sarpsomer Місяць тому ⁺⁴
Why so much hate comments on a niche channel ?
@Kaurenjalalifarahani Місяць тому
As someone who studies machine learning, idk why people who dont even know what a genetic network is are attacking him talking about ai
@commonwombat-h6r Місяць тому ⁺²
very nice video! But how would you know if the model is bullshitting you on a problem for which you don't have a solution beforehand?
@deter3 Місяць тому
you are helping openai to train the model . And I do not think openai o1 will have much help in the future :
@user-bsksoen2133 Місяць тому ⁺¹
Please solve Riemann hypothesis
@d3mist0clesgee12 Місяць тому
Thanks to LLM'snow I can do graduate level Physis problem, in a few months or yars it will caulate that in 10 seconds or less, lol, that is the future,
@beijingChef Місяць тому ⁺²
In your first video, o1 solved an 1.5 week homework for 2 minutes. So today's challenge is not that promising...
@DanielSeacrest Місяць тому ⁺⁴
Although this is not o1-preview like in other problems, this is o1-mini and GPT-4o.
@beijingChef Місяць тому ⁺¹
@@DanielSeacrest I expect Kyle will use complex problems to test o1. Other youtubers don't have so many complex problems (have to be maths , physics, or economics, etc ) to test.
@NLPprompter Місяць тому ⁺³
these model doesn't know the physical world like we do because they observe physical world through words of internet and their synthetic data, i believe since these models have tendency wants to learn so if they able to do... eh what i it's term (learning from experience).
@NicholsonNeisler-fz3gi Місяць тому
Great clip.
@nickrobinson7096 Місяць тому
Would be interested to see, regardless of it getting it right or wrong, what would happen if after it gives the answer if you say something like 'do you agree with your answer?' you could also try it within a single question like 'after providing your answer please check again'. I have done this once and it can go into a bit of a loop for a few mins, realising its made mistakes and trying again and again. In my simple test it did this for a long time but eventually got a decent answer.
@nickrobinson7096 Місяць тому
Or maybe like... try and alternative way and see if you get the same answer.
@noway8233 Місяць тому
Interesting , so its good but not good enoph
@KasunWijesekara Місяць тому
dude ur reactions are priceless lmao xD
@KyleKabasares_PhD Місяць тому
@@KasunWijesekara LOL I’m glad you think so
@eigenvector123 Місяць тому
Is it just me, or has 4o been significantly nerfed after o1 was released? Sure, 4o wasn’t too good at math, but it wasn’t that bad back then.
@sacnan Місяць тому ⁺²
OpenAI consumes water of about 3 bottles to generate 100 words...
@marwin4348 Місяць тому ⁺⁹
The water they "consume" is still there afterwards.
@sacnan Місяць тому
@@marwin4348 How ?
@evangelion045 Місяць тому ⁺³
Boring as hell. Unsubscribe
@KyleKabasares_PhD Місяць тому ⁺⁸
@@evangelion045 bye ;)
@DeniSaputta Місяць тому ⁺¹
Rude dud