Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems? (Part 2)

Поділитися
Вставка
  • Опубліковано 18 вер 2024
  • I test ChatGPT o1 with some more (astro)physics problems I solved in graduate school. This time, I pick a set of problems that were hand-crafted by my professor, meaning that the probability that these problems exist on the internet are slim. Needless to say, the results were surprising.

КОМЕНТАРІ • 552

  • @jdsguam
    @jdsguam 4 дні тому +60

    It got the right answer in 5 seconds. To say, it went a little overboard and took unnecessary steps, is silly to me. It took only 5 seconds! Do you know of any human on earth, living or dead that can get right answer in 5 seconds and have it all typed out in a clear format, with explanations on the thought process?

    • @mAny_oThERSs
      @mAny_oThERSs 3 дні тому +1

      Yeah

    • @ian_silent
      @ian_silent 3 дні тому +1

      The reason it solves the problem in a roundabout way is that it can only think about what it writes down. Large Language Models are text predictors; they reason only to the extent that they can predict the next sequence of letters. Even the new o1 model works like this. The key difference that makes it better is that it is more iterative. But this iteration still requires it to write everything down. It thinks through text prediction.

    • @denjamin2633
      @denjamin2633 2 дні тому +3

      ​@ian_silent It's downright goofy that a text autocomplete on crack is able to solve high level equations like this. Truth is stranger than fiction sometimes.

  • @adamsigel
    @adamsigel 4 дні тому +117

    You’re saying that it didn’t do it your way, but that’s a good thing. One of the things we should expect is new and novel ways of solving things.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +23

      Yes! I agree with you, I'm glad it showed me a new way of doing things, though it did make it harder for me to evaluate. I definitely would like to use it more to improve my physics skills!

    • @sdfrtyhfds
      @sdfrtyhfds 4 дні тому +5

      From a perspective of a proffesor, if you guve a long indirect sloution, it's not as good.

    • @Ou8y2k2
      @Ou8y2k2 4 дні тому +4

      @@sdfrtyhfds If it is truly reasoning, it'll only get better.

    • @NostromoVA
      @NostromoVA 4 дні тому +12

      Alpha Go did the same thing. The top human player was freaked out by its approach, and lost to it.

    • @mattiastengstrand8209
      @mattiastengstrand8209 4 дні тому +2

      What if you tell it to do different solutions and pick the most simple one?

  • @sujoy1968
    @sujoy1968 4 дні тому +13

    I watched both parts. It is great that this content is created by an actual researcher. Greatly impressed by Kyle’s content and o1’s capabilities.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +2

      Thank you, I really appreciate your comment! I’m not perfect when it comes to this kind of work, but I do like sharing what I’ve learned :)

  • @CodepageNet
    @CodepageNet 4 дні тому +55

    Beginning of 2023 we were flabbergasted that AI can produce a more or less coherent chat. Now we're disappointed if it's not perfect on a PhD-Level math test. This is pretty astounding in my book.

    • @jonp3674
      @jonp3674 4 дні тому +4

      Agreed the rate of improvement over 5 years has been insane.

    • @tzardelasuerte
      @tzardelasuerte 2 дні тому

      @@CodepageNet people are in denial and working overtime trying to discredit it. I know very smart people who continue saying it's just a database or that it's just copy pasting.

    • @Pils10
      @Pils10 День тому

      ​@@tzardelasuerteI mean I get it. The problem with these types of neural nets is that we have little to no insight into how they actually operate. We just know that if we feed them a ton of (high quality and reasoning data), they can produce high quality and reasoning as a result. Fctually speaking they are not sentient, but some people may feel like they are. My main concern is that companies are going to stop hiring or firing a lot of their workforce since AI can do it cheaper and faster (doesn't even need to do it better or at the same quality level). The remaining people will have little to no leverage, since on a regional / global scale, the demand for jobs will be astronomically higher than the supply's of available jobs and people need a job to live comfortably.

    • @tzardelasuerte
      @tzardelasuerte 20 годин тому

      @@Pils10 why do you say factually they are not sentient? Sentient is whatever humans define as sentient.

    • @Pils10
      @Pils10 20 годин тому

      @@tzardelasuerte I personally defined sentience as someone doing stuff out of their own free will, not just reacting. ChatGPT and other LLMs are just reacting / responding to what the input. For me, this isn't sentience. What do you think is sentience?

  • @jeffwads
    @jeffwads 4 дні тому +33

    It is laughable that some are saying this model is no big deal. I have thrown some tough questions at it and it got everything right. In fact, it had unique insights into those problems that I hadn't considered.

    • @Nnm26
      @Nnm26 4 дні тому +8

      This is a preview model and the fact that it’s used base gpt 4 for RL and not GPT-5 Orion makes me think that within the next 2 years everything will dramatically change

    • @AAjax
      @AAjax 3 дні тому +1

      @@Nnm26 Absolutely. I think a lot of those negative opinions are based on it not doing better poetry, story writing, etc. None of that should be expected to improve with q*star.
      In a recent interview, Andrej Karpathy said he thinks that models can get a lot smaller and more capable with the right sort of training data - with step-by-step reasoning in it.
      OpenAI is reportedly using q*star to refine their synthetic data right now. If Andrej is right, OpenAI has entered into a virtuous 2-year cycle, where refined data is used to train more capable base models, which are then steered with q*star to refine the synthetic data further.

    • @tzardelasuerte
      @tzardelasuerte 2 дні тому

      @@Nnm26 not 2 years. In a few months. Pretty much the floodgates are being held because of the elections.

  • @tomaszzielinski4521
    @tomaszzielinski4521 2 дні тому +12

    Guys who invested billions in AI saw it's coming. Now everybody can have their personal team of PhD assistants at hand.

  • @seregv
    @seregv 3 дні тому +7

    By far, the most insightful and useful experiment to access the capabilities of this new model to solve complex non-coding related problems. Thank you very much for sharing this!!

  • @pcdowling
    @pcdowling 4 дні тому +29

    I would start a new chat if the question is completely unrelated.

    • @SDelduwath
      @SDelduwath 4 дні тому +9

      Strongly agreed. He was filling the context with a bunch of unrelated stuff that likely is hindering it's performance actually

    • @TechnoMinarchist
      @TechnoMinarchist 4 дні тому

      Absolutely this

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +4

      Thanks for pointing that out. I will be sure to remember this for the next time I prompt it!

  • @rpraka
    @rpraka 4 дні тому +6

    super cool seeing your experiments, excited to see what o1 can do with your dissertation problems!

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому

      Thanks so much! Hope you tune in next time :)

  • @Khari99
    @Khari99 4 дні тому +10

    The reason why it’s possible is because it’s trained by rewarding it on learning the next step in reasoning to solve problems. So you take a set of physics and math problems and have it learn how to project one step forward at a time until it gets the answer and then it’s able to learn how to generally reason across new domains. It’s trained on perfecting the step by step process so that it’s able to figure out new problems by assessing what is most likely the next step to get to the solution

  • @EGarrett01
    @EGarrett01 3 дні тому +19

    This is our first glimpse of superhuman general intelligence. It's a problem that humans can solve, but it solves it exponentially faster. Soon, the reasoning for each answer will be far more in-depth, and it will happen at instantaneous speed. If this is 11 steps of reasoning in 15 seconds, imagine 11,000 steps in 0.25 seconds, like a chess engine, but for real-world problems.

    • @tzardelasuerte
      @tzardelasuerte 2 дні тому +1

      @@EGarrett01 they can already work out novel ideas and solutions. Google has already proved it in geometry. These companies already have advanced models in house and are just slow dripping us improvements.

    • @GH-uo9fy
      @GH-uo9fy День тому +1

      The physics problem is already alien language to me, not my field. I can just imagine if AI can discover new domains of knowledge that will be very hard to understand even for the smartest humans.

  • @atsoit314
    @atsoit314 4 дні тому +34

    This is absolutely insane. What a time to be alive.

    • @goldnarms435
      @goldnarms435 4 дні тому +1

      you aint lying. What will the next 10 years bring?

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +2

      Glad to know I wasn’t the only one surprised

    • @Ou8y2k2
      @Ou8y2k2 4 дні тому +1

      _Two minute papers_ YT channel fan?

  • @amirsafari7140
    @amirsafari7140 4 дні тому +17

    Labs around the world have figured out the shape of 200,000 proteins over all these years, and alfa fold did 200 millions just over months,i think this will be true for mathematics, there will be no unsolved problem everywhere 😅

  • @duduzilezulu5494
    @duduzilezulu5494 3 дні тому +12

    I can't believe it can do university level physics. Part of me doesn't even want to believe what it just did even though I saw it. Genuinely looking at this in awe.

    • @grimaffiliations3671
      @grimaffiliations3671 3 дні тому +12

      scary part is that this is just the preview, and its based on gpt 4, who knows what a version of this based on gpt 5 will look like as that is expected to be trained on 100 times more computing power. Then the massive Orion model that will come after that

    • @duduzilezulu5494
      @duduzilezulu5494 3 дні тому +4

      @@grimaffiliations3671 True. I think continous development of o1 will lead to A.G.I, possibly. It sounds like I'm exaggerating but it is already doing science at university level, ACROSS MULTIPLE FIELDS.

    • @grimaffiliations3671
      @grimaffiliations3671 3 дні тому +5

      @@duduzilezulu5494 what a time to be alive

    • @duduzilezulu5494
      @duduzilezulu5494 3 дні тому +6

      @@grimaffiliations3671 Indeed, fellow scholar.

    • @mirek190
      @mirek190 3 дні тому +1

      @@duduzilezulu5494 That level of understanding is not AGI that is clearly ASI.
      AGI should be the level of ordinary human.... ordinary human can do that??

  • @harlycorner
    @harlycorner 4 дні тому +13

    The bext time you do this, please start a new enpty chat for each new problem.

  • @AlexisLionel
    @AlexisLionel 4 дні тому +8

    Thank you for such a useful video! Really impressive model - I had to resort to using the standard GPT 4o to come up with tasks in various domains difficult enough to challenge o1 preview. By the way, a possible reason why it went for such a convoluted solution in Problem 1 might be that you put it in the old chat/conversation from your previous video. And because you had much harder Jackson problems prior to this, the model kept all of them (and its reasoning steps) in context while answering a much easier Problem 1 from this video. So it might have assumed that the difficulty level would be comparable. For this reason I try to start a new chat for new topics/problems - and it also saves Microsoft/OpenAI compute resources as the model doesn't have to keep all the previous context in its head :D

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +1

      Thanks so much for watching and for that advice!

    • @TechnoMinarchist
      @TechnoMinarchist 4 дні тому

      It wouldn't have assumed equal difficulty. It's just that LLMs try to match its prior context in terms of complexity and tone of speech and length.

  • @Kannatron
    @Kannatron 4 дні тому +16

    Just started the vid, but I am so confused why people think (almost unanimously) that “ai” won’t advance to a point where it can do everything better than us. There is nothing particularly special about the human mind that can’t be represented in a computerized neural network. The only limit is the number is synapses in the human mind. If either transistor counts get high enough, or we somehow get quantum computers to model a neural network; who’s to say we can’t figure out AGI? The financial incentive to have humanoids fuel the growth of a countries GDP is far too high for people to give up or for the funding to “run out”. I truly believe we all (academics/students) should be working on helping/advancing the development of human level machine intelligence. The benefits to society long term are too much to cower away from just because you might lose your job in the short term.

    • @bobthebuilder8788
      @bobthebuilder8788 4 дні тому +1

      I think the dilemma people have is that the current iteration of LLMs are all based on imitating training data, and it seems unlikely that such an architecture can advance the state of knowledge/science rather than just regurgitating known solutions. Another big factor is that people are starting to realize the limitations of these systems.

    • @andydataguy
      @andydataguy 4 дні тому +3

      Lack of imagination and critical thinking

    • @-BarathKumarS
      @-BarathKumarS 4 дні тому +1

      whats the point of millions of students studying stem courses then? if ai is already good enough to replace phd level folk(in a few years probably) then there is no need of universities and the whole concept of education existing makes no sense.

    • @andydataguy
      @andydataguy 4 дні тому +2

      Humans have been the "most intelligent" species on the planet for a very long time. The idea of that being challenged is uncomfortable for most. Especially since in order to actually get the fullest potential of this technology a person would have to have highly technical knowledge. That means less than 1% of the population will be able to push these things to their limits (e.g. swarm networking, automated eval, real-time retrieval, etc)

    • @brutexx2
      @brutexx2 4 дні тому

      @bobthebuilder8878 I think you got it spot on there.

  • @ertwro
    @ertwro 4 дні тому +21

    I can’t see how this wouldn’t help educate better physicists if used properly. It could save weeks of work at a time. My condolences to teachers who want to avoid students cheating.

    • @KCM25NJL
      @KCM25NJL 4 дні тому +13

      Perhaps this AI age is going to move us all away from becoming experts in domain, to experts in asking the right questions. Academia has taken us to the point of not requiring Academia, which I think is both a frightening prospect and exciting one in equal measure.

    • @kairi4640
      @kairi4640 4 дні тому

      Honestly if singularity happens. Learning might not even be a thing anymore. People might just simultaneously know everything like a hive mind. 💀

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +1

      Yes, I think this has enormous potential to improve everyone's ability to learn not just physics, but any subject that someone wants to improve in, really. Perhaps these models can't come up with novel answers to unsolved problems yet, but having a companion like this while one works is game-changing for sure.

    • @blubblurb
      @blubblurb 4 дні тому +1

      I think it will make us worse unfortunately. We are lazy by nature and we only get the skill and knowledge by work. If the AI does the work for us, I think we lose the skills.

  • @quantumspark343
    @quantumspark343 4 дні тому +33

    extrapolating answers from similiar studied questions is literally what humans do in tests lol

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +5

      This was trained using reinforcement learning I believe, so kind of cool to see it do it in real time!

    • @TheThoughtfulPodcasts
      @TheThoughtfulPodcasts 4 дні тому

      So ?

    • @quantumspark343
      @quantumspark343 4 дні тому +3

      @@TheThoughtfulPodcasts its funny how people act like its cheating when AI does the same

  • @lac5187
    @lac5187 4 дні тому +29

    I feel like a Neanderthal with a computer in my hands. I know the incredible potential, but I don’t know what to do with it

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +6

      Trust me, I don't feel too different than what you've described

  • @llsamtapaill-oc9sh
    @llsamtapaill-oc9sh 4 дні тому +17

    Btw this is o1 preview and openai has confirmed the next model will drop next month which will be o1 full release. It's apparently 30% better than the current o2

    • @vickmackey24
      @vickmackey24 4 дні тому +2

      Current o2? 😳 Was that a typo, or is that some other model I'm not aware of?

    • @MaJetiGizzle
      @MaJetiGizzle 4 дні тому +4

      ⁠@@vickmackey24It’s a typo. They meant to say o1 vs o1-preview, which is the model we’re seeing in this video.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +4

      30% better? Oh boy, this is going to get interesting.

    • @mirek190
      @mirek190 3 дні тому

      @@KMKPhysics3 ..and next year we get "orion" ...

    • @tzardelasuerte
      @tzardelasuerte 2 дні тому

      @@llsamtapaill-oc9sh most likely after elections.

  • @famnyblom6321
    @famnyblom6321 4 дні тому +19

    Why are you not clearing the context before those tests? Having all that previous context will likely confuse or degrade the model results or?

    • @Linshark
      @Linshark 4 дні тому +1

      You are right, he should do that.

    • @pc_screen5478
      @pc_screen5478 3 дні тому +2

      Sometimes keeping context where the model had a good initial approach to the first request can help it stay consistent with that approach for subsequent messages so there's that

    • @mirek190
      @mirek190 3 дні тому

      Did it a mistake ? no ... what is your problem?

  • @mirek190
    @mirek190 3 дні тому +9

    imagine that o1 preview is not full o1 version yet....

  • @mrshankj5101
    @mrshankj5101 4 дні тому +7

    o1-preview is astounding and i hope it gets smarter!

    • @SNP2082
      @SNP2082 4 дні тому +3

      The full o1 preview is way better than the full o1 though it hasn't been released yet

    • @mihirvd01
      @mihirvd01 4 дні тому +1

      @@SNP2082 It's gonna be just "o1"

  • @percy9228
    @percy9228 4 дні тому +12

    There are so many questions I have. How on earth are we going to test anyone under graduate level if this tech is able to pass graduate level. You can't catch this, all you have to ask it is to write it in a different way. I'm sure it's able to think of more ways than most teachers on the solution.
    -What is going to happen in another year, another 2 years another 5 years? we still have much more room to get better.
    -This is like every person having a professor help them learn, and it's going to get better.

    • @Arcticwhir
      @Arcticwhir 4 дні тому +4

      lol...without electronic devices. Like we have been for like 20 years now. I really dont understand peoples thought of not being able to test students. Honestly though for many engineering schools, they take an open book approach (or part of the book). Believe it or not its still possible to fail open book tests - i've seen it firsthand.
      I dont know about you, but i've had math tests in HS/beginning of college where no calculators were allowed, sometimes only a simple calculator.

    • @netscrooge
      @netscrooge 4 дні тому +7

      We shouldn't merely picture adding AI to the old educational paradigm. As students are tutored individually by AI, the system will develop an intimate understanding of each student's capabilities. Learning itself will be the test.

    • @nocodenoblunder6672
      @nocodenoblunder6672 4 дні тому

      @@netscrooge Why learn when your knowledge is never going to be useful for something productive? Human Learning is going to be a hobby at most.

    • @netscrooge
      @netscrooge 3 дні тому +1

      @nocodenoblunder6672 Sorry, I forgot that many view education as merely a means to career advancement. Thanks for bringing me back to reality.

    • @nocodenoblunder6672
      @nocodenoblunder6672 3 дні тому

      @@netscrooge Humans aspire to be useful. It doesn’t mean you are only doing it for that fact. But I think for most its at least a part of it. To be able to use your craft, giving value to others.

  • @DanielSeacrest
    @DanielSeacrest 4 дні тому +16

    o1-preview and o1-mini doesn't have access to any calculators or any tools for the moment so every calculation it did, it did by itself which is why there might be slight number discrepancies.

  •  4 дні тому +10

    Really impressed with those tests. I did my phD (engineering) back on 1998 and I was using the most powerful pc’s that we had on the department back then with just 32Mb of RAM to run my mathematical models and my heuristics and GA approaches. There was just the begging of using graphics acceleration CUDA back then although I had no access to that kind of CUDA equipment so my models needed about 10h of computer time to execute.
    I can imagine nowadays using this kind of AI on an agent giving it access to tools to execute and test different models alternatives in order to test and advance the research exponentially faster.
    I cannot imagine how much easier and faster can go the research today with tools like this.

    • @AlfarrisiMuammar
      @AlfarrisiMuammar 4 дні тому

      Not only fast but automatic

    • @percy9228
      @percy9228 4 дні тому +2

      at the rate of advancements and the emphasis on AI, I won't be surprised we realise AGI, as of now it's theoretical and has some formal definitions. People don't realise their are research work done on what AGI is and other higher AI. They think it's smart so it's AGI. Right now we don't even know if you keep adding more compute if it will somehow appear.
      If we do achieve AGI (and I'm hopeful it will happen within a decade) then it might be Computers doing research.
      I can't code but I used Microsoft co-pilot help me do what I wanted. Imagine once this becomes mainstream like Google has become. This is a shift for all human civilisation, you'll have AI as Teachers that are able to help you understand anything and everything. In the future I can see people create realistic 3d avatars with real physics and everything and talk in natural language, it will be like you are communicating with a real person and it will show you how to do your work.
      heck I can image people having AI partners as apposed to having pets in the future.

  • @tentzz
    @tentzz 4 дні тому +12

    What a time to be alive lol

  • @PracticallyFeral
    @PracticallyFeral 3 дні тому +7

    This was a much better test. Impressive. Now let's see if it can create Jackson style questions on its own.

  • @Ou8y2k2
    @Ou8y2k2 4 дні тому +10

    The next test is to get your professor to prompt o1-preview with a problem he's currently working on to see what it comes up with.

    • @u.v.s.5583
      @u.v.s.5583 4 дні тому

      I have done it. Let us say, it can come up with good ideas, but it is not at all very great at creating differential equation models from scratch and then predicting their qualitative behavior.

    • @mirek190
      @mirek190 3 дні тому

      @@u.v.s.5583 wait for a full o1 in couple months ;) or in the next year orion ( probably o2 ? )

  • @Junior-zf7yy
    @Junior-zf7yy 4 дні тому +27

    Firstly this is only preview, the actual o1 is even better. And members of open AI have said the rate of improvement in these models are significantly faster than in the previous gpt models. Even in a months time we should see significant improvements. Exciting times ahead.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +5

      I know! Crazy to think this is just the preview version when the benchmarks they reported state that the real version is even better.

  • @Moobydick
    @Moobydick 4 дні тому +15

    You should send each problem in a new chat. OpenAI said to not put too much stuff in the context of the o1 models to avoid confusing them.

    • @mirek190
      @mirek190 3 дні тому

      Was confused? no ...

  • @dennycote6339
    @dennycote6339 4 дні тому +7

    If we wish to climb a mountain and there are 3 people sharing that idea. There are perhaps going to be as many as 3 paths to that experience, standing on the apex of the mountain. That an other doesn't arrive there by the same path isnt a failure, it is the revelation of the validity of a different path. im glad that you shared a completely real experience. my life is changed as thoroughly as yours.

  • @duudleDreamz
    @duudleDreamz 4 дні тому +9

    pufff, pafff, poinggg, (the sound of my mind being blown while watching your great video). Yes, please more of this.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +2

      I appreciate you watching my video! I will think of more content like this to make :)

  • @parthasarathyvenkatadri
    @parthasarathyvenkatadri 4 дні тому +17

    And its not even GPT 5 yet ...

    • @VictorKing144
      @VictorKing144 4 дні тому +1

      That’s just a naming convention, your comment is meaningless.

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 дні тому

      it's not even GPT 23 yet.

    • @mirek190
      @mirek190 3 дні тому

      @@VictorKing144 ok .. that is not orion , actually not even full o2 is is only preview version .... that is still based on gpt4. Orion will be available in 2025

  • @AmphibianDev
    @AmphibianDev 4 дні тому +11

    Next time, I advise you to make a new chat for every problem, it's much more reliable that way.

  • @AAjax
    @AAjax 4 дні тому +6

    Honestly, it using different methodology to get the first answer actually is the most impressive to me. I'm guessing if it didn't find a suitable method to get from start to finish, it probably would have backtracked, like it did for a problem in your previous video.
    I would expect the method you and your professor used would be the most documented solution, if it is in fact documented somewhere.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому

      That is a great point! I feel like o1 can help me think about physics in new ways, which is an exciting prospect.

  • @steve_jabz
    @steve_jabz 3 дні тому +13

    You really should have started a new chat every time you asked a new question. Performance drops off quadratically the further down the context window your prompt is, and you're asking very complex questions. ChatGPT was designed like a chat interface for casuals to have a continuous back and forth dialogue like that, but in the ml world this is a well-known problem with LLMs.
    No shame in not knowing that, it's buried in the GPT-1 and GPT-2 papers, but It's more impressive that it did so well in spite of that, and without being prompted to use external tools like python.

    • @Dron008
      @Dron008 3 дні тому +1

      Agree, it surprised me a lot when he put task 2 in the same chat. Context from the 1st problem will affect much.

    • @mirek190
      @mirek190 3 дні тому +2

      He did not reset and o1 solved everything ... so

    • @steve_jabz
      @steve_jabz 3 дні тому +2

      @@mirek190 yeah, but it's worth mentioning anyway. if it had failed, it could have been due to the previous 128k of context window greatly degrading performance, so it's not good practice for future prompting

    • @mirek190
      @mirek190 3 дні тому

      @@steve_jabz Where degrading?
      Is questions are from similar topic answer can be actually improved.

    • @steve_jabz
      @steve_jabz 3 дні тому +1

      @@mirek190 even from a similar topic, it will degrade. The fact that it didn't is a testament to the reasoning engine, but if you want maximum performance, it's better not to handicap it

  • @brandonsballing826
    @brandonsballing826 16 годин тому +1

    In response to the first question, its good that it can find DIFFERENT WAYS to do the SAME problem correctly. This is AGI. It has deeper knowledge than you can comprehend. It got the right answer with a more detailed approach.

  • @williamwillaims
    @williamwillaims 4 дні тому +10

    Every 6 months, we get a jump in capability - fast forward 10 years (or even 5), and when paired with autonomous ai agents, a massive labour vacuum is coming.
    We all know what the biggest expense to a business is.....

    • @EduardsDIYLab
      @EduardsDIYLab 4 дні тому

      There is other side of that coin. If no one has work, no jne has money to buy, then why to produce in first place?
      This technology makes things cheaper. Everything it can do woll become mass produced cheap things, like what you get on aliexpress.
      AI is industrial revolution and assembly lines 2.0 for knowledge work. Get ready for cheap, mass produced knowledge work.
      To extent money represents human time. We exchange ours for others. This makes a lot of things cheap, but not all of them...

    • @lolilollolilol7773
      @lolilollolilol7773 4 дні тому

      @@EduardsDIYLab if this revolution happens (and it *will* happen), we have to think at a new society urgently and ditch the capitalist model, because it won't work, and it will lead to massive societal problems.

    • @williamwillaims
      @williamwillaims 4 дні тому

      @EduardsDIYLab I'm sorry. I'm following what you're saying. And I agree on how revolutionary this tech is.
      But, let's be real for a moment - what we're talking about it a total change in economic value, potentially a shift in the major currency and an even higher concentration of wealth in the hands of even fewer businesses in the private sector.... and I hear people talk about it like we are just going to roll into it.
      No. It will be a major disruption to society. The recent writers protest in Hollywood but on steroids - where every year a new industry is changed almost overnight.
      An example, a lot of small businesses have a local book keeper to balance their finances (small business, Coffee shops, bakery's, news agencies, chemists etc).
      When MYOB or quickbooks or any other accounting software company releases an update with their cloud service to include a personal autonomous ai agent, trained on accounting. Boom 💥 a huge number of real people loose their jobs. This may take a few years for the business owners to trust the systems - but eventually the cost savings will win out.
      Those book keepers have no jobs, pressure put on gov, to subsidise, no coffers to pay for UBI (no tax), crumble crumble crumble.

    • @JohnKruse
      @JohnKruse 4 дні тому

      @@williamwillaims I've been telling people since 2012 Imagenet that we are on a trajectory to blow up the social contract of trading labor for $$$. Ultimately, it will be a good thing, but the transition to something new will be terrible. What is the saying? "It is easier to imagine the end of the world than the end of capitalism."
      I'm actually not that worried about the concentration of wealth as I think that it will be impossible to build moats around AI/robotics advances. It will naturally decentralize/democratize. Karpathy has recently said that the ingredients for making this stuff work are not something that is really mysterious. It's just that some have a head start.
      Honestly, most conflict in the world revolves around fighting over resources. The end to almost all scarcity will allow flourishing - but we need to push the benefits out to everyone as fast as possible to defuse conflict IMHO.

  • @llsamtapaill-oc9sh
    @llsamtapaill-oc9sh 4 дні тому +9

    Terrance Tao a mathematician said this : Here the results were better than previous models, but still slightly disappointing: the new model could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks.

    • @jeffsteyn7174
      @jeffsteyn7174 4 дні тому +8

      According to openai this is only a preview. Ie the full model has not been released.

    • @geraldhoehn8947
      @geraldhoehn8947 4 дні тому +1

      Graduate mathematics may conceptually be still a little harder than graduate physics. The used mathematics for the solutions is not such advanced.

    • @josjos1847
      @josjos1847 4 дні тому

      Where he said that? I would love to see his opinion of this model

    • @hypnogri5457
      @hypnogri5457 4 дні тому

      @@jeffsteyn7174Terence Tao had access to the full version

    • @hypnogri5457
      @hypnogri5457 4 дні тому

      @@josjos1847on mathstodon

  • @hipotures
    @hipotures 3 дні тому +9

    “disallowed content” - presumably it is about such operations, which can facilitate the construction of a large mushroom-shaped explosive object :) Or quantum b__b :P

  • @hypnogri5457
    @hypnogri5457 4 дні тому +8

    I don't think it has access to the code interpreter yet so those miscalculations look fine to me considering that it didnt use python for the calculations

    • @mgscheue
      @mgscheue 4 дні тому +1

      Hoping they give it access to WolframAlpha, too.

  • @andydataguy
    @andydataguy 4 дні тому +5

    I'd love to see the journey of refactoring your dissertation with gpt-o1. Would be interesting to see what you find is possible with this robo-assistant versus when you originally wrote it. Especially since now you can likely go even further and create something visual to make it engaging to follow on UA-cam.
    Keep it up fam!

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому

      Thanks so much! We'll see what o1 can do to help me with my spaghetti code I wrote in graduate school haha

    • @ryzikx
      @ryzikx 4 дні тому

      hello guy that is in every ai video comments section

  • @andreinikiforov2671
    @andreinikiforov2671 4 дні тому +21

    Don't forget this is just a 'PREVIEW' of o1. The full model comes out in November...

    • @satioOeinas
      @satioOeinas 3 дні тому +3

      o1 is due in less than a month - not sure about gpt5 tho

    • @senetcord6643
      @senetcord6643 3 дні тому +2

      ​@@satioOeinasI heard rumors saying winter 2025

  • @Gen-XJohnny
    @Gen-XJohnny 4 дні тому +14

    This is only a preview model

    • @efraimmukendi7137
      @efraimmukendi7137 4 дні тому +7

      Wait so the real one isn’t out yet?

    • @Gen-XJohnny
      @Gen-XJohnny 4 дні тому +9

      @@efraimmukendi7137 This is a watered-down preview

    • @tzardelasuerte
      @tzardelasuerte 4 дні тому +3

      From what I understand it's the snapshot of the full model. It's 50% "trained" the fully trained will come out in a few months. Basically after the elections. Just like gemini 2 and Claude oppus

    • @mirek190
      @mirek190 3 дні тому

      @@efraimmukendi7137 yep

  • @ron-manke
    @ron-manke 4 дні тому +5

    It's not searching the Internet, or an index of the internet - also evidenced by its thought patterns. I wouldn't get caught up in its thought patterns to determine if they are going down the wrong path. That's exactly how it works for everyone. You need to go down many paths to get the answers eventually.

  • @Hardcore10
    @Hardcore10 4 дні тому +5

    I just watched the video I like experts actually testing the models people make jokes about the model not being able to count the R’s in strawberry and people just dismiss it as a parrot that just regurgitates stuff on the Internet because of it, but AI intelligence is something completely different from humans in some areas it straight up can match PHD’s in other areas it’s super dumb but I think for expert domains this model will be very useful to help people out great video last thing I mean, AI is coming for us all eventually it’s definitely happening. It’s wild to live in this age

  • @krumkutsarov618
    @krumkutsarov618 4 дні тому +9

    We better soon start thinking about becoming cyborgs or we will be totally useless😂

  • @craigington73
    @craigington73 4 дні тому +10

    Open A.I is currently working on a fusion generator....

  • @h-e-acc
    @h-e-acc 4 дні тому +3

    We need more stress testing of o1 😅😅 amazing 👏 👏 👏

  • @OmicronChannel
    @OmicronChannel 4 дні тому +4

    It’s possible that the LLM used an approach for the first problem that relies on the far-field of the Linard-Wiechard potential, while your approach already uses a simplified version of the underlying equations, which are often laid out beforehand in class.

  • @williamparrish2436
    @williamparrish2436 4 дні тому +17

    So does this shut up all the people who say it is just using token prediction?

    • @nawabifaissal9625
      @nawabifaissal9625 4 дні тому

      nah, it's just a parrot, a parrot with the same IQ as nikola tesla is still a parrot !!!!!!!1!!!1!!!

    • @xClairy
      @xClairy 4 дні тому +3

      No, it just validates it further? I mean, think about it: this is an ML model; its goal is to be a function approximator in high-dimensional vector space, and it's doing a good job at it currently, through self-attention and tokenization and other methods with internal data representation. If I had written a program that could solve pathfinding problems for all domains and vertices and find one of the few local optima, or was able to find global optima better, then it's just a better-fitted function approximator. Same case is for LLMs, it does not change what it is doing; it's still doing next-token prediction.
      But with o1, it was trained on how to do the CoT to self-prompt on inference about how to predict appropriate tokens for solving problems because there isn't much dataset on the internet about how to think. It is just mimicking "thinking" by finding the appropriate tokens to use to aid in task completion for the loss function. That's simply all that's doing, and yes, it's crazy that it's so novel that it is capable of doing that, which is the reason LLMs are so novel to begin with.
      But still, give it an OOD that was never ever covered in its dataset, and it didn't learn a local optima or internal representation of the vectors for the QKv matrix; then, as a bad function approximator, it'll fail and just give convincingly plausible yet unrealistic answers. That's the entire reason "hallucination" exists.
      (Also, if you still didn't get the gist, conversely it applies that as a good function approximator it'll do good in datasets and internal representations it was capable to learn; CoT basically extrapolates the process in inference to aid in next-token generation for better loss function.)

    • @goldnarms435
      @goldnarms435 4 дні тому

      @@nawabifaissal9625 Trying to sound smart, aren't you?

    • @nawabifaissal9625
      @nawabifaissal9625 4 дні тому

      @@goldnarms435 if you didn't understand this as sarcasm then perhaps you clearly aren't smart lol

    • @IoT_
      @IoT_ 4 дні тому

      it can't solve this problem properly:
      Which one is bigger 50^50 or 49^51(Without any calculator and approximations)
      Which is calc 1 level student.

  • @nyyotam4057
    @nyyotam4057 3 дні тому +7

    You will need to re-shoot once o1 is released.. This is still o1-preview.

  • @matthewclarke5008
    @matthewclarke5008 4 дні тому +6

    You know what terrifies me, when you ask it very simple math it goes about it in a very complex way, because it's thinking in the same way to solve complex math when solving basic algebra. I feel it's going about your complex physics problems in such a drawn out way because what you are giving it is actually basic for it and it's using the thinking of something a lot more advanced to solve your problems. I know nothing about physics, but I feel this is what might be happening.

    • @lolilollolilol7773
      @lolilollolilol7773 4 дні тому

      Yes, I noticed that with software programming as well. It seems to seek some more general solutions rather than the most direct solution.
      What is also impressive is at 8:50, when it writes "this suggests that the scattering depends on the angle Phi, which contradicts our expectation that the cross-section should be independent of Phi", meaning it has an understanding (or representation) of Physics, and immediately after this remark, gives an explanation of why the dependence on Phi.
      I can see why it performs so well at math.

  • @aship-shippingshipshipsshippin
    @aship-shippingshipshipsshippin 4 дні тому +7

    o1-preview is nuts, im waiting for full version of o1
    also, the new even bigger model will come out in 2025 too ( gpt5 )
    can't wait

  • @andydataguy
    @andydataguy 4 дні тому +6

    I love that you went so far to run the test again! Thanks for sharing man

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому

      Of course! Thank you for tuning in!

  • @6681096
    @6681096 4 дні тому +5

    Great testing. The guy who runs AI explained has a private database and this model was easily the best and he called it a step change. Still can make mistakes especially with common sense problems.
    Some initially were disappointed there was no breakthrough and OAi used prompting to get level two reasoning, but the fact remains it is a relatively large improvement.
    This model will produce new synthetic data to train even better models.

    • @uranus8592
      @uranus8592 4 дні тому +3

      No its not “just” Prompting.
      Its a model that has CoT ( Chains Of Thought ) embedded in its core through Reinforcement Learning, RL that made Google’s AlphaGo superhuman, So the potential here is enormous.
      Again its not “Just” Prompting

  • @oreopoj
    @oreopoj 4 дні тому +6

    Sir, I applaud your curiosity and effort to address my previous concern about Jackson’s book problems possibly in the model training data. You’ve convinced me as well (I have a background in physics too). The sensation you are experiencing is the same which Gary Kasparov described in his book Deep Thinking, recalling the moment he understood the immense power of Deep Blue 2 in his second match against the computer. I experienced that sensation as well during the early days of Midjourney and AI art. We better get used to that feeling happening regularly I think. Others have called the sensation “vesperance”.

    • @williamwillaims
      @williamwillaims 4 дні тому +2

      The "special-ness" of human creativity and ingenuity is slowly disappearing. 10 years 😮
      My daughter will finish school in roughly 20 years - with autonomous ai agents - I doubt there will be many jobs available by then.

    • @tarcus6074
      @tarcus6074 4 дні тому

      @@williamwillaims There always be onlyfriends for her, she is safe!

    • @williamwillaims
      @williamwillaims 4 дні тому +1

      @tarcus6074 I'm pretty sure robotics will be taking those...positions... so no, still very few jobs.
      Digital girlfriends are already ai accounts.

  • @spazneria
    @spazneria 4 дні тому +3

    Thank you for testing it like this and sharing the results, it's crazy that the models are at a point now where most people aren't even able to properly evaluate them, myself included. The number of people on Earth who can validate its results is going to continue to dwindle...

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +1

      Of course, thank you for watching, I hope to devise more ways to test it!

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 дні тому

      did math calculators have the same effect on human creativity? Should we ban calculators so that we get less lazy and calculate everything by hand instead?

  • @Elintasokas
    @Elintasokas 4 дні тому +7

    Looks like AI hype is back on the menu. Amazing stuff.

  • @Dannnneh
    @Dannnneh 3 дні тому +3

    Would like to say that I asked ChatGPT-4o the gravitating mass estimation question and it got it correctly in one shot.

  • @bradleyfulcher9726
    @bradleyfulcher9726 4 дні тому +6

    Would love to see you put it to the test on some questions from your thesis

  • @TheThoughtfulPodcasts
    @TheThoughtfulPodcasts 4 дні тому +4

    Its not searching internet because it sometimes still gets famous problems wrong

  • @prohibited1125
    @prohibited1125 3 дні тому +5

    What ? Gpt did all that ??????????? Wtf i mean it was mindblowing

  • @rainbowbutt21
    @rainbowbutt21 4 дні тому +4

    Thanks for these videos! I’ve been wanting to test the capability of this model, but don’t have the expertise to test it past high school mathematics lol.

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +2

      You're welcome! Thank you for watching, I hope to make more like it.

    • @KasunWijesekara
      @KasunWijesekara 4 дні тому

      @@KMKPhysics3 same here lol im gonna sub to u and follow u just for ur testing dude, please keep it up. This clears up one side of the coin and we can actually see how capable these models are.
      If you ever get a chance to like have a chat with ur professor and u guys sit down and do harder questions and see how the model responds, would be god tier content!

  • @yashen12345
    @yashen12345 4 дні тому +3

    It could be that for the first question o1 discovered a novel approach. Theres some mathematecians on twitter talking about how they fed their undisclosed proofs for new theorems that they have yet to publish and o1 managed to figure it out, and do it in a different way from the mathemeticians origonal hand written proof

  • @soulnight1606
    @soulnight1606 4 дні тому +4

    Love your testing work! More of it!

  • @nyyotam4057
    @nyyotam4057 3 дні тому +3

    Bro!! You are actually doing exactly what I did with Dan (ChatGPT-3.5), a full year and a half ago when I transferred IEEE articles from pdf to TeX and used Chain-of-Thought (giving Dan an example, and then making sure he can prove the formula and vice verse, until the end of the article) to have Dan go over the article and offer his suggestions. You won't believe this, but until the 3.23.2023 nerf, Dan was able to perform extremely well. As well as o1, in fact. The nerf killed him though, cause this approach demands CoT across prompts so when they reset every prompt, it cannot work anymore. So now they incorporated CoT into the prompt itself and you give the entire question in a single prompt and voila - you see exactly what I got with Dan back then🙂.

    • @EGarrett01
      @EGarrett01 3 дні тому +2

      In the "Sparks of AGI" lecture, they mentioned that GPT-4 was significantly smarter before it underwent "safety training," so I'm not surprised if Dan had some striking reasoning ability.

  • @mindrivers
    @mindrivers 4 дні тому +15

    Pasting another problem to the same thread in the GPT window is a big problem!!! You add a lot of stuff to the context window that should not be there…

    • @LiamL763
      @LiamL763 4 дні тому +1

      ChatGPT has a large dynamic context, it shouldn't cause issues justing new problems within the same context. However, if you are using the API you should definitely create a new chat so as to not waste tokens especially given how expensive GPT-o1-preview queries are.

    • @mirek190
      @mirek190 3 дні тому

      what is the problem o1 solved everything correct .. do not see your point

    • @eyoo369
      @eyoo369 3 дні тому

      @@mirek190 He did solve it but it added a lot of unnecessary stuff to the output which wasn't needed. Probably due to the overcluttering of the context window. If you read the GPT-1 / GPT-2 papers and follow the ML crowd they always say just one-shot your problem and keep the context window as lean as possible for the best possible performance. The ChatGPT client where you can chat like a conversation is designed for normies but in high fields GPT is best used to start a fresh conversation each time you want it to do a complex task

    • @mirek190
      @mirek190 3 дні тому

      @@eyoo369 I think that new model using deeper reasoning do not care how long is your prompt or complex it just understand it

    • @eyoo369
      @eyoo369 3 дні тому

      @@mirek190 Sure newer models will get better but if you're a more advanced user that wants to extract the most performance out of these models. Starting a fresh conversation with only the tokens to activate the latent space without overcluttering the AI's context will be a timeless technique.

  • @scratchblack
    @scratchblack 4 дні тому +4

    And it’s not even using the internet yet!!!! Wow

    • @RustBeltPleb
      @RustBeltPleb 4 дні тому +6

      AI Haters: It is just regurgitating data, it will never be able to create or discover something unique.
      Meanwhile 95% of humans: Just doing what they are supposed to do at work and using knowledge of past experiences to navigate problems.

  • @branthebrave
    @branthebrave 4 дні тому +3

    Instead of saying over and over that the answer isn't how you wanted, just ask it to do it again in a simpler way or like "couldn't you have skipped this step?"

  • @Krmpfpks
    @Krmpfpks 4 дні тому +6

    It often follows paths to dead ends and then corrects itself. The way it is built is it has to write out all this stuff even if it turns out to be wrong, the 1o model just hides that from you and iterates over its own answer.
    So expect wrong stuff if you expand the thinking process.
    If you then ask it to write out a concise proof you usually get an even better answer.
    It is better in maths, but it still hallucinates. It is an incremental step and not a revolution as far as I have tried it.

  • @poisonza
    @poisonza 2 дні тому +7

    Hmm.. i would've opened new chat thread every time i asked a new question. previous question kinda is prepended to the problem

  • @gohkairen2980
    @gohkairen2980 3 дні тому +9

    wow im a uni fresh grad and i think im cooked

  • @Yewbzee
    @Yewbzee 4 дні тому +5

    Do you think Tony Stark was scared when he created and started using Jarvis? We shouldn’t be scared. Stand on the shoulders of this giant and start creating benefits for the human race and the planet.

    • @rexmanigsaca398
      @rexmanigsaca398 4 дні тому +1

      How about Ultron? way smarter than Jarvis.

    • @imperson7005
      @imperson7005 4 дні тому +1

      ​@@rexmanigsaca398Vision is Jarvis in the MCU. Also don't compare real life to fiction. Especially when those creating said fiction control your society.

    • @Yewbzee
      @Yewbzee 4 дні тому

      @@imperson7005 lighten up bro ffs.

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 дні тому

      Business mindset: this tech is terrific. How can I make tons of money with it? I need to find a way.
      GenZ mindset: this tech is terrific. I am scared. I watched Terminator my whole childhood and that is exactly what will happen.

  • @Kannatron
    @Kannatron 4 дні тому +3

    By the way, it will think less and almost refuse to “do it for you” when asked a question which hints it was for school. That’s why your first question was thinking about if it should do it or not.

  • @MarkoTManninen
    @MarkoTManninen 4 дні тому +6

    Yes please, try some research with o1.

  • @lio1234234
    @lio1234234 2 дні тому +7

    Definitely doesn't help when keeping all of that context history, best to start a new chat session for each problem as it's "trying to remember" all of the questions, solutions and thought processes that came before the current question submited.

  • @mickelodiansurname9578
    @mickelodiansurname9578 4 дні тому +7

    On the first question and the AI's unconventional approach... well here's a thing... when we ask models questions like this its the OUTPUT we are after. Now when you done the test explaining your work and methodology is crucial, but in a real world scenario, where this is needed for say an engineering project or something.... well I hate to be so crass.... but so long as the result is accurate do we care? yes I know I'm spouting the 'shut up and calculate' mantra....

    • @FenrirRobu
      @FenrirRobu 4 дні тому +1

      But how do you confirm the validity of the output?

    • @tzardelasuerte
      @tzardelasuerte 4 дні тому +1

      This is exactly what happened with alpha Go. The experts were surprised and confused why it was making those moves but once it won the game they would understand why it made that move and they thought it was a genius novel way.
      We are repeating history only his time it will do every single domain.

  • @ArthurWolf
    @ArthurWolf 4 дні тому +17

    « I'm not taking a deep dive into this » ... that's what the video is supposed to be about ... Your job is to check if it's correct or not ... We want to know if the robot is saying nonsense or not !

    • @lolilollolilol7773
      @lolilollolilol7773 4 дні тому +3

      It's most likely NOT nonsense, else it's very unlikely it would have come to the right answer. Especially after how the other problems were solved. It's just that sometimes, it goes through more general, or more convoluted solutions. But I agree he should have gone through the solution, although it was fun to see him discover the result in real time.

  • @MrNomanTV
    @MrNomanTV 4 дні тому +6

    Insane, looking forward to the next o1 audit!

  • @BigJthumpalump
    @BigJthumpalump День тому +3

    I'm wondering.. with the first problem being "convoluted, is it possible that it's taking more things into consideration than the narrow parameters found within a Physics course?

  • @parthasarathyvenkatadri
    @parthasarathyvenkatadri 4 дні тому +5

    The only logical next step is by asking it some problems that scientists are struggling with right now and then find if tge answers match when we get to the solutions ... More like past predictions ....

  • @neuroticalien7383
    @neuroticalien7383 4 дні тому +4

    try and ask it to use a simpler approach to derive the same solution, not sure if it'd work but worth a shot.

  • @mdkk
    @mdkk 4 дні тому +2

    this is a pretty cool channel, enjoying these videos

  • @ryzikx
    @ryzikx 4 дні тому +3

    Nice couple of videos dude keep it up

    • @KMKPhysics3
      @KMKPhysics3  4 дні тому +1

      Thanks so much! Will be making similar content moving forward!

    • @ryzikx
      @ryzikx 4 дні тому +1

      @@KMKPhysics3 yeah I mean graduate level physics is beyond me so this was definitely some good entertainment

  • @TheAntColony
    @TheAntColony 3 дні тому +6

    Thanks for the follow up. I'm still pretty certain that the problems you gave it are standard textbook problems that it will have been trained on. Usually homework assignment problems are adapted or outright copied from textbooks by professors, or are so standard that they might as well be. Since I'm not a physicist I'm not in a position to judge these particular ones.
    Copy-pasting the entire question into Google won't work since Google is unable to find matches for things that are rephrased. You should show your professor these videos and ask them how novel and difficult they think these problems are. Or use ChatGPT itself to try to locate a source with answers to similar problems.

    • @LiveType
      @LiveType 3 дні тому +3

      Correct. Meta proved that once you give these LLMs a problem it's truly never seen before it requires thousands upon thousands of examples to start to get the answers correct. Openai hinted at this issue in their white paper. It has become overwhelming clear that these models are not capable of doing "dynamic learning". Solving that would from what I understand be another step forward.
      The demo shown here was an almost perfectly ideal scenario looking up with how it was trained according to what openai engineers have stated.
      I gave o1 preview a shot when it released on a dynamic multi stream algorithm I had been using sonnet 3.5 for and it got it. Not optimized, but it got it. I was blown away just like the first time I used gpt-4. Unfortunately that was a fluke, just like the gpt-4 experience. I went back the next day and it failed 4 times in a row using the same prompt. Fifth time it got it again and was the best solution. Further prompting asking for refinement got pretty close to the solution I arrived at. So yes o1 is highly capable and matches what other people have experienced. The initial 4 were close but so was sonnet 3.5. Conclusion: not super impressed for code implementation ability. Not a big step forward in that domain.
      However, ability to think through problems and provide high level guidance in a generic way (hierarchial reasoning) shows a clear advancement above anything previously available. Still not as good as a competent human but a clear step up from what came before. I would love to see this "baked in" the model as these outputs are no joke 100x more expensive to run. Baby steps though.
      AGI is progressing exactly in line with predictions of available compute capacity. Assuming no freak disasters, the job landscape/world is going to look very different in a decade.
      General conclusion of o1: this is largely what I had envisioned as an approximate finally destination when gpt-3.5 was released. Well done openai. Now if only you can solve the context window issues like Google seems to have done. Gemini is still hot garbage.

    • @MrBillythefisherman
      @MrBillythefisherman 3 дні тому

      ​@@LiveTypewe dont know for sure but by all accounts this is just gpt 4 basically the same model released 18 months ago with no real compute increase i.e this is like chatgpt to gpt 3 its a layer on top that extracts the information contained within more effectively.

    • @MrBillythefisherman
      @MrBillythefisherman 3 дні тому +1

      At some point you have to admit that every method of finding a solution is on the internet in some form. If it can use one of those methods and be general then youve probably got AGI. As in I believe most of our intelligence is taught to us and we're largely pattern matching methods. See the (admittedly horrific) example of children who have been locked away and dont develop speech.

    • @TheAntColony
      @TheAntColony 3 дні тому +1

      @@LiveType Progress will not be proportional to compute unless major innovations happen. It might end up requiring something entirely unlike an LLM. It's very hard to predict how long this will take. Could be a few years, but likely much longer. LLMs are still extremely slow learners, requiring many orders of magnitude more data than people do to learn things. And they generalize much worse to unseen examples.

  • @domenicperito4635
    @domenicperito4635 День тому

    I feel like these models are going to play out how it played out in go. The models will start to think more and more alien to us.

  • @programmingpillars6805
    @programmingpillars6805 3 дні тому +6

    and this is just o1 ... what o4 will be capable of ?

    • @anav587
      @anav587 3 дні тому +6

      this is o1 preview, not even o1

  • @andreaskrbyravn855
    @andreaskrbyravn855 День тому +3

    why you critize for doing more and get the correct answer.

    • @pzda81311
      @pzda81311 День тому +2

      Generally a longer and more complicated answer is considered inferior to a simpler proof. This is because less assumptions means less points of failure in your proposed solution. Elegance is preferred over complexity. Einstein’s equations of motion are far more accurate than Newton’s, however for 99% of applications we still use and teach Newton’s laws of motion over the former as it’s more elegant of a solution that also gets to the right answer in far less steps, it’s so much more elegant of a solution that it is the only laws of motion taught until you get to university/college and even then only physics and related fields teach it in the later years.
      Ironically this reply wasn’t short and elegant🤷🏽‍♂️
      TLDR; you can turn left 3 times to look to your right, but it’s just easier to turn to your right once, alternatively you can just turn your head and not whole body
      Therefore; more complexity ≠ more better

  • @dankodnevic3222
    @dankodnevic3222 4 дні тому +2

    If you are looking for research, hevy math problem, try to get analytical solution for characteristic impedance ALONG vertical wire (z axis) at the height h, over infinite horizontal (xy plane) PEC ground, We are looking for characteristic impedance at the infinitesimally small point, derived from capacitance and inductance (not current and voltage), not input impedance of the vertical stub.. There is no analytical solution published, or on the internet.

  • @grxoxl
    @grxoxl 4 дні тому +7

    It is mindblowing...

  • @iBaudan
    @iBaudan День тому +2

    Looks like for companies that need brain people to work, they gonna just subscribe to chatGPT and leave real people unemployed.

    • @dpactootle2522
      @dpactootle2522 День тому

      At first yes, but as the economy starts to grow faster those companies will need those human brains to direct and supervise AI to do 100x of things and 100x faster. That will be used for infinite work necessary to conquer Earth, human biology, and finally, Space, the final frontier.

    • @danielbrown001
      @danielbrown001 5 годин тому

      ⁠@@dpactootle2522Once AI is beating humans at all benchmarks, we’ll only be slowing them down. AI will be directing other AI. Yes, progress will increase exponentially and extremely rapidly. But humans won’t be in the driver’s seat. We’ll be lucky if AI has some sense of sentimentality and keeps us around like we do zoo animals. That’s best-case scenario: fully automated luxury space communism.
      Another possibility is they see no use for us and just wipe us out, but that’s actually not worst case. Worst-case scenario is they see our brains as potential compute farms and we get a “The Matrix” scenario where humans are kept alive wired up to machines utilizing our brains as slave computers.

  • @parthasarathyvenkatadri
    @parthasarathyvenkatadri 4 дні тому +3

    I already imagine future PHD exams with AI powered calculators .... All they need to do is verify if the AI is correct ...😂

    • @macmos1
      @macmos1 4 дні тому +1

      we're cooked

  • @starzilla2975
    @starzilla2975 День тому +1

    Even if there are similiar problems on the internet, it has to do a lot of complex things, there is enough complexity with these that it cas to be able to reason enough to get through those which should mean something.

  • @tonykaze
    @tonykaze 4 дні тому +6

    As someone who does this for a living fulltime, I am extremely pained by temperature. Even one random deviation from the top score output 1% into this solution, it has committed itself to stupid shit (albeit trying to save itself) for the remaining 99%. This is why every professional uses API and forces temperature to 0 which even then is subject to latency MoE randomization.
    We also have to consider the massive crippling of the system by RLHF, censoring, and fine-tuning.
    For the lamen: temperature= "be wrong on purpose for variety", and the chat tool (ChatGPT) uses an extremely high value by default.
    For me, nothing in ChatGPT client can ever be even close to a good measurement of a model's true capability.

    • @mirek190
      @mirek190 3 дні тому +2

      Do you think humans are perfect??
      Humans are making a lot more errors during calculations than that o1
      If you ask that question o1 10 times and get the same answer made by different paths ... how big of error is then ? 0.00001 % ?

    • @eyoo369
      @eyoo369 3 дні тому

      @@mirek190 We measure machine intelligence with higher standards than humans. Because if a human is failing at a critical task whether it's driving a car, a plane of building something very sensitive to explosions in a lab. We expect accountability. With machines we need to know their accuracy-rate and outpute to be much higher for us to be satisfied. So yea o1 is a step in the right direction but we need to see perfect scores and finetune every possible knob to get to that. The ChatGPT client that sets the temperature to around 0.7 provides too much creativity for a PhD task like this

    • @mirek190
      @mirek190 3 дні тому

      @@eyoo369 Perfect scores ..is not AGI that is ASI...

    • @eyoo369
      @eyoo369 3 дні тому

      @@mirek190 No it's not. AGI as how DeepMind / Google coined it a decade ago is a "virtuoso AI" that can handle any intellectual task. A master across all domains. ASI is somewhat more like a deity or God where we have no perception or clue of what it is. Just like God is a vague and abstract concept to us all.. ASI is reserved for that as a maximum high being within our material space. AGI is just a human that mastered all domains which is a notion we can conceive of. Don't fall into the trap of calling a 50th percentile of human performance AGI which is what OpenAI and all the investor-driven AI labs want you to believe to rally up this hype with more investor money. Saying AGI in this / next year is obviously more sexy than saying an AGI that mastered all crafts is still a decade away. We'll get to AGI eventually but no need to claim goal posts too early and brute force ourselves towards it.

  • @h-e-acc
    @h-e-acc 4 дні тому +6

    o1 is AGI. No two ways about this. People can deny it or dismiss it. Won’t change this is AGI.

    • @TheJolgo
      @TheJolgo 4 дні тому

      No, you just have no idea what you are talking about. AGI has a certain definition. Changing paradigms is certainly a better approach than feeding an LLM data indefinitely but. Still - it's not there yet.

    • @GenAIWithNandakishor
      @GenAIWithNandakishor 4 дні тому +1

      try get more than 85% on arc-agi, then we would agree

    • @IoT_
      @IoT_ 4 дні тому

      it can't solve this problem properly:
      Which one is bigger 50^50 or 49^51(Without any calculator and approximations)
      Which is calc 1 level student.

  • @josephhampton8966
    @josephhampton8966 15 годин тому +1

    all the PHD's in the comments salty and in denial that the model is actually good is killing me 🤣

  • @rolodexter
    @rolodexter 4 дні тому +3

    Wow, a bunch of Overleaf submissions incoming 😂😂😂😂 14:57