OpenAI o1: Testing on Complex Business Problems and Logic

Поділитися
Вставка
  • Опубліковано 8 лис 2024

КОМЕНТАРІ • 29

  • @andromeda3542
    @andromeda3542 Місяць тому +4

    Thank you very much for your insightful contribution. I can see you are quite well-versed in this field.
    However, as you know, this is merely the first stage, o1, correct? This means, on the one hand, we can expect improvements. These improvements will likely occur in cycles, approximately every six to twelve months. On the other hand, we still haven't reached AGI, artificial general intelligence. What we have with this O-model, if I may put it that way, is certainly a step in the right direction.
    The current goal is to elevate it to a human level through scaling and various techniques. But we are not yet at a fully human level. Once we reach that, we would have AGI, true general artificial intelligence. Within a year, however, we will likely see above-average answers to such questions from the model.
    What you are expecting, and indeed what most people are anticipating, is a kind of superintelligence-level response. But the system is not there yet. Naturally, it would require access to far more data, particularly in this area. Eventually, though, it will in very near future deliver astonishing results, much like AlphaGo did with move 37.
    Move 37 in AlphaGo’s historic match against Lee Sedol represents a profound moment not just in the world of artificial intelligence, but for humanity's understanding of intelligence itself.
    In this game, AlphaGo-a machine-played a move that was so unconventional, so unexpected, that even the grandmaster Lee Sedol, one of the world's greatest Go players, was taken aback. Experts initially thought the move was a mistake, as it defied traditional human logic and strategy. But, as the game progressed, it became clear that this move was not only valid but brilliant, leading to AlphaGo's victory.
    For humanity, Move 37 symbolizes the potential of AI to transcend human intuition and knowledge. It demonstrated that machines could think in ways fundamentally different from us, exploring possibilities that the human mind might overlook or deem improbable.

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому

      Now I feel I know little to nothing
      Thanks for an amazing educational comment!

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 Місяць тому +1

    I have no idea how it works, but from a layman's perspective from what i've heard i think what its doing is using prompts behind the scenes that are designed specifically to tackle complex problems like 'chain of thought' 'agents' and several others, its like its working through a flow chart, trying these techniques and keeping track of the result to see if it conforms to the answer demanded by the prompt" finally after all these behind the scenes prompts it will output its final answer.

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому +1

      That makes sense and you put it in a great way!

  • @MyWatermelonz
    @MyWatermelonz Місяць тому +6

    So, the issue is that this is still an LLM. So if you throw in a prompt that is just a question out of a business textbook with all the business jargon/terminology, it's gonna still throw out the typical MBA answer.
    o1 isn't perfect as you've seen, but like other models prompting is just as important. I find better results with a shorter and more concise prompt with bullet points on what it needs to do/figure out and then tell it to think as if you are like a genius CEO. This helps the CoT use certain words in its thinking process that are higher level and keeps it on track with the more concise bullet points without all the jargon.

    • @adolphgracius9996
      @adolphgracius9996 Місяць тому +4

      Keep in mind that this llm does not have real information about your business. Imagine you go to a friend and ask him about how should you write your wedding vows but without telling him any real information about her

    • @byrnemeister2008
      @byrnemeister2008 Місяць тому +1

      @@adolphgracius9996Good point your absolutely correct. But I have seen marketing pitches that do the same😊

  • @10NightLord
    @10NightLord Місяць тому

    OpenAI's latest model, known as 01, introduces a feature called Chain of Thought (CoT), which allows the AI to think through problems more effectively. It's likely built on something like GPT-4.5 (raw), and they've trained it to reflect on its own reasoning during test-time instead of just relying on pre-training and fine-tuning. This means the model can adapt and improve its thinking on the fly, using patterns learned from users. While we don’t know the full details of their "Strawberry" architecture, it's said that they use reinforcement learning to reward the best reasoning processes. With enough computing power, this model can do some pretty impressive things, and according to Terence Tao, its ability to handle complex math has really improved. but its not AGI and lacks alot regarding some usecases...

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому

      This is the type of thing I read about it is but it doesn’t say really how it works as a neural network, especially if this is not a weights model based on fine tuning
      They need to be careful, open source models are good and getting better if those advance they do so fast… I get why not say exactly how it works but if users don’t really understand they remain skeptical

  • @amdenis
    @amdenis Місяць тому

    Great vid! I wouldn't say out of nowhere, as we were talking about it for months now, as an incremental release before October's next gen full release of Strawberry.

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому

      Agree, but that was through marketing plays to push the hype
      But I am not sure if this is working for them! Also if you go on X the hype they pushed would have made you think the model is something more

  • @humphuk
    @humphuk Місяць тому +1

    I guess I am like many - its impressive that it knows who killed Aunt Agatha - but does it bring value in the business world over the current models? I guess I may be looking in the wrong places - as I find little testing in this space apart from Samer .... and the likes of Ethan Mollick. I guess I just need to try.... I have seen much on the need to prompt these models differently to the GPT series .... more learning to do. Thanks Samer for putting this out there

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому +1

      Thank you for the comment
      I want to give it another try with one problem with much more context and intensive prompting and follow up to see if I judged to fast

  • @ToolmakerOneNewsletter
    @ToolmakerOneNewsletter Місяць тому +3

    You might want to do some research before you make your next "test" video. It might help you align your purpose with your tests as they relate to the stated strengths of the model. Your comments about SORA and voice also show you have not been keeping up with news on these topics (studio contract talks, red team testing, etc.).

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому +2

      Thank you for watching through and the feedback
      Again here I test the model on a non typical case which is always interesting to try, and such testing as in this case as I published a more detailed one with more details prompting I realize the capacity of such model and how it works differently
      For SORA and voice model I keep track when I can but regardless of anything OpenAI over promised and either failed or under delivered till now that is obvious to the market and as they go after another round of funding soon we will all see what goes on
      Apple will for sure support them to support a mediocre apple iPhone and make it look ahead of the competition others might play along
      But my comment isn’t unique and I’d definitely one many others are making

  • @Giovanni2862
    @Giovanni2862 Місяць тому

    Test sito Latex-TikZ - negative!

  • @the_proffesional1713
    @the_proffesional1713 Місяць тому +1

    Tidak mungkin kamu kembarannya Misha Charoudin?

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому

      You think, i checked him out i didn't think so 😂

  • @sunofson0
    @sunofson0 Місяць тому

    don't know, i was expecting something impressive

    • @mAny_oThERSs
      @mAny_oThERSs Місяць тому +3

      Give the model access to information about your product and company and give it internet access and you will be delivered something impressive. Unless specific it wont be able to magically create outlandishly good solutions that solve all of your problems. Some random guy testing the model won't find the same value in it as a company and the model also has the word "preview" in it for a reason...

    • @sunofson0
      @sunofson0 Місяць тому

      @@mAny_oThERSsthe hype behind o1 and the time it takes to “think” to respond raised the bar of expectation without influencing it with company’s data. Millions of people are on the edge for an AI that resolves their problems without sneaking confidential data. This o1 is not impressive.

    • @cepisyies6667
      @cepisyies6667 Місяць тому

      You should ask it to choose 3 immediate actions it thinks most important.

    • @mAny_oThERSs
      @mAny_oThERSs Місяць тому

      @@sunofson0 did you even read anything i said? even just o1 preview is massively impressive from a company standpoint, not for the consumer. right now o1 preview has huge capability, but no flexibility. once it gains flexibility close to 4o like internet access and uploading files, it will also be groundbreaking for the average joe. you are just acting massively impatient and childish if you dont understand that fact and start crying whenever you even just see (not even use) something that doesnt have outlandishly good performance in all aspects. all companies are currently extremly thirsty for o1 while you say its not a big deal... and again, it'S a preview for a reason. have you ever googled what the word preview means. it shows you what a model CAN do, not what it does. it shows you what it does once its not a preview anymore and thats also the point in time where its meant to be used by you and where it will be massively better than everything else.

    • @GiveMeTheMic22
      @GiveMeTheMic22  Місяць тому

      Will try test further based on others suggested, again I think over hype marketing game is a dangerous one they plays badly