OpenAI's o1: Has It Surpassed Claude 3.5 Sonnet? Testing with Cursor

Поділитися
Вставка
  • Опубліковано 7 лис 2024

КОМЕНТАРІ • 53

  • @sjkba
    @sjkba Місяць тому +32

    Interesting. o1 is probably not the model far small fixes. Imagine o1 doing the planning and telling claude which individual code snippets to produce. This is so wild!

  • @gabrielsandstedt
    @gabrielsandstedt Місяць тому +2

    Claude 3.5 sonnet is a very well trained model

  • @Björn-w7v
    @Björn-w7v Місяць тому +9

    On 3:32 you wondered that it was inline a bit slower. That's correct, because you used the "Please Fix" prompt not with gpt o1, you used claude 3.5 in that moment.

  • @PatrickSteil
    @PatrickSteil Місяць тому +2

    Thanks for the video! Going to be testing it today as well!

  • @sib431
    @sib431 Місяць тому +3

    Thank you for sharing

  • @57aistreet
    @57aistreet Місяць тому +5

    🎯 Key points for quick navigation:
    00:00:00 *🖥️ Overview of OpenAI's o1 Model Integration with Cursor*
    - Overview of integrating OpenAI's o1 model with Cursor,
    - Explanation of how to add models and API keys in Cursor settings.
    00:01:23 *🛠️ Using the o1 Preview Model for Web Development*
    - Comparison of o1 Preview and o1 Mini models for coding tasks,
    - Limitations of o1 Preview model in streaming responses,
    - Demonstration of generating web page content using Cursor's composer.
    00:02:59 *🧩 Customizing UI Elements and Performance Considerations*
    - Customization of generated web pages and UI components using Cursor,
    - Discussion on the speed and responsiveness of Cursor's integration with OpenAI models,
    - Considerations on cost and performance when using the o1 models for development tasks.
    Made with HARPA AI

  • @crispinrovere
    @crispinrovere Місяць тому +1

    Really interest to see all this leapfrogging.

  • @Sergio-Sanchez-com
    @Sergio-Sanchez-com Місяць тому +3

    I wonder how it performs on more complex tasks. The strong suit should be logic.
    Thanks for the video

  • @GrahamAnderson-z7x
    @GrahamAnderson-z7x Місяць тому +5

    There are probably over 5 billion website boilerplate examples for the model to learn from. I don’t know. When you actually have to build non-boilerplate, it gets complicated, fast. I do use Cursor on a daily basis.

  • @cbgaming08
    @cbgaming08 Місяць тому +2

    Does it surpass Claude 3.5 Sonnet in meeting your specific use cases?

    • @reboundmultimedia
      @reboundmultimedia Місяць тому +3

      Sonnet is all-around a better model for web development. The problem is o1 mini still struggles with syntax errors, NextJS errors, etc. it’s a very small model. If you give it tons of solid, concise instructions though, it has some impressive problem solving capabilities. I think o1 mini will serve well in agentic coders as a “planner” and then LLMs like sonnet can do the coding work

    • @cbgaming08
      @cbgaming08 Місяць тому +2

      @@reboundmultimedia
      Right on the mark! Livebench's benchmarks seem to confirm your conclusions. It's fascinating to see an LLM demonstrate this kind of reasoning. I'm excited for the full release of o1 (non-preview version) and to see how far this new family of OpenAI models can go. Also, I'm looking forward to seeing how Anthropic responds with their upcoming Claude 3.5 Opus. It's an exciting time in AI development!

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +1

      I agree with this! I have been testing a bit further on the weekend and o1-mini does excel at planning. Examples I posted on Twitter here:
      x.com/dev__digest/status/1835116196347146445?s=46&t=6e0Os0xZqOpcsvlzGBwpow

  • @digidope
    @digidope Місяць тому +1

    It was in my cursor without any setup yesterday.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +1

      I saw they launched official initial support shortly after I published this - was glad to see that 🙂

  • @damien2198
    @damien2198 Місяць тому +6

    For me Sonnet is still miles ahead

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +1

      I think for a lot of not most tasks for app and web development it should do. I still want to test some more involved instructions and see how they each perform

  • @w.2550
    @w.2550 Місяць тому +2

    Can I buy $1000 of credits to get access to tier 5 or do I have to spend $1000?

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому

      Great question - I am not sure. I believe cursor now has is giving (I believe 10 o1-mini credits per day for pro subscribers)

  • @lev1ato
    @lev1ato Місяць тому +3

    the problem with Next.js syntax was always interesting to me, in the sense that you can train the model on new data but what if it still has old data/docs inside the model. Does in that case the model know the timeline of information, so it is aware what data is newer? If that makes sense.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +3

      Yes absolutely it makes sense - this can happen for frameworks, libraries and api documentation etc. It might have the knowledge of for example app router but bias towards pages router. So being more explicit will help but having to pass in context for new docs and apis will be something we will have to continue to do

  • @nastastic
    @nastastic Місяць тому +2

    Is cursor free?

  • @juhu3709
    @juhu3709 Місяць тому +1

    What does "stream back" even mean?

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому

      neither o1 or o1 mini support streaming responses (yet)

    • @justtiredthings
      @justtiredthings Місяць тому +1

      the response is delivered all at once in one big chunk

  • @rickandelon9374
    @rickandelon9374 Місяць тому +6

    We have figured out how to build AGI, but we are still limited by computation limits. Once we painstakingly get to AGI, the AGI will work to to make computing better and those will feed each other. 2030 is the best guess for when we wil get AGI. I think O series is a complete architecture for AGI unlike GPT series. The way openai have made tokens to work as reasoning seems interesting. It acts like an internal monologue of human brains. o1 continuosly doubts itself when it "thinks" inside its chain of thought. It proposes alternatives, goes into buts and also sticks to a solution if it sounds sufficiently promising. Overcoming context limits and making this inference time compute indefinite will get us to AGI .Maybe O5 will be AGI.

  • @VaibhavShewale
    @VaibhavShewale Місяць тому +2

    still waiting for my humanoidbot

  • @ToolmakerOneNewsletter
    @ToolmakerOneNewsletter Місяць тому +3

    Or you could just do the logical thing and give it an example of the type of web site you want it to emulate, but we both know there are more dedicated web site building AI platforms better suited to this purpose.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +1

      Fair ! If there are any more involved examples anyone has ideas would like to see just lmk and I can see if I can try and make more involved example

  • @samuelmarndi
    @samuelmarndi Місяць тому +3

    Every time i try other gen ai and give them chance i am always disappointed. That's why i keep comming back to chatgpt. Of course they can answer some basic to small complex things but based on my usage they fail terribly. I tried to use Gemini so but but always disappointed. Claude is better than Gemini imo but it costs and i feel frustrated that it doesn't give enough tokens or rpm etc. the free one is too much limited to use.

  • @vivekbansal3903
    @vivekbansal3903 Місяць тому +2

    bro can you try one thing. A snake game in python with genetic algorithm. Its a good task to evaluate.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +1

      I need some really hard ideas to test… 💡 😬

    • @vivekbansal3903
      @vivekbansal3903 Місяць тому +2

      @@DevelopersDigest ok what about snake game + computer vision + deep learning?
      Snake game just because open ai posted snake game video for o1 preview.

  • @Jh3a7
    @Jh3a7 Місяць тому +2

    meh, didn't see anything special - sonnet 3.5 is already doing all this, if not even better.

  • @avi7278
    @avi7278 Місяць тому +4

    honestly o1 is garbage for what it's supposed to be. I haven't found a single use case where it's worth the time or extra cost to use it... With some extra effort in prompting, Claude performs just as well or better. I really don't get what everyone is so excited about.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому +2

      I have been testing it more on harder coding problems and have found that with very specific direction it can return and solve quite challenging problems. I have been testing with 1000 lines of code that is pretty involved and o1 was the first LLM to solve some issues I had within it. With that said, it took a number of shots where I had to give increasingly specific direction with each turn. I think sonnet 3.5 is still likely going to be the preferred for many + most of day to day coding tasks. I am still testing though!

    • @avi7278
      @avi7278 Місяць тому

      @@DevelopersDigest esp if your codebase is even somewhat clean with established patterns and conventions, you don't need an advanced reasoning model for 99 percent of requests. Claude is more than sufficient. Plus I find that with most requests the advanced reasoning doesn't produce significantly better answers. The increased token output is better but that's also a solved problem with Claude. Aider generates unlimited output with Claude by intelligently stitching together multiple outputs so the end result is the same.

    • @DevelopersDigest
      @DevelopersDigest  Місяць тому

      ​@@avi7278 ty - I haven't had a chance to try aider. how do you like aider compared to cursor / alternatives ?

    • @reboundmultimedia
      @reboundmultimedia Місяць тому

      @@DevelopersDigestI think this due to the size of the model and thus sparsity in the training data. Plus knowledge cut off. I think a more up to date cut off and larger model would crush sonnet