Interesting. o1 is probably not the model far small fixes. Imagine o1 doing the planning and telling claude which individual code snippets to produce. This is so wild!
On 3:32 you wondered that it was inline a bit slower. That's correct, because you used the "Please Fix" prompt not with gpt o1, you used claude 3.5 in that moment.
🎯 Key points for quick navigation: 00:00:00 *🖥️ Overview of OpenAI's o1 Model Integration with Cursor* - Overview of integrating OpenAI's o1 model with Cursor, - Explanation of how to add models and API keys in Cursor settings. 00:01:23 *🛠️ Using the o1 Preview Model for Web Development* - Comparison of o1 Preview and o1 Mini models for coding tasks, - Limitations of o1 Preview model in streaming responses, - Demonstration of generating web page content using Cursor's composer. 00:02:59 *🧩 Customizing UI Elements and Performance Considerations* - Customization of generated web pages and UI components using Cursor, - Discussion on the speed and responsiveness of Cursor's integration with OpenAI models, - Considerations on cost and performance when using the o1 models for development tasks. Made with HARPA AI
There are probably over 5 billion website boilerplate examples for the model to learn from. I don’t know. When you actually have to build non-boilerplate, it gets complicated, fast. I do use Cursor on a daily basis.
Sonnet is all-around a better model for web development. The problem is o1 mini still struggles with syntax errors, NextJS errors, etc. it’s a very small model. If you give it tons of solid, concise instructions though, it has some impressive problem solving capabilities. I think o1 mini will serve well in agentic coders as a “planner” and then LLMs like sonnet can do the coding work
@@reboundmultimedia Right on the mark! Livebench's benchmarks seem to confirm your conclusions. It's fascinating to see an LLM demonstrate this kind of reasoning. I'm excited for the full release of o1 (non-preview version) and to see how far this new family of OpenAI models can go. Also, I'm looking forward to seeing how Anthropic responds with their upcoming Claude 3.5 Opus. It's an exciting time in AI development!
I agree with this! I have been testing a bit further on the weekend and o1-mini does excel at planning. Examples I posted on Twitter here: x.com/dev__digest/status/1835116196347146445?s=46&t=6e0Os0xZqOpcsvlzGBwpow
I think for a lot of not most tasks for app and web development it should do. I still want to test some more involved instructions and see how they each perform
the problem with Next.js syntax was always interesting to me, in the sense that you can train the model on new data but what if it still has old data/docs inside the model. Does in that case the model know the timeline of information, so it is aware what data is newer? If that makes sense.
Yes absolutely it makes sense - this can happen for frameworks, libraries and api documentation etc. It might have the knowledge of for example app router but bias towards pages router. So being more explicit will help but having to pass in context for new docs and apis will be something we will have to continue to do
We have figured out how to build AGI, but we are still limited by computation limits. Once we painstakingly get to AGI, the AGI will work to to make computing better and those will feed each other. 2030 is the best guess for when we wil get AGI. I think O series is a complete architecture for AGI unlike GPT series. The way openai have made tokens to work as reasoning seems interesting. It acts like an internal monologue of human brains. o1 continuosly doubts itself when it "thinks" inside its chain of thought. It proposes alternatives, goes into buts and also sticks to a solution if it sounds sufficiently promising. Overcoming context limits and making this inference time compute indefinite will get us to AGI .Maybe O5 will be AGI.
Or you could just do the logical thing and give it an example of the type of web site you want it to emulate, but we both know there are more dedicated web site building AI platforms better suited to this purpose.
Every time i try other gen ai and give them chance i am always disappointed. That's why i keep comming back to chatgpt. Of course they can answer some basic to small complex things but based on my usage they fail terribly. I tried to use Gemini so but but always disappointed. Claude is better than Gemini imo but it costs and i feel frustrated that it doesn't give enough tokens or rpm etc. the free one is too much limited to use.
honestly o1 is garbage for what it's supposed to be. I haven't found a single use case where it's worth the time or extra cost to use it... With some extra effort in prompting, Claude performs just as well or better. I really don't get what everyone is so excited about.
I have been testing it more on harder coding problems and have found that with very specific direction it can return and solve quite challenging problems. I have been testing with 1000 lines of code that is pretty involved and o1 was the first LLM to solve some issues I had within it. With that said, it took a number of shots where I had to give increasingly specific direction with each turn. I think sonnet 3.5 is still likely going to be the preferred for many + most of day to day coding tasks. I am still testing though!
@@DevelopersDigest esp if your codebase is even somewhat clean with established patterns and conventions, you don't need an advanced reasoning model for 99 percent of requests. Claude is more than sufficient. Plus I find that with most requests the advanced reasoning doesn't produce significantly better answers. The increased token output is better but that's also a solved problem with Claude. Aider generates unlimited output with Claude by intelligently stitching together multiple outputs so the end result is the same.
@@DevelopersDigestI think this due to the size of the model and thus sparsity in the training data. Plus knowledge cut off. I think a more up to date cut off and larger model would crush sonnet
Interesting. o1 is probably not the model far small fixes. Imagine o1 doing the planning and telling claude which individual code snippets to produce. This is so wild!
This is the way...
Love this 💡
I just did that. It works best this way.
o1 mini for coding, preview for structure
Claude 3.5 sonnet is a very well trained model
It is!
On 3:32 you wondered that it was inline a bit slower. That's correct, because you used the "Please Fix" prompt not with gpt o1, you used claude 3.5 in that moment.
Good catch Ty!
Thanks for the video! Going to be testing it today as well!
ty - I am curious what you think!
Thank you for sharing
Thank you for watching
🎯 Key points for quick navigation:
00:00:00 *🖥️ Overview of OpenAI's o1 Model Integration with Cursor*
- Overview of integrating OpenAI's o1 model with Cursor,
- Explanation of how to add models and API keys in Cursor settings.
00:01:23 *🛠️ Using the o1 Preview Model for Web Development*
- Comparison of o1 Preview and o1 Mini models for coding tasks,
- Limitations of o1 Preview model in streaming responses,
- Demonstration of generating web page content using Cursor's composer.
00:02:59 *🧩 Customizing UI Elements and Performance Considerations*
- Customization of generated web pages and UI components using Cursor,
- Discussion on the speed and responsiveness of Cursor's integration with OpenAI models,
- Considerations on cost and performance when using the o1 models for development tasks.
Made with HARPA AI
Thanks
Really interest to see all this leapfrogging.
I wonder how it performs on more complex tasks. The strong suit should be logic.
Thanks for the video
There are probably over 5 billion website boilerplate examples for the model to learn from. I don’t know. When you actually have to build non-boilerplate, it gets complicated, fast. I do use Cursor on a daily basis.
Agree 💯
Does it surpass Claude 3.5 Sonnet in meeting your specific use cases?
Sonnet is all-around a better model for web development. The problem is o1 mini still struggles with syntax errors, NextJS errors, etc. it’s a very small model. If you give it tons of solid, concise instructions though, it has some impressive problem solving capabilities. I think o1 mini will serve well in agentic coders as a “planner” and then LLMs like sonnet can do the coding work
@@reboundmultimedia
Right on the mark! Livebench's benchmarks seem to confirm your conclusions. It's fascinating to see an LLM demonstrate this kind of reasoning. I'm excited for the full release of o1 (non-preview version) and to see how far this new family of OpenAI models can go. Also, I'm looking forward to seeing how Anthropic responds with their upcoming Claude 3.5 Opus. It's an exciting time in AI development!
I agree with this! I have been testing a bit further on the weekend and o1-mini does excel at planning. Examples I posted on Twitter here:
x.com/dev__digest/status/1835116196347146445?s=46&t=6e0Os0xZqOpcsvlzGBwpow
It was in my cursor without any setup yesterday.
I saw they launched official initial support shortly after I published this - was glad to see that 🙂
For me Sonnet is still miles ahead
I think for a lot of not most tasks for app and web development it should do. I still want to test some more involved instructions and see how they each perform
Can I buy $1000 of credits to get access to tier 5 or do I have to spend $1000?
Great question - I am not sure. I believe cursor now has is giving (I believe 10 o1-mini credits per day for pro subscribers)
the problem with Next.js syntax was always interesting to me, in the sense that you can train the model on new data but what if it still has old data/docs inside the model. Does in that case the model know the timeline of information, so it is aware what data is newer? If that makes sense.
Yes absolutely it makes sense - this can happen for frameworks, libraries and api documentation etc. It might have the knowledge of for example app router but bias towards pages router. So being more explicit will help but having to pass in context for new docs and apis will be something we will have to continue to do
Is cursor free?
Only for free prompts
There is a free trial!
What does "stream back" even mean?
neither o1 or o1 mini support streaming responses (yet)
the response is delivered all at once in one big chunk
We have figured out how to build AGI, but we are still limited by computation limits. Once we painstakingly get to AGI, the AGI will work to to make computing better and those will feed each other. 2030 is the best guess for when we wil get AGI. I think O series is a complete architecture for AGI unlike GPT series. The way openai have made tokens to work as reasoning seems interesting. It acts like an internal monologue of human brains. o1 continuosly doubts itself when it "thinks" inside its chain of thought. It proposes alternatives, goes into buts and also sticks to a solution if it sounds sufficiently promising. Overcoming context limits and making this inference time compute indefinite will get us to AGI .Maybe O5 will be AGI.
Thank you for the thoughtful comment ❤️
still waiting for my humanoidbot
me too..
Or you could just do the logical thing and give it an example of the type of web site you want it to emulate, but we both know there are more dedicated web site building AI platforms better suited to this purpose.
Fair ! If there are any more involved examples anyone has ideas would like to see just lmk and I can see if I can try and make more involved example
Every time i try other gen ai and give them chance i am always disappointed. That's why i keep comming back to chatgpt. Of course they can answer some basic to small complex things but based on my usage they fail terribly. I tried to use Gemini so but but always disappointed. Claude is better than Gemini imo but it costs and i feel frustrated that it doesn't give enough tokens or rpm etc. the free one is too much limited to use.
bro can you try one thing. A snake game in python with genetic algorithm. Its a good task to evaluate.
I need some really hard ideas to test… 💡 😬
@@DevelopersDigest ok what about snake game + computer vision + deep learning?
Snake game just because open ai posted snake game video for o1 preview.
meh, didn't see anything special - sonnet 3.5 is already doing all this, if not even better.
honestly o1 is garbage for what it's supposed to be. I haven't found a single use case where it's worth the time or extra cost to use it... With some extra effort in prompting, Claude performs just as well or better. I really don't get what everyone is so excited about.
I have been testing it more on harder coding problems and have found that with very specific direction it can return and solve quite challenging problems. I have been testing with 1000 lines of code that is pretty involved and o1 was the first LLM to solve some issues I had within it. With that said, it took a number of shots where I had to give increasingly specific direction with each turn. I think sonnet 3.5 is still likely going to be the preferred for many + most of day to day coding tasks. I am still testing though!
@@DevelopersDigest esp if your codebase is even somewhat clean with established patterns and conventions, you don't need an advanced reasoning model for 99 percent of requests. Claude is more than sufficient. Plus I find that with most requests the advanced reasoning doesn't produce significantly better answers. The increased token output is better but that's also a solved problem with Claude. Aider generates unlimited output with Claude by intelligently stitching together multiple outputs so the end result is the same.
@@avi7278 ty - I haven't had a chance to try aider. how do you like aider compared to cursor / alternatives ?
@@DevelopersDigestI think this due to the size of the model and thus sparsity in the training data. Plus knowledge cut off. I think a more up to date cut off and larger model would crush sonnet