Indeed! The 01 model is definitely superior for coding, so far it has been flawless and surpassed the preview, as expected. The latency is a small price to pay. However, I am keen to see how Anthropic respond 🙂
i still get better results with claude. however i agree on the clean formatting. a really underrated practice is to generate examples with whatever model is best at that and feed it as context to your workhorse model. having a dataset of good examples works wonders for any llm application.
BTW you can also use gemini-2.0-flash-exp and gemini-2.0-flash-thinking-exp-1219 in Cursor (you need to add a Google API Key and add them manually to models) In my first tests these models perform very very good :)
I think you should compare it with Claude for the same tasks. Giving Claude the same tasks would allow you to compare both models in terms of response quality, speed, cost, and overall performance. However, I don't think the two tasks you provided are very representative, as the documentation appears to be well-structured and comprehensive. These tasks seem to mainly involve reading the documentation and combining the provided examples. It might also be valuable to test the same tasks on GPT-4 and even less advanced models, to better evaluate the actual level of difficulty. And what would be even more interesting is to assign O1 and Claude more complex real-world tasks, such as working within an existing codebase to add a new feature or solve a specific problem. Most models are quite good at generating standalone files like HTML, JS, or CSS, but they tend to struggle when dealing with an existing codebase.
You should use Composer rather than Chat. There's also an agent option there which may've worked better for one-shot. Also, you can add docs in Cursor itself under Features in the settings.
Do you always have to scroll through the chat to find different code snippets and apply the individually one by one? Doesn't Cursor provide any better experience for dealing with this?
Instead of Chat, he could have used Composer, which will edit multiple files per prompt. All you have to is approve or reject the changes. Sometimes Composer updates files when you're not expecting it, and I often find myself telling it 'don't change anything' when iterating a strategy for building a feature. I probably could get the same result by switching to Chat, but Composer feels smarter, dunno if that's true. Anyone have thoughts on that?
You know I really liked the idea of Claude, and I even coded with it for a time. But holy shit, their business model is annoying as hell within the actual Claude app. Even as a paying customer I had a limit per chat. Meaning I’d get really far into a project, get a cutoff mid-project, and have to start a new conversation with zero context. I use chats as projects. Funnel my entire limit into one chat for Pete’s sake.
All ai are useless just for boilerplate If one that train with an open source repo then u will get these type of ai dev u will get what I'm saying if ur not copy-paste
Using o1 for setting up a project and Claude for doing the small edits is the best of both worlds. I love how cleanly formatted o1's code is!
Indeed! The 01 model is definitely superior for coding, so far it has been flawless and surpassed the preview, as expected. The latency is a small price to pay. However, I am keen to see how Anthropic respond 🙂
@@19LloydG in my experience, claude is still better.
i still get better results with claude. however i agree on the clean formatting.
a really underrated practice is to generate examples with whatever model is best at that and feed it as context to your workhorse model.
having a dataset of good examples works wonders for any llm application.
So, how much did that o1 usage cost you?
Cursor o1 charges is 40 cents per request.
that's the right question.
You have 10 01-mini requests/day with Pro subscription, dunno about 01
@@ArcanoIncantatoreisn’t it unlimited?
@@multi_variateif true that is actually insanely expensive hahahaha
what was the API cost for these chats, or its it included in Cursor subscription?
Looks very impressive. Would be interesting to see a direct comparison with claude with the same prompts
OHH WOW THE AUDIO TRACK TOO??? 🎉🎉😮
10:45 You opened the HTML file directly from the drive instead accessing the node server. IMHO the Oneshot worked.
BTW you can also use gemini-2.0-flash-exp and gemini-2.0-flash-thinking-exp-1219 in Cursor (you need to add a Google API Key and add them manually to models)
In my first tests these models perform very very good :)
I think you should compare it with Claude for the same tasks. Giving Claude the same tasks would allow you to compare both models in terms of response quality, speed, cost, and overall performance.
However, I don't think the two tasks you provided are very representative, as the documentation appears to be well-structured and comprehensive. These tasks seem to mainly involve reading the documentation and combining the provided examples.
It might also be valuable to test the same tasks on GPT-4 and even less advanced models, to better evaluate the actual level of difficulty.
And what would be even more interesting is to assign O1 and Claude more complex real-world tasks, such as working within an existing codebase to add a new feature or solve a specific problem. Most models are quite good at generating standalone files like HTML, JS, or CSS, but they tend to struggle when dealing with an existing codebase.
You should use Composer rather than Chat. There's also an agent option there which may've worked better for one-shot.
Also, you can add docs in Cursor itself under Features in the settings.
can only use gpt4 or sonnet 3.5 in Composer
@ ah! Good to know, thanks. Wonder if that will change in the future.
Do you always have to scroll through the chat to find different code snippets and apply the individually one by one? Doesn't Cursor provide any better experience for dealing with this?
Instead of Chat, he could have used Composer, which will edit multiple files per prompt. All you have to is approve or reject the changes. Sometimes Composer updates files when you're not expecting it, and I often find myself telling it 'don't change anything' when iterating a strategy for building a feature. I probably could get the same result by switching to Chat, but Composer feels smarter, dunno if that's true. Anyone have thoughts on that?
You really should use the composer with the agent option enabled in Cursor.
Much better for tasks like the one you did in the video.
isn't o1 like super expensive?
It's following cursor rules much better than any other model. But cost is high. It's easy 10$/ day
Do you know that you can just paste the URL of the website in the cursor chat? No need to copy the entire content over.
thanks. was this costly?
Does anyone know whether o1 pro mode also will be available through the API and if so how much Cursor might charge for it?
Kris I do not have access as well - Maybe because we are from EU.
i have the o1 model available in the models options. i am from germany
@@superlama6452 Woke regulators are going to keep you away from innovation. Bring the change.
@@multi_variate Define woke?
Is cursor much better than just VS code with roo-cline
For example with roo-cline you can connect an Obsidian MCP so it can search and take in knowledge during its work
You know I really liked the idea of Claude, and I even coded with it for a time. But holy shit, their business model is annoying as hell within the actual Claude app. Even as a paying customer I had a limit per chat. Meaning I’d get really far into a project, get a cutoff mid-project, and have to start a new conversation with zero context.
I use chats as projects. Funnel my entire limit into one chat for Pete’s sake.
thats impressive
I think it would be better to use sonnet on a regular basis and solve the harder problems with o1 otherwise your waller is gonna be broken 😂😂
Amazing
You’re all doing AI coding completely wrong! 😂😂😂😂
Please, add arabic to audio track.
All ai are useless just for boilerplate
If one that train with an open source repo then u will get these type of ai
dev u will get what I'm saying if ur not copy-paste
Just for boiler plate doesn't sound useless, boilerplates are very useful.