o3-mini is really good (but does it beat deepseek?)
Вставка
- Опубліковано 7 лют 2025
- OpenAI just released their new reasoning model o3 mini, with some very clear responses to the crazy stuff Deepseek's been up to 👀
Thank you Ragie AI for sponsoring! Check them out at: soydev.link/ragie
Try out o3 mini for free: soydev.link/chat
o3 mini announcement: openai.com/ind...
Check out my Twitch, Twitter, Discord more at t3.gg
S/O @bmdavis419 for the awesome edit 🙏
The fact they dropped it so quickly tells you that OpenAI has had the ability to make great cheap models for awhile now but just didn't want to due to lack of competition.
exactly, it's much more than this. they just want to release new products just for making money. it doesn't matter for them to put the best foot forward
they announced this a while ago bruh
Except they announced a late January release for o3-mini back in December 2 months ago? They might’ve made it cheaper because of r1 but release date has nothing to do with it.
@@arotobo agree, but we never know the scope of what's going to be released. it's not tangible exactly. so they might even just keep the hype cycle up and keep raising funds and selling us mediocre products.
@@voicevy3210 we always knew it was o3 mini, they literally said it in the announcement video that finished their “12 days of openai”. I see how my wording is confusing tho so I will fix it.
App devs should send DeepSeek team a thank-you letter
why?
@@mikitoburrito Because they forced OpenAI to lower the price for o3-mini to be competitive again. Otherwise the would probably start with $100 per 1m tokens lol
@@mikitoburrito you mean you have no clue???
"After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."
@@myintmaunmaun DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek
I'm sure they've planned to make o3 more expensive, but they've had to come up with a cheaper pricing due to R1. I'm also sure Google wanted to increase their pricing of the experimental "thinking" Gemini Flash model once it comes out of the preview phase, but now they'll need to adjust as well. Thank you DeepSeek!
DeepSeek helps US's people to bring the AI price down, that make closeAI follow up with. someday, closeAI maybe become the true OpenAI
DeepSeek helps people all around the world to bring the AI price down.
DeepSeek made their source code open for everyone to use proving that we dont actually need Project Stragate, but smarter ways to train models.
"After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."
@@kukuricapica not true
@@kukuricapicaAnd if OpenAI ever decides to actually sue, AI would finally get regulated and they may shoot themselves in the foot.
Theo to fix your issue with markdown OpenAI is looking for a key in the system message, this is from their new docs on reasoning models.
"Markdown formatting: Starting with o1-2024-12-17, reasoning models in the API will avoid generating responses with markdown formatting. To signal to the model when you do want markdown formatting in the response, include the string Formatting re-enabled on the first line of your developer message."
This past week in the world of AI, is a great example of free market competition principles!
No none of these players are free market based. While ChatGPT has its big investors, DeepSeek has Chinese government in the shadows. Both will steal your data and it is a matter of opinion if you want it to go to corporate thieves or CCP autocratic thieves.
yup. hard to see in other fields. Once AI is figured out, innovation won't be as much of a disruptive to bigger players. I'm rooting for open source.
The irony is that it was sparked by communist China, lol.
Not true at all. Giving your stuff for free as open source is as far from capitalism that it can get. Capitalism was what was in power 3 weeks ago.
yeah, EV is the opposite example of free market competition principles!
“Two devs in a trenchcoat” is such good a way to describe early startups 😂
15:00 of course he knows this, no typical user is running such a demanding task so they do not need to push the model so hard, thus less error prone
Markdown: "Formatting re-enabled" on the first line of your developer message, to enable markdown.
OpenAI was like "What's their price? *DOUBLE IT*"
fr, I had to confirm😅
I'm rooting for DeepSeek and similar opensource companies. If opensource wins the AI race, we all win. If OpenAI wins, we all lose. That sounds extreme, but that's honestly how it looks right now.
Oh noooo... OpenAI wins, we looose Oh noooo 🤣 🤣 🤣
Dont we benefit either way by the competition? I mean by the end, one of them will have a model that we suprass all of them, and we will benefit from the model.
We? Who is we?
We win either way.
opensource will never win, because every time opensource discovers new methods, closedsource will just copy it silently since they can access to it, but when closedsource discovers new method, opensource can't access it
The best thing about DeepSeek is that it looks like they've been able to do (at least to some degree) what the rest of the industry has been hounding OpenAI to do (unsuccessfully) forever: Reveal their chain-of-thought and go open source. They aren't doing either yet, but they are relaxing their "moat" a bit and giving more detailed, but still high level chain-of-thought and they are considering actually open sourcing some of their code.
These are the kind of vids that are ripe for ai video summarisation
yup used o3 for this case lol so good!
which one are we using?
Yes ChatGPT UI is bad we get it
To be fair with all the funds that they have, that bad of a UI deserve that kind of criticism.
It’s not just bad. It’s doesn’t work for long running tasks
He's just trying to sell his product.
i challege t3 chat to maintain its performance with the amount of traffic chatgpt has
Nope. First, deepseek still outperform o3 mini with tons of problems that i gave. Second, its free.
It's not free. It cost 3000 USD for a graphics card :D and electricity.
@@tomirkplYou can use it for completely free on their website... Are you dumb?
@@tomirkpl Not to mention the model you can run on a 4090 (or 5090 if you can even get a chance to buy one), is only the 70b model AT BEST, with super slow speed, and far dumber than the 671b model hosted on their website, and without a search function that you will have to implement by yourself, which can be far inferior to their native one.
@@brockoala2994 bro 0.001$ per API token is basically free you don't need to host anything
It does't.))) I tried to use o3-mini-hard to write my one simple python script and it failed to work after 15 additional questions while deepseek wrote me a working script after 15 questions.
On the first question every model faild.
I would suggest you check the internal thoughts to see what's going on.
When did theo become an AI bro 😂
hype is views
he has an chat app...
End of 2024, I guess
bc he's grifter
He is literally a Cursor Editor investor, he always made videos about them since copilot.
DeepSeek R1 is Better & Free
And it’s open source what more could you ask for
Deepseek is not free, it charges for every token you call through api
@childe2001he talks about the web app and mobile app not api
Uhhh no. o3 smokes DeepSeek
@@TheWarehouseDude true the code it gave me was crazy good o3 mini high is much superior but having deepseek make openai scared is very good for us users
Stop the hype. I used it all day on real world coding problems and it’s not much different from 3.5 sonnet. Even there, most of the improvement from 0-1 isn’t coming from the model, it’s coming from the software layer on top of the model.
it isn't really an upgrade from o1 performance wise afaik. It's just similar/same performance with greater efficiency and speed.
Claude is still the best coding model for real world tasks
same here, it is not too much...
Exactly
Bingo, the model is basically the same imho, there's effectively just some built in "are you sure" and "outline the steps" prompts. Agree 3.5 sonnet still seems to pull ahead in real-world coding tasks.
o3 fails at the marble cup question....fail....deepseek gets it right
o3 is specialized for coding and STEM, not marbles
@@moonasha if it's logic fails at marble, cup, table - it's shite
No, if you are smart engouh to use right model its better than deep seek.
Some people dont even know which model to use for coding and smaller tasks.
@@Vedant-df9zo Nope, it should at least have the 'reasoning' to understand that the marble falls out of the cup and onto the table when turned upside down. If it fails at this, it won't be any good at 'STEM'. Imagine the reasoning errors it will make with fluid dynamics.
Still R1 is just as good and cheaper. 👍👍👍
I mean... OpenAI did say they were launching o3-Mini at end of January in Midish December...
But the emergence of V3 at that time made O3 reconsider the release time. No one knows what O3 was doing during this period.
but not with a massive price drop
You're a legend for offering o3-mini on free tier, thanks so much for that!
After extensive experience with this model with DeepSeek Literally deepseek Thinks longer and gives better and more accurate answers in long context Most importantly, I can download it locally and use it Also, the new OpenAI model is not a complete model, it cannot even view files, and it is really stupid and worthless, even in normal questions it did not answer correctly
I have to say, Claude does that, it has been reported that it sometimes ignores first prompts so the interaction in that specific chat would last longer. If you give it the same promt in another chat it might give you better results.
Honestly, after some tests online and testing myself, o3 is underwhelming. Like, by A LOT. R1 still manages to beat it in half if not more of the tasks, especially making a game in html.
Also, I've noticed a SIGNIFICANT drop in all chatgpt models. What do I mean? They seem to respond in such stupid ways, they don't actually follow what the user is saying. This started happening after r1 released (I did not use R1 at that time, so there's no bias on my side. I came to this conclusion before using R1 or even hearing about it.)
hey just a quick shout out to you guys the t3 chat is amazing, tried it for the first time today and responses were asap. Great work
o3 mini-high is actually insanely good. been playing for a while. absolutely mind-blowing.
No it's not it has better pricing than the other ChatGPT models but nowhere near being mind blowing.
03 mini and the high version are not completely free though, unlike DeepSeek R1, so just use that model, you dumbass.
Using both o3 mini high and deepseek for 8 hours yesterday I can confidently say Deepseek is better at doing it what you tell it to. All GPT want's to do is give you // Fill in the rest here comments. I am cancelling my GPT subscription
With the amount of hype i see from UA-camrs about AI i thought even the old gpt 4 could easily complete all of these AoC tasks with ease, especially considering the results are everywhere online, the fact that the latest models can't was shocking to me. And i'm out here wondering why everyone is sucking off the Cursor ide while it's struggling with my simple react codebase. So much empty hype around AI it's insane
Bro make a video on how are you so productive
Just start coding when you're seven years old and you're good to go.
He has a team. He just hired Ben Davis who is insanely productive.
I always suspected OpenAI to mine BTC in the background of the page or something 😂
Another scam from mister charlatan Altman. Before they program gpt to change the reply, here is what I got by asking ChatGPT o3-mini "Which model am I talking to? " It replied : Let's break down the answer into simple points:
- **I am GPT-4:**
I run on the GPT-4 architecture. That's my main model.
- **About "o3-mini":**
There is no version called "o3-mini" in my design.
My technology is entirely based on GPT-4.
So, to answer your question directly: No, I'm not "o3-mini." I'm GPT-4.
It's not weird that o3 Mini costs less per token than 4o. It's probably the equivalent of 4o mini but with reasoning capabilities.
It ultimately spits out MUCH more tokens per prompt and you're still paying for them even if you don't see them over the API.
"After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."
But does that mean they officially reconized DeepSeek good 😙.
Open source wins all day every day. Sam can only keep the Potemkin Village standing for so long before all of his skeletons come flying out of the closet.
Strange how I've found sonnet 3.5 to still be the best at my coding tasks
Maybe others are good in Python and React, but when coming to code for some less popular language as Drupal/PHP or SwiftUI, Claude still impress me.
Honestly, if I were building an app right now, it would take a huge, huge leap in capabilities for me to even consider any Open AI (or Google, or Antrhopic) model. The cost-effectiveness, the ability to self-host, the ability to apply LORA to fine tune for specific capabilities; these are high-value things when you're building an app and it would take a substantial increase in capabilities from Open AI before I would even start to debate giving them up.
What’s with the fake tweet thumbnail?
short answer: yes, it really beats deepseek.
I personally haven't bumped into any of Theo's issues, I feel sorry for him.
If you give an LLM an open ended problem with tons of requirements they will miss something unless you prompt them super specifically
Reasoning models are just really good at prompting themselves very specifically
I am a genius and I write amazing prompts. That’s why I actually don’t use the oh one model. I use the old GPT four model and it works better for me because the GPT four actually gives me precise what I want the oh one things instead of me, which is of course worse because I’m super superior to you or any other human being.
@tubeyou6794
I'm not sure if you are implying I said anything of the sort. Or you are actually saying that you do that which wouldn't be the smartest thing to do.
But technically if you broke down the problem into very clear step by step instructions you realize how that would be easier for the AI no?
If you want to test this, you can use any reasoning model that actually gives you all the "thinking" part.
Ask a question to V3 that it can't do consistently but R1 can.
Now ask it to R1, then take the content inside the think tags and dive it with the question to V3
Watch V3 get the question right.
But if you want to do it even better, break down the problem in tiny steps yourself, and ask it to do the steps 1 by 1 and you'll probably do a better job than R1
I didn’t know that Claude was so expensive. We use it at work & it honestly does so well that we always assumed they updated it to be a reasoning model, but after watching this video I will be suggesting several changes
Well all grads and current students careers just went up in flames. What a good prank it was.
Tf?
Interestingly, OpenAI O3's reasoning process inevitably shows Chinese thinking process, which looks like a trick that is not hidden well.
Have you tried with new gemini thinking model?
Your videos taught me so much that I know almost nothing about. Thank you, Theo
@ 6:24, when GPT cut off that last response after “Setting the parameter” paragraph , why didn’t you then just ask something along the lines of “your response got cut off after (copy/paste last paragraph). Continue from there.”
You’d have had the ability to objectively evaluate o3 mini’s coding capabilities if you had written a prompt like that b/c that would’ve generated a final stable version of the script
If GPT agents will replace all developers, why did not all those companies fix their UI yet?
T3 Chat needs a toggle for "Just answer, don't explain the answer." and it should default to on.
Can you open source a version of T3chat, or some boilerplate that uses the same stack? I am curious how you’ve married nextjs and react router, counter to what everyone says you should do, yet you seem to be getting a good result.
Claude just shows that everything that Amazon touches (or invests in) end up being promising and then they suck.
Its still the best model
It is probably still the best non-reasoning model and it works the fastest.
@ there is no such thing as a 'reasoning' model, that's just a marketing term
@@RomeTWguybest model for normies. Nobody with a serious novel or hard problem is going to choose Claude, it just makes stuff up confidently because it can't reason.
@@RomeTWguythen there is such a thing guy
who else skips like crazy whe you hear 'this days sponsor' and dont listen to ads at all and is not affected by them?
very sure deepseek is cooking r2 in silence. The next distillation will setback openAI. But completion is good for us consumers. Let them fight
After testing, the 03 is now inferior to the r1.
@@sasa-tg4odJUST SHUT UP!!!!!!!
The pagination is weird indeed. In the network tab, you can see that AT LEAST 2 requests are made for each page, sometimes more are made.
Did you try adding "Formatting re-enabled" on the first line of your developer message to re-enable Markdown?
I just stumbled onto T3 today, and wanted to get signed up but its missing even basic functionality such as a system prompt? i understand you want to run lean and mean, but couldnt you stash it somewhere in advanced? And folders are a must if you are running 20 queries a day.
Yeah. It's unusable.
I don't know why you should compare a shelled deepseek clone with the native deepseek.
4o is still better for general knowledge, trivia, etc.
reasonable dad jokes
The word Reasonable came from teaching a son to eat by reading a story about a bowl. (Read-son-a-bowl)
The 03 mini prices are either BS they use as a loss leader OR someone forgot to pull the plug on GPT4.
For sure loss leader. They are in panic mode. Alas, R1 is still completelly free, so OpenAI can f off for all I care
@@lukasz96 O3 is in the free tier too
How are you so good at Advent of code to have pretty good timings?
Do you have experience with Algorthmic Problems and Competitive Programming, or you are naturally extremely gifted?
NVDIA vs AMD || Deep seek vs Open AI .... what a strange world we live in
if it wasnt for deepseek, o3 mini probably wouldnt release for another year, exact thing they did with sora
This is the beginning of the plateau. No increase in result accuracy, but making it cheaper. Wild to me that claude 3.5 is still superior to both r1 and o3 when it comes to coding lol
Guys! What tier do I need to use o3-mini through the API?
3
Google has been cooking with Gemini models recently and adding them to the exact comparison would be very nice
Their UI is hilariously bad. I can’t agree more with that! 😊
9:30 That was a really hard problem.
3 medium problems layered.
Its definitely hard. Thought it would be easier. But the trick is that its a combinations problem not a greedy problem.
You can greedily get the combinations to reduce space. After that realization its kinda easy. Just a lot of writing. Incredibly fun problem.
Had no idea recursive optimal pathways could be so different with such obvious and seemingly fixed optimal paths.
I gave 03 mini functional, working, simple code to evaluate. It had improvement ideas that sounded fine, so I asked it to improve. It was like dealing with gpt 3 ...it broke the existing code and provided no updates. After 5 more prompting sessions it still could not even duplicate the existing code that worked. Not sure what the hype is yet. What may I be missing? Thanks
15:36 I'm pretty sure that the reasoning UI given by OpenAI is actually gaslighting. The actual reasoning tokens are not exposed to us the user. Instead they have yet another process that is summarizing the reasoning in order to obfuscate their techniques.
I've been using o3 occasionally, but I still like r1 more for most prompts. r1 tells you when your question is too large for the context window, o3-mini just forgets that you asked a question if it's before 5000 lines of code. o1 answers best.
Agent Smith: ... The perfect world was a dream that your primitive cerebrum kept trying to wake up from. Which is why the Matrix was redesigned to this: the peak of your civilization. I say your civilization, because as soon as we started thinking for you it really became our civilization, which is of course what this is all about. (Matrix quotes)
Ned Flanders sabe lo que dice.
Obviously, why O3 can be out shortly after Deepseed with lower training cost ? O3 incorporated the key feature of Deepseed's code
We should just build a decoder model to convert that fun formatting of he’s (r1) , to markdown or any other formatting. I am pretty sure any basic gpt can already decode in such a way.
now closed source ai will copy what deepseek has done because they can access it because deepseek is completely open source, then will sell it to the public
What happens when China develops and releases a free version of Sora?
😮
server cost will be too much. It won't be free. But it will be open source we can run on our local machine
The Chinese Sora equivalents, Kling and MinMax, far surpass America's Sora in capability. Though not free to use, the United States has already lost ground in this domain of technological competition.
@ Thank you for letting us know! Very interesting.
Let them keep their expensive models to themselves
DeepSeek R1 is now very slow. And DeepSeek R1 (Nitro) which is fast is $7 in $7 out.
The real ranking.
o3 mini - deepseekr1 - claude sonnet 3.5 - o1 - o1 mini - qwen 2.5. Used em all qwen is not there yet and gemini even the latest gemini 2 isnt even in the list its worse among ranking at the moment
They can keep their closed source trash
I tested it for about 4 hours yesterday and for now 01 Pro is just better due to being more compliant and on task when presented with long and complex prompt scripts and tasks. It's not so much about hallucinations at this point, it's more like it selectively ignores parts of the script, even with the most extreme reenforcement. Like, it understands the full context but will do what it wants past a certain point instead of following the full letter of what you're asking for it unless you go back to chunking your answer and go peacemeal.
It's just my opinion,I feel that R1's answer after reasoning is better than o3-mini.
LIke more detailed and structured
I tested o3-mini (low) [free-tier] and Deepseek R1 on some math competitions. Deepseek R1 is able to solve many problems from the Chinese National High School Math League First Round, but fails miserably on the Second Round (harder problems). On the other hand, o3-mini (low) solves all problems from the Second Round @2 (those I threw to it), but fails on the National Team Selection Test (extremely hard problems). And o3-mini (low) is clearly faster than Deepseek R1. So at least for math, o3-mini (low) is better than Deepseek R1.
Without deepseek o3 would cost $25 per million tokens
Can it run locally on my off-grid base out in nowhere? No?
Good bye. Hi this is your homie Tony from LCSign.
Mistral Small 3 > DeepSeek. No normal user has any use for a highly censored model like DeepSeek that needs a giant server to even run it properly.
Can someone tell me which one is better, or do they both have their own advantages?
Still prefer DeepSeek
Why would anyone in their right mind support that kind of scummy behavior when they could have released the cheaper option to begin with?
Seems like we have to download and train our own. Prime agen is looking at this. Maybe internet of bugs could join. Like taking a 7 year old bright kids and slowly bringing him along. Clearly the way is for good programmers to pick a language like ziggy and group teach a new model. Lots of work.
I find the naming of open ai extreemly confusing, you have three o3 models mini/medium/high but the mini also has 3 sub models: mini-low mini-medium and mini-high. So now when we are looking at benchmarks, we have no idea what the benchmarks refer to. Especially when you do like the video creator here, naming it "o3" , it is not clear anymore if we are still looking at the o3 mini models, and if yes, which of the sub models. Deepseek seem to be better then o3 mini-mini and mini-medium, but not mini-high (which is currently only for pro subscribers). But offcourse deepseek r1 can be downloaded, whereas the o3 models cannot be downloaded. And when we think of all the dowtime chatgpt has the last few, weeks. It becomes tempting to run it offline. Especially because with deepseek we can use PDF's, for some reason the o3 models don't support files.
i think the formating outpout issue is a tell that this was not as polished as they wanted before release and did a rushed release following deepseek fallout. they never wanted to release this for free but have been forced into it. it's a lot cheaper for us but they are probably doing this at a big loss
Instead of being confused why most of your costs are from claude, you could maybe just make out that claude is really well liked....
The reason why claude is used so much, is because it is incredibly well aligned for programming. You really feel that they put a lot of effort in their rlhf for programming tasks and it works really well on cursor
Deepseek r1 is though much better at reasoning about the code, but not that good in creating good nice code without significant prompting
Do API users (or T3) have to pay for the tokens used in the weird formatting of outputs? Like those weird lines of dashes would presumably consume tokens despite having zero functional value in the output.
o3 mini might be cheap but it generates lots of output token for reasoning deepseek v3 is the best compromise for generation
In my test scenario o3-mini solved the problem fast, but R1 spent 10 minutes and gave me code that don’t work at all. All other models I tried also was not able to solve my test task. So o3-mini favorite for now. Have not tried o1 just in case.
With o1 you are paying for output tokens that you don't get to see. That sounds like a scam to me
calling deepseek dilluted from their own model and then dropping a model comparable to r1 for so cheap doesnt make sense at all, its like openai is completely falling apart out of desperation💀
Why doesn't t3 chat have QWEN2.5? And why doesn't it have the qwen mini distilled versions of Deepseek?