o3-mini is really good (but does it beat deepseek?)

Поділитися
Вставка
  • Опубліковано 7 лют 2025
  • OpenAI just released their new reasoning model o3 mini, with some very clear responses to the crazy stuff Deepseek's been up to 👀
    Thank you Ragie AI for sponsoring! Check them out at: soydev.link/ragie
    Try out o3 mini for free: soydev.link/chat
    o3 mini announcement: openai.com/ind...
    Check out my Twitch, Twitter, Discord more at t3.gg
    S/O ‪@bmdavis419‬ for the awesome edit 🙏

КОМЕНТАРІ • 392

  • @amihartz
    @amihartz 6 днів тому +1117

    The fact they dropped it so quickly tells you that OpenAI has had the ability to make great cheap models for awhile now but just didn't want to due to lack of competition.

    • @voicevy3210
      @voicevy3210 6 днів тому +79

      exactly, it's much more than this. they just want to release new products just for making money. it doesn't matter for them to put the best foot forward

    • @urmom8322
      @urmom8322 6 днів тому

      they announced this a while ago bruh

    • @arotobo
      @arotobo 6 днів тому +58

      Except they announced a late January release for o3-mini back in December 2 months ago? They might’ve made it cheaper because of r1 but release date has nothing to do with it.

    • @voicevy3210
      @voicevy3210 6 днів тому +18

      @@arotobo agree, but we never know the scope of what's going to be released. it's not tangible exactly. so they might even just keep the hype cycle up and keep raising funds and selling us mediocre products.

    • @arotobo
      @arotobo 6 днів тому +13

      @@voicevy3210 we always knew it was o3 mini, they literally said it in the announcement video that finished their “12 days of openai”. I see how my wording is confusing tho so I will fix it.

  • @jiachen1078
    @jiachen1078 6 днів тому +630

    App devs should send DeepSeek team a thank-you letter

    • @mikitoburrito
      @mikitoburrito 6 днів тому +5

      why?

    • @rodjenihm
      @rodjenihm 6 днів тому +74

      @@mikitoburrito Because they forced OpenAI to lower the price for o3-mini to be competitive again. Otherwise the would probably start with $100 per 1m tokens lol

    • @itsmeGeorgina
      @itsmeGeorgina 6 днів тому +14

      ​@@mikitoburrito you mean you have no clue???

    • @myintmaunmaun
      @myintmaunmaun 6 днів тому +19

      "After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."

    • @deathrace-bx5ne
      @deathrace-bx5ne 6 днів тому

      @@myintmaunmaun DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek

  • @damians.7859
    @damians.7859 6 днів тому +154

    I'm sure they've planned to make o3 more expensive, but they've had to come up with a cheaper pricing due to R1. I'm also sure Google wanted to increase their pricing of the experimental "thinking" Gemini Flash model once it comes out of the preview phase, but now they'll need to adjust as well. Thank you DeepSeek!

  • @leuhenry8031
    @leuhenry8031 6 днів тому +203

    DeepSeek helps US's people to bring the AI price down, that make closeAI follow up with. someday, closeAI maybe become the true OpenAI

    • @dibu28
      @dibu28 6 днів тому +19

      DeepSeek helps people all around the world to bring the AI price down.

    • @kukuricapica
      @kukuricapica 6 днів тому +15

      DeepSeek made their source code open for everyone to use proving that we dont actually need Project Stragate, but smarter ways to train models.

    • @myintmaunmaun
      @myintmaunmaun 6 днів тому +6

      "After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."

    • @pneumonoultramicroscopicsi4065
      @pneumonoultramicroscopicsi4065 6 днів тому +3

      ​@@kukuricapica not true

    • @apierror
      @apierror 6 днів тому

      ​​@@kukuricapicaAnd if OpenAI ever decides to actually sue, AI would finally get regulated and they may shoot themselves in the foot.

  • @bryce1017
    @bryce1017 6 днів тому +38

    Theo to fix your issue with markdown OpenAI is looking for a key in the system message, this is from their new docs on reasoning models.
    "Markdown formatting: Starting with o1-2024-12-17, reasoning models in the API will avoid generating responses with markdown formatting. To signal to the model when you do want markdown formatting in the response, include the string Formatting re-enabled on the first line of your developer message."

  • @ashleymccarthy6232
    @ashleymccarthy6232 6 днів тому +115

    This past week in the world of AI, is a great example of free market competition principles!

    • @gamernerd7139
      @gamernerd7139 6 днів тому

      No none of these players are free market based. While ChatGPT has its big investors, DeepSeek has Chinese government in the shadows. Both will steal your data and it is a matter of opinion if you want it to go to corporate thieves or CCP autocratic thieves.

    • @mcwornex2123
      @mcwornex2123 6 днів тому +4

      yup. hard to see in other fields. Once AI is figured out, innovation won't be as much of a disruptive to bigger players. I'm rooting for open source.

    • @VV-nw4cz
      @VV-nw4cz 6 днів тому

      The irony is that it was sparked by communist China, lol.

    • @vedranlekic9725
      @vedranlekic9725 6 днів тому +6

      Not true at all. Giving your stuff for free as open source is as far from capitalism that it can get. Capitalism was what was in power 3 weeks ago.

    • @天天去溜达
      @天天去溜达 6 днів тому +1

      yeah, EV is the opposite example of free market competition principles!

  • @Lucas-gt8en
    @Lucas-gt8en 6 днів тому +17

    “Two devs in a trenchcoat” is such good a way to describe early startups 😂

    • @devviz
      @devviz День тому

      15:00 of course he knows this, no typical user is running such a demanding task so they do not need to push the model so hard, thus less error prone

  • @haha7567
    @haha7567 4 дні тому +2

    Markdown: "Formatting re-enabled" on the first line of your developer message, to enable markdown.

  • @tedarcher9120
    @tedarcher9120 6 днів тому +34

    OpenAI was like "What's their price? *DOUBLE IT*"

  • @sbowesuk981
    @sbowesuk981 6 днів тому +48

    I'm rooting for DeepSeek and similar opensource companies. If opensource wins the AI race, we all win. If OpenAI wins, we all lose. That sounds extreme, but that's honestly how it looks right now.

    • @tomirkpl
      @tomirkpl 6 днів тому +2

      Oh noooo... OpenAI wins, we looose Oh noooo 🤣 🤣 🤣

    • @generalegg6778
      @generalegg6778 5 днів тому +1

      Dont we benefit either way by the competition? I mean by the end, one of them will have a model that we suprass all of them, and we will benefit from the model.

    • @Pepe12345-c
      @Pepe12345-c 5 днів тому

      We? Who is we?

    • @Jaroslav-f9o
      @Jaroslav-f9o 4 дні тому

      We win either way.

    • @shining_cross
      @shining_cross 4 дні тому +4

      opensource will never win, because every time opensource discovers new methods, closedsource will just copy it silently since they can access to it, but when closedsource discovers new method, opensource can't access it

  • @BruceWayne15325
    @BruceWayne15325 6 днів тому +7

    The best thing about DeepSeek is that it looks like they've been able to do (at least to some degree) what the rest of the industry has been hounding OpenAI to do (unsuccessfully) forever: Reveal their chain-of-thought and go open source. They aren't doing either yet, but they are relaxing their "moat" a bit and giving more detailed, but still high level chain-of-thought and they are considering actually open sourcing some of their code.

  • @HDshrimpkick
    @HDshrimpkick 6 днів тому +34

    These are the kind of vids that are ripe for ai video summarisation

  • @Rsanda
    @Rsanda 6 днів тому +57

    Yes ChatGPT UI is bad we get it

    • @haroldcruz8550
      @haroldcruz8550 6 днів тому +23

      To be fair with all the funds that they have, that bad of a UI deserve that kind of criticism.

    • @fatal510
      @fatal510 6 днів тому +2

      It’s not just bad. It’s doesn’t work for long running tasks

    • @Mezielz
      @Mezielz 5 днів тому

      He's just trying to sell his product.

    • @devviz
      @devviz 14 годин тому

      i challege t3 chat to maintain its performance with the amount of traffic chatgpt has

  • @the_proffesional1713
    @the_proffesional1713 6 днів тому +20

    Nope. First, deepseek still outperform o3 mini with tons of problems that i gave. Second, its free.

    • @tomirkpl
      @tomirkpl 6 днів тому +5

      It's not free. It cost 3000 USD for a graphics card :D and electricity.

    • @RedEyeLazer
      @RedEyeLazer 6 днів тому

      ​@@tomirkplYou can use it for completely free on their website... Are you dumb?

    • @brockoala2994
      @brockoala2994 6 днів тому

      @@tomirkpl Not to mention the model you can run on a 4090 (or 5090 if you can even get a chance to buy one), is only the 70b model AT BEST, with super slow speed, and far dumber than the 671b model hosted on their website, and without a search function that you will have to implement by yourself, which can be far inferior to their native one.

    • @haythemsandel8303
      @haythemsandel8303 6 днів тому

      @@brockoala2994 bro 0.001$ per API token is basically free you don't need to host anything

  • @dibu28
    @dibu28 6 днів тому +10

    It does't.))) I tried to use o3-mini-hard to write my one simple python script and it failed to work after 15 additional questions while deepseek wrote me a working script after 15 questions.
    On the first question every model faild.

    • @lost4468yt
      @lost4468yt 6 днів тому +1

      I would suggest you check the internal thoughts to see what's going on.

  • @subhashpeshwa2997
    @subhashpeshwa2997 6 днів тому +134

    When did theo become an AI bro 😂

    • @sid4579
      @sid4579 6 днів тому +38

      hype is views

    • @richielickie
      @richielickie 6 днів тому +24

      he has an chat app...

    • @nwsome
      @nwsome 6 днів тому +2

      End of 2024, I guess

    • @urmom8322
      @urmom8322 6 днів тому

      bc he's grifter

    • @Frozander
      @Frozander 6 днів тому

      He is literally a Cursor Editor investor, he always made videos about them since copilot.

  • @_Factboy_Sunny_
    @_Factboy_Sunny_ 6 днів тому +86

    DeepSeek R1 is Better & Free

    • @fufuu
      @fufuu 6 днів тому +14

      And it’s open source what more could you ask for

    • @childe2001
      @childe2001 6 днів тому +7

      Deepseek is not free, it charges for every token you call through api

    • @th-redattack
      @th-redattack 6 днів тому

      @childe2001he talks about the web app and mobile app not api

    • @TheWarehouseDude
      @TheWarehouseDude 6 днів тому +4

      Uhhh no. o3 smokes DeepSeek

    • @th-redattack
      @th-redattack 6 днів тому

      @@TheWarehouseDude true the code it gave me was crazy good o3 mini high is much superior but having deepseek make openai scared is very good for us users

  • @matthewwoodard9810
    @matthewwoodard9810 6 днів тому +21

    Stop the hype. I used it all day on real world coding problems and it’s not much different from 3.5 sonnet. Even there, most of the improvement from 0-1 isn’t coming from the model, it’s coming from the software layer on top of the model.

    • @mikitoburrito
      @mikitoburrito 6 днів тому +3

      it isn't really an upgrade from o1 performance wise afaik. It's just similar/same performance with greater efficiency and speed.

    • @RomeTWguy
      @RomeTWguy 6 днів тому +2

      Claude is still the best coding model for real world tasks

    • @LucasSouzaDev
      @LucasSouzaDev 6 днів тому

      same here, it is not too much...

    • @ismaelplaca244
      @ismaelplaca244 6 днів тому

      Exactly

    • @furycorp
      @furycorp 6 днів тому

      Bingo, the model is basically the same imho, there's effectively just some built in "are you sure" and "outline the steps" prompts. Agree 3.5 sonnet still seems to pull ahead in real-world coding tasks.

  • @jeystone2159
    @jeystone2159 6 днів тому +63

    o3 fails at the marble cup question....fail....deepseek gets it right

    • @moonasha
      @moonasha 6 днів тому +3

      o3 is specialized for coding and STEM, not marbles

    • @jeystone2159
      @jeystone2159 5 днів тому +3

      @@moonasha if it's logic fails at marble, cup, table - it's shite

    • @Vedant-df9zo
      @Vedant-df9zo 4 дні тому

      No, if you are smart engouh to use right model its better than deep seek.
      Some people dont even know which model to use for coding and smaller tasks.

    • @ViralKiller
      @ViralKiller 4 дні тому +1

      @@Vedant-df9zo Nope, it should at least have the 'reasoning' to understand that the marble falls out of the cup and onto the table when turned upside down. If it fails at this, it won't be any good at 'STEM'. Imagine the reasoning errors it will make with fluid dynamics.

  • @leeyouyun7728
    @leeyouyun7728 6 днів тому +32

    Still R1 is just as good and cheaper. 👍👍👍

  • @vassovas
    @vassovas 6 днів тому +43

    I mean... OpenAI did say they were launching o3-Mini at end of January in Midish December...

    • @PrinceofUnderpants
      @PrinceofUnderpants 6 днів тому

      But the emergence of V3 at that time made O3 reconsider the release time. No one knows what O3 was doing during this period.

    • @neoglacius
      @neoglacius 5 днів тому +4

      but not with a massive price drop

  • @gr33nDestiny
    @gr33nDestiny 6 днів тому +2

    You're a legend for offering o3-mini on free tier, thanks so much for that!

  • @m3nafsy
    @m3nafsy 6 днів тому +2

    After extensive experience with this model with DeepSeek Literally deepseek Thinks longer and gives better and more accurate answers in long context Most importantly, I can download it locally and use it Also, the new OpenAI model is not a complete model, it cannot even view files, and it is really stupid and worthless, even in normal questions it did not answer correctly

  • @VidoviDroga
    @VidoviDroga 6 днів тому +3

    I have to say, Claude does that, it has been reported that it sometimes ignores first prompts so the interaction in that specific chat would last longer. If you give it the same promt in another chat it might give you better results.

  • @Darkenz000
    @Darkenz000 5 днів тому +2

    Honestly, after some tests online and testing myself, o3 is underwhelming. Like, by A LOT. R1 still manages to beat it in half if not more of the tasks, especially making a game in html.
    Also, I've noticed a SIGNIFICANT drop in all chatgpt models. What do I mean? They seem to respond in such stupid ways, they don't actually follow what the user is saying. This started happening after r1 released (I did not use R1 at that time, so there's no bias on my side. I came to this conclusion before using R1 or even hearing about it.)

  • @md.manzeralam6508
    @md.manzeralam6508 6 днів тому

    hey just a quick shout out to you guys the t3 chat is amazing, tried it for the first time today and responses were asap. Great work

  • @pikachu-mx6hi
    @pikachu-mx6hi 6 днів тому +16

    o3 mini-high is actually insanely good. been playing for a while. absolutely mind-blowing.

    • @haroldcruz8550
      @haroldcruz8550 6 днів тому +7

      No it's not it has better pricing than the other ChatGPT models but nowhere near being mind blowing.

    • @RedEyeLazer
      @RedEyeLazer 6 днів тому +1

      03 mini and the high version are not completely free though, unlike DeepSeek R1, so just use that model, you dumbass.

  • @markbond08
    @markbond08 6 днів тому +1

    Using both o3 mini high and deepseek for 8 hours yesterday I can confidently say Deepseek is better at doing it what you tell it to. All GPT want's to do is give you // Fill in the rest here comments. I am cancelling my GPT subscription

  • @chriss87878
    @chriss87878 4 дні тому +1

    With the amount of hype i see from UA-camrs about AI i thought even the old gpt 4 could easily complete all of these AoC tasks with ease, especially considering the results are everywhere online, the fact that the latest models can't was shocking to me. And i'm out here wondering why everyone is sucking off the Cursor ide while it's struggling with my simple react codebase. So much empty hype around AI it's insane

  • @LEONARDO-xs2ke
    @LEONARDO-xs2ke 7 днів тому +16

    Bro make a video on how are you so productive

    • @VeaceslavBARBARII
      @VeaceslavBARBARII 6 днів тому +1

      Just start coding when you're seven years old and you're good to go.

    • @furycorp
      @furycorp 6 днів тому +1

      He has a team. He just hired Ben Davis who is insanely productive.

  • @perguth
    @perguth 6 днів тому +16

    I always suspected OpenAI to mine BTC in the background of the page or something 😂

  • @GigaSimp
    @GigaSimp 5 днів тому +1

    Another scam from mister charlatan Altman. Before they program gpt to change the reply, here is what I got by asking ChatGPT o3-mini "Which model am I talking to? " It replied : Let's break down the answer into simple points:
    - **I am GPT-4:**
    I run on the GPT-4 architecture. That's my main model.
    - **About "o3-mini":**
    There is no version called "o3-mini" in my design.
    My technology is entirely based on GPT-4.
    So, to answer your question directly: No, I'm not "o3-mini." I'm GPT-4.

  • @TheHronar
    @TheHronar 6 днів тому +7

    It's not weird that o3 Mini costs less per token than 4o. It's probably the equivalent of 4o mini but with reasoning capabilities.
    It ultimately spits out MUCH more tokens per prompt and you're still paying for them even if you don't see them over the API.

  • @myintmaunmaun
    @myintmaunmaun 6 днів тому +6

    "After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."

  • @tongducthanhnam
    @tongducthanhnam 6 днів тому +4

    But does that mean they officially reconized DeepSeek good 😙.

  • @keyser021
    @keyser021 5 днів тому

    Open source wins all day every day. Sam can only keep the Potemkin Village standing for so long before all of his skeletons come flying out of the closet.

  • @Versus-A
    @Versus-A 6 днів тому +11

    Strange how I've found sonnet 3.5 to still be the best at my coding tasks

    • @DanielMetille
      @DanielMetille 6 днів тому

      Maybe others are good in Python and React, but when coming to code for some less popular language as Drupal/PHP or SwiftUI, Claude still impress me.

  • @nikomancer69
    @nikomancer69 4 дні тому

    Honestly, if I were building an app right now, it would take a huge, huge leap in capabilities for me to even consider any Open AI (or Google, or Antrhopic) model. The cost-effectiveness, the ability to self-host, the ability to apply LORA to fine tune for specific capabilities; these are high-value things when you're building an app and it would take a substantial increase in capabilities from Open AI before I would even start to debate giving them up.

  • @df_all
    @df_all 6 днів тому +5

    What’s with the fake tweet thumbnail?

  • @WiseWeeabo
    @WiseWeeabo 5 днів тому

    short answer: yes, it really beats deepseek.
    I personally haven't bumped into any of Theo's issues, I feel sorry for him.

  • @MrEnriqueag
    @MrEnriqueag 6 днів тому +11

    If you give an LLM an open ended problem with tons of requirements they will miss something unless you prompt them super specifically
    Reasoning models are just really good at prompting themselves very specifically

    • @tubeyou6794
      @tubeyou6794 6 днів тому +1

      I am a genius and I write amazing prompts. That’s why I actually don’t use the oh one model. I use the old GPT four model and it works better for me because the GPT four actually gives me precise what I want the oh one things instead of me, which is of course worse because I’m super superior to you or any other human being.

    • @MrEnriqueag
      @MrEnriqueag 6 днів тому

      @tubeyou6794
      I'm not sure if you are implying I said anything of the sort. Or you are actually saying that you do that which wouldn't be the smartest thing to do.
      But technically if you broke down the problem into very clear step by step instructions you realize how that would be easier for the AI no?
      If you want to test this, you can use any reasoning model that actually gives you all the "thinking" part.
      Ask a question to V3 that it can't do consistently but R1 can.
      Now ask it to R1, then take the content inside the think tags and dive it with the question to V3
      Watch V3 get the question right.
      But if you want to do it even better, break down the problem in tiny steps yourself, and ask it to do the steps 1 by 1 and you'll probably do a better job than R1

  • @OhsoLosoo
    @OhsoLosoo 6 днів тому

    I didn’t know that Claude was so expensive. We use it at work & it honestly does so well that we always assumed they updated it to be a reasoning model, but after watching this video I will be suggesting several changes

  • @Jenkkimie
    @Jenkkimie 6 днів тому +1

    Well all grads and current students careers just went up in flames. What a good prank it was.

  • @Alistair1217
    @Alistair1217 4 дні тому

    Interestingly, OpenAI O3's reasoning process inevitably shows Chinese thinking process, which looks like a trick that is not hidden well.

  • @VikasKapadiya1993
    @VikasKapadiya1993 6 днів тому +1

    Have you tried with new gemini thinking model?

  • @ItsNicolau
    @ItsNicolau 6 днів тому

    Your videos taught me so much that I know almost nothing about. Thank you, Theo

  • @bigmedge
    @bigmedge 5 днів тому

    @ 6:24, when GPT cut off that last response after “Setting the parameter” paragraph , why didn’t you then just ask something along the lines of “your response got cut off after (copy/paste last paragraph). Continue from there.”
    You’d have had the ability to objectively evaluate o3 mini’s coding capabilities if you had written a prompt like that b/c that would’ve generated a final stable version of the script

  • @VV-nw4cz
    @VV-nw4cz 6 днів тому +1

    If GPT agents will replace all developers, why did not all those companies fix their UI yet?

  • @tjblackman08
    @tjblackman08 6 днів тому

    T3 Chat needs a toggle for "Just answer, don't explain the answer." and it should default to on.

  • @frosty129
    @frosty129 2 дні тому

    Can you open source a version of T3chat, or some boilerplate that uses the same stack? I am curious how you’ve married nextjs and react router, counter to what everyone says you should do, yet you seem to be getting a good result.

  • @misterJBD
    @misterJBD 6 днів тому +4

    Claude just shows that everything that Amazon touches (or invests in) end up being promising and then they suck.

    • @RomeTWguy
      @RomeTWguy 6 днів тому

      Its still the best model

    • @Frozander
      @Frozander 6 днів тому +3

      It is probably still the best non-reasoning model and it works the fastest.

    • @RomeTWguy
      @RomeTWguy 6 днів тому

      @ there is no such thing as a 'reasoning' model, that's just a marketing term

    • @wwkk4964
      @wwkk4964 6 днів тому

      ​@@RomeTWguybest model for normies. Nobody with a serious novel or hard problem is going to choose Claude, it just makes stuff up confidently because it can't reason.

    • @josephvictory9536
      @josephvictory9536 6 днів тому

      ​@@RomeTWguythen there is such a thing guy

  • @SjarMenace
    @SjarMenace 5 днів тому

    who else skips like crazy whe you hear 'this days sponsor' and dont listen to ads at all and is not affected by them?

  • @HillaryNamanya
    @HillaryNamanya 6 днів тому +1

    very sure deepseek is cooking r2 in silence. The next distillation will setback openAI. But completion is good for us consumers. Let them fight

    • @sasa-tg4od
      @sasa-tg4od 6 днів тому +1

      After testing, the 03 is now inferior to the r1.

    • @RedEyeLazer
      @RedEyeLazer 6 днів тому +1

      ​@@sasa-tg4odJUST SHUT UP!!!!!!!

  • @omarmady5582
    @omarmady5582 5 днів тому

    The pagination is weird indeed. In the network tab, you can see that AT LEAST 2 requests are made for each page, sometimes more are made.

  • @alexbowe3411
    @alexbowe3411 6 днів тому +1

    Did you try adding "Formatting re-enabled" on the first line of your developer message to re-enable Markdown?

  • @Schlafen-wx1kx
    @Schlafen-wx1kx 6 днів тому +1

    I just stumbled onto T3 today, and wanted to get signed up but its missing even basic functionality such as a system prompt? i understand you want to run lean and mean, but couldnt you stash it somewhere in advanced? And folders are a must if you are running 20 queries a day.

  • @YixuanLi-h3i
    @YixuanLi-h3i 4 дні тому

    I don't know why you should compare a shelled deepseek clone with the native deepseek.

  • @lost4468yt
    @lost4468yt 6 днів тому +1

    4o is still better for general knowledge, trivia, etc.

  • @SithLordBishop
    @SithLordBishop 7 днів тому +6

    reasonable dad jokes

    • @trappedcat3615
      @trappedcat3615 6 днів тому +1

      The word Reasonable came from teaching a son to eat by reading a story about a bowl. (Read-son-a-bowl)

  • @AlucardNoir
    @AlucardNoir 6 днів тому +3

    The 03 mini prices are either BS they use as a loss leader OR someone forgot to pull the plug on GPT4.

    • @lukasz96
      @lukasz96 6 днів тому +2

      For sure loss leader. They are in panic mode. Alas, R1 is still completelly free, so OpenAI can f off for all I care

    • @josjos1847
      @josjos1847 6 днів тому +1

      ​@@lukasz96 O3 is in the free tier too

  • @bladekiller2766
    @bladekiller2766 5 днів тому

    How are you so good at Advent of code to have pretty good timings?
    Do you have experience with Algorthmic Problems and Competitive Programming, or you are naturally extremely gifted?

  • @ishaat_plays
    @ishaat_plays 6 днів тому +1

    NVDIA vs AMD || Deep seek vs Open AI .... what a strange world we live in

  • @FRareDom
    @FRareDom 5 днів тому

    if it wasnt for deepseek, o3 mini probably wouldnt release for another year, exact thing they did with sora

  • @GhostHack_1
    @GhostHack_1 4 дні тому

    This is the beginning of the plateau. No increase in result accuracy, but making it cheaper. Wild to me that claude 3.5 is still superior to both r1 and o3 when it comes to coding lol

  • @mokoboko2482
    @mokoboko2482 6 днів тому +10

    Guys! What tier do I need to use o3-mini through the API?

  • @emirtunahanalim2748
    @emirtunahanalim2748 6 днів тому

    Google has been cooking with Gemini models recently and adding them to the exact comparison would be very nice

  • @spetz911
    @spetz911 6 днів тому +2

    Their UI is hilariously bad. I can’t agree more with that! 😊

  • @attentioncestpaslegal7847
    @attentioncestpaslegal7847 6 днів тому +2

    9:30 That was a really hard problem.

    • @josephvictory9536
      @josephvictory9536 6 днів тому

      3 medium problems layered.
      Its definitely hard. Thought it would be easier. But the trick is that its a combinations problem not a greedy problem.
      You can greedily get the combinations to reduce space. After that realization its kinda easy. Just a lot of writing. Incredibly fun problem.
      Had no idea recursive optimal pathways could be so different with such obvious and seemingly fixed optimal paths.

  • @Worldkiajoliet
    @Worldkiajoliet 6 днів тому

    I gave 03 mini functional, working, simple code to evaluate. It had improvement ideas that sounded fine, so I asked it to improve. It was like dealing with gpt 3 ...it broke the existing code and provided no updates. After 5 more prompting sessions it still could not even duplicate the existing code that worked. Not sure what the hype is yet. What may I be missing? Thanks

  • @daniellyons6269
    @daniellyons6269 6 днів тому

    15:36 I'm pretty sure that the reasoning UI given by OpenAI is actually gaslighting. The actual reasoning tokens are not exposed to us the user. Instead they have yet another process that is summarizing the reasoning in order to obfuscate their techniques.

  • @nathanbanks2354
    @nathanbanks2354 6 днів тому

    I've been using o3 occasionally, but I still like r1 more for most prompts. r1 tells you when your question is too large for the context window, o3-mini just forgets that you asked a question if it's before 5000 lines of code. o1 answers best.

  • @ПетроБойко-ц3б
    @ПетроБойко-ц3б 6 днів тому +1

    Agent Smith: ... The perfect world was a dream that your primitive cerebrum kept trying to wake up from. Which is why the Matrix was redesigned to this: the peak of your civilization. I say your civilization, because as soon as we started thinking for you it really became our civilization, which is of course what this is all about. (Matrix quotes)

  • @GonzaloGuevaraFreire
    @GonzaloGuevaraFreire 6 днів тому +5

    Ned Flanders sabe lo que dice.

  • @fredshum7521
    @fredshum7521 5 днів тому

    Obviously, why O3 can be out shortly after Deepseed with lower training cost ? O3 incorporated the key feature of Deepseed's code

  • @IvanBrandonOwonoMbarga
    @IvanBrandonOwonoMbarga 6 днів тому

    We should just build a decoder model to convert that fun formatting of he’s (r1) , to markdown or any other formatting. I am pretty sure any basic gpt can already decode in such a way.

  • @shining_cross
    @shining_cross 4 дні тому

    now closed source ai will copy what deepseek has done because they can access it because deepseek is completely open source, then will sell it to the public

  • @divinelyindifferent
    @divinelyindifferent 6 днів тому +2

    What happens when China develops and releases a free version of Sora?

    • @stephenlflf3871
      @stephenlflf3871 6 днів тому +2

      😮

    • @vaibhav5783
      @vaibhav5783 6 днів тому

      server cost will be too much. It won't be free. But it will be open source we can run on our local machine

    • @sasa-tg4od
      @sasa-tg4od 6 днів тому +1

      The Chinese Sora equivalents, Kling and MinMax, far surpass America's Sora in capability. Though not free to use, the United States has already lost ground in this domain of technological competition.

    • @divinelyindifferent
      @divinelyindifferent 6 днів тому

      @ Thank you for letting us know! Very interesting.

  • @alexleo4863
    @alexleo4863 6 днів тому +1

    Let them keep their expensive models to themselves

  • @dibu28
    @dibu28 6 днів тому

    DeepSeek R1 is now very slow. And DeepSeek R1 (Nitro) which is fast is $7 in $7 out.

  • @nikyabodigital
    @nikyabodigital 4 дні тому

    The real ranking.
    o3 mini - deepseekr1 - claude sonnet 3.5 - o1 - o1 mini - qwen 2.5. Used em all qwen is not there yet and gemini even the latest gemini 2 isnt even in the list its worse among ranking at the moment

  • @hendrx
    @hendrx 6 днів тому +5

    They can keep their closed source trash

  • @al2935
    @al2935 6 днів тому

    I tested it for about 4 hours yesterday and for now 01 Pro is just better due to being more compliant and on task when presented with long and complex prompt scripts and tasks. It's not so much about hallucinations at this point, it's more like it selectively ignores parts of the script, even with the most extreme reenforcement. Like, it understands the full context but will do what it wants past a certain point instead of following the full letter of what you're asking for it unless you go back to chunking your answer and go peacemeal.

  • @shahswatpandey5427
    @shahswatpandey5427 6 днів тому

    It's just my opinion,I feel that R1's answer after reasoning is better than o3-mini.
    LIke more detailed and structured

  • @slzzzzzzzz
    @slzzzzzzzz 6 днів тому

    I tested o3-mini (low) [free-tier] and Deepseek R1 on some math competitions. Deepseek R1 is able to solve many problems from the Chinese National High School Math League First Round, but fails miserably on the Second Round (harder problems). On the other hand, o3-mini (low) solves all problems from the Second Round @2 (those I threw to it), but fails on the National Team Selection Test (extremely hard problems). And o3-mini (low) is clearly faster than Deepseek R1. So at least for math, o3-mini (low) is better than Deepseek R1.

  • @ccyberhub
    @ccyberhub 6 днів тому

    Without deepseek o3 would cost $25 per million tokens

  • @ryanlee2091
    @ryanlee2091 6 днів тому

    Can it run locally on my off-grid base out in nowhere? No?
    Good bye. Hi this is your homie Tony from LCSign.

  • @sophieedel6324
    @sophieedel6324 6 днів тому

    Mistral Small 3 > DeepSeek. No normal user has any use for a highly censored model like DeepSeek that needs a giant server to even run it properly.

  • @seye46
    @seye46 6 днів тому

    Can someone tell me which one is better, or do they both have their own advantages?

  • @dltn42
    @dltn42 6 днів тому +2

    Still prefer DeepSeek

  • @jasonchang8601
    @jasonchang8601 6 днів тому

    Why would anyone in their right mind support that kind of scummy behavior when they could have released the cheaper option to begin with?

  • @kellyaquinastom
    @kellyaquinastom 3 дні тому

    Seems like we have to download and train our own. Prime agen is looking at this. Maybe internet of bugs could join. Like taking a 7 year old bright kids and slowly bringing him along. Clearly the way is for good programmers to pick a language like ziggy and group teach a new model. Lots of work.

  • @sebkeccu4546
    @sebkeccu4546 6 днів тому

    I find the naming of open ai extreemly confusing, you have three o3 models mini/medium/high but the mini also has 3 sub models: mini-low mini-medium and mini-high. So now when we are looking at benchmarks, we have no idea what the benchmarks refer to. Especially when you do like the video creator here, naming it "o3" , it is not clear anymore if we are still looking at the o3 mini models, and if yes, which of the sub models. Deepseek seem to be better then o3 mini-mini and mini-medium, but not mini-high (which is currently only for pro subscribers). But offcourse deepseek r1 can be downloaded, whereas the o3 models cannot be downloaded. And when we think of all the dowtime chatgpt has the last few, weeks. It becomes tempting to run it offline. Especially because with deepseek we can use PDF's, for some reason the o3 models don't support files.

  • @lyndonsimpson1056
    @lyndonsimpson1056 6 днів тому

    i think the formating outpout issue is a tell that this was not as polished as they wanted before release and did a rushed release following deepseek fallout. they never wanted to release this for free but have been forced into it. it's a lot cheaper for us but they are probably doing this at a big loss

  • @mdxggxek1909
    @mdxggxek1909 6 днів тому

    Instead of being confused why most of your costs are from claude, you could maybe just make out that claude is really well liked....
    The reason why claude is used so much, is because it is incredibly well aligned for programming. You really feel that they put a lot of effort in their rlhf for programming tasks and it works really well on cursor
    Deepseek r1 is though much better at reasoning about the code, but not that good in creating good nice code without significant prompting

  • @jonklaric
    @jonklaric 6 днів тому

    Do API users (or T3) have to pay for the tokens used in the weird formatting of outputs? Like those weird lines of dashes would presumably consume tokens despite having zero functional value in the output.

  • @benx1326
    @benx1326 6 днів тому

    o3 mini might be cheap but it generates lots of output token for reasoning deepseek v3 is the best compromise for generation

  • @KirowOnet
    @KirowOnet 6 днів тому

    In my test scenario o3-mini solved the problem fast, but R1 spent 10 minutes and gave me code that don’t work at all. All other models I tried also was not able to solve my test task. So o3-mini favorite for now. Have not tried o1 just in case.

  • @joshix833
    @joshix833 5 днів тому

    With o1 you are paying for output tokens that you don't get to see. That sounds like a scam to me

  • @legelf
    @legelf 6 днів тому

    calling deepseek dilluted from their own model and then dropping a model comparable to r1 for so cheap doesnt make sense at all, its like openai is completely falling apart out of desperation💀

  • @sasjadevries
    @sasjadevries 6 днів тому

    Why doesn't t3 chat have QWEN2.5? And why doesn't it have the qwen mini distilled versions of Deepseek?