o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

Wes Roth

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 31 січ 2025

КОМЕНТАРІ • 302

@albinobadguy 2 години тому ⁺⁶¹
This is the year of the Snake, after all.
@brjohow 39 хвилин тому
this will cure cancer and ageing , trust me bro, just dont optimize it and pay sam altman and ignore deepseek and free/open models.
@IvelLeCog 10 хвилин тому
Kinda profound perhaps?
@Penrose707 2 години тому ⁺³¹
It was fun watching your snake game get to almost superhuman level. Then to get o3 to build it's own functional machine learning model for the same game it created.... Damn Wes
@BruceWayne15325 3 години тому ⁺³⁵
I used it in real world coding today for about half a day. It seems to be incrementally better than o1 at coding. I had a coding problem that I ran into yesterday that o1 couldn't solve no matter how many times I prompted it. I gave it to o3-mini-high and it also struggled with it, but after about 3 tries it figured it out. I'm very happy with it. It's not there yet as far as something that I can rely on, but it's getting there.
@spinninglink 2 години тому ⁺¹⁵
Question is, could you solve it yourself? Or did you need o3 to solve it for you?
@hao21291 2 години тому
how is it compared to deepseek?(if you have tested it)
@autohmae 2 години тому
What was interesting, on the tweet of Cursor they had a small note at the bottom: what surprised us, developers told us they still prefer Sonnet
@telegrphavenuetv 2 години тому ⁺¹
How to get o3 mini high, only seeing o3mini
@coinholio470 Годину тому
I haven't tried o3 yet, was disappointed with o1 coding performance so have been back to using Claude Sonnet 3.5 since. Have you used Claude at all and if so how do you think it stacks up to o3-mini?
@RonBarrett1954 2 години тому ⁺¹⁰
First, the outtro was... AWESOME! Stay'in alive, baby!
Secondly, I'm definitely going to watch you video multiple times. There was so much there to grasp. Yes, for coding o3-mini high does feel like, "Whoa, wait, what!?!"
@Pure_Science_and_Technology 2 години тому ⁺²¹
I just put a single python file of 5,322 lines of code in high mode and had it explain the code and refactor it. wow! This is the best coding model hands down. And it has internet access so you can have it read updated api docs or anything else. It’s an available model in cursor now. 😮
@TheChromePoet 24 хвилини тому
I'm no coder or have any idea what you just said, but I'm pumped.
@dennisg.9785 Годину тому ⁺¹⁴
03 is incredible. It basically walked me step by step to install Python, then I asked it to make a checkers game... 1st try, and it works! It's definitely an amazing tool. What a wonderful world 😅
@brjohow 38 хвилин тому
and if it makes a bug in complicated code you wont have a clue how to fix it.
@De_Ramen_Shaman 29 хвилин тому ⁺⁴
@@brjohow found the guy that's getting replaced next
@bestofhacks1514 28 хвилин тому ⁺¹
@@brjohow Working backwards with a bug in some code is easier than writing it from scratch, this still greatly reduces barrier of entry. Also the first reaction to the layman running into a bug would be changing the prompt
@A.I.Amplify Годину тому ⁺⁷
You're explaining it perfectly. To be able to say something as simple as "create me a game that plays itself"... as well as the code writing itself next to perfect(up until a point obviously)... normal people can ask a a simple question... copy and paste a few things... and create something that most experienced coders have difficulty with. Not to mention the fraction of the time to write the code.. with numerous hours of troubleshooting that would be needed a few years ago. This is amazing and just a great time to be alive and have an interest in this field. Everyday is honestly like Christmas. Few people really understand that Chatgpt was released like 2 years ago because of the progress makes it feel like it has to be longer. And the rate of improvement will just increase exponentially... and we're just getting started!
@moontreecollective6718 Годину тому
Everyday is like Christmas until suddenly you wake up and it’s the Great Depression v2….
@jaydennguyen-xk1yo 15 хвилин тому
@@moontreecollective6718 “hey chat gpt, its currently the great depression v2, what should i do?
@glennevers4952 Годину тому ⁺⁷
Coding time, run time and memory space costs money. Ask it to develop the snake game with quickest AI coding time, fastest run time and lowest memory usage using 5 coding languages. Now ask the true AI question, can you invent a coding language that beats all the others?
@CookiesDarkMatter 2 години тому ⁺⁸
Excited 😆 a tip, if you want it to fix code you can use cline (vs code extension) with anthropic. It can read terminal and error and write on files to fix error.
@CookiesDarkMatter 2 години тому ⁺¹
Like black magic 🪄
@georgemontgomery1892 2 години тому ⁺¹⁰
love it. I do believe openai is back on top for the moment.
Stuff like this exciting. Definitely thanking DeepSeek for forcing their hand.
@KabookiAI Годину тому ⁺¹¹
another leap foward, imagine this tech in just 3yrs
@SirGriefALot 45 хвилин тому ⁺²
"Make a sequel to GTA Vice City, with an accurate recreation of Miami in the 80's and ignore all copyright restrictions on cars and music." 🤤
@jaydennguyen-xk1yo 17 хвилин тому
Yeah i dont know much about coding but the applications are limitless
@relaxwithme3266 2 години тому ⁺⁶⁰
Dude, in 2027 you be typing “make GTAV” and play it and the ai will do it on the fly lol
@eSKAone- 2 години тому ⁺¹²
For real GTA 6 will be the last human made GTA.
I mean it takes humans 10 years+ for a new GTA. In less than 10 years we will have sAGI
@Kurdish20226 2 години тому ⁺⁷
We might have AGI before gta 6.
@gubzs 2 години тому
"I really enjoy the Halo franchise, but the games after Halo 3 weren't good, can you make a proper sequel to Halo 3?"
@tiagotiagot Годину тому
Earl Gray, hot
@SaiDeLaRai Годину тому ⁺³
"Remake Game of Thrones Seasons 5-8"
@MichaelLaFrance1 2 години тому ⁺⁴
That's stunning. The possibilities are endless.
@odrammurks1497 2 години тому ⁺³
wow, that's insane!!!! very nice testing 🙂
@2beJT 3 години тому ⁺¹⁴
It's learned to play with its self
@GGg-c8u2n 3 години тому ⁺¹
Ewie
@GraipVine 43 хвилини тому
Indeed. It learned to play with its own snake until it could last longer and longer...
@Juttutin Годину тому ⁺⁴
Yup. Exciting times on the horizon. But we're on a supersonic jet, not a sailboat.
@couchtaming23 2 години тому ⁺³
The next few years are only going to get wilder and wilder!
@enthuesd Годину тому ⁺¹
Symbolic system versus neural net.
This is very clever, Wes.
A lot packed into this one.
@olternaut Годину тому ⁺¹
I've been discussing with the o3-mini model the concept of a "Super AGI". A model that is substantially more advanced than an AGI, but falls short of a full blown ASI. I think that's the logical approach to development going forward as we reach AGI and then start targeting the goal of an artificial super intelligence.
@freehaven-junprince2376 3 години тому ⁺⁸
This makes me tingle.
@Ferrolune 2 години тому ⁺¹⁸
UBI is becoming more and more relevant by the day, damn...
@brjohow 38 хвилин тому
not even close. openai, despite the deepseek fiasco, still has job openings for people who can actually code.
@electricpaper269 27 хвилин тому ⁺²
lump sum of labor fallacy
@jameslincs 10 хвилин тому ⁺¹
UBI will never be U, you won’t qualify because you didn’t vote the right way 😢
@ZappyOh 4 хвилини тому
Culling is more likely.
@smanqele Годину тому
Man what a trip this presentation is. Totally worth it 🙏
@Diaspora_UA 3 години тому ⁺⁵
Interesting.Thank you !
@here-ethereal Годину тому ⁺¹
this is really awesome! my first session using o3 has been very impressive as well.
you ask for other ideas to play with, I strongly encourage ARC Prize type puzzles. it's a fascinating challenge!
@superfliping Годину тому
Great to use your same test giving you insight on how much its grown. Thank you Wes 😊
@ccdj35 2 години тому ⁺⁴
The first thing I did was to make a game. While the code worked 100%, it didn't understand the instructions very well. The reason your snake game test worked so well may be due to it already trained a lot with that type of queries.
@gregorykarsten7350 Годину тому ⁺¹
Agree, this is major step up
@Craznar Хвилина тому
I just finished getting to to write some complex SQL Server stored procedures for a job I'm doing.
I chatted to it for a bit, asked it some questions, clarified some things - and now I have a solid stored procedure I can now use in my Delphi code.
Around 8 hours work in around 20 minutes.
It even came up with a better approach after it said there were some issues and I offered it that option.
@adarmawan1977 3 години тому ⁺⁵
Amazing content🎉 very interesting.
@agenticmark Годину тому
pretty fucking awesome to see you training models man. o3 is a game changer.
@jantuitman 15 хвилин тому ⁺¹
First time I was really really impressed by a Wes Roth video. I am curious how it will do with coding challenges where the idea is not in the common domain but a new idea and you have to explain the idea to the AI
@Di5functi0n3l_playp3n Годину тому
Great news. Ty. Just what I needed to hear.
@tvwithtiffani 48 хвилин тому
You've gotten me excited 🤸‍♀️ now i cannot sleep for thinking about all of the possibilities
@uss9f 2 години тому ⁺³
AWESOME VID THANKS
@evdm7482 Годину тому
This growth reminds me of Siri whose response is still: ‘I found this on the web for “Siri you’re the worst WTF, how did you even understand that’’
@Holphana 2 хвилини тому
a 3D maze that generates infinitely. Low poly assets. A physics engine isn't necessarily required. It could just generate an emulation of a maze.
@eSKAone- 2 години тому ⁺¹
The "What's next?" moment is pretty sick I must say
@RevealAI-101 2 години тому ⁺¹
Oh my word Wes, you are literally giddy with excitement
😊
@wwkk4964 2 години тому
That was an excellent review Wes!
@henrischomäcker 11 хвилин тому
This is really Impressive!
@jchastain789 Годину тому
Love the vids as usual mane
@relaxedrelaxed 3 години тому ⁺³
you're the best, Wes!! TY from husband and I!
@Jacobk-g7r 2 години тому ⁺³
2:11 i just had a wild thought. Anybody could use this, upload their data such as drone specs and parameters and have it train in the simulation or it could learn how to operate models of cars or how to build models to operate devices. Like build specific models and then upload them and then have autonomous drones but upload something like they have to obey non violent or harmful requests from the admin or something. I’m thinking next level. We all build models and create a library for everyone. Open source model library, the library of Alexander lol
@k1m198 2 години тому
THANK YOU for doing the AI playing game test! I've been working with o1 pro on a shining force AI! Looks about to get supercharged! You rock Wes!!
@blijebij Годину тому
Loved this so much your absolutely not insane! o3 mini is the first model I am impressed with! Great promise for the future.
@vmb326 40 хвилин тому
Wes - good NextGen quote there my hat is off to you 😂
@Spark877 Годину тому
This was great!
@wongjimmy9195 2 години тому ⁺⁴
i feel it's not a big breakthrough(since they have deployed big resource include people). The main value things are open source (how to do this) to cause or make everyone (they are want to involve in ai business) to have chance to get this achievement or they want things. forget it what ever things stronger or not / whether better than other one.
@Zerobytexai 3 години тому ⁺⁹³
Stop doing the snake game because every top tier model can do that now. It's not even a test. Instead you need to start asking it to create a super Mario game from scratch and have it play it. Super Mario is just enough of a challenge to push the top tier models to their limits but still make it possible.
My UA-cam channel goes DEEP into this.
@jansenncuber8009 3 години тому ⁺¹²
Agree, it’s more important than ever to test models on things that specifically are not in their training data, rather than one of the most prevalent bite sized “tests” in coding
@Zerobytexai 3 години тому ⁺⁸
@@jansenncuber8009Exactly, and I use super Mario as an example because some models are able to pull it off slightly but just not enough, and deepseek was able to do it barely. Super Mario in python is right at the brink of challenge for top tier model capabilities right now. I'm not sure why we are sticking to the snake game, it's too simple now. As these models become more advanced, then our testing methods should becone more advanced.
@jeltoninc.8542 2 години тому ⁺⁷
I want it to make Quake 3 Arena
@aaronhhill 2 години тому ⁺⁵
It's a baseline. Everybody has done it, so it becomes the perfect measurement of progress. Like prompting Will Smith Eating Spaghetti.
@scroopynooperz9051 2 години тому ⁺³
Lol I'd be happy with Street fighter 2: champion edition 😂
@17cmmittlererminenwerfer81 2 години тому ⁺²
It's means it is. Its is the possessive form of it.
@vmb326 14 хвилин тому
IDEA => "AI in a Sandbox" is a self-learning environment where O3 Mini generates, executes, and refines AI models in real time. Running in a safe local sandbox, it iterates on tasks like gameplay, navigation, and problem-solving, using feedback loops, reinforcement learning, and a visual dashboard to autonomously improve AI performance.
@kostailijev7489 Годину тому
Ok, congrats, as a layman, I totally enjoyed this video!
@badca52 Годину тому
"pretty good script right?"... after a two sentence prompt. Yeah.. yeah Wes, that's pretty fuckin good!
@ricka4678 3 години тому ⁺²
incredible times we live in
@Sedokun 2 години тому ⁺⁶
30:32 - "I want to..."
- "No, you don't"
It's amazing and depressing at the same time. A lot of discoveries were made by doing stuff you're not supposed to.
@JanKowalski-nh3gi Годину тому
For the last 2 years i see the snake test, does that mean we still testing this thing with snake in 2030? Amazing progress!
@mrgoober6320 Годину тому
There are societies (on Earth right now) that are going to leapfrog over a thousand years of technological progress, straight to task-capable AI. We're going to witness what is effectively a violation of Starfleet's prime directive. It was bad enough when we were introducing firearms and cell phones. Just fascinating to consider.
@Wanderer2035 24 секунди тому
Everytime they release a new model, 100’s a thousands of people around the world will get layed off
@dadadadada17 35 хвилин тому ⁺²
I'm sure the american administration is working hard to ensure that every engineer gets his UBI lmao
@couchtaming23 3 години тому ⁺³
We're just lifting off-this rocket launch is only the beginning, and o3 is just the first small step.
@Juttutin Годину тому
By o5, I expect to be able to train an AI to just comment on my behalf on UA-cam. Think of the time savings!
And the lack of going back to edit or the typo!
@couchtaming23 17 хвилин тому
@@Juttutin maybe in 2025 they can upgrade to o6... Really don't know.
@sqwert654 Годину тому ⁺¹
I'm keen on using learning models in my game to to create emergant gameplay. After each game, a shooter say the gameplay is accessed, then the npcs get trained for the next game.
@leafdriving 2 години тому ⁺²
The Game "Asteroids" - but 1 vs 1 - It can train on itself, or you - Actual asteroids optional - include space ship enemies.... like the original - that fits your intelligence window
@peace5850 Годину тому
Spacewar!
@mihirvd01 Годину тому
WHAT A TIME TO BE ALIVE !!! 🥺 ALL HAIL THE "REAL" INTELLIGENCE OVERLORDS !!!
@patrickwhite9902 Хвилина тому
Wes i used this model today to build a novel ANN and train itself. Pretty much single shot, training itself right now using a teacher model. Lets see how it pans out.
@mematron 2 години тому ⁺²
STAYIN' ALIVE!
@darklodus99 Годину тому
Oh man, I use Asteroids as a model test instead of snake, just seems like it is more involved. I got mind blown just from o3 making the game. Then I saw your video and said to myself, naa...I gotta try this....I had it make a neural net AI same as you in your video. MY GOD! I just had it do a 1k training run and then had it auto play Asteroids, this is just NUTZ!
@alexanderpoplawski577 Годину тому
The debugger from Turbo C back in the 90's had a fake option: debug and find error, where it put out some response like, dream on. This now has become a reality.
@ericlori8231 3 години тому
right on im working on it now and I have a few ideas
for learning
@SFJayAnt 3 години тому ⁺⁵
Wow!
@GospelProgressionsUniversity 3 години тому ⁺³
Hey Wes, what happened to your intro music? That was a whole vibe.
@WesRoth 3 години тому ⁺⁷
I know!
I loved that, got hit with a copyright, even though I had rights to it :(
I need to generate my own intro with AI music, I think....
@frogz 2 години тому ⁺¹
@@WesRoth seriously, i am more and more amazed what suno and other ai music al gore's can do
@Gabebox 2 години тому ⁺¹
@@frogz Can you give me a brief overview of where to look for interesting information/results out of audio models?
@Zanthous_ 34 хвилини тому
A test you could do in the future would be something like the video "AI Learns to Play SUIKA GAME" if you are trying to get it to build a more sophisticated base game that is still reasonable to do for a future model. It's a step up in complexity from snake and requires physics simulation. For a new game developer it should be easy to make still, and it might still generate okay search traffic
@uw10isplaya 2 години тому
Bit delayed (from 2 videos ago you mentioned this), but thought you had an interesting question (and the answer is interesting) about Deepseek v3/r1, and the relationship between the two:
Deepseek v3 created -> r1 "reasoning" process creates a reasoning model based off v3 -> r1 gets incorporated back into v3's architecture as an "expert" in math/coding (or any prompt that is classified at the orchestrator layer to benefit from the reasoning process) in its MoE architecture.
It's basically gpt4o and o1 in the same architecture. We should probably expect similar from OpenAI soon, unless they think having multiple standalone models gives off the vibe of a better/more robust product line.
@pikachu-mx6hi 2 години тому
been using o3-mini-high today.. absolutely like like a beast
@RockyFretz Годину тому
Awesome!
@nightcrows787 2 години тому
Hrmm let me go check this thing out ;-)
Thank you for your video.
@phizc 39 хвилин тому
22:19 We've had AI/computer tools that can play chess better than any human for quite some time now. Chess tournaments haven't gone away.
@tgray1 2 години тому
for visuals, i think some kind of 3d game that shows ghost laps of all the training runs would be sick
@NoxAllan 3 години тому ⁺¹
Woah, alrighty, AGI in ten years time😂
@TheChromePoet 23 хвилини тому
If it wasn't for DeepSeek we wouldn't have this model so soon.
@imqqmi 6 хвилин тому
A lunar landing game, like the one from Dominic Doty, where you have to program your own auto pilot. I think this would be really interesting for AI to solve and it's visually satisfying to see.
Without AI I got it to a point it could land even with quite an extreme initial velocity, angular momentum etc. an be efficient with fuel.
@Shy--Tsunami 2 години тому
Very very crazy!
Im saving up money to get back into paying for the skool community membership. Im really hoping to get help with a TTRPG ai that will help DMs get game content ready, but also act as a dm for players who dont have a group and want to play solo in whatever world they can come up with, with the help of the TTGM
@OccultDemonCassette Годину тому
I wish they'd release a model that was actually decent at creative writing. Something that would avoid using the word "tapestry" in every other output would be nice.
@peace5850 Годину тому
Yes, we really need to delve into this problem!
@Jacobk-g7r 58 хвилин тому
Just had a better idea than anything i might have had previously, we all use this ai to fabricate models related to tools, household appliances, vehicles, games, unreal engine or other game engine variants, and make a library for everyone and free.
More ideas, farm tools, video designers, i mean imagine a model or something needed to be done. Deciphering languages and sounds and being able to communicate with animals would be cool. And to hack our own genetics and use neuralink and editing tools to communicate with the body. So many potentials. A library of models would be amazing because it wouldn’t need any specific app to run and we could share the files online or maybe ask Nvidia to do this and it could benefit everyone. Imagine the world advancing instead of being stun locked by money and greedy people with low ambitions.
@MCA5EY 2 години тому
have it build a maze container and do a fluid sim filling it up, maybe add rotation, or fill it with multiple colored fluids and have them mix randomly in the maze as it fills and rotates
@paulharding1172 2 години тому
Build a Donkey Kong type game. Where AI Super Intelligence throws Nvidia 5090s at Ilya Sutskever as he tries to rescue Sam Altman.
@dannyzaze9126 Годину тому ⁺¹
I am fairly certain that a few of us would be really interested to see this unfold layered over Minecraft. If it could program In Java. Not sure? But I assume it can.This would be great fun to watch.🤔
@dannyzaze9126 Годину тому ⁺¹
It could populate the world with its own AI agents. Learn to mine and craft. It could populate the world with its own horror MOBS, creating change un population density. And as this unfolds generationally, add rewards to incentivize expertise and invention. maybe🤔
@VictorGallagherCarvings Годину тому
Just think, in the future we wont have to think anymore.
@Crittek 3 години тому ⁺¹
My dream is that co-pilot is my real time AI desktop assistant and can help do tasks in any software by literally just reading the documentation (if it hasn’t already) That horizon seems to always get closer.
@DJRYGAR1 3 години тому
you dream about being jobless?
@memegazer 2 години тому ⁺¹
@@DJRYGAR1
maybe the work for themselves so they are not worried about getting fired?
@DJRYGAR1 2 години тому
@ if AI is able to read documentation and do the task, it means there is no work. Does not matter that work was outsourced to someone who's self-employed. It disappears.
@memegazer 2 години тому ⁺¹
@
if the finished product is valuable then it stands to reason that you could still sell it
@mikea3076 2 години тому
I was just thinking during the video that the current models just start trying to do what is asked and may not give you what you want in cases where a human professional might use their knowledge to ask for clarification on what you actually want before starting the task. You mention the same thing at the end. This would be next level if they can do this.
@robertkeyes258 25 хвилин тому
How about - take a photo of night sky and have it figure out the location it was taken at.
@vslaykovsky 2 години тому
When I see that "AGI" is praised for creating snake game without bugs, I understand that software engineer profiession is secured, no need to worry.
@Juttutin Годину тому
Wes, yes I share your excitement, but you always give the AI the benefit of the _doubt_ when it comes to the stuff that might be called 'common sense'.
In this case, it was the green fruit thing. There is this gaping hole in _all_ the models we've seen so far, that really doesn't matter when there is a human wrangler to provide this thing, that would prevent any child trying to get better at snake by chasing it's tail in circles.
I'd have preferred you told it to realise what it was doing, and see what resolution it adopted, for the sake of the experiment, rather than the incredibly human-feeling solution you, its human, had to give it (i.e. no green fruit).
But regardless of all that, an extraordinary demonstration of the cusp we live on. Even if every AI still needs a human to fill in the _doubt_ gap, that doesn't limit the productivity gains that seem to be almost here.
@FuZZbaLLbee 3 години тому
Soon : Generating RL gyms to create thought chains to train the model on.
Once an AI gets the ability to train itself without human intervention it will get very interesting.
From what I understand AI is already very good at generating reward functions
@DK-yz9xk Годину тому
I love deepseek, help making open ai a little more open 😂
@mrd6869 Годину тому
You want visual, have this thing paired up with Operator and have it complete actions
in your browser.
@RockPowerUSA Годину тому
I'm an old timer, I'm sticking with Deepseek. I think it really understands me r most, and it has the best sense of humor, when you ask it to respond with humor in its thinking and in its answer. I'm going with the Old Reliable Deepseek for RockPower. I can't keep changing to these wannabes that will keep popping up because that's the cool part of free coding and free sourcing and free viewing and free love.❤
@peace5850 Годину тому
Lol. Seriously awesome troll. Old. Reliable. Wannabes. Lots of funny word choices to trigger people.
@MojaveHigh Годину тому
You should go back to past episodes and pull out all of the snake game making segments and put them into a single video to show progress.
@Jacobk-g7r 51 хвилина тому
10:41 imagine every cell phone has an ai model specific to the phone and can use WiFi and Bluetooth to communicate with other devices and communicate with us and be a bridge. Imagine the ai detects and sees using WiFi and could assist like in video games. Seriously lmao. I’m geeking over here bro. Seriously let that image sink in. Destiny, ai companion that is specific to that ghost. Cellphone can have model similar. WiFi vision and scanning capabilities would be on par with gaming maybe better or worse depending how we scan. If we use an ai model like the mini itself then it can generate links between or bridges that connect to and learn on the fly. It’s actually a dream come true and it’s possible. It could connect to cars and computers and all sorts of things. It could communicate with other devices and models and link to operate and share data. Like the car already has a model for operating and stuff but your model would connect and they would temporarily merge and then the companion would disconnect and link broken or separation function to decouple without breaking or corrupting like the usbs and data drives.
@sqwert654 Годину тому
The AI on my 4 player Mahjong game on Steam is only 40lines of C#. Don't know if machine learning would play better. But it would be a fun project.
@JeremyTuck-r5k 37 хвилин тому
What scares me is this it what UA-camrs get access to. Can you imagine what DARPA has?

Наступне

Автоматичне відтворення

o3-mini is really good (but does it beat deepseek?)