Open Source "Thinking" Models Are Catching Up To OpenAI o1 Already...
Вставка
- Опубліковано 11 лют 2025
- Try Mammouth now for only $10/mo! mammouth.ai
With the release of the full o1 and o3 model, this video has unfortunately already been outdated! (I am too slow as usual oops) But these are still some really good models which you should keep your eyes on, cuz they are gonna be the ones you are able to use locally in the future ;)
Check out my newsletter!
mail.bycloud.ai/
QwQ
[Blog] qwenlm.github....
DeepSeek R1
[Chat] chat.deepseek....
[X Thread] x.com/deepseek...
LLMs Do Not Think Step-by-step in Implicit Reasoning
[Paper] arxiv.org/abs/...
LlaVa-CoT
[Paper] arxiv.org/abs/...
Marco-o1
[Paper] arxiv.org/abs/...
O1 Replication Journey: A Strategic Progress Report -- Part 1
[Paper] arxiv.org/abs/...
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
[Paper] arxiv.org/abs/...
This video is supported by the kind Patrons & UA-cam Members:
🙏Andrew Lescelius, Ben Shaener, Chris LeDoux, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Marcelo Ferreira, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz, jiye, Anushka, Henrik Sundt, Julian Aßmann, Raffay Rana, Thomas Lin, Sid_Cypher, Mark Buckler, Kevin Tai, NO U, Gonzalo Fidalgo, Igor Alvarez, Alon Pluda, Clément Veyssière, Sander Zwaenepoel, etrotta
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Profile & Banner Art] / pygm7
[Video Editor] @bhargavesque
[Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx
[Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190
[Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7
[Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9
[Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN
[Ko-fi] ko-fi.com/bycl...
Try Mammouth now for only $10/mo! mammouth.ai
With the release of the full o1 and o3 model, this video has unfortunately already been outdated! (I am too slow as usual oops) But these are still some really good models which you should keep your eyes on, cuz they are gonna be the ones you are able to use locally in the future ;)
I am also making a summary on o3, o1 full/12 days of OpenAI, so stay tuned!
can't wait for the o3 video
Forgot to pin it lol 😅
The service looks nice, but capped 3 uses of o1 per day on 10 euro/month tier is a deal-breaker. I'd much rather have a per-week amount of uses like in chatgpt, because 99% of the time when you need advanced reasoning models (or any models really), you burn through your uses with short bursts of multiple questions. When I do something, rarely do I have the liberty to wait until tomorrow
Otherwise the 10/20 euro a month tier looks nice
@@elgatodelamuerteThere is a 10 euro tier?
I checked out the mammouth which seems fair enough except from the fact that it has o1 - preview and not full o1 so I am waiting for it to make an upgrade.
Comments owe you an apology
DUDE, you are CRIMINALLY underrated.
How long till deepseek replicates o3 you recon?
deepseek is a joke
Not very long actually, they just have to use test time training at scale.
Deepseek failed to make me a snake game in python with deep thinking enabled 😂😂😂
By the time it replicates o3, we'll already have o5 or o7.
@@ThreeChethat's not a reason to not be excited about it?
I like how this video is just outdated immediately after o3 announcement
o3 is a sham, not impressed, you should be glad openai don't have good models and never get agi
There is a gap between o1 and o3 for sure , not sure o1 pro is not even mentioned , should we expect an o3 pro ?! 100% at the gpq.a diamond benchmark already ?!
@@plouismarie It's only up from here
@@fontenbleau Did you take one look at how it performed compared to other models
@@joonpk867 i don't need, i'm testing them, do you know why they don't disclose nasty stuff? There's no such thing as helpful assistant outside filters.
I'm surprised by the results you got on the September 12th to November 27th question. I tried it on o1-mini, 4o, 3.5 Sonnet, and they all got it right the first time. I then tried the exact same prompt locally with llama3.2 and qwq, llama3.2 got it right but qwq absolutely lost its freaking marbles. It produced a 127-line answer (6,580 characters), doubting itself over and over and trying to get at the answer in lots of different ways: "But wait, let me double-check", "Wait, perhaps using the day-of-year method would be better", "Wait, but earlier I got 77 days. There's a discrepancy here." It tried maybe 7 different techniques like counting over an entire year and removing 365 days. It eventually landed on 77 but did notice that it depended on whether it's inclusive. Fascinating.
overthinking final boss
Overthinking and have hard time figure out his own mistake, i also encounter when they make mistake on algabric manipulation step to find g(x) from (fog) with respective f(x).
If first you don't notice it tell them to do the right step and they will get it, but yeah at that point it just the same as using intuition model lol.
@@onlyms4693 It's interesting you mention this because I've also had trouble with function composition being recognized by LLMs. Because the use of ∘ as a symbol is often difficult to keep when using an LLM, we use the letter `o` instead and that seems to trigger its own set of problems. I've found that I had to be explicit about the notation each time, like start my prompt with "Consider the notation `fog` as the composition of functions `f` and `g`" - that can sometimes help. Sometimes they don't even understand `∘` or use some other unicode character… overall, consistent and predictable math notation is not quite there yet, it seems. Have you tried different approaches to make sure it understands you? And composition is just one topic, there are many more where it can be difficult to convey the exact meaning.
you know that ai is moving fast when your video is majorly ultra outdated by the time it even comes out
Exactly
Literally
O3 doesnt outdate this video, if thats what you mean
It really is not..What openAI has in works , other companies have as well.Besides its google gemini models that are SOA now
Savage
@@TheLastElderDragonthey probably do, but none of the competitors have released anything
liked for the sponsor. Finally something that is useful.
Qwen is insane locally. Can’t wait to try deepseek R1 😊
The smallest qwen is insanely bad. Since that's all I can run locally I'll stick to 4o
Deepseek v3
We can't expect open-weight models to compete forever on consumer hardware, but thats okay, we're already at a great point.
Companies can release open weight models that can only be run by companies. LLama-405b is open weights.
@ccash3290 Yes, that's what I'm saying. While NPUs, more VRAM, and eventually different model architectures maybe could lessen the gap. I have a hard time believing we won't always be playing catch up at best.
jewvidia mindset
@@tetros5265we will always be playing catch up because AI is at this point only limited by hardware. Issues related to training data are now overcome by increasing the computing power available to AI
I won't be happy until I can run an ASI on my 15yo PC.
Love your videos man
Your test of asking the number of days between two dates was a poor test because it doesn’t involve the type of abilities that o1 was specifically designed to be good at. o1 is a reasoning model, and counting is not reasoning. I see so many videos and tweets like this-like the ever-present “how many r’s are in strawberry” being used to complete dismiss a new model, when meanwhile that model can perform incredible feats of coding, math, writing, or reasoning. If you actually want to properly evaluate a model before discussing it, then use proper tests.
Its really strange how someone can have a revolutionary idea, and then the moment they release it then that work is able to be picked apart and grown on so rapidly. The world just starts to move your idea into maximum speed of development and before you know it the original creator of the idea falls into obscurity. Steve jobs with the iPhone, Ivan Sutherland with VR, Sam Altman with OpenAi is heading that way. The first isn't always the best and that is how life is supposed to be. sparks of genius, clouded by sparks of efficiency, That's the industrial era for you.
man if i had money id pay to speed that process up. sam altman needs to be forgotten about and made redundant like yesterday.
a lot of this is because first movers stop thinking about innovating and instead start to worry about how to protect their moat, this already happened with sora and now its lackluster when it would've been groundbreaking last year
Sam altman didn't have a revolutionary idea. We were studying AI in the 80s
Altman really isn't that revolutionary. A good marketer maybe and don't get me wrong that is a skill. But I don't really know what you are referring to. We have been talking about AI since the 80s and the base theory was also established then. And Altman has little to do with machine learning let alone with developing the concept of transformers. Both of those where mostly done by google.
The iPhone also isn't revolutionary for the reasons we remember. Sure it was the first to introduce multitouch but that is far less revolutionary than introducing touchscreens or apps or any of the modern features we assume are part of a "smartphone". Jobs was a visionary there is no denying that but his biggest innovation is introducing the concept of a 'visionary tech CEO' who then markets the shit out of this non cost effective products so that they can mass produce and actually become feasible and profitable.
I’m sorry, did you just say that Steve Jobs and the iPhone fell into obscurity? Did you really mean to say that, or was this a typo of some kind? Also, Altman and OpenAI are nowhere near falling into obscurity. While the rest of the world scrambles to catch up to o1-preview, OpenAI has released o1 full, and o1 pro, and just previewed o3 which completely destroyed every other model in existence on all the most important benchmarks we have. Now the world will start trying to catch up to o3, and by the time they do OpenAI will be releasing o4, or o5, or something new entirely. I really don’t understand how people can have these kinds of takes.
R1
Love your channel
Thanks for the great content 🎉
LLaVA-CoT's stages really reminded me of the OODA loop. Interesting.
If someone is basing their product only on copying someone else then they will never 'catch up' they will always be one step behind at least.
its open source weido
its been too long, your analyses and jokes are too good to pass up when o3 has been announced
2:20, doesn't 4o have tools and code execution to do it instead?
yeah
I love your videos so much
O3 is already a thing I’m pretty sure Sam Altman is still ahead
it was literally announced yesterday, it is practically AGI unless they shift the goalposts again.
@@jwilliamcaseit's not 😂 or Elon with all his Starships would be obsolete
@@jwilliamcase its nowhere near AGI, because the very simple goalpost that was set a very, very long time ago has yet to even be approached. The hint is in the name; General. It is still just a language model and very little more. But it's definitely closer
Even if it is AGI, it costs $6000 per prompt💀
there is no moat
The number of days from Sep 12 to Nov 27 is 77, inclusive.
The number of days between Sep 12 and Nov 27 is 75, exclusive.
I’m not sure how you counted
GPT-4o keeps counting exclusive-inclusive, even though it says it’s counting inclusive, so it’s wrong. You can make your question clearer by saying inclusive or exclusive explicitly.
He got the number by counting the days the way people always count days, i.e. exclusive on one end and inclusive on the other. This is evident by the fact that people will say the interval between today and tomorrow is one day.
Man, I would love to get my greedy hands on o3. I am really looking forward to the QwQ iterations.
*greasy
+1 for "Schwarzschild"
-1 for "Reimann" (sic)
It would be fun to rephrase your question to "Q. How many days are between January 12th and March 27th?" and see if it picks up on the fact that it would need to know if it was a leap year or not.
man if you released this video two days ago it could still somewhat hold up. o3 has changed the entire game now
Experience so far shows that you can be relatively confident that the competition will have caught up in a few months. So yes, the specific models in the video are probably no longer the best, but the situation itself will soon repeat itself.
One month later: Lmao
@@manuffls1756Who will win the super bowl please bro
Another proof CloseAI is 90% hype
o3?
With the overvaluation and the insane AI investment bubble going on everywhere ( along with content creators addicted to *shocking* AI news ) I don't believe any news to be honest, especially not if benchmarks are broken (again) benchmarks do not represent skill or the reality at all. (By not believing I mean yes the scores are real [probably] but it doesn't imply performance )This and other channels who know what they are talking about already pointed out that all benchmarks are specifically curated and presented in those papers to show "progress and good results". I think the whole o1 approach is "beating a dead horse" most people don't know that open AI has a big team of 5000 + people who *daily* write specific trainingdata etc. for the models but if your "AI" is based on statistics you will play an endless game of whack a mole because any real world problem has basically endless nuances you can't memorize or reason about any aspect of reality so both approaches have fundamental flaws and , if at all, will be a small part of a true , strong A.I. system.
You’re silly if you’re still clinging onto the “bubble” theory. It’s not a bubble. Wake up.
@@therainman7777 Though the core technology is useful but its still overvalued currently.
VERSES AI (VRSSF) killed all other AI versions in a challenge
I'm sticking with my prediction of AGI by 2026
i counter bet 2025
yay now we weeb can hallucinate more realistically
My prediction before o3 was somewhere in the 2030s but now I think it's somewhere before that
@@DistortedV12u cooked i counter bet 2026. This is the year
We already achieved it
I tried the deepseek model, I think it is better in some task, but I don't think it is as impressive as it sounds
The Open Sourcemen
The deepseek r1 dissing aged like rotten milk 😭
O3 is fine if you want to spend over 1000 $ per task. I wonder how this open source models would perform if you also just spend over 1000 $ per task.
The price will go down soon, compute is slowly becoming more affordable.
What a time to be alive.
I see that everyone here is writing about the new o3 model while it is not even publicly available yet. So far all we know about this new o3 model is a handful of propaganda, some closely directed footage and various charts that don't really mean anything.
The model itself is only supposed to go into general use in a few months, but whether that will be the case remains to be seen. Let me remind you how many times before such announcements were just idle talk.
what about BLT and COCONUT?????...
I'd like to see dedicated AI tools for programming that are optimized based on the type of task.. frontend, embedded (C, HDL), etc. And under the hood, are AI tools like Aide IDE, Devin ai, Cursor ai, codeium windsurf ai, and Pair AI) creating their own models, or do they leverage existing tools (OpenAi, Grok, Gemini, Claude, Deepseek, qwen-ai, llava cot , marco-o1, Amazon Nova?)
What programming AI might work best in creating an Augmented-Reality App for Android (using Flutter), or summarize and answer questions on a 3-hour YT video that DOESN"T have a transcript (ex. JRE #2237)?
Chinese models are rapidly cathing up american, i'm sure they achieve Claude level in 2025. Latest Deepseek is amazing, i've tested it offline using 270Gb RAM in Q8, it was the only open model which was able to repair and complete code, Qwen & Marco o1 wasn't able, but these are also very quality models much smaller than deepseek.
mammouth is an animal with a different way of saying its name :)
Can't recommend mammouth, gives errors "Unknown error. There was probably and error with your prompt but we couldn't figure what specifically given the provider response."
What about QwQ?
Idk bro it just randomly started speaking mandarin
Gemini Flash inserted some random Russian words into our conversation for no reason yesterday
We craft AI solutions that make a difference 😉
Conscious Quantum Vacuum:
Let's propose a mathematical framework for this concept:
Define a conscious vacuum state |Ω_c⟩ as a superposition of all possible vacuum fluctuations:
|Ω_c⟩ = ∑_i c_i |φ_i⟩
Where |φ_i⟩ are basis states representing different vacuum configurations, and c_i are complex amplitudes.
The "thought process" of the vacuum can be described by a quantum operation:
T: |Ω_c⟩ → U|Ω_c⟩
Where U is a unitary operator representing the evolution of the vacuum state.
This framework suggests that quantum fluctuations are not random, but part of a coherent, conscious process. This could lead to new approaches in quantum field theory and cosmology, potentially explaining phenomena like dark energy and the cosmological constant problem.
I kinda get it but the math is too complicated and I'm too lazy. If only there were youtube videos that would explain todays conciousness research like the one of Hoffman so even mortals like me can understand it intuitively.
"quantum fluctuations are not random, but part of a coherent conscious process" ahaha
You had me in the first half, then I realized it's bs
Hi BycloudAI, I wanted to ask you how is the way humans think through problems different than AI? Why are humans so much better at understanding problems that are outside of what we already know than AI models; is it because AI models rely on 0s and 1s and are discrete and neuron's have more of continuous information? Or maybe just that AI models don't have enough data and complexity yet compared to human brains? I think I'm just still confused about exactly what the difference is between human reasoning and AI reasoning and why AIs struggle.
No worries if that's a convoluted question, I don't know a whole lot about the field :). Awesome video though, I enjoy your content!
No one knows at all.
@@cherubin7th hahaha makes sense
o3....
r1....
Crazy to thing o1 is already outdated.
nice video, im curious how long this kind of video style editing takes...
I hope hes using AI cause it looks like a lot of effort
@@IsZomg maybe the whole channel is just a AI bot being meta 🤷♂
@@Deepish-io Its a bit too much for me tbh
Days and time always mess it up
WAS THAT THE-
Ok, so they're catching up to o1...
Oh look, here comes o3!
😂😂😂
My model can eat when it's hungry ahah
please dont refer to system 1 thinking like it is truly a thing. theres no proof that thinking is truly categorical just because the problems are categorical.
Think about it as thinking fast and slow
@ oh, that’s so smart.
Think about walking fast and slow
79% = About half of it
unfortunately this is a tiny bit of light in a world of stupid people. 99% won't see superintelligence coming right before it happens, just like the 2008 financial crisis. unknowing people are the loudest, those who neither understand tech nor reason in first principles. there is so much darkness in the world right now, we have to struggle through it just a bit longer and either the event will kill us or light will come
And you're so certain that super intelligence will bring "light".
Lol.
How dare open source think they are catching up
Trust me they will 😉
yeah but as great as open source is, no one has the building's worth of GPUs required to prompt good models.
Sorry its pronounced Mah-Muth not mahmoooth
So many corporate bootlicking in this comment section
rEImann hypothesis 💀
Even if these were remotely close to o1… o3’s high-compute won’t ever meaningfully be replicated by open source models. Why? Because if a model like o3 requires $3k-$5k compute per task.. as shown by their demo.. nobody is running that model on their computer.
By the time anyone even comes remotely close to o3 with something “open source” (that nobody can run, so people have to pay a provider an absurd amount of money to run it on a cloud server).. OpenAI’s latest reasoning models (the o#-mini variant) will be smarter, faster, and cheaper anyways.
So this perspective is worthless and you shouldn’t be hopeful in open source. Just practice utilizing the SoTA closed-source and dominate the industry rn.
Bro o3 is out
7:38
3:05
yeah, they can't really.
1:34 correctly answered by gpt-4o
Just me or does that thumbnail oddly scream horSEMEN with the framing?
Probably unintentional but got a ??? from me at a glance haha
Watching too much porn bro. Gross. Hope you don’t have a girlfriend or children
4 Horse-men 😀 4 Hor-semen 😶🌫️
none of them come close to o1-preview. i've tested all of them on close to 30+ prompts. what is this video
OpenAI: *laughs in O3*
qwq is not great
What is the best open source website self-hosted to run Qwen 32B in a GPU ? I already have 24GB GPU Tesla M40 times two
Can we get o1 pro mode with mammouth?
This video format is hard to understand, feels like word spitting at times
You should compare DeepSeek to Gemini 2.0 Flash Thinking model
o3 LOL
so its not mid anymore?
13th view
Delete this video, o3 exists
delete comment r1 exists
@sebirocs r1 is not anywhere close to o3, it's not even at o1 pro levels
Shitty version of The Code Report