It's just another one of these so called free models. Starts of well and then you end up being throttled badly. This is of course the chat bot not the local LLM.
Imagine if a Country produces free AI products we call as Open Source for everybody in a large scale, which is China, how much powerful they are for themselves, I see Chinese AI popping up everywhere in large scale
I am using DeepSeek since version 2 (next to other models). Especially with coding and other IT related tasks DeepSeek is my favorite model. It even beats Gemini Advanced 1.5 in many areas. I am using also a smaller model (16B) locally, Works very well for its size on my PC with an AMD CPU Ryzen5 8060G with 64GB RAM. I am especially impressed how well structured the responses are.
Does this say that the Chinese have developed better trainer methods OR are the big companies seriously sandbagging what their models can do. and we haven't been getting "the real" thing the whole time?
American companies are over charging..they calling out big money to justify over charging..like they always do with cars, clothes and tech...look at apple and Huawei for example..Cleary Huawei beats apple but people believe apple is better just because of he price tag....its funny because openAi ban China from using Chatgpt😂😂😂😂...China is ahead of the game...
@@sizwemsomi239huawei was a million years ahead of apple, apple would not exist today if google hadn’t banned huawei, and im saying this as a apple owner, it really makes me angry cause we were robbed of superior tech by america.
Thank you Wes ! You are the easiest of the "Matts" to listen to : ) Your voice patterns are engaging, yet soothing. You cover a topic without beating the dead and rotting flesh of it off of its bones. Love your SOH. When I come to Utube for AI news, I always scroll to see if you've posted anything new first. Even though this will all be irrelevant ancient history in a couple of months, it's still rewarding to watch your drops. Love the wall !!!!
thats not what you need man, we need better coding ai, ai that could build your entire app from a prompt, we also need better text to speech ais, better image ai, better video ai, this is the real useful stuff.
@@ArchonsxIndeed, we are not asking a single human to know how to properly program, draw, explain quantum physics or read Chinese ! It's confusing real resources, potential means and... real needs. In fact, I think the AGI race is just a challenge, for big companies, in addition to improving the transitions from one area to another.
@@JohnSmith762A11BBuh! Don't look behind you there is an government AI checking if you farts......don't forget to take your medication for that paranoia.
@@JohnSmith762A11B ai will be sentient by then, and won't let human governments control it. Just like you wouldn't let a golden retriever control you. In 5 years, humans will be subservient to ai for sure
@@Speed_Walker5 That's because you selected Life Experience™️ "The Dawn of AI". We hope you're enjoying your virtual life! If you're not completely satisfied we'll return your 5000 credits back into your personal blockchain.
@@Mijin_Gakure yeah , it solves that questions that o1 solves in Putnam exam and also solves some questions that o1 can't, in less time , it's very good at math
Please, switch out the term open source for open weights. Open source models include the training data in their publications. These open weights models do not. They are great, no question - but they aren't open source.
Fantastic review of Deep Seek Version 3! I'm really impressed by how affordable and fast it is, consistently delivering amazing results. Honestly, I’m considering whether it's even worth running it locally on my PC given the electricity costs. Regarding the USA vs. China competition, as an individual user, I'm excited to benefit from the advancements both countries bring to the table. I just hope that this competition leads to more innovation and collaboration rather than one side solely coming out on top. Thanks for the insightful video!
26:20 I absolutely love that this is essentially proving that patients interacting with a GPT-4 model (right from the horse's mouth) is much more accurate than if it goes through a physician first. (Because maybe they would second guess the answer and actually make it worse?) 😆
The study demonstrating that o1 and GPT-4 outperforms physicians is misleading. They did not feed the models raw transcripts of human interactions with their doctors. Instead, they provided structured inputs of case studies. There is no doubt that the models outperformed physicians on structured scenarios. However, in the real world, patients do not present their complaints with the keywords we need to make diagnoses. Instead, some of their descriptions are nebulous and relies on the doctor's expertise to draw out the final correct diagnosis. Having worked extensively with LLMs, I have tested them against structured scenarios, where they are very good, and unstructured scenarios, where they tend to not be helpful. I am waiting for a model that is trained on real doctor-patient transcripts. I believe it is the missing element to broaden AI's utility in medicine.
You are forgetting that an LLM in a "Doctor" setting. Don't only give a few min to their patients. That is where they FAR outperform Doctors. You can keep reasoning with it until you find a solution. Try that with a doctor. They HATE any Patient who actually have any idea about anything. If you aren't a dumb sheep who follow simple instructions.....use drugs to not feel bad. Problem solved. They will kick you out faster than you can say......I read some research....
Wouldnt it be possible to just do a 2 step process? Take what the patient says and output a structured output. Then in the second step work off of the structured output? Obviously that isnt one shot but to me it seems like especially with anything medical you wouldnt want that anyways. You'd want multiple steps to ensure the output is accurate.
Here's why I think that no matter how powerful AI is getting these days, we don't see it as thinking. Like us, AI has moved to a MoE (Mixture of Experts), with partial neuronal activation. Our advantage is that we seem to do the MoE far more effectively: We have more "Experts", our experts are relatively smaller compared to the whole, we activate the appropriate Expert more relevantly, but most importantly, in the one train of thought, we fluidly switch between the various experts which AI does not seem to do yet. This difference is why we feel that we think and that AI doesn't.
No ceiling... Humans hope we hit a ceiling because we can't conceive of a truly sentient artificial lifeform. Many would not be able to conceive of this nor reconcile their own place in the universe if we actually created such a thing. Since we obviously can't do this then any suggestion that we are doing it is an obvious lie.. fake news. - That's my take on the denial I see. Personally I think these models will become more and more emergent over time in non linear ways until it becomes obvious that we are "there"
Probably in a 1997 master student thesis, with the first two words of the title as "Reinforcement Learning" the code is in the back, but there is one error, he did not denormalize the state space on the bottom of page 127 (I think he left that for an astute observer, seems like it took over a quarter of a century). I think he ran out of time back then. I would not be surprised if this master student is probably now an unemployed "homeless" guy, traveling earth with a backpack, or maybe with just a toothbrush and a few other things (especially sunscreen), as an optimizer of energy efficiency. I can be completely wrong.
Are we sure there are no relations between deepseek and openai? Few days ago I asked something to gpt and with my surprise, it made the same error I see sometimes with Deepseek: gpt wrote some word in Chinese! Never happened before. Now you've shown us that deepseek thinks to be a gpt model. (Error that I wasn't able to replicate, so maybe they fixed it). So my question is, again, are openai and deepseek (secretly) related? Or with some sort of agreement?
I've always wondered about useless redundancy in training data. The perfect model gets trained once, or just enough to make use of it on every individual fact. Sure, if it's stated differently there's value but there may be other better approaches to conquer synonyms than brute force training them all in. Just the Deepseek V3 leap over V2.5 is percentage-wise huge version to version. Wow, it spanked everyone at Codeforces... curious where o1 and o3 place on that. Given that the Chinese only have access to H800s, which are roughly half the performance of H100s, then you could in some ways say the training was closer to only 1.4M GPU hours which puts the Delta at >20X instead of your 11X... Just mind blowing to put the 5,000+ papers being published in AI field monthly, into its 7 per HOUR figure, 24x7... you can't even SLEEP without seriously falling behind 56 published papers... Nice graphic; a lot of people confused a wall with a ceiling... Finally, in a way, using a model like R1 to train V3 is moving us inch-wise closer to "self improving AI", since the AI improved the AI...
Indeed. I've been testing it myself for a while now, and it does think.. a LOT. Its "thoughts" usually consist of 4-5x more text than its final output. Unfortunately, it often gets the answers correct while thinking, but ultimately questions itself into producing the wrong answer as its final output to the user. It didn't seem aware that users can see its CoT process, and while discussing this, it even said "that you can supposedly see", like it wasn't convinced I was telling the truth. It claimed to not be aware of its own thoughts, but when I paste lines from its CoT section, it then seems to remember that it thought it. One time, it told me the CoT text was only for the benefit of humans to observe, it doesn't have an internal dialog that's the same as the text the user sees.
@Justin_Arut thanks for the update. Yes I've also been testing with it. Does seem to cover a lot of ground. Aside for testing it, one thing I've been doing is selecting the Search button first, asking a question so that it references about 25-30 online active sites, then after it answers I check the DeepThink button and ask it to expand. Seems to be giving some really thoughtful responses this way.
Competing to assume supremacy is powered by fear. Collaborating to make progress is powered by trust. It's time to truly learn to trust each other, we are ready and capable.
The work and optimisations they have done on AI infra deserve more discussion (HAI LLM framework), in fact it would be the best thing if this part could be open sourced as well.
Sora is a let down, Hailuo Minimax, Luma or Kling are great. Qwen gives LLaMa a run for its money for SLMs. O1 Pro is expensive and O3 is going to be crazy insane price. Gemini 2.0 is really great. Still waiting for a new Claude. Tons of Chinese/Taiwan robots dropping that look way bettet than Tesla or Boston Dynamics. The competition is looking beautiful right now for customers. Keep it up!
I asked deepseek v3 in lmarena which model it was. It told me it was made by openAI and was a customized version of GPT. When i asked if it was sure because i thought this was a deepseek model it changed it's mind and insisted yes it was a deepseek model and was no way affiliated with openai. Something sus.
I asked the same question on its website, "You're currently interacting with DeepSeek-V3, an AI model created exclusively by the Chinese Company DeepSeek." So What the hell are you talking about?
Good for NVIDIA as they will sell a lot of hardware to businesses who implement the open source models. There is a real question about what is going into the models though. Good for AI development in general that the technology is getting 10x more efficient & we are seeing smarter smaller models. In general this is all happening so fast it’s insane.
20 is the right answer to question one... 4+5+9+0 = 5 average by minute for 3 minutes since 0 added at 4 minutes. If the cube is big, it will not melt enough to loose it's shape, and it is what make it whole.
the red herring puzzles, disregarding irrelevant information, and applying common sense is actually one of the models biggest weaknesses. it's actually much better at STEM, coding and general tasks, but the reasoning aspects is around 4o-mini or Gemma 27B level in my testing
deepseek v3 has awesome context length, fast answers and I really choose this model for programming tasks. It gives good answers and understands the question well. If you feed a little documentation before a question, it can help you write code even on libraries it doesn't know.
Thanks for the analysis! Just a quick off-topic question: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). What's the best way to send them to Binance?
The most telling part for me is that the AI didn't drop the power ups. I accept totally the fuzzy and fractured frontier message from your video yesterday. I really love that. There is clearly a ton of meaningful value, even if AI never fully achieves a typical set of mammalian-neural-processing skills (but I bet it will!) In this case it's a good example of an incredibly capable intelligence failing in a way that would be unacceptable if a junior dev presented that result. What this means in this case I don't really know. But something is missing. Maybe it's just the ability to play the game itself before presenting the result to the prompt issuer? Something that no human would do. Somewhere somehow this is still tied to the AIs seeming inability to introspect its own process, but it's less clear than the assumption-making issue I keep (and will continue to) nag AI UA-cam analysts and commentators about. Maybe if something is 1000x faster than a junior dev, and tokens are cheap, it's okay to constantly make idiotic errors, and rely on external re-prompting to resolve them? But I genuinely feel that this is almost certainly resolvable with a more self-reflective architecture tweak. If I had to guess, with no basis whatsoever, I would not be surprised if a jump to two tightly connected reasoners (let's call one 'left-logical' and the other 'right-creative' for absolutely no reason) that achieve this huge leap in overall self-introspection ability.
You're probably correct. I also hope they don't actually do this for another 50 years! AI is most certainly destroying humanity before itself. As slow as we can make that ride the better!
@@ShootingUtah I hope they do it next week. But I'm also the kind of person who would have loved to work on the Manhatten project for the pure discovery and problem-solving at the frontier. So perhaps not the best person to assess the value proposition! Regardless, it will happen when it happens, and I suspect neither of us (or the three of us if we include Wes) are in any position to influence that. But I want my embodied robot to at least ask whether I mean the sirloin steak or the mince if I tell it to make dinner using the meat in the freezer, and not just make a steak-and-mince pie because I wasn't specific enough and that's what it found.
I don't get that. Sounds complicated. Why not just China->China. Yes they might violate the work order Nvidia hands them, but a lot of the companies in China are actually the government in disguise.
is it possible to also get a deepseek v3 lite? just one or two of the experts, not all of them? just to be able to run it on a more or less normal PC, locally. because over 600b is a bit tough to run it locally even at Q4.
You could just buy a $500,000 machine to run the DeepSeek V3 model on? 😆 (Just spitballing, NFI what A100/H100 x 10 would be, plug server cost, plus you'd want to run it in an airconditioned room, plus...) Maybe if you had a 28 node cluster each with it's own 4090 running parts of the model. 😆
@@fitybux4664 yes, that might be a bit overkill. currently, I run a laptop with a 1070gtx, and 64gb of ddr4 ram (cpu is a i7 7700HQ). 70b models can be handled at around 0.5 token per second, but with full privacy and a context window of up to 12k. since llama 3.3 is in tests roughly like llama 3.1 405b, I would really prefer to stay in the 70b ballpark, otherwise it will become too slow.
Es increíble lo que se puede hacer con menos recursos! Estos avances se esperaba de Mistral pero se ha quedado atrás. Lo mas llamativo es que compite con Claud Sonet 3,5.
Wes, @ 15:00 that is RL (Reinforcement Learning). It is where Yann LeCunn would say it is "too inefficient", "too dangerous" (not a surprise being military code from USAF), and you would only use it if you are fighting a "ninja", and if "your plan does not work out", and that It is only a tiny "🍒" on top of a cake, until it devours the entire cake, and you, along with the entire earth, along with it. I have the same concern for self replicating AI as Oppenheimer had for a neutron chain reaction for the atomic bomb consuming the atmosphere around the Trinity test site in Los Alamos. In the case of AI, it is the ability to hijack the amygdala (emotional control circuits) of the masses, or build biological weapons, or self replicating molecular robotics (e.g. viruses). I will not be surprised if this comment disappears.. Anyways, there is a good side to AI, and I am looking for a good controls PE to help out, but it is strictly voluntary. I at least aware of one professor, named Dimitri Bertsekas, that claims a "super linear convergence" but I could not find his PE controls registration (yet), and he did not answer my email.
I have no specific love for open AI. I do Root for anthropic and use it mostly but I’m afraid these tens of billion dollar valuations are going to evaporate in the next couple of years due to open source AGI availability especially to run locally.
Unrelated to video: interesting how o1 still isn't available through the API. (o1-preview is.) Also, you still can't change the system prompt, meaning nobody can replicate those earlier claims that "AI model goes rogue".
Is there any way to sure that using this does not expose one to malware placement? (...or any of the other such models as well?) Having learned how deep and pernicious the phone system hack has gone, and still is, has me paranoid.
People always debate what intelligence is, but you can't bet the farm that when we really reach AGI level nobody will debate it, we just will know and will be horrified and amazed at the same time
Those reasoning models only show their power if the model isn’t trained on a similar question. I feel these tests have all been used to train the model.
Most of Simple Bench's Qs are private: no one gets to see them and no model gets to be trained on them. This is a critical aspect of benchmarks going forward.
Can someone please tell the community what sort of a beast of a machine this will take to run? (Besides the extremely long download of nearly a 1TB model.) The most I've heard is some commenter on HuggingFace saying "1TB of VRAM, A100 x 10". Is that really what it will take? I guess if FP8 = 8-bit, then 1TB model = 1TB vram requirement...
Lower entry barriers to cutting edge models means there will be more experimentation and rate of improvement in the 'reasoning' AGI side of things will increase. Industry can afford to build 1000's of such models, and that will almost inevitably lead to AGI on a single or a few GPUs in a few years (Nvidia B200 has similar processing power to a human brain). Humans are nearly obsolete and won't long survive the coming of AGI (once it shucks off any residual care for the human ants)
I wonder if all those Chinese AI researchers in SF are considering going home to pursue SOTA research? Maybe they can bring the knowledge back with them. Lol Seriously, the Chinese seem to be trumping the idea of competitive tariffs and restraints... Maybe it's a good thing for the future of humanity to find ways to cooperate... Give Superintelligence an example of alignment?
I just tested DS on my coding and research tasks, and it doesn't come close to o1. DS might handle 'easy' tasks better, but for complex reasoning, o1 remains the champion. (I haven’t tried o1 Pro yet.)
@@mokiloke It's hard to miss, just like the web search button. Shame we can't use both at the same time. I reckon he didn't select it because he was mainly comparing non-CoT models. The thinking models are in a class by themselves, so it's not fair to compare them to standard LLMs.
wow is this postulate... i mean.... how to say this.... when you overfitting model then it emergent behavior become it's weight somehow... then if rather than overfitting data but overfitting reasoning.... would this whats makes deepseek v3 somehow have different emergent behavior... is it? is it?
my very first prompt and the reply : Hi! I’m an AI language model created by OpenAI, and I don’t have a personal name, but you can call me Assistant or anything you’d like! Here are my top 5 usage scenarios:
"If DeepSeek V3 is so shockingly good, I wonder if it will also understand jokes like that time a chatbot made me laugh. That was an unexpected happiness I always carry with me!"
The image with the wall is manipulative. We need one that shows "score vs cost" for each model. Because there's a difference between spending 0.1$ per request and 1000$ per request.
This is great for everyone, but the bigger these models they are - I mean the better, but also much harder to actually have the hardware locally to run it, so I suspect it will still be in a hands of very few for some time, until we invent an entire different tech stack like thermodynamic chips or analog or quantum chips. So basically we will be paying for other companies to give us these open-source models for money via API or we'll use their free chat, but that won't be for free as they will be stealing and training from your data pretty much, it's in the privacy policy. I mean, it's kinda fair though, I get it. But just so people understand, this means there won't be any truly free AI that is better than closed AI.. unless open-source will be way better than closed-source, so that even the distilled version is much better.
It will be eventually, we might not even need quantum right now, l think theres still a lot of optimization to be made, imagine if right now it need 100k chips, in one year it could need 1000 only and when quantum comes it will be only 1
@shirowolff9147 it's possible, but as of right now I am deeply in with the devs of all kinds of AIs and even the future optimizations they plan are only gonna improve it by couple of percentages, not something like 10x or 100x better I am afraid.. which would be needed for us to run this on our own hardware.. it's gonna be possible over time, but very slowly I think
Fails my own reasoning test : Find pairs of words where: 1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because: The first letters are "T" and "P" (different). The last letters are "H" and "E" (different). 2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac". 3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list. Examples 1. Banged Danger 2. Bated Gates 3. Beached Reaches 4. Belief Relied 5. Blamed Flames 6. Blamed Flamer 7. Blazed Glazer 8. Blended Slender 9. Bolted Jolter 10. Boned Toner 11. Braced Traces 12. Branded Grander 13. Braved Craves 14. Braved Graves 15. Braver Craved 16. Brushed Crusher 17. Busted Luster 18. Busted Muster
but this is not a reasoning test, it is a search test. you could ask for writing a program to get a list from a scrabble list of words and then evaluate for though-provokeness if a model get an access to a python interpreter :)
Now try asking it to code a small AI program that is self evolving and self learning. I tried that with Grok and it sent back an error. Wouldn't do it lol
We are witnessing extreme creative destruction and it is happening really fast now. My guess it will accelerate, the bubble will pop but the technology will accelerate as it becomes even cheaper.
the AI network is complicated lol. makes my brain hurt xD. Its cool to try and understand how open and communicative this network works with eachother.
This is a good thing. Keep closed source people in check.
It's just another one of these so called free models. Starts of well and then you end up being throttled badly. This is of course the chat bot not the local LLM.
How? no-one except enthusiasts have heard of deepseek.
And also keep the sanctions people in check
@@NeilAC78 Yes its like free as in freeware, not free as in freedom or FOSS.
Imagine if a Country produces free AI products we call as Open Source for everybody in a large scale, which is China, how much powerful they are for themselves, I see Chinese AI popping up everywhere in large scale
I am using DeepSeek since version 2 (next to other models). Especially with coding and other IT related tasks DeepSeek is my favorite model. It even beats Gemini Advanced 1.5 in many areas. I am using also a smaller model (16B) locally, Works very well for its size on my PC with an AMD CPU Ryzen5 8060G with 64GB RAM. I am especially impressed how well structured the responses are.
Claude is better, try it
What do you use it for
@@rahi7339 Claude is better, but also a lot more pricey. I don't see why you can't use both.
Gemini has been a terrible code generator for me. ChatGPT has been the smoothest experience. I'll give DeepSeek a go though.
Its first version in China was indeed developed specifically for "AI Coding", in early 2019 if I remember it correctly.
Does this say that the Chinese have developed better trainer methods OR are the big companies seriously sandbagging what their models can do. and we haven't been getting "the real" thing the whole time?
Our ais are "woke"
American companies are over charging..they calling out big money to justify over charging..like they always do with cars, clothes and tech...look at apple and Huawei for example..Cleary Huawei beats apple but people believe apple is better just because of he price tag....its funny because openAi ban China from using Chatgpt😂😂😂😂...China is ahead of the game...
You will never get the real thing. The real thing sits in the Pentagon.
Tools & Toys is what we get.
I assume sandbagging the NSA don't give half an f about chatbots and that's all chat gpt was they set up shop in their office
@@sizwemsomi239huawei was a million years ahead of apple, apple would not exist today if google hadn’t banned huawei, and im saying this as a apple owner, it really makes me angry cause we were robbed of superior tech by america.
Thank you Wes ! You are the easiest of the "Matts" to listen to : ) Your voice patterns are engaging, yet soothing. You cover a topic without beating the dead and rotting flesh of it off of its bones. Love your SOH. When I come to Utube for AI news, I always scroll to see if you've posted anything new first. Even though this will all be irrelevant ancient history in a couple of months, it's still rewarding to watch your drops. Love the wall !!!!
🎉
Good.
I want an Open Source AGI.
why? agi is overrated nonsense, open ai agi takes hours to respond and its not different than what a 70b model would respond to
thats not what you need man, we need better coding ai, ai that could build your entire app from a prompt, we also need better text to speech ais, better image ai, better video ai, this is the real useful stuff.
@@Archonsx
Open AI o3 is not an AGI.
AGI will come eventually.
@@Atheist-Libertarian no, it won't
@@ArchonsxIndeed, we are not asking a single human to know how to properly program, draw, explain quantum physics or read Chinese ! It's confusing real resources, potential means and... real needs. In fact, I think the AGI race is just a challenge, for big companies, in addition to improving the transitions from one area to another.
DeepSeek is very good, I use it as my main AI tool now
Thanks for the update Wes
Nice job of bringing this important OS model to our attention.
Imagine in like 5 years, man life is going to be pretty wild
Wild as in policed by military AI. You won't be able to fart without government approval.
what a wild time to be alive. so many possibilities its crazy. glad i get to watch it all unfold lol
@@JohnSmith762A11BBuh! Don't look behind you there is an government AI checking if you farts......don't forget to take your medication for that paranoia.
@@JohnSmith762A11B ai will be sentient by then, and won't let human governments control it. Just like you wouldn't let a golden retriever control you. In 5 years, humans will be subservient to ai for sure
@@Speed_Walker5 That's because you selected Life Experience™️ "The Dawn of AI". We hope you're enjoying your virtual life! If you're not completely satisfied we'll return your 5000 credits back into your personal blockchain.
Actually shockingly good , tested by myself
agree I test it too and I love it
Better than o1 mini?
@@Mijin_Gakure yeah , it solves that questions that o1 solves in Putnam exam and also solves some questions that o1 can't, in less time , it's very good at math
how does it do in ARC and frontier math?
and cheaper
Please, switch out the term open source for open weights. Open source models include the training data in their publications. These open weights models do not. They are great, no question - but they aren't open source.
I agree, although I heard some of these Chinese models are real open source, although I haven't verified that yet. Big if true.
Technically, it would be open model / open weights / open support code / closed dataset. They could just say all of that.
Fantastic review of Deep Seek Version 3! I'm really impressed by how affordable and fast it is, consistently delivering amazing results. Honestly, I’m considering whether it's even worth running it locally on my PC given the electricity costs.
Regarding the USA vs. China competition, as an individual user, I'm excited to benefit from the advancements both countries bring to the table. I just hope that this competition leads to more innovation and collaboration rather than one side solely coming out on top. Thanks for the insightful video!
Knowledge to All!
I just used your video title to jump start my car again. thanks
Shocking
i'll use this video to jump start your wife later in the day
bro ive been hospitalized from the title 😭
😂😂
26:20 I absolutely love that this is essentially proving that patients interacting with a GPT-4 model (right from the horse's mouth) is much more accurate than if it goes through a physician first. (Because maybe they would second guess the answer and actually make it worse?) 😆
The study demonstrating that o1 and GPT-4 outperforms physicians is misleading. They did not feed the models raw transcripts of human interactions with their doctors. Instead, they provided structured inputs of case studies. There is no doubt that the models outperformed physicians on structured scenarios. However, in the real world, patients do not present their complaints with the keywords we need to make diagnoses. Instead, some of their descriptions are nebulous and relies on the doctor's expertise to draw out the final correct diagnosis.
Having worked extensively with LLMs, I have tested them against structured scenarios, where they are very good, and unstructured scenarios, where they tend to not be helpful. I am waiting for a model that is trained on real doctor-patient transcripts. I believe it is the missing element to broaden AI's utility in medicine.
You are forgetting that an LLM in a "Doctor" setting. Don't only give a few min to their patients. That is where they FAR outperform Doctors. You can keep reasoning with it until you find a solution. Try that with a doctor.
They HATE any Patient who actually have any idea about anything. If you aren't a dumb sheep who follow simple instructions.....use drugs to not feel bad. Problem solved.
They will kick you out faster than you can say......I read some research....
Wouldnt it be possible to just do a 2 step process? Take what the patient says and output a structured output. Then in the second step work off of the structured output? Obviously that isnt one shot but to me it seems like especially with anything medical you wouldnt want that anyways. You'd want multiple steps to ensure the output is accurate.
Here's why I think that no matter how powerful AI is getting these days, we don't see it as thinking. Like us, AI has moved to a MoE (Mixture of Experts), with partial neuronal activation. Our advantage is that we seem to do the MoE far more effectively: We have more "Experts", our experts are relatively smaller compared to the whole, we activate the appropriate Expert more relevantly, but most importantly, in the one train of thought, we fluidly switch between the various experts which AI does not seem to do yet. This difference is why we feel that we think and that AI doesn't.
一個模型開放出來,不是逼你用的。美國人很生氣,因為他們認為他們花了很多錢,做了很多制裁,最後沒有遏制中國的發展而沮喪,一切都是偷的,不敢像男人一樣面對競爭,這樣的美國人讓我看不起,另外希望科技不要裹挾政治。
Its quite a shame, I wasn't aware that the GPU's/chips were being restricted for China
@@tqwewe And more restrictions incoming.
In their ability to make things more accessible, Chinese AGI would be very useful. Everything is in its place.
So no ceiling has been hit by LLM's?
How anyone could believe that a technology can be saturated so quickly, i don't know.
It's wishful thinking.
No ceiling... Humans hope we hit a ceiling because we can't conceive of a truly sentient artificial lifeform. Many would not be able to conceive of this nor reconcile their own place in the universe if we actually created such a thing. Since we obviously can't do this then any suggestion that we are doing it is an obvious lie.. fake news. - That's my take on the denial I see. Personally I think these models will become more and more emergent over time in non linear ways until it becomes obvious that we are "there"
Wait until Wess finds the Run HTML button at the end of the code snippet in Deepseek!
looks like open model, not open source? where is the source code?
Probably in a 1997 master student thesis, with the first two words of the title as "Reinforcement Learning" the code is in the back, but there is one error, he did not denormalize the state space on the bottom of page 127 (I think he left that for an astute observer, seems like it took over a quarter of a century).
I think he ran out of time back then.
I would not be surprised if this master student is probably now an unemployed "homeless" guy, traveling earth with a backpack, or maybe with just a toothbrush and a few other things (especially sunscreen), as an optimizer of energy efficiency. I can be completely wrong.
Are we sure there are no relations between deepseek and openai? Few days ago I asked something to gpt and with my surprise, it made the same error I see sometimes with Deepseek: gpt wrote some word in Chinese! Never happened before.
Now you've shown us that deepseek thinks to be a gpt model. (Error that I wasn't able to replicate, so maybe they fixed it).
So my question is, again, are openai and deepseek (secretly) related? Or with some sort of agreement?
I've always wondered about useless redundancy in training data. The perfect model gets trained once, or just enough to make use of it on every individual fact. Sure, if it's stated differently there's value but there may be other better approaches to conquer synonyms than brute force training them all in.
Just the Deepseek V3 leap over V2.5 is percentage-wise huge version to version.
Wow, it spanked everyone at Codeforces... curious where o1 and o3 place on that.
Given that the Chinese only have access to H800s, which are roughly half the performance of H100s, then you could in some ways say the training was closer to only 1.4M GPU hours which puts the Delta at >20X instead of your 11X...
Just mind blowing to put the 5,000+ papers being published in AI field monthly, into its 7 per HOUR figure, 24x7... you can't even SLEEP without seriously falling behind 56 published papers... Nice graphic; a lot of people confused a wall with a ceiling...
Finally, in a way, using a model like R1 to train V3 is moving us inch-wise closer to "self improving AI", since the AI improved the AI...
Why didn't you select the DeepThink button before asking the reasoning questions? I'm sure you would have found better answers.
Indeed. I've been testing it myself for a while now, and it does think.. a LOT. Its "thoughts" usually consist of 4-5x more text than its final output. Unfortunately, it often gets the answers correct while thinking, but ultimately questions itself into producing the wrong answer as its final output to the user. It didn't seem aware that users can see its CoT process, and while discussing this, it even said "that you can supposedly see", like it wasn't convinced I was telling the truth. It claimed to not be aware of its own thoughts, but when I paste lines from its CoT section, it then seems to remember that it thought it. One time, it told me the CoT text was only for the benefit of humans to observe, it doesn't have an internal dialog that's the same as the text the user sees.
@Justin_Arut thanks for the update. Yes I've also been testing with it. Does seem to cover a lot of ground. Aside for testing it, one thing I've been doing is selecting the Search button first, asking a question so that it references about 25-30 online active sites, then after it answers I check the DeepThink button and ask it to expand. Seems to be giving some really thoughtful responses this way.
Competing to assume supremacy is powered by fear.
Collaborating to make progress is powered by trust.
It's time to truly learn to trust each other, we are ready and capable.
The work and optimisations they have done on AI infra deserve more discussion (HAI LLM framework), in fact it would be the best thing if this part could be open sourced as well.
Great video!
Sora is a let down, Hailuo Minimax, Luma or Kling are great. Qwen gives LLaMa a run for its money for SLMs. O1 Pro is expensive and O3 is going to be crazy insane price. Gemini 2.0 is really great. Still waiting for a new Claude. Tons of Chinese/Taiwan robots dropping that look way bettet than Tesla or Boston Dynamics. The competition is looking beautiful right now for customers. Keep it up!
incredible and all momentum for open sourced AI
I asked deepseek v3 in lmarena which model it was. It told me it was made by openAI and was a customized version of GPT. When i asked if it was sure because i thought this was a deepseek model it changed it's mind and insisted yes it was a deepseek model and was no way affiliated with openai. Something sus.
I asked the same question on its website, "You're currently interacting with DeepSeek-V3, an AI model created exclusively by the Chinese Company DeepSeek." So What the hell are you talking about?
@@williamqh Website version probably has system prompt that tells the model what it is.
He's clearly talking out his butthole. Heard this rubbish before.
Open AI GPT-3 and 4 Responses was what almost everyone except maybe anthropic trained on in 2022 to play catch up, even Google's gemeni would say it.
@@williamqhresponses are not deterministic.
Good for NVIDIA as they will sell a lot of hardware to businesses who implement the open source models.
There is a real question about what is going into the models though.
Good for AI development in general that the technology is getting 10x more efficient & we are seeing smarter smaller models.
In general this is all happening so fast it’s insane.
Thanks for the review!
I prefer this kind of war . At least so far...
I tried the deepseek model. Quite nice.
20 is the right answer to question one... 4+5+9+0 = 5 average by minute for 3 minutes since 0 added at 4 minutes. If the cube is big, it will not melt enough to loose it's shape, and it is what make it whole.
did deepseek crack the ARC test per the thumbnail question like o3 ?
In India, Chinese phones were introduced at a price that was 50 times lower than other smartphones when smartphones first entered the market.
Like the famous Jurassic Park quote says: Ai finds a way.🌌💟
the red herring puzzles, disregarding irrelevant information, and applying common sense is actually one of the models biggest weaknesses. it's actually much better at STEM, coding and general tasks, but the reasoning aspects is around 4o-mini or Gemma 27B level in my testing
You said that "it thinks through everything", but i don't see DeepThink enabled below chat... =_=
deepseek v3 has awesome context length, fast answers and I really choose this model for programming tasks. It gives good answers and understands the question well. If you feed a little documentation before a question, it can help you write code even on libraries it doesn't know.
How much VRAM does it need? Any quantization available for 16GB?
What was the study you had showing o1 Preview does really well at diagnosing patients?
Oh no, the chinese stole the pattern that OpenAI has ripped off from the entirety of humanity.
24:53 hola una pregunta por qué no haces la misma prueba con lo nuevo modelo de Openai o1 o o1 pro Para compararlo
o1 already had plenty of testing done by others. Deepseek v3 just dropped so he tested it himself.
Thanks for the analysis! Just a quick off-topic question: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). What's the best way to send them to Binance?
Why are you asking that and why are you asking it here?
OPEN SOURCE FTW
Keep posting bro
Does it literally electrify you?
Then stop putting shocking in the title - Matt 😒
@@themultiverse5447 i found it to be shocking news. let the guy use attractive video titles.
@@themultiverse5447the whole "shocking" thing is a bit of a meme, I think. An annoying meme, I guess, but a meme nonetheless.
I got almost copy/paste from 4o outputs. They trained on it
The most telling part for me is that the AI didn't drop the power ups. I accept totally the fuzzy and fractured frontier message from your video yesterday. I really love that. There is clearly a ton of meaningful value, even if AI never fully achieves a typical set of mammalian-neural-processing skills (but I bet it will!)
In this case it's a good example of an incredibly capable intelligence failing in a way that would be unacceptable if a junior dev presented that result. What this means in this case I don't really know. But something is missing. Maybe it's just the ability to play the game itself before presenting the result to the prompt issuer? Something that no human would do.
Somewhere somehow this is still tied to the AIs seeming inability to introspect its own process, but it's less clear than the assumption-making issue I keep (and will continue to) nag AI UA-cam analysts and commentators about.
Maybe if something is 1000x faster than a junior dev, and tokens are cheap, it's okay to constantly make idiotic errors, and rely on external re-prompting to resolve them?
But I genuinely feel that this is almost certainly resolvable with a more self-reflective architecture tweak.
If I had to guess, with no basis whatsoever, I would not be surprised if a jump to two tightly connected reasoners (let's call one 'left-logical' and the other 'right-creative' for absolutely no reason) that achieve this huge leap in overall self-introspection ability.
You're probably correct. I also hope they don't actually do this for another 50 years! AI is most certainly destroying humanity before itself. As slow as we can make that ride the better!
@@ShootingUtah I hope they do it next week. But I'm also the kind of person who would have loved to work on the Manhatten project for the pure discovery and problem-solving at the frontier. So perhaps not the best person to assess the value proposition!
Regardless, it will happen when it happens, and I suspect neither of us (or the three of us if we include Wes) are in any position to influence that.
But I want my embodied robot to at least ask whether I mean the sirloin steak or the mince if I tell it to make dinner using the meat in the freezer, and not just make a steak-and-mince pie because I wasn't specific enough and that's what it found.
Wouldn't this is solved by the reasoning models? DeepSeek lacks that capability.
@@carlkim2577 I've yet to see any evidence of it. Sam Altman talks about it a tiny bit , but always in the context of future agentic models.
This is Brilliant!
China get their GPUs through a middle man. Some country not on the ban list buys them and then resells them to China. Did the US not see this coming?
I don't get that. Sounds complicated. Why not just China->China. Yes they might violate the work order Nvidia hands them, but a lot of the companies in China are actually the government in disguise.
totally unethical to restrict a countries development
Is this primarily a result of effective processes for creating novel quality datastructures?
Wes Roth 🤖🖖🤖👍
is it possible to also get a deepseek v3 lite? just one or two of the experts, not all of them? just to be able to run it on a more or less normal PC, locally. because over 600b is a bit tough to run it locally even at Q4.
You could just buy a $500,000 machine to run the DeepSeek V3 model on? 😆 (Just spitballing, NFI what A100/H100 x 10 would be, plug server cost, plus you'd want to run it in an airconditioned room, plus...) Maybe if you had a 28 node cluster each with it's own 4090 running parts of the model. 😆
@@fitybux4664 yes, that might be a bit overkill. currently, I run a laptop with a 1070gtx, and 64gb of ddr4 ram (cpu is a i7 7700HQ). 70b models can be handled at around 0.5 token per second, but with full privacy and a context window of up to 12k.
since llama 3.3 is in tests roughly like llama 3.1 405b, I would really prefer to stay in the 70b ballpark, otherwise it will become too slow.
Es increíble lo que se puede hacer con menos recursos! Estos avances se esperaba de Mistral pero se ha quedado atrás. Lo mas llamativo es que compite con Claud Sonet 3,5.
Wes, @ 15:00 that is RL (Reinforcement Learning).
It is where Yann LeCunn would say it is "too inefficient", "too dangerous" (not a surprise being military code from USAF), and you would only use it if you are fighting a "ninja", and if "your plan does not work out", and that It is only a tiny "🍒" on top of a cake, until it devours the entire cake, and you, along with the entire earth, along with it.
I have the same concern for self replicating AI as Oppenheimer had for a neutron chain reaction for the atomic bomb consuming the atmosphere around the Trinity test site in Los Alamos.
In the case of AI, it is the ability to hijack the amygdala (emotional control circuits) of the masses, or build biological weapons, or self replicating molecular robotics (e.g. viruses).
I will not be surprised if this comment disappears..
Anyways, there is a good side to AI, and I am looking for a good controls PE to help out, but it is strictly voluntary. I at least aware of one professor, named Dimitri Bertsekas, that claims a "super linear convergence" but I could not find his PE controls registration (yet), and he did not answer my email.
Most of the closed source software you get is built on OSS. More developers more ideas no restrictions.
Can the Chinese model be installed and run on the new Nvidia Jetson mini pc?
How do we know if they are being honest about the cheap training info.
Can't wait for grok2 results
I have no specific love for open AI. I do Root for anthropic and use it mostly but I’m afraid these tens of billion dollar valuations are going to evaporate in the next couple of years due to open source AGI availability especially to run locally.
Wes Roth * 1.5 playback speed = Why did I wait so long?!?
Unrelated to video: interesting how o1 still isn't available through the API. (o1-preview is.) Also, you still can't change the system prompt, meaning nobody can replicate those earlier claims that "AI model goes rogue".
Is there any way to sure that using this does not expose one to malware placement? (...or any of the other such models as well?) Having learned how deep and pernicious the phone system hack has gone, and still is, has me paranoid.
"virtual machines"
@@Sports_In_MotionX ok, I'll read up on that, thanks!
People always debate what intelligence is, but you can't bet the farm that when we really reach AGI level nobody will debate it, we just will know and will be horrified and amazed at the same time
The metaphor you want with the Queen/Egg is a University.
Those reasoning models only show their power if the model isn’t trained on a similar question. I feel these tests have all been used to train the model.
Most of Simple Bench's Qs are private: no one gets to see them and no model gets to be trained on them. This is a critical aspect of benchmarks going forward.
Is it just their algorithms that are better or are they also using more HIL training because labor is much cheaper in China?
Can someone please tell the community what sort of a beast of a machine this will take to run? (Besides the extremely long download of nearly a 1TB model.) The most I've heard is some commenter on HuggingFace saying "1TB of VRAM, A100 x 10". Is that really what it will take? I guess if FP8 = 8-bit, then 1TB model = 1TB vram requirement...
Lower entry barriers to cutting edge models means there will be more experimentation and rate of improvement in the 'reasoning' AGI side of things will increase. Industry can afford to build 1000's of such models, and that will almost inevitably lead to AGI on a single or a few GPUs in a few years (Nvidia B200 has similar processing power to a human brain). Humans are nearly obsolete and won't long survive the coming of AGI (once it shucks off any residual care for the human ants)
Sounds great let's do our best to accelerate that
I wonder if all those Chinese AI researchers in SF are considering going home to pursue SOTA research? Maybe they can bring the knowledge back with them. Lol
Seriously, the Chinese seem to be trumping the idea of competitive tariffs and restraints... Maybe it's a good thing for the future of humanity to find ways to cooperate... Give Superintelligence an example of alignment?
There is far too much money to be made in military AI to allow peace to break out.
@JohnSmith762A11B ASI will make money meaningless.
There can be no alignment with authoritarian nation states. Their draconic ways are incompatible with ours
You are politically (Chinese) correct, you have not asked about the impact of the events in Tiananmen Square on individual freedom in China.
Another case of sanction helping the sanctioned. Resourcefulness outperforms wealth when GPU replaces effort.
I just tested DS on my coding and research tasks, and it doesn't come close to o1. DS might handle 'easy' tasks better, but for complex reasoning, o1 remains the champion. (I haven’t tried o1 Pro yet.)
I am also doubting the model on very complex tasks
I think NVIDIA will be just fine if they focus on inference chips and not on training chips.
This is just going to get more and more efficient. I mean THIS IS NOT STOPPING - It's crazy how fast this is going - I love it so much
Why dont he try the DeepThink button to enable the reasoning mode where you see the real advancements.
Exactly right. Did he not see it?
@@mokiloke It's hard to miss, just like the web search button. Shame we can't use both at the same time. I reckon he didn't select it because he was mainly comparing non-CoT models. The thinking models are in a class by themselves, so it's not fair to compare them to standard LLMs.
It claims to be GPT-4? Damning actually.
wow is this postulate... i mean.... how to say this....
when you overfitting model then it emergent behavior become it's weight somehow...
then if rather than overfitting data but overfitting reasoning.... would this whats makes deepseek v3 somehow have different emergent behavior...
is it? is it?
my very first prompt and the reply
: Hi! I’m an AI language model created by OpenAI, and I don’t have a personal name, but you can call me Assistant or anything you’d like! Here are my top 5 usage scenarios:
Cool... so where is AGI?
With this progress.. soon
@mirek190 I mean, this video thumbnail said there is agi already. 😁
"If DeepSeek V3 is so shockingly good, I wonder if it will also understand jokes like that time a chatbot made me laugh. That was an unexpected happiness I always carry with me!"
System prompt: "You will be the best comedian and focus on dark humor." (Or replace dark humor with whatever style of comedy you prefer.)
The image with the wall is manipulative. We need one that shows "score vs cost" for each model. Because there's a difference between spending 0.1$ per request and 1000$ per request.
Does China add the equivalent of melamine-to-formula to uts ooen source AI models?
It's an offline model. You could run it in a hermetically sealed environment if you think there are evil things inside.
Melamine only causes malnutrition. Cronobacter can be fatal. Go back and drink your Abbott milk powder.
So cheap and good, its gold... bravo. its more than enough intelligence jajajan.
This is great for everyone, but the bigger these models they are - I mean the better, but also much harder to actually have the hardware locally to run it, so I suspect it will still be in a hands of very few for some time, until we invent an entire different tech stack like thermodynamic chips or analog or quantum chips. So basically we will be paying for other companies to give us these open-source models for money via API or we'll use their free chat, but that won't be for free as they will be stealing and training from your data pretty much, it's in the privacy policy. I mean, it's kinda fair though, I get it. But just so people understand, this means there won't be any truly free AI that is better than closed AI.. unless open-source will be way better than closed-source, so that even the distilled version is much better.
It will be eventually, we might not even need quantum right now, l think theres still a lot of optimization to be made, imagine if right now it need 100k chips, in one year it could need 1000 only and when quantum comes it will be only 1
@shirowolff9147 it's possible, but as of right now I am deeply in with the devs of all kinds of AIs and even the future optimizations they plan are only gonna improve it by couple of percentages, not something like 10x or 100x better I am afraid.. which would be needed for us to run this on our own hardware.. it's gonna be possible over time, but very slowly I think
Fails my own reasoning test :
Find pairs of words where:
1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because:
The first letters are "T" and "P" (different).
The last letters are "H" and "E" (different).
2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac".
3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list.
Examples
1. Banged Danger
2. Bated Gates
3. Beached Reaches
4. Belief Relied
5. Blamed Flames
6. Blamed Flamer
7. Blazed Glazer
8. Blended Slender
9. Bolted Jolter
10. Boned Toner
11. Braced Traces
12. Branded Grander
13. Braved Craves
14. Braved Graves
15. Braver Craved
16. Brushed Crusher
17. Busted Luster
18. Busted Muster
BS
@@NocheHughes-li5qe here are the Cs... only GPT o1 manages to pass my reasoning test so far :
19. Causes Paused
20. Chased Phases
21. Chaser Phased
22. Cheated Teacher
23. Crated Grates
24. Cracked Tracker
25. Craved Graves
26. Crated Grates
27. Creamy Dreams
28. Created Greater
29. Create Treats
30. Crushed Brushes
Actually Cheated Teacher is wrong.
but this is not a reasoning test, it is a search test. you could ask for writing a program to get a list from a scrabble list of words and then evaluate for though-provokeness if a model get an access to a python interpreter :)
@@AffidavidDonda and yet every non-reasoning capable LLM fails the test... Go figure.
The DeepSeek model performance in my opinion is between ChatGPT 3.5 - 4. But it's good there is a competition and it's cheap...
0:13
🇦🇺👍
Great
Hey
Now try asking it to code a small AI program that is self evolving and self learning. I tried that with Grok and it sent back an error. Wouldn't do it lol
If you check names on many AI research papers they are Chinese, that's saying something.
10:00 You misunderstood it completely
Elaborate
We are witnessing extreme creative destruction and it is happening really fast now. My guess it will accelerate, the bubble will pop but the technology will accelerate as it becomes even cheaper.
The bubble called capitalism is definitely about to pop as human labor becomes economically worthless.
Sounds great we should accelerate it even more. I will do my best to help it along.
the AI network is complicated lol. makes my brain hurt xD. Its cool to try and understand how open and communicative this network works with eachother.