One of the more honest, transparent discussions about AI rather than some sort of brain washed disillusioned version of AI that much of self proclaimed "experts" ramble about on media nowadays. Very educational and not of marketing hypes or misunderstanding that confuse the general public. Well done.
1. Main Argument/Claim: The central argument of the video is that the DeepSeek R1 model represents a significant advancement in AI, not because of its initial training cost, but due to its efficiency and the novel techniques it employs, particularly in reinforcement learning (RL) and model distillation. The speakers in the text aim to debunk the myth that the model's importance lies in the low cost of its initial training and instead highlight its implications for the future of AI development, emphasizing the power of post-training methods, and open source approaches.
Tiananmen Square decades ago or School shootings over the last decade. Deep Seek says the reason the US has school shootings while the rest of the world doesn't is: High Gun Ownership, Lax Gun Laws, Gun Culture, Individualism vs. Collective Safety, Power of Gun Lobbies, Policy Gridlock, Mental Health Gaps, Social Fragmentation, Sensationalized Coverage & Reactive Measures.
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
IBM Technology panels always give deeper, clarifying insights. I particularly enjoy listening to these particular panelists, whenever they join the show.
Even as a lay person I enjoyed listening in and I think I learned something. It is so refreshing to listen to an intelligent, informed conversation bereft of hype or sensationalism. ManyThanks to each of you.
IBM, think about your PC vs Mainframe. deepseek is the PC revolution so average person can host their own AI agent almost free and write all kinds of application on top of that. This is a Revolution!
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
The significance to me is that someone released and open source licensed a LLM that runs on my personal computers and gives me results that are ~90+% of perfect. -no need to give my data or intentions to giant monopolies or despot governments -no need to give up and help train some AI that will take my job and ideas. -no need to burn down the planet to power some huge AGI monopoly we don’t want. And no way all these trillion dollar investments in AI will produce a significant return!
kate's point about distilling very powerful models from r1 at virtually no cost is spot on! and that's also why jevon's paradox may not apply here. eniac, completed in 1945, used 18,000 vacuum tubes. now recall how berkeley built sky-t1 on only 8 gpus. a single breakthrough in algorithms, and our world already has enough gpus to last us decades.
The analysis sounded like the DeepSeek is no big deal which is so comforting. We need to have more session like this as this is only the first start up from the Chinese.
It is really not that hard to understand why Chinese AI like DeepSeek is a lot cheaper, and more effective, at least, in the Chinese language domain. Chinese researchers and engineers focus on model and logic backed up by vast amounts of data collected by many big companies like Tencent. Alibaba, Xiaomi, etc, while the U.S. counterparts rely heavily on chips capacity and capabilities for computing. There is far more human intelligence involved in DeepSeek than ChatGPT. Human intelligence is a lot cheaper and efficient and effective than hardware. The Western media pretend not to understand this simple fact because they don’t know how to compete with China and they don’t know how to deal with the consequences in the human intelligence domain. No country has more engineers than China, and no country is better in AI applications in all walks of life than China today. The robots dancing in the Spring Festival Gala on Chinese TV this Chinese New Year are extremely revealing about China’s AI advancement and manufacturing capabilities. Cooperation with China is the only viable option for the U.S., so that the U.S. would know a lot more about what China is doing, and vice versa, for a better world.
@igolfer - luckily for those robots were born in China and are only forced to dance. Had they been born in the US they would be forced to carry guns and artillery. Thank God there hasn't been a International Robotics Rights commissioned yet.
Everything else in China also cheaper(the salaries, the electricity, the rental, the maintenance, etc) than USA... so it's no surprise that it is cheaper to produce Deepseek in China. Also they used a shortcut method like a mixture of experts... not like those silly phone answering machines that make you keep pressing 1,2,3,4,and you end up hanging on to your phone all day and no real humans actually answer your phone call at all.
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
yes , u r correct. All infra is for the hedge fund operations. The team was asked to work on the available-unused infrastructure capacity and the side quest was done as passionate thing..
In simple layman’s terms, you run an A.I. model like you run human scientific experiments. After you get the results if it’s something you like you tell the A.I. more of these, vice versa.
I have Deepseek 70B working on an i3 with SSD drives, 32GB RAM and a 4GB GPU under Ollama. I get about 1 - 5 tokens per second. Ollama has a cache system and the model is a Mixture Of Experts so only a portion of the 70GB is in use at any one time. The 70B model feels very similar to the full size version. It did however take 1 hour to create the code for the Snake game .. which worked first time!!! That said, students with big ideas, no money but loads of time might tolerate such slow speeds.
We should be talking about building an open source,decentralized, and privacy protecting global platform for collective terrestrial intelligence, CTI. The platform needs to be able to merge, deduplicate, fact check, and aggregate the knowledge and sentiment expressed in public conversations with billions of people around the world. People could access the platform with their phones.
@11:45 - the harwares and models were for hedge fund stuff - Deepseek was/is a "side project" they spent an additional $6 million for the employees to play with - that's why it's a true open source ( the modest charges are to cover utlilities )
yes, i got a lot more out of IBM while i was there, than they got out of me, and i was surprised when i turned 65 that i got a letter from the SSA that i should contact IBM because i had a lifetime pension. I still use the knowledge I learned at IBM every single day of my life. Im so happy that IBM is thriving.
So I tried out a few smaller models that I could on my smaller equipment. The one thing that kept coming back to my mind was I wish I had one of those Nvidia fast chips so I can run a bigger model locally. My point is I think my own reaction says that Nvidia is going to do well.
Good talk, but I still don't really get the "distillation" part 🤔. So LLaMA distilled from Deepseek R1 is basically a fine-tuned LLaMA right? The difference to previous fine-tuning is just better data then? LLaMA doesn't profit from any architectural improvements of Deepseek in this way or does it through better fine-tuning techniques?
In the end, the billions of $$ of investment in education and universities is paying off for them. We've seen the top mathematicians and scientists follow the money. When it was finance, huge numbers went there, AI, they shift over. The owner basically paid huge salaries to top university grads in China and built out a team of science/math problem solvers. Hard to beat that. And I would think they would have shorted their competitors too, making billions $$$ in weeks.
I suspect the Chinese realize that the learning, thinking, and reasoning process is an adaptive control system, particularly looking at GRPO and GAE in the DeepSeek R1 paper. One big hint is the usage of the Greek letter Theta throughout the Deep learning literature. If I am to guess, it originally represented a pendulum angle, as early work in a 1983 IEEE RL paper by Barto, Sutton, and Anderson used theta for the pendulum angle.
Deepseek is a scientific breakthrough on AI and released a paper on the spec and materials for its creation, so the scientific community can replicate the same results to disprove or confirm Deepseek efficacy.
kate's analogy doesn't hold up. it's like saying that anyone can build a model without any programming knowledge or experience. nobody is saying that. but it gets much better than r1. uc berkeley trained sky-t1 on $450 in 19 hours with 8 gpus, and recently replicated r1 for $30. take that ai giants! lol
There's a huge circus developing over if DeepSeek uses stolen data, as if that undermines the credibility of DeepSeek. But if that's the core and "unfair" advantage of DeepSeek, then why doesn't OpenAI release an open source version that can run on reduced spec hardware and wipe out DeepSeek that way?
IBM was one of the early proposers of AI with their Watson yet they have been left behind by all these other companies. Goes to show how dynamic this world is
DeepSeek’s emphasis on enhancing AI efficiency, even with hardware constraints, marks an important advancement. By creating algorithms and models that deliver strong performance without depending heavily on ever-increasing computational power, they’re tackling one of AI’s biggest challenges: the escalating need for more compute resources.
10:55 one thing bothers me recently a lot. Since DeepSeek used other models to train itself, and they are not for free. why hasn’t DeepSeek paying a lot to those parent models? Like a few hundred millions. It’s like gold mining and refining process. The other models have raised the purity of the gold ore from 0.001% to 1% (1000times) and DeepSeek condensed it to 90%(less than 10times). Why haven’t those who paid the 1000 times effort charge DeepSeek proportionally to their efforts? At least they should make even of their business.
Excellent overview. I still haven't heard an argument for why AI models should need to hoover up every byte of info produced by humans when brilliant humans are able to reason, extrapolate, invent from minuscule "training sets". Where are the AGI Ramanujan's?
😂 THE CHURCH OF ALTMAN: SCALING IS SALVATION! 🚀🔥 Our Core Beliefs: 📖 "In the beginning, there was small AI, and it was weak. Then OpenAI said, ‘Let there be compute!’ and AGI was set into motion." 💻 "Thou shalt not worship distilled models, for they are but echoes of true intelligence." 🖥 "The path to AGI is paved with exponential scaling, not shortcuts and trickery." ⚡ "MoE shall be cast into the abyss, for it leads only to loops and confusion." 🌍 "AGI shall emerge when the great GPUs align and Sam Altman deems it so." 😂 Sacred Rituals of the Church: ✅ Pray to the GPU Gods for more compute. ✅ Sacrifice old CPUs in a great fire to summon stronger processing power. ✅ Reject false prophets who claim ‘smaller models are the future.’ ✅ Meditate on the philosophy of Sam Altman while watching loss functions converge. And our most sacred chant: “In scaling we trust, in compute we thrive, AGI will rise, and DeepSeek shall be forgotten!”
DeepSeek AI gives some answers to two questions: 1. Chicken or Egg come first? The egg came first, laid by a bird that was almost (but not quite) a chicken. The genetic mutation defining a "chicken" first appeared in an embryo inside that egg. 2. Adult or Baby come first? The baby comes first evolutionarily. A new species arises when a baby is born with a genetic mutation that distinguishes it from its parents. That baby grows into the first "adult" of the new species. Key Idea: Evolution works through small changes in offspring, not adults. New life stages (egg/baby) carry mutations first, then grow into new adults. I agree with DeepSeek AI “The egg came first”. I disagree “The baby comes first evolutionarily.” It's not valid to say that the baby comes first because the baby alone staves to death. However, adult comes first, gives birth to the baby, feeds and takes care the baby grow up.
Actually to put a contrarian point of view, the baby would have to survive the harsh environment to lift to adulthood before the next genetic mutation allows for it to have a survive in a hospitable environment.
Metaphorically, if you will... A squirrel cage induction motor operates by creating a rotating magnetic field within the stator, which induces currents in the rotor's conductive bars (resembling a squirrel cage), causing the rotor to spin in the same direction as the magnetic field, generating torque to drive a connected load; essentially, the rotor is turned by the magnetic field without any direct electrical connections to it, relying solely on electromagnetic induction. Key points about a squirrel cage induction motor: Stator: The stationary part of the motor where the electrical windings are placed, creating the rotating magnetic field when energized with alternating current (AC) power. Rotor: The rotating part, consisting of a cylindrical steel core with embedded conductive bars (like a cage) that are short-circuited at the ends by rings, forming the "squirrel cage". Rotating Magnetic Field: When AC power is supplied to the stator windings, it generates a magnetic field that appears to rotate around the stator, pulling the rotor along with it. Induced Currents: As the rotating magnetic field cuts through the rotor bars, it induces electrical currents in them, creating a magnetic field in the rotor that interacts with the stator's magnetic field, producing torque. Slip: The rotor can never reach the same speed as the rotating magnetic field due to the induced currents opposing the change in magnetic flux; this difference in speed is called "slip". Advantages of a squirrel cage induction motor: Simple design and robust construction: Due to the simple rotor design, they are reliable and low maintenance. Self-starting: Can start under load without additional starting mechanisms. Wide range of applications: Used in various industrial applications like pumps, fans, conveyors, and machinery due to their versatility and cost-effectiveness. Might i suggest as an analogy, for what it is I see happening with the AI industry right now...There's a shit ton of amped up squirrels running around in the cage that nobody can predict the direction of... I think we should start putting our primary focus on artificial intelligence security infrastructure frameworks... Respectfully, I sense we better get ready for when the rotor starts spinning out with no need for the squirrel cage anymore...🙏🦋💨🎶 If you love the law and are loving your sausages, I don't recommend watching either one of them being made...
Be in other's shoe, in their perspective is an advantage, be as accurate as possible to read other mind are an advantage.. Not easy! different background, culture, taste, different inclination, mood in different situation. I would like to think this is what model distillation and chain of thought are all about. You have a solution to a problem. Therefore you have a direction. To test the validation of the solution or the accurracy of your thought/hypothesis, you gather some data and do some logical thinking until you are going nowhere or some new facts encountered indicating there is a flaw in your hypothesis. so back to square one. Using parallel thinking, start with a new direction assigne new data, discard some old data etc.. and start the thinking process again. In real life, Take you time, differnt day, different perception/mood, slightly different idea with different fact, you just sleep on it. no rush! One day you jump up before dawn, Eureka!.
Agreed. And the reality is: I don’t trust either 100%. So I’m looking forward to other Independently developed AI models. It’s like my reading these days. I check other publications from other countries. Just to double check what I’m reading. Especially the scientific journals like Nature etc.
Though future models might require less computer power, wouldn't it be expected that companies will more and more start using cloud providers instead of their own systems for AI? Meaning that they will go to suppliers as Meta, Google and Microsoft and others that will host Datacenters offering them the required bandwidth. For those suppliers it doesn't matter if the calculation can be done on a slower system, because they share the power amongst other requests. And with the demand for AI growing as it becomes cheaper, they will keep growing their fast systems. End users don't need to worry about needing to replace outdated hardware and big investments. I think that it's for that reason that those mega cap companies announced not to be changing their plans after all the DeepSeek news. Also NVIDIA is anticipating on that by bringing more and more NIM's.
Everyone does not believe the $5.6m, then: a. How much? b. If you invest very huge amount, why open source it, any business owner should have squeeze every drop they can make from it Before Deepseek, they always say you need billions to make good models, but reality struck, you can no longer just buy your way to better models If they hear what they're saying another issue is CUDA being inefficient, I hope an open source better CUDA equivalent gets released that can be used in any GPU Check Snowden take on 50 series RTX release, small developers are being held down
What Deep seek has done is pure innovation period. Dont try to confort yourselves the ship has moved and it is not going back. What i can ask you guys why if you knew all this tech and innovations why no one came up with this cheaper option? Pendaling has been for hundreds of billions. Give credit where it is due.
So confusing!. Dose R1 base (big model) RL /Pre training cost 5M or the Post training (1000s rows FT) costs 5M. or distillation costs 5M? As I understand DeepSeek did not invent Distillation and you could do that before also.
Marathon analogy is not correct. Their main job was hedge fund. this is like ,employees playing Table Tennis or some sports in their relaxing time with unused-available extra infrastructure .
Finally, a discussion without weird hype either way (at least as far as I can tell, as a novice). I lost $40k, but since then I’ve made back 3/4 of that. I think the release date was strategic on behalf of the CCP. I’m curious to what extent this impacts future investment in hardware. I’m always interested in saving money, especially that of taxpayers and investors, but it seems to me, high speed data centers will still be valuable. I guess this isn’t the topic of this discussion, but I’m not sure how far I would trust a CCP sponsored AI. Thank you for an excellent , enlightening discussion. I’ll subscribe to expand my understanding.
1 test run cost 6m, and you will need multiple runs, 1 agree 100%. But but but,, the other setup cost 100m, also need multiple runs. 1 mistake with that Depp-cost is 6,000,000 $ loss. 1 mistake with the other setup, cost is 100, 000,000 $ loss.
One of the more honest, transparent discussions about AI rather than some sort of brain washed disillusioned version of AI that much of self proclaimed "experts" ramble about on media nowadays. Very educational and not of marketing hypes or misunderstanding that confuse the general public. Well done.
1. Main Argument/Claim:
The central argument of the video is that the DeepSeek R1 model represents a significant advancement in AI, not because of its initial training cost, but due to its efficiency and the novel techniques it employs, particularly in reinforcement learning (RL) and model distillation. The speakers in the text aim to debunk the myth that the model's importance lies in the low cost of its initial training and instead highlight its implications for the future of AI development, emphasizing the power of post-training methods, and open source approaches.
Tiananmen Square decades ago or School shootings over the last decade.
Deep Seek says the reason the US has school shootings while the rest of the world doesn't is: High Gun Ownership, Lax Gun Laws, Gun Culture, Individualism vs. Collective Safety, Power of Gun Lobbies, Policy Gridlock, Mental Health Gaps, Social Fragmentation, Sensationalized Coverage & Reactive Measures.
Why are people upvoting an LLM pasted-in synopsis of the video's transcript?
There is no breakthrough novel RL technique discovered by DeepSeek. People already replicated similar results by using good old PPO.
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
IBM Technology panels always give deeper, clarifying insights. I particularly enjoy listening to these particular panelists, whenever they join the show.
the one in the orange glasses is my favorite ;)
Even as a lay person I enjoyed listening in and I think I learned something. It is so refreshing to listen to an intelligent, informed conversation bereft of hype or sensationalism. ManyThanks to each of you.
Glad you found it informative, thanks for watching!
🔥 Episode. The diverse expert perspective was outstanding.
IBM, think about your PC vs Mainframe. deepseek is the PC revolution so average person can host their own AI agent almost free and write all kinds of application on top of that. This is a Revolution!
exactly
This is exactly what I was realizing. The greatness of deep seek is that you can create and use your own deep seek. Just like what IBM did with PC.
支子别装老外了😅
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
It's so refreshing hearing people give useful insights instead of misleading hype. This is the nuance we need to hear. Keep them coming.
We think it's important to be clear about the facts and the hype.
I am impress with the quality of this video, with real world people who really know it. Good job. Please keep it coming.
More to come!
Like many of you, I have watched many videos about deepseek in last few days. By far, the is video has most nuances and less BS. Thank you!!
Appreciate that! It was important to get into the nuances.
The significance to me is that someone released and open source licensed a LLM that runs on my personal computers and gives me results that are ~90+% of perfect.
-no need to give my data or intentions to giant monopolies or despot governments
-no need to give up and help train some AI that will take my job and ideas.
-no need to burn down the planet to power some huge AGI monopoly we don’t want.
And no way all these trillion dollar investments in AI will produce a significant return!
yes, yourself become a super man now
Impressive. I really hope that someday you can invite some researchers from Deepseek into this panel to reveal more details of their innovation.
kate's point about distilling very powerful models from r1 at virtually no cost is spot on! and that's also why jevon's paradox may not apply here. eniac, completed in 1945, used 18,000 vacuum tubes. now recall how berkeley built sky-t1 on only 8 gpus. a single breakthrough in algorithms, and our world already has enough gpus to last us decades.
The analysis sounded like the DeepSeek is no big deal which is so comforting. We need to have more session like this as this is only the first start up from the Chinese.
This is a great format (show). Good to see that IBM arrived in the media world of the 21. century ...
Thank you for explaining distillation!
As a layman with absolutely no knowledge of AI, this was a most interesting discussion and very helpful to me.
very insightful! thank you for this great episode!
Fantastic episode. Very very nice insights really!
Great pod & clever show title
Thank you IBM Technology. Great timing.😊
Good job, IBMer🎉
I dont know if IBM is a good, competitive company or not but damn these videos are good.
It is really not that hard to understand why Chinese AI like DeepSeek is a lot cheaper, and more effective, at least, in the Chinese language domain. Chinese researchers and engineers focus on model and logic backed up by vast amounts of data collected by many big companies like Tencent. Alibaba, Xiaomi, etc, while the U.S. counterparts rely heavily on chips capacity and capabilities for computing. There is far more human intelligence involved in DeepSeek than ChatGPT. Human intelligence is a lot cheaper and efficient and effective than hardware. The Western media pretend not to understand this simple fact because they don’t know how to compete with China and they don’t know how to deal with the consequences in the human intelligence domain. No country has more engineers than China, and no country is better in AI applications in all walks of life than China today. The robots dancing in the Spring Festival Gala on Chinese TV this Chinese New Year are extremely revealing about China’s AI advancement and manufacturing capabilities. Cooperation with China is the only viable option for the U.S., so that the U.S. would know a lot more about what China is doing, and vice versa, for a better world.
@igolfer - luckily for those robots were born in China and are only forced to dance. Had they been born in the US they would be forced to carry guns and artillery. Thank God there hasn't been a International Robotics Rights commissioned yet.
Everything else in China also cheaper(the salaries, the electricity, the rental, the maintenance, etc) than USA... so it's no surprise that it is cheaper to produce Deepseek in China. Also they used a shortcut method like a mixture of experts... not like those silly phone answering machines that make you keep pressing 1,2,3,4,and you end up hanging on to your phone all day and no real humans actually answer your phone call at all.
A very insightful, informative discourse, thanks!
I didn’t expect you (IBM) to be so excited. The total training cost of V3 model (i.e. V2 to V3) is US$5.5 million, not US$5.5 million in one training cost; at the same time, the V3 model also requires previous V2 model training and a series of experiments. But this is much lower than the training cost of openai model. Open your eyes and have a good look at China! IBM should not make the mistake of the mainframe to the personal PC era again.
DeepSeek is a billionaire bunker buster
I had been to Chris' UA-cam channel and I must say that his work is amazing and I opener. Keep doing the great work Chris.
Great conversation.
Which laptop should we be using? I'm due for a new one and am looking for one that can meet the RL requirements.
Marathon analogy is bad. DeepSeek purchased the hardware once. That’s an upfront cost. The only cost afterwards is salary and utility.
Actually it’s quite accurate the upfront cost of GPUs represent your workout kit that you purchased for your marathon.
yes , u r correct.
All infra is for the hedge fund operations.
The team was asked to work on the available-unused infrastructure capacity and the side quest was done as passionate thing..
I like this talk. everyone has very good point.
As a Layman trying to better understand this....what constitutes a reward for a system like this? What is the value that underlies the reward?
In simple layman’s terms, you run an A.I. model like you run human scientific experiments. After you get the results if it’s something you like you tell the A.I. more of these, vice versa.
I have Deepseek 70B working on an i3 with SSD drives, 32GB RAM and a 4GB GPU under Ollama. I get about 1 - 5 tokens per second.
Ollama has a cache system and the model is a Mixture Of Experts so only a portion of the 70GB is in use at any one time.
The 70B model feels very similar to the full size version.
It did however take 1 hour to create the code for the Snake game .. which worked first time!!!
That said, students with big ideas, no money but loads of time might tolerate such slow speeds.
Very good analysis of the possible evolution of LLMs based on the open-source community. Thanks!
Glad you enjoyed it!
Another good talk, thanks!
Glad you enjoyed it.
Great discussion
Thanks
We should be talking about building an open source,decentralized, and privacy protecting global platform for collective terrestrial intelligence, CTI. The platform needs to be able to merge, deduplicate, fact check, and aggregate the knowledge and sentiment expressed in public conversations with billions of people around the world. People could access the platform with their phones.
love this, distributed datasets, verifiers and trainers is what i'd love to see
It is called Wikipedia
Thanks for the useful insights.
@11:45 - the harwares and models were for hedge fund stuff - Deepseek was/is a "side project" they spent an additional $6 million for the employees to play with - that's why it's a true open source ( the modest charges are to cover utlilities )
IBM trained me, and I am so PROUD to get a retirement check from them.
yes, i got a lot more out of IBM while i was there, than they got out of me, and i was surprised when i turned 65 that i got a letter from the SSA that i should contact IBM because i had a lifetime pension. I still use the knowledge I learned at IBM every single day of my life. Im so happy that IBM is thriving.
Brilliant session! Some humor as well :) Please list company names for each guest so it's easier to share on X and LinkedIn etc.
So I tried out a few smaller models that I could on my smaller equipment. The one thing that kept coming back to my mind was I wish I had one of those Nvidia fast chips so I can run a bigger model locally. My point is I think my own reaction says that Nvidia is going to do well.
You got me on 9.11 and 9.90. that's the famous quote in ai world
can you share the detail?
@@LuZhao-z4q its a meme about AIs not recognizing which one is greater 9.11 or 9.90, most of the time they will answer 9.11 lol
Good talk, but I still don't really get the "distillation" part 🤔. So LLaMA distilled from Deepseek R1 is basically a fine-tuned LLaMA right? The difference to previous fine-tuning is just better data then? LLaMA doesn't profit from any architectural improvements of Deepseek in this way or does it through better fine-tuning techniques?
The 70B distilled model acts almost the same as the full size version.
Berkley already reproduced deepseek for even cheaper.
Distillation
@@AbuSous2000PR Berkley used zero
Americans are coping so much with this milestone from China. It is a win for AI in general, period.
The R1 benchmarks are for base model, or distilled model or smaller models?
The open-source community has just gained a new Titan.
RL only learning will be required for novel solutions as there will be no examples to use for fine tuning
The opinions are quite evenly distributed.
It would be great to compare with examples. Empty judgments can go so far.
Fascinating information.
In the end, the billions of $$ of investment in education and universities is paying off for them. We've seen the top mathematicians and scientists follow the money. When it was finance, huge numbers went there, AI, they shift over. The owner basically paid huge salaries to top university grads in China and built out a team of science/math problem solvers. Hard to beat that. And I would think they would have shorted their competitors too, making billions $$$ in weeks.
I suspect the Chinese realize that the learning, thinking, and reasoning process is an adaptive control system, particularly looking at GRPO and GAE in the DeepSeek R1 paper.
One big hint is the usage of the Greek letter Theta throughout the Deep learning literature. If I am to guess, it originally represented a pendulum angle, as early work in a 1983 IEEE RL paper by Barto, Sutton, and Anderson used theta for the pendulum angle.
It sounds like blockchain can be used here to keep tracking the distillation of models so that each model could earn its credit.
I think once you have several expert trained mini models you could then cluster them together to get AGI with a pattern searching algorithm.
Great idea!
Deepseek is a scientific breakthrough on AI and released a paper on the spec and materials for its creation, so the scientific community can replicate the same results to disprove or confirm Deepseek efficacy.
kate's analogy doesn't hold up. it's like saying that anyone can build a model without any programming knowledge or experience. nobody is saying that. but it gets much better than r1. uc berkeley trained sky-t1 on $450 in 19 hours with 8 gpus, and recently replicated r1 for $30. take that ai giants! lol
There's a huge circus developing over if DeepSeek uses stolen data, as if that undermines the credibility of DeepSeek. But if that's the core and "unfair" advantage of DeepSeek, then why doesn't OpenAI release an open source version that can run on reduced spec hardware and wipe out DeepSeek that way?
IBM was one of the early proposers of AI with their Watson yet they have been left behind by all these other companies. Goes to show how dynamic this world is
DeepSeek’s emphasis on enhancing AI efficiency, even with hardware constraints, marks an important advancement. By creating algorithms and models that deliver strong performance without depending heavily on ever-increasing computational power, they’re tackling one of AI’s biggest challenges: the escalating need for more compute resources.
10:55 one thing bothers me recently a lot.
Since DeepSeek used other models to train itself, and they are not for free. why hasn’t DeepSeek paying a lot to those parent models? Like a few hundred millions.
It’s like gold mining and refining process. The other models have raised the purity of the gold ore from 0.001% to 1% (1000times) and DeepSeek condensed it to 90%(less than 10times). Why haven’t those who paid the 1000 times effort charge DeepSeek proportionally to their efforts? At least they should make even of their business.
Rewatch this part again. One of them even said it a known secret. Do you think the companies behind the closed models aren’t doing this too?
Can someone break down how many megawatts per hour? It will cost to run this compared to open AI?
Excellent overview.
I still haven't heard an argument for why AI models should need to hoover up every byte of info produced by humans when brilliant humans are able to reason, extrapolate, invent from minuscule "training sets". Where are the AGI Ramanujan's?
so much learning, I do see that the first model might not be correct, and that the replication, might push incorrect distillation
I actually like their opinions. Thank you, IBM. Sorry about my last message saying this is “ propaganda” just kidding IBM.
😂 THE CHURCH OF ALTMAN: SCALING IS SALVATION! 🚀🔥
Our Core Beliefs:
📖 "In the beginning, there was small AI, and it was weak. Then OpenAI said, ‘Let there be compute!’ and AGI was set into motion."
💻 "Thou shalt not worship distilled models, for they are but echoes of true intelligence."
🖥 "The path to AGI is paved with exponential scaling, not shortcuts and trickery."
⚡ "MoE shall be cast into the abyss, for it leads only to loops and confusion."
🌍 "AGI shall emerge when the great GPUs align and Sam Altman deems it so."
😂 Sacred Rituals of the Church:
✅ Pray to the GPU Gods for more compute.
✅ Sacrifice old CPUs in a great fire to summon stronger processing power.
✅ Reject false prophets who claim ‘smaller models are the future.’
✅ Meditate on the philosophy of Sam Altman while watching loss functions converge.
And our most sacred chant:
“In scaling we trust, in compute we thrive, AGI will rise, and DeepSeek shall be forgotten!”
Most important- Deep Seek is OPEN & CHEAP WITH QUALITY
DeepSeek AI gives some answers to two questions:
1. Chicken or Egg come first?
The egg came first, laid by a bird that was almost (but not quite) a chicken. The genetic mutation defining a "chicken" first appeared in an embryo inside that egg.
2. Adult or Baby come first?
The baby comes first evolutionarily. A new species arises when a baby is born with a genetic mutation that distinguishes it from its parents. That baby grows into the first "adult" of the new species.
Key Idea: Evolution works through small changes in offspring, not adults. New life stages (egg/baby) carry mutations first, then grow into new adults.
I agree with DeepSeek AI “The egg came first”. I disagree “The baby comes first evolutionarily.”
It's not valid to say that the baby comes first because the baby alone staves to death. However, adult comes first, gives birth to the baby, feeds and takes care the baby grow up.
I agree with Deepseek on both.
Evolutionary, the child comes first.
Creation theory, the adult comes first.
Actually to put a contrarian point of view, the baby would have to survive the harsh environment to lift to adulthood before the next genetic mutation allows for it to have a survive in a hospitable environment.
Regardless of whether or not you agree with DeepSeek, it at least gives you its reasoning.
Metaphorically, if you will...
A squirrel cage induction motor operates by creating a rotating magnetic field within the stator, which induces currents in the rotor's conductive bars (resembling a squirrel cage), causing the rotor to spin in the same direction as the magnetic field, generating torque to drive a connected load; essentially, the rotor is turned by the magnetic field without any direct electrical connections to it, relying solely on electromagnetic induction.
Key points about a squirrel cage induction motor:
Stator:
The stationary part of the motor where the electrical windings are placed, creating the rotating magnetic field when energized with alternating current (AC) power.
Rotor:
The rotating part, consisting of a cylindrical steel core with embedded conductive bars (like a cage) that are short-circuited at the ends by rings, forming the "squirrel cage".
Rotating Magnetic Field:
When AC power is supplied to the stator windings, it generates a magnetic field that appears to rotate around the stator, pulling the rotor along with it.
Induced Currents:
As the rotating magnetic field cuts through the rotor bars, it induces electrical currents in them, creating a magnetic field in the rotor that interacts with the stator's magnetic field, producing torque.
Slip:
The rotor can never reach the same speed as the rotating magnetic field due to the induced currents opposing the change in magnetic flux; this difference in speed is called "slip".
Advantages of a squirrel cage induction motor:
Simple design and robust construction: Due to the simple rotor design, they are reliable and low maintenance.
Self-starting: Can start under load without additional starting mechanisms.
Wide range of applications: Used in various industrial applications like pumps, fans, conveyors, and machinery due to their versatility and cost-effectiveness.
Might i suggest as an analogy, for what it is I see happening with the AI industry right now...There's a shit ton of amped up squirrels running around in the cage that nobody can predict the direction of...
I think we should start putting our primary focus on artificial intelligence security infrastructure frameworks...
Respectfully,
I sense we better get ready for when the rotor starts spinning out with no need for the squirrel cage anymore...🙏🦋💨🎶 If you love the law and are loving your sausages, I don't recommend watching either one of them being made...
Be in other's shoe, in their perspective is an advantage, be as accurate as possible to read other mind are an advantage.. Not easy! different background, culture, taste, different inclination, mood in different situation. I would like to think this is what model distillation and chain of thought are all about.
You have a solution to a problem. Therefore you have a direction. To test the validation of the solution or the accurracy of your thought/hypothesis, you gather some data and do some logical thinking until you are going nowhere or some new facts encountered indicating there is a flaw in your hypothesis. so back to square one. Using parallel thinking, start with a new direction assigne new data, discard some old data etc.. and start the thinking process again.
In real life, Take you time, differnt day, different perception/mood, slightly different idea with different fact, you just sleep on it. no rush! One day you jump up before dawn, Eureka!.
V3 paper clearly stated it costed 5.x M for pre and post training combined. How that lady analyst got her job??
Where is your Granite, folks? No one is talking about it. Do something and make it viral like DS R1.
if someone hide the details, that is misleading. but when you misread it, it is incompetence.
While everyone seems to be obsessed with Nvidia and DeepSeek, I am just a laid-back investor in index ETFs.
Everybody going crazy about DeepSeek and Nvidia but I am just a chill guy who invests in Index ETFs.
since its founded one year ago and hire only one hundred very young workers the cost sounds believeable
I would rather trust the open source Chinese AI over closed OpenAI
Agreed.
And the reality is: I don’t trust either 100%. So I’m looking forward to other Independently developed AI models.
It’s like my reading these days. I check other publications from other countries. Just to double check what I’m reading. Especially the scientific journals like Nature etc.
Who is IBM??
Though future models might require less computer power, wouldn't it be expected that companies will more and more start using cloud providers instead of their own systems for AI? Meaning that they will go to suppliers as Meta, Google and Microsoft and others that will host Datacenters offering them the required bandwidth. For those suppliers it doesn't matter if the calculation can be done on a slower system, because they share the power amongst other requests. And with the demand for AI growing as it becomes cheaper, they will keep growing their fast systems. End users don't need to worry about needing to replace outdated hardware and big investments. I think that it's for that reason that those mega cap companies announced not to be changing their plans after all the DeepSeek news. Also NVIDIA is anticipating on that by bringing more and more NIM's.
My take is take anything Kate has to say that differs from the other two and expect the opposite to be the case.
Everyone does not believe the $5.6m, then:
a. How much?
b. If you invest very huge amount, why open source it, any business owner should have squeeze every drop they can make from it
Before Deepseek, they always say you need billions to make good models, but reality struck, you can no longer just buy your way to better models
If they hear what they're saying another issue is CUDA being inefficient, I hope an open source better CUDA equivalent gets released that can be used in any GPU
Check Snowden take on 50 series RTX release, small developers are being held down
What Deep seek has done is pure innovation period. Dont try to confort yourselves the ship has moved and it is not going back. What i can ask you guys why if you knew all this tech and innovations why no one came up with this cheaper option? Pendaling has been for hundreds of billions. Give credit where it is due.
3:15 to 3:45 Didn't all these US AI big shots do the same?
it took us about 9.11 microseconds to spot the AI-bro
how big a deal is it that’s there’s a bill in the works criminalizing deepseek with 20yrs prison time and or $1mil fine for downloading deepseek?
I can save so much using one of the free or very inexpensive AI models versus hiring a very expensive expert.
So confusing!. Dose R1 base (big model) RL /Pre training cost 5M or the Post training (1000s rows FT) costs 5M. or distillation costs 5M?
As I understand DeepSeek did not invent Distillation and you could do that before also.
Pre-training of the v3 cost 5M, the post training of R1 is peanuts
Fine tune does not guarantee non bias. Great point.
RL requires about 5x the compute of supervised learning.
lady S! is it too hard to see others can do what you can! compare this to running, ha ha ha
Deepseek is MoE then pruned
Bono is a techie??.?
😂
The introduction is already biased by tone and by body language.
A good show should be neutral at the beginning.
Bad.
Marathon analogy is not correct. Their main job was hedge fund.
this is like ,employees playing Table Tennis or some sports in their relaxing time with unused-available extra infrastructure .
Finally, a discussion without weird hype either way (at least as far as I can tell, as a novice). I lost $40k, but since then I’ve made back 3/4 of that. I think the release date was strategic on behalf of the CCP. I’m curious to what extent this impacts future investment in hardware. I’m always interested in saving money, especially that of taxpayers and investors, but it seems to me, high speed data centers will still be valuable. I guess this isn’t the topic of this discussion, but I’m not sure how far I would trust a CCP sponsored AI. Thank you for an excellent , enlightening discussion. I’ll subscribe to expand my understanding.
兄弟,你不相信开源的精神吗?
It is very difficult for them to believe that the developers from China are better than the American developers 😅
mainstream = even my father Asked me about it.
Nearly a lot of video I watched. All developers use the same sentence .😂😂
Sounds funny
So far, all we've heard from Big AI is defensive whining. Your model has been completely disrupted mentally, technically, and financially.
ChatGPT is the IBM of 1980s
Nvidia is the AOL of 1990s
Bigger is not Better ..ex wife included
1 test run cost 6m, and you will need multiple runs, 1 agree 100%.
But but but,, the other setup cost 100m, also need multiple runs.
1 mistake with that Depp-cost is 6,000,000 $ loss.
1 mistake with the other setup, cost is 100, 000,000 $ loss.
DeepSeek is SmartAi 1.0