The AI Hardware Arms Race - We Are Reinventing Computing?

bycloud

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 чер 2024
Check out Gamma.app now using this link here!
gamma.1stcollab.com/bycloud
CS-3
[Blog] cerebras.net/press-release/ce...
Groq
[Groq online Inference] groq.com/
[Groq LPU Blog] groq.com/wp-content/uploads/2...
Truffle-1
[Store] preorder.itsalltruffles.com/
Intel Max1550 Paper
[paper] arxiv.org/abs/2403.17607
[Tweet] x.com/main_horse/status/17728...
Extropic
[Blog] www.extropic.ai/future
[Andrew's Explainer] x.com/Andercot/status/1767252...
[Documentary] x.com/jasonjoyride/status/178...
[Documentary Retweet] x.com/0xKyon/status/178459142...
[Tweet Drama Thread 1] x.com/BasedBeffJezos/status/1...
[Tweet Drama Thread 2] x.com/fluxtheorist/status/178...
This video is supported by the kind Patrons & UA-cam Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] Massobeats - Glisten
[Music] Massobeats - Lush
[Profile & Banner Art] / pygm7
[Video Editor] Silas
0:00 Intro
0:35 CS-3
1:52 Groq
2:48 Truffle-1
4:09 Intel Max 1550 Research
5:18 Extropic Chip
8:08 Gamma.app (sponsor)
Наука та технологія

КОМЕНТАРІ • 84

@bycloudAI Місяць тому ⁺⁹
Check out Gamma.app now using this link here! gamma.1stcollab.com/bycloud
@Superfastisfast Місяць тому
no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, no, yes, maybe?
@2CSST2 Місяць тому ⁺¹⁴⁰
I don't think the number of likes on twitter should be even mentioned when it comes to determining whether a new innovation path is likely to succeed or not.
@user-aeb87825 Місяць тому ⁺¹⁴
Yeah that's just the hype meter
@o_2731 Місяць тому ⁺³
I believe he was being sarcastic ...
@BooleanDisorder Місяць тому ⁺¹
True. Especially since Twitter doesn't even exist anymore!
@BooleanDisorder Місяць тому
True. Especially since Twitter doesn't even exist anymore!
@Words-. Місяць тому ⁺¹
Maybe I missed a part in the video, though I think you've misunderstood the point of that segment, which I'm assuming is at 5:55. He said they got ratio-ed, which may signify skepticism amongst those who know about the tech; nowhere did he say that this was a good predictor of the company's success. I feel like this criticism is out of place.
@imerence6290 Місяць тому ⁺⁶¹
You missed 900 T/s generation speed on groq while using Llama 3 7b. ITs insane.
@catcoder12 Місяць тому ⁺⁶
Finally I can start writing my 1000 word essay at 11:59PM and submit it on time.
@20xd6 Місяць тому ⁺²⁵
explaining particle physics with spongebob is next level
@WoolyCow Місяць тому ⁺¹
so kids, if you fold some paper in half and then stab it with a pencil you get a wormhole!
@TheDragonshunter Місяць тому ⁺⁸
We also need that guy that found a way to use out current fiber optic infrastructure for way faster internet to start cooking.
@MrValgard Місяць тому ⁺³¹
forgot mention photonic chips!
@kushalvora7682 19 днів тому
There is no performance test till now like grok and rest. But once photonic chips are perfected for real world applications, electronic chips would feel like century old technology.
@Guedez1 Місяць тому ⁺¹⁹
The exotropic chip thing seems iffy at best. While it probably actually can run that fast, it probably will be filled with compound inaccuracies that will super lobotomize the models
@mu11668B Місяць тому
I'd say nothing's iffy there. It's a blatant scam. Simply runnung a RNG and hoping that a trained AI model would come out of it is like trying to implement the infinite monkey theorem without the "infinite" part.
@BernhardVoogenberger-tl5ox Місяць тому ⁺¹⁸
No light chips mentioned.
@XenoCrimson-uv8uz Місяць тому
sadly.
@truthwillout2371 Місяць тому ⁺¹
I think they're a bit far out for now.
@user-io4sr7vg1v Місяць тому
Mach Zender.
@oxey_ Місяць тому ⁺¹
The developments on them are cool but we're still very far off actually using them for any sort of AI training or inference
@SirusStarTV Місяць тому
I only seen Taichi chip that did something useful
@BooleanDisorder Місяць тому ⁺²
Imagine attosecond pulse based photonic chips! With multiple wavelengths in the same pathways for extra oomph
@JazevoAudiosurf Місяць тому ⁺²
nvidia advertises that their chips basically work as one unit. to me it is inevitable if physically possible, that we will get larger and larger chips because the models also grow. we will probably get much larger than wafer scale, possible magnitudes larger
@wiiztec Місяць тому ⁺¹
really surprised this wasn't about quantum computers
@jameshughes3014 Місяць тому ⁺¹
This stuff is so cool. Personally I love the idea of some kind of analog digital hybrid as the most efficient path forward, but all these different solutions are interesting. The truffle looks cool. I wish they'd make one half as powerful and half as expensive. Imagine being able to run a 60B model at home without adding to your pc's load?
@ncking5414 Місяць тому ⁺⁵
alot of ifs in the tech industry lately,ill just wait and see actuall physical outcomes.
@pladselsker8340 Місяць тому
Empirical observation gang!
@Napert Місяць тому ⁺²
300 tokens per second of what model? I can get 1000+ tokens per second on my 3060 ti if I use some of the smaller models (gemma 2b q2_k or qwen 0.5b q2_k)
@user-tg2or8wy2n Місяць тому ⁺³
Hi! Great video.
I am a little bit lost on some of the numbers though . Regarding the cerebras huge chip, how on Earth are they training a 4 trillion parameter model, which at bfloat16 (2 bytes/parameter) the model alone (without counting activations, and optimizer weights) weighs 8 TB??!!
Same applies to the Truffle part where they claim you can train a 100B parameter model in just 60GB RAM. Sure if you use int8 or less as the model's precision then it makes sense, but the standard is 2 bytes so they should at least be more specific when giving their values. Is there something related to the 'shared memory' component that gets my calculation wrong?
What am I missing? Thanks again for the video!
@novantha1 Місяць тому ⁺³
Model size in GB != Size of data needed to perform operations on that model.
As an extreme example, there was a paper where they managed to run a 70B model (if memory serves) on a 4GB GPU by only throwing the matrices actively being operated on into VRAM. The whole model was something like 140GB of information, but realistically, if you divide that across ~70 or so layers it was more like 2GB per layer, plus some change. Adding onto that, the attention matrices and feed forward matrices could be further split up as needed.
Granted, in that case it was pretty slow (1 hour for a forward pass, lmao), but if you were planning a compute chip around that there’s a few ways around it like significantly faster RAID arrays to load data quickly, for example, would not be an unreasonable solution for people like Cerebra’s.
With that said, there are other approaches, too. Things like Mixture of Experts are much more effective if you’re evaluating inference accuracy per memory access, for instance, or efficiency of parallel compute per intercommunication. If you look at something like Snowflake Arctic it’s pretty efficient in the sense that you only load 2 out of 256 (if memory serves) experts, meaning that while the whole model could be 900+ GB, you could get away with running it on a 32GB GPU, given a sufficiently fast method to load the experts onto the GPU per forward pass.
In reality, there’s a lot of solutions for dealing with large models and in particular hardware companies have access to a lot of talent that can sort out these optimizations for them, so even if it sounds crazy usually there’s a way to do what they’re saying they can do.
@user-tg2or8wy2n Місяць тому
@@novantha1 Thanks for the response! I am aware of these variants where you only partially load certain parts of the model, there's also the option that Apple is pushing forward of storing most of the model in flash and loading only what's necessary higher up in the hierarchy.
But I don't think many of these options (if any) actually work in practice right? Not that they aren't technically possible, but barely practical, the latency caused by the continuous up/down streams of data between memory layers is too much to handle, am I right?
@novantha1 Місяць тому
@@user-tg2or8wy2n It depends heavily on the (no pun intended) context.
In the case of the paper I mentioned where they ran a 70B model on a 4GB GPU, I'm pretty sure they had extremely slow storage relatively. A typical GPU for ML has between 500 to 2000GB/s of memory bandwidth, but the storage they used had 0.3GB/s (hard drive or SATA SSD, IIRC).
On the other hand, if you were planning a piece of enterprise hardware (Cerebras) my assumption is that for a chip that costs over $1 million, that shelling out for a RAID array of 32 PCIe gen 5 SSDs (320GB/s, 64Tb of capacity easily) is by no means insane, and I wouldn't be surprised if they had even faster information delivery systems.
In the case where you're doing something in enterprise and plan it out, there's no reason that streaming "cold" parameters (ie: the unchanging values that are multiplied against) couldn't be do-able on modest GPU hardware, or waferscale engine or what have you.
My assumption is that the waferscale engine is fast enough to handle the active parameters in memory, and has a well thought out delivery system for streaming new data in as it's needed, though I could be wrong, but that seems like a large oversight given the scale of their operation.
@StefanReich Місяць тому
@@novantha1 1 hour for a forward pass... that's a very scalable solution 😄
@Laszer271 Місяць тому ⁺¹
I thought you will talk about those light-based hardware. The one that uses photons rather than electrons to basically 100x the performance of chips.
@pn4960 Місяць тому ⁺³
The question isn’t wether or not exthropic can succeed in the current market environment but will the current market environment (ai bubble) sustain itself long enough for exthropic to succeed.
@Alexisz998 Місяць тому
A bubble ? where do you see a bubble ?
@MaakaSakuranbo 29 днів тому ⁺¹
@@Alexisz998 everyone hype-jumping in on AI when we have very few actually reliable use cases for it yet?
Images are cool, but still full of flaws and other issues
Text is cool, but hallucinations and stuff
Music is cool but far from perfect
@Alexisz998 29 днів тому
@@MaakaSakuranbo I share your point of view, but remember that ai only took of 1 and a half year ago... and everything is getting better and better every months. Ai won't limit itself to video, song and text generation, i think it'll go far beyond with humanoid robotics. And big tech companies like microsoft, google, amazon and Meta are already making huge profit using Nvidia GPU by lowering their operating cost
@MaakaSakuranbo 29 днів тому
@@Alexisz998 Humanoid robots require fixing the inherent issues in the current generation first
@Alexisz998 29 днів тому
@@MaakaSakuranbo What do you mean by the current generation ?
@setop123 Місяць тому
very good video 👏
@monad_tcp Місяць тому ⁺¹
1:54 why do I care as a consumer ? when the AI expending hype is over, I want to have realtime raytracing on my games !
@WeirdInfoTV Місяць тому ⁺¹
Your editing is very funny
@kylemorris5338 Місяць тому
I completely forgot until you pointed it out with arrows that THAT'S the company whose CEO got outed as the acc holder of Beff Jezos. Not holding my breath that that guy holds the key to the next big thing since sliced transistors.
@H1kari_1 Місяць тому ⁺¹
Whatever happened to Mythic AI Chips?
@invizii2645 Місяць тому
Nice
@kabargin Місяць тому
what about Tesla Dojo's D1 chip?
@DrW1ne Місяць тому ⁺⁴
Good video, I am surprised you didn't talk about analog PC for AI.
okey, you kinda did...
@cdkw2 Місяць тому ⁺¹
Every time one of theses videos role, I think why am I even doing my CS degree? Why not just become like a math teacher?
@cdkw2 Місяць тому
@@OverbiteGames oh
@superfliping 28 днів тому
Build your team. Prove your LLM Super?
1. CodeCraft Duel: Super Agent Showdown
2. Pixel Pioneers: Super Agent AI Clash
3. Digital Duel: LLM Super Agents Battle
4. Byte Battle Royale: Dueling LLM Agents
5. AI Code Clash: Super Agent Showdown
6. CodeCraft Combat: Super Agent Edition
7. Digital Duel: Super Agent AI Battle
8. Pixel Pioneers: LLM Super Agent Showdown
9. Byte Battle Royale: Super Agent AI Combat
10. AI Code Clash: Dueling Super Agents Edition
@pladselsker8340 Місяць тому
I am so sceptical about Extropic. I'll just put them there in the "physics based voxel engines" and "flying cars" box and wait until I can buy a PCIe extension that reliably and purposefully accelerates generative models 1000x times. I don't care anymore.
@user-zc6dn9ms2l Місяць тому ⁺¹
lol .the issue is the extremely monochrome nature of binary code
@monad_tcp Місяць тому ⁺³
3:44 hahah , TAKE THAT NVIDIA.
nVidia selling over-priced VRAM , someone actually did what I've been complaining for years, why not use normal cheap memory instead from the system via a shared bus ? (yeah, I know about the intel problem with PCs and the unified memory architecture that never comes, just don't use Intel then)
@randomobserver9488 Місяць тому ⁺¹
Because it's slow af. Works for LLM inference where the problem is in fitting the models in memory at all, but it would be insanely slow in most DGPU applications
@monad_tcp Місяць тому ⁺¹
@@randomobserver9488 no, I was talking about a computer that uses unified memory.
on PCs, the problem is the bandwidth of the PCIe bus, which is never the same bandwidth as the GDDR.
But GDDR itself isn't that slower than normal DDR5. With smart caching done by the "motherboard", its totally possible to pull that thing.
nVidia really knows what they're doing with this myth that GDDR is expensive, it is not, they're the ones selling it as expensive, way above what it really costs because you can't upgrade the memory.
That's why GPU memory paging to system RAM would be slow in a PC, there's not enough bandwidth in the memory controller inside GPU and even less on the PCIe bus, and I think that's on purpose.
There's another detail that GPU memory is accessed at 1024bits parallel, while the Intel CPU only access 512bits in parallel.
That's easily solvable using more memory channels.
With some cache, and some driver smartness, it would totally be possible for an ARM CPU to share its RAM with the GPU.
Unified addressing, as long as the GPU doesn't touch the memory directly is doable.
Not only its doable, it was done before, the Playstation5 runs like that, it has an I/O controller that sits between the memory and both the GPU/CPU to arbiter the bus.
@monad_tcp Місяць тому
alas, on mobile, all GPU/CPU share memory that way, its not a matter of the speed of the main memory being slow, its a matter of architecture of the hardware, that's what I'm complaining.
@monad_tcp Місяць тому
I want "VESA LOCAL BUS" back !
@randomobserver9488 Місяць тому
@@monad_tcp CUDA has supported unified memory (single address space, on-demand paging) from about 2017 and is limited by PCIe4 x16 to 32GB/s which is ~half of DDR5 bandwidth on consumer platforms. The GDDR bandwidth even on a mid to high end gaming GPU is more than 10x the DDR bandwidth. Might as well use CPU or iGPU when the DDR bandwidth would limit the dGPU to same speed. The number of use cases that benefit from large amount of very slow memory and need dGPU-levels of compute is tiny.
@average_snmp_user Місяць тому
Funny how he didn't mention the fastest AI chip that you can actually buy right now, which is AMD's mi300. And don't tell me that b200 is faster, i know, but if even Mark Zuckerberg can't buy it right now, than the b200 is not out, it is announced. Even on NVIDIA's website you can't rent a GB200.
@latrechetaher6340 28 днів тому
Aaah my brain
@itzhexen0 Місяць тому ⁺¹
Can't wait to see all of these people try to shove all of this into peoples lives. Grab the popcorn.
@carterprince8497 Місяць тому
I don't see why you wouldn't just but a 4090 instead of preorder the mac thing.
@rogerc7960 Місяць тому
New Operating systems
@monad_tcp Місяць тому
no operating systems , just hardware and an external scheduler service
@Aurora-bv1ys Місяць тому ⁺¹
I want to be a part of the AI industry, how can I approach it?
@Raphy_Afk Місяць тому ⁺²
Watch the choices that key figures made and do the same, preferably young researchers.
@JazevoAudiosurf Місяць тому ⁺³
like anything else, learn it from first principles. learn how neural nets, transformers work, learn how to code, how to use OpenAI API etc
@Raphy_Afk Місяць тому
@@JazevoAudiosurf Learning how to code on a low level may be a bad idea, on a high level software engineering and algorithms are necessary, but I doubt coding in python or C will be for long
@JazevoAudiosurf Місяць тому
@@Raphy_Afk Personally I chose typescript, any language works for API. I did some neural networks but figured that understanding how it works is more valuable that implementing it (ML is hard). Personally just use OpenAI API now for my projects
@boxeryy6661 17 годин тому
Why do you associate scientific theories with companies or any single organisation? Thermodynamic Computing was around for some time and it has been pursued by academicians too .
@axl1002 Місяць тому
Tenstorrent cough cough...
@jyuseries8313 Місяць тому
true? true random??
@joeyhandles Місяць тому ⁺¹
there was a major slowdown and china is the only country who is gonna push us forward

Наступне

Автоматичне відтворення

Mamba Might Just Make LLMs 1000x Cheaper...