Next-Gen CPUs/GPUs have a HUGE problem!

High Yield

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 чер 2024
With TSMCs upcoming N3E process node shrink, SRAM cells are not scaling in size for the first time ever. In this video I will discuss why SRAM is so important for all modern CPUs & GPUs, how it will affect basically everyone, including AMD, Nvidia, Intel and Apple, and why chiplets are a possible work around.
Follow me on Twitter: / highyieldyt
0:00 Intro
0:41 SRAM Explained
1:58 Why is SRAM scaling so important?
5:30 Chiplets as a possible solution
WikiChip Article: fuse.wikichip.org/news/7343/i...
Наука та технологія

КОМЕНТАРІ • 896

@flioink Рік тому ⁺⁸⁹⁴
Nowadays CPUs have more cache memory than my
first PC had RAM.
It's amazing how far we've come in terms of processing power.
@uvuvwevwevweossaswithglasses Рік тому ⁺¹⁴
486 :D
@TremereTT Рік тому ⁺¹⁷
I think once we get a better process than calculating parts of the Program ahead of time in parallel in all possible outcomes and then throwing away all of the cached results but one because of a result ahead in the pipeline, we will get to need way less Cache.
@soylentgreenb Рік тому ⁺⁸
But the L1 is still small and shrinking it no longer makes it faster.
@bricaaron3978 Рік тому ⁺⁴
@@uvuvwevwevweossaswithglasses How much RAM did a gaming 486 have, and about how much was a megabyte of RAM?
@kellyshea92 Рік тому ⁺²
I just built my first pc the other day and got it to post on the first try. It literally take 1 second for it to boot up. I didnt think the new i9 was so strong
@mkatakm Рік тому ⁺⁴⁷⁵
That's why AMD is starting to use 3D v-cache, which is basically stacking multiple cache RAM layers vertically in the same space. As it did with Ryzen 7 5800x3d, same technology is coming with 7000 series AMD CPUs as well soon.
@ClaimClam Рік тому ⁺²⁶
techno gobbledygook
@baldwindomestic2267 Рік тому ⁺⁹⁰
@@ClaimClam more cache, but stack like burger patty, more stack, more cache/burger
@ClaimClam Рік тому ⁺³⁷
@@baldwindomestic2267 understand
@robojimtv Рік тому ⁺¹⁹
Wouldn't be surprised if the GPUs get v cache one day too. I think it could solve a number of issues with the RDNA3 chips
@guytech7310 Рік тому ⁺¹⁹
The issue is addressing heat when stacking dies vertically. I don't know how much heat SRAM produces, but I suspect it will be a problem. Maybe they can get by with a double stack, but I suspect any additional layers is not going to have the means to dissipate the heat.
@Chillst0rm Рік тому ⁺³⁹⁵
This is why MCM (( multi chip modules )) combined with 3d vcache will be soo important moving forward. L4 cache will probably make a return also, as something much farther from the die compared to L1 to L3
@GewelReal Рік тому ⁺³¹
if L4 will be able to work as RAM that would be a revolution. Few GB of L4 would make getting RAM for light use obsolete. And even with extra RAM it would be a massive performance benefit
@Coecoo Рік тому ⁺³⁷
You say "Important" but there is legitimately no excuse for more powerful consumer hardware outside of extreme VR / 4K. Graphical fidelity has peaked years ago at the currently mainstream use of polygons. If anyone bought any sort of remotely mid-range computer within the last 1-2 years and they experience performance issue in games, it is 100% optimization / functionality related.
@Technicellie Рік тому ⁺¹¹
@@Coecoo I agree with you from the sight we have now, but I wouldn't set it in stone just yet.
I don't see what can be improved in graphic fidelity.
But just because we don't see it, doesn't mean that there is none.
@dkis8730 Рік тому ⁺²¹
@@Coecoo completely path traced ray traced games are the future though. And you need the most powerful hardware today to run 4k/144fps which gives you optimal smoothness with visibly much better graphics.
@ThylineTheGay Рік тому ⁺⁹
@@Coecoo companies should definitely be aiming for efficiency, but they don't, and probably won't, because "this won't destroy the planet" doesn't market as well as "oooooh, shiiiiny"
Classic capitalism 🙃
@damienlobb85 Рік тому ⁺⁹⁷
AMD definitely doesn't get enough credit for their forward thinking in this regard. And as highly regarded Jim Keller and his work on Zen has been. There was an engineer (Sam Naffziger) who was responsible for persuading the senior execs to use chiplets on Zen and future AMD products.
@ledoynier3694 Рік тому ⁺⁵
.. maybe because they did not invent the wheel? every foundry has MCM designs and chip stacking technologies being worked on since the past 10 - 15 years. We're only just starting to see them hit the market.
@BruceCarbonLakeriver Рік тому ⁺¹⁹
@@ledoynier3694 and yet intel was talking about "we're not gluing our chips together..." (although they are doing it for Xeon for a while...)
@CommanderRiker0 Рік тому ⁺⁵
Didn't Intel do this long ago with "Crystal-Well" 128mb cache chip year and years ago?
@HighYield Рік тому ⁺⁵
Broadwell i7-5775C
@1000area Рік тому ⁺⁶
@@HighYield but that's an L4 cache, a known solution to add cache next to the chip. not stacked cache like what AMD and TSMC are working right now.
@DigBipper188 Рік тому ⁺⁸²
AMD had cache scaling down as one of a few reasons they decided to split dies. Cache and some interfaces such as the memory controller don't scale well on their chips when going down a node, which is why their later EPYC and R7000 parts have the IO and some cache levels split from the cores so that they can maintain the diminutive size of the actual cores themselves, and then anything that doesn't scale well (e.g memory controller, L3 cache and so forth) can be produced on another die at a cheaper, lower resolution process node (say, 5nm for the CCDs and then 16nm or even 20nm for the MCD / IO dies). This is also why the Memory Cache Die (MCD) of RDNA3 is a thing too, as it doesn't scale well on the current 5nm node, so AMD has opted to use a larger node for these parts to reduce cost, and then reserve the 5nm node for the GCD itself where they can still see density benefits from the increased resolution of that lithography node.
@Jaker788 Рік тому ⁺⁸
Well, they don't quite wanna go so far back as 16-20nm, they've been progressing their non logic die, For Ryzen 7000 it's 6nm for IO (and IO + L3 for RDNA3) a high yielding, efficient, and cheaper than 7nm node that's basically refined and faster manufactured 7nm due to multiple layers using EUV. Seems like they'll stay there for a while on IO (and cache for RDNA) and keep logic shrinking to new cutting edge nodes.
While density doesn't scale anymore with IO, and now memory, there is supply, tooling, and energy efficiency still that factor's in. 20nm planar silicon wouldn't be as efficient for L3 or IO.
@rocket2739 Рік тому ⁺⁵
''Reduce cost'' yeah, for them. Because on the consumer end, we have yet to see the prices go down...
@Jaker788 Рік тому ⁺⁷
@@rocket2739 Technically we saw RX7000 prices drop a bit below the previous generation RX6000.
But really, reduced cost means it won't increase as much as any competition that isn't doing the same thing. If this pays off for AMD, and Nvidia takes years to get their own implementation then they'll be at a cost disadvantage.
@josephsteffen2378 Рік тому ⁺³
@@Jaker788 Nvidia enjoyed its day in the sun. I remember when the Titanium Series(or whatever it was) was released... It was just by chance that I read an article (on some online computer magazine/media).... I recognized the jump in technology/speed/value... Nvidia shot ahead of the pack. Not by a few feet or seconds, more like they "lapped" the competition.... that stock just moved from $17/share to $20/share. Some how I got it all together and told everyone that I knew "BUY INVIDIA!". It just reached $27. I guessed that it could go, maybe, up to about $129. I figured that was as far as my skill could guess. I don't know jack about the stock market or trading... It was the only time in my life that I made a prediction of a stock profit... or suggested purchasing.... NAILED IT!
@peceed 7 місяців тому
@@josephsteffen2378 The same with AMD. Unfortunately didn't have money for investment.
@coladict Рік тому ⁺⁴¹
Engineering is always a balancing act. Improving one aspect comes with drawbacks in another. There may be ways to mitigate those drawbacks, but eventually when using the same principle of a technology you will hit its physical boundaries.
@jabadahut50 Рік тому ⁺²²
Magnetic resistance memory is nearly as fast as SRAM and there are methods out there for it to be used in an analog mode allowing a single cell to hold 8 bits per cell. Would be interesting to see in the future if this tech gets adopted.
@diegorosario2040 Рік тому ⁺¹
Wouldnt it requiere on chip error correction to be used to store 8 bits?
@jabadahut50 Рік тому
@@diegorosario2040 depends on the design but it might. I'm not 100% sure how it works but to my understanding its a sort of a magnetic potentiometer with a sort of adc that is hardwired to the 256 possible outputs.
@diegorosario2040 Рік тому
@@jabadahut50 the deal with non binary encoding Is that it worsens singal to noise ratio. Error correcting code would be need to mitigate that problem
@jabadahut50 Рік тому
@@diegorosario2040 likley and im sure that might trade some speed off but ecc memory is already usually denser and slower than non ecc memory anyway so I dont think it'd be a huge trade off for 8x capacity per chip
@diegorosario2040 Рік тому
@@jabadahut50 it Will work storage wise but i am curious if it could compromise bandwith
@miweneia Рік тому ⁺²⁵
This channel is criminally underrated, presenting so much data and such key points in such a digestible and short manner is commendable!
That aside, it’s actually crazy to think about how humanity has existed for thousands of years, but in only the past 50 years we’ve went from creating the first CPU to hitting the actual physics limitations of it’s cache module, and in other 15 or so years we’ll probably hit the physics limitations of the actual CPU’s transistor size. Really makes me wonder what technology and chips would look like 50 years from now… Hopefully I’ll find out firsthand!
@manojramesh4598 Рік тому
True
@marsovac Рік тому ⁺¹¹⁹
Nice video! But you didn't explain what "SRAM scaling" means in this context and why is it happening. I guess it means that the size of an SRAM cell does not get smaller as the process node gets smaller. But considering that the same applies to some other parts of the chip like interconnects, this is nothing new.
Currently TSMC 7nm or 5nm have almost the same feature sizes but the density is increased in smaller nodes. Logic circuits are not packed as close to each other as possible and this is where they get scaling.
SRAM does not have the possibility to be denser, since it already is as dense as it can be in a perfect grid. At some point logic circuits in the chip will end up having the same problem.
So the real problem is that the processes are getting less nm in their name while the transistor gate distance remains the same. They are decreasing numbers of the process but they are not nanometers anymore and this is what is causing SRAM problems. SOmething as dense as it gets has no benefit from increasing density, just from decreasing transistor size.
Maybe you want to talk about this "cheating" that is occuring in the process names. The name of the process no longer correlates to the distance between transistor gates. Maybe a video about process shrinking and how it changed in the last 10 years would be informative.
@adityasalunkhe8156 Рік тому ⁺¹⁵
^exactly he should have said SRAM chips stopped scaling in density rather than just scaling because also remember the register file and the microcode controller is also implemented as an SRAM in the execution pipeline and if there is more delay to access register file it would would mean less IPC and then why would you have faster ALU nodes paired with slower register file or microcode controller makes no sense
@larion2336 Рік тому ⁺⁹
Yeah idk that this is as significant as he makes it sound. The entire reason AMD are going with chiplet designs in RDNA 3 is because there are already things like IO, and memory to an extent already, don't scale as well with lower nm designs, so they make the core GPU chip lower nm and higher nm for other parts where downscaling it doesn't lead to any real performance benefits while saving them money. Well that and it means they can stitch chips together but yeah.
@dex6316 Рік тому ⁺⁸
This video mentioned that other components of a processor are also suffering from scaling issues. However, this is especially problematic for SRAM cells. SRAM not scaling means that to boost performance one must use more silicon. That’s very bad for the high performance microprocessor industry, which is the premise of this video. Other components not scaling well isn’t as impactful on the final designs because processors aren’t dependent on massive growth of these components; look at the cache growth to see why SRAM not scaling is really bad. Also logic cells don’t get denser by optimizing how they are packed together. The cells are reconstructed using different materials to hit desired performance targets at smaller sizes. Logic transistors are in fact getting smaller.
@kotekzot Рік тому ⁺¹
If feature sizes remain almost the same, what is it about new processes that enables them to reduce wasted space to increase density?
@johndododoe1411 Рік тому ⁺¹
@@dex6316 How do material changes allow smaller logic gates without allowing smaller SRAM cells?
@6SoulHunter9 Рік тому ⁺¹²⁸
The information quality of this channel is astounding, I cannot believe it has only 3.4k subscribers.
Also, presentation quality is also very good and it's improving :)
@marsovac Рік тому ⁺⁷
you would be astonished by how many US people will not watch these videos simply because of the accent. I've seen people that don't want to watch videos done by Aussie or British english creators because of the accent, and those are much closer to american english.
@6SoulHunter9 Рік тому ⁺⁴
@@marsovac I know. The accent was always right for me, but after watching some harsh criticism I have started to pay attention and I think that this channel is improving on that regard, the accent was thicker.
And while I don't mind the accent, I know that there are some channels which sometimes I watch without being very interested, because the voice is smooth and mesmerizing. I am sure it would help this channel to take off.
Me? I don't mind, my english accent isn't the best either.
@RM-el3gw Рік тому
yes, it's crazy underrated. The youtuube algorithm is the one that sometimes fails to bring quality content like this to the front where it belongs.
@padnomnidprenon9672 Рік тому
Loo yes. I just realized he have 4k subs. I thought it was 90k at least
@stevewiley3832 Рік тому
For me it is the usage of sensationalist wording. He used the words "...approaching death", which implies that SRAM has a functionality problem even though the issue is a scaling problem.
@aylim3088 Рік тому ⁺⁴¹
I'd really wanna see what a more mature chiplet GPU with 3d cache could do. Bit of a shame that rx 7900 was a bad launch but definitely hopeful for the future; besides, I would have been suspicious if the first-ever chiplet GPU didn't launch with teething problems. Shame its issues can't really be called 'just' teething problems, but I'll keep on the waiting game.
@TheCustomFHD Рік тому ⁺³
It seems the AMD GPUs are relatively easy to reduce the Hotspot Temp. Vertically mounting it seems to fix it, and also more thermal paste. Look at Der8auer's video
@JJAB91 Рік тому ⁺²
The hotspot issue only seems to effect AMD's own cards, partner cards don't have such issues.
@pacifi5t Рік тому ⁺⁴
Thank you for breaking down this issue. I thought I knew a lot about hardware, but it seems I've only seen the top of this iceberg.
@JosephArata Рік тому ⁺⁵
Die stacking will get rid of this problem, they can use a larger process node with the SRAM, while the GPU/CPU cores are using the lowest node possible. They'll also likely start using HBM once they go full PC on a single chip design.
@K11... Рік тому ⁺¹⁷⁴
Your channel will grow through the roof soon. You have amazing content.
@theminer49erz Рік тому ⁺⁹
I know, it's great to see so many people interacting and "liking" so fast. The number has grown steadily for some time now, which is great! He deserves it for sure!
@HighYield Рік тому ⁺²⁸
It also makes the whole video creation process a lot more fun if I know ppl are actually gonna watch it!
@nutzeeer Рік тому ⁺²
Just got a front page recommendation and i will sub
@nutzeeer Рік тому ⁺¹
3841th sub :)
@Hunter_Bidens_Crackpipe_ Рік тому ⁺²
Nah
@b130610 Рік тому ⁺²⁸
AMD certainly seems to have an advantage in the chiplet space because of their past successes with zen, but I have to wonder how much longer that advantage will last. It would be pretty ironic if nvidia integrates chiplets into their cards before AMD can leverage that advantage for a clear win at the high end. It seems like they really had a golden opportunity with rdna3, but it obviously hasn't really worked out that well so far.
@ag687 Рік тому ⁺⁶
it's not a chiplet, but Nvidia is already leveraging entire datacenters of cards to work together as though its one supersized GPU. Which means they probably already have the tech they need need to do chiplets without too much of an issue.
@b130610 Рік тому ⁺⁵
@@ag687 afaik, the chiplet tech AMD is using is at least a couple orders of magnitude higher bandwidth than nvidias data center networking solutions (although, they are impressive in their own right). The chiplet interconnects are developed in coordination with tsmc though, so it's not inconceivable that Nvidia could use similar tech to AMD as long as they stay in good graces with TSMC.
@sudeshryan8707 Рік тому ⁺²
i think Amd has patented most practical aproaches to chiplet design already which will leave others very much little space for innovation. Intel's struggling for years with their tile design is showing its much harder for others to be competitive.
@b130610 Рік тому ⁺²
@@sudeshryan8707 I'm inclined to agree with you there, but I'm not ready to rule out something new built on TSMCs packaging technologies for high speed interconnects. Last year I thought no other chip design firms were even close to AMD on mass market chiplet designs, but then we saw the m1 ultra from apple with very impressive performance scaling over a whole new fabric. I wouldn't count Nvidia out, but I'm certainly not expert on the matter, just an armchair critic.
@aravindpallippara1577 Рік тому ⁺⁴
@@b130610while apple's m1 ultra is very impressive it has less bandwidth per silicon usage and the interposer itself is an extremely expensive tech compared to amd's infinity fabric based inter die communication
Amd might go patent troll on other companies going forward, not a fan of that happening
@tqrules01 Рік тому ⁺⁴⁸
I don't think it will be an issue for AMD. They are using 3D caching. The 5800X3D is stil a beast. Oh nvm you already mentioned it. I think in the future they will be able to start stacking with a faster and faster interconnects i.e next gen infinity Fabric
@Yuriel1981 Рік тому ⁺⁷
Was going to say pretty much the same thing. 3D cache will increase the amount of SRAM that a chip will be able to hold. It doesn't fix the scaling problem. But it does solve some of the size issues which is why the AMD Chiplet tech is most likely the next step.
@kotekzot Рік тому ⁺¹
Pretty sure Infinity Fabric is slower than the vias used in 3D V-cache.
@daxconnell7661 Рік тому ⁺¹
even when early computers where developed some discovered you could double the amount of memory in a computer by stacking ram. 4464 RAM Chip commodore 64/Apple era
@spamcheck9431 Рік тому ⁺²
THIS right here.
I think AMD and Nvidia are going to separate here in terms of utility.
Nvidia is gonna have to focus on cuda cores, while AMD focuses on parallel processing.
The only thing that might save Intel is if they somehow went along with apple’s chip methodology, where they target specific use cases, such has a portion of the CPU hard wired for specific tasks instead of relying on transistor gates.
@kotekzot Рік тому ⁺¹
@@spamcheck9431 would you explain what hardware features Apple integrates that Intel doesn't? AFAIK Intel and AMD include a lot of extra instruction sets and some accelerators (e.g. for encryption).
@kiri101 Рік тому ⁺²⁰
I already knew about the topic but this was such a well organised video it was still worth watching. Your pacing, delivery of speech and the information density in the video are very well balanced. Thank you.
@bananaboy482 Рік тому ⁺⁵⁴
The amount of attention this video has is criminal. Best video I've watched all day! Entertaining, informative in an easy to understand way, and well made!
@towb0at Рік тому ⁺¹⁰
Super interesting topic. Seems like the one that comes up with the best successor to SRAM will take the cake, once chiplets scaling is fully utilized
@mnomadvfx Рік тому ⁺¹⁴
This has been known for a while and ARM have been looking to using some variant of MRAM to replace SRAM for the purpose of CPU caches.
While this is difficult in a monolithic die it becomes easier with chiplet stacking as AMD have already demonstrated with X3D.
Not only will MRAM offer non volatility/persistance for potentially higher power efficiency, but it will also offer dramatically superior area scaling to SRAM for larger caches.
@dascandy Рік тому ⁺⁴
This finally explains why the CPU core is made on a smaller process than the memory chips, when it used to be that memory chips were the first to shrink (because of much simpler design).
@alwanexus 11 місяців тому
You may be thinking of DRAM, which requires different process features.
@JoeLion55 6 місяців тому
DRAM has always been on an older process than logic, because 1) DRAM cost control is much more critical than logic and can’t afford to use bleeding edge fab processes, and 2) the DRAM array has features (like Wordlines and bitlines) that use entirely different fan processes and aren’t able to scale at the same rate as logic transistor processes.
But, historically, SRAM was used as the test vehicle to test new processes, because SRAM uses (or can use) “normal” logic transistors.
@anepicotter4595 Рік тому ⁺⁶
Fortunately we can get a lot more SRAM with AMDs 3D cache method and it’ll definitely work well in chiplet designs even as the core chiplets continue to scale down.
@HazzyDevil Рік тому
Love the way you present these videos, about time I subscribe :)
@runeoveras3966 Рік тому
Great video! Thank you. Hope you enjoy the holidays.
@youcrew Рік тому ⁺¹⁰
I think this is why chiplette/tile designs are essential. We will start seeing SOC packaging get larger
@BruceCarbonLakeriver Рік тому
It is a matter of time when the whole Van Neuman architecture is within a chiplette design. The motherboard just will hold RAM and peripherals connected to the SoC.
@omegaprime223 Рік тому ⁺¹¹
My only thought is: "Oh no, application developers will have to learn how to optimize again... the horror."
Companies have been offloading optimization work because technology could just brute-force things for so long, now that we're starting to see limitations that might stick around for more than once chip generation corporations will have to optimize existing features if they want to cram even more features in.
@zthemythz Рік тому
were probably just going to see stagnation
@scaryhobbit211 Рік тому ⁺¹⁷
Eh... they'll find a way around the SRAM bottleneck, like they always do.
There's the Chiplet designs like you mentioned, but I'm also interested to see what IBM's light-based CPU leads to.
@soylentgreenb Рік тому ⁺⁹
Single core scaling ended when dennard scaling died. Multicore scaling isn’t really working that well as real time consumer applications like games cannot take good advantage of it without increasing latency (hemce why 144 FPS today doesn’t feel better than 72 FPS in the 90’s; more pipelined engine). Moore’s law scaling is not holding up that well either; it is about cost per transistor, but wafer prices are almost competing with density scaling.
Light is a piss poor medium for density of storage and density of logic. Light is very large. A blue photon is 350 nm big and when you approach that sort of scale you get weird effects like surface plasmon resonance and quantum tunneling. So you either incorporate the weirdness and do something with plasmons or you make a bus with micron sized wave guides; a lithography size that hasn’t been in vogue since the 1980’s
@amineabdz Рік тому ⁺¹
@@soylentgreenb So the absolute best photonics can do is non ionizing radiation ? which is very near ultra violet range, either that or find some way to mitigate the material degradation from using some ionizing wavelength (which afaik is impossible, or else even nuclear shielding on Nuclear power plants would not be of a concern anymore)
@davidmckean955 Рік тому ⁺²
Considering we're quickly reaching the physical limits of what's possible for scaling all parts of the CPU, we have much bigger problems to worry about medium term.
@amentco8445 Рік тому
@@soylentgreenb And what would be the big issue in utilizing UV for this?
@davidgunther8428 Рік тому ⁺⁷
I think 2.5D chiplets will stay at the L3 cache level, not the L2 level. There's so much data transfer and the latency needs to be so low that L2 on a chiplet would need to be closer/ stacked to perform well.
@zonemyparkour Рік тому ⁺¹
When your channel becomes famous, I want to leave this here as proof I was here from the beginning.
Great content. Loved your graphic explanations.
@samghost13 Рік тому
There was a Big Light switching ON in my Head. Thank you very much Sir!
@TheDoomerBlox Рік тому ⁺⁴
7:14 - Probably was worth noting that '6nm', in spite of being "adjacent" to '5nm' in its name, is actually a refined-refined version of the older n7 TSMC node seen on Zen2 chiplets.
@HighYield Рік тому ⁺¹
You are correct, 6nm is based in 7nm just like 4nm is based on 5nm.
@SpencerHHO Рік тому ⁺¹¹
I thought scaling had pretty much died around the 28nm nodes. It seems AMD has already solved this issue with chiplets and 3DVcache all the L3 cache on RDNA3 variants released so far by amd have the cache and memory controllers(which also don't scale much anymore) on separate chiplets on a cheaper older node than the main compute die. we will see larger packages with AMD and costs will continue to rise but their chiplet designs gives then Aussie advantage and Intel is already trying to implement their own version. A lot of the tech AMD uses is co developed with TSMC and isn't that different from the tech apple is using with its M2 chips I suspect this will only accelerate the transition to multi die SOCs and 3D stacking. Cache is a lot less energy hungry than logic so it makes sense that this what's seeing 3d stacked silicon first.
@paulsim7589 Рік тому
I knew this from other hardware videos. But i watched this anyway as its quite relaxing and easy to kisten to. Your format for explanation is very good. Thank you.
@rahcxyoutube Рік тому ⁺¹
I absolutely love your videos, keep it up!
@joehorecny7835 Рік тому ⁺⁹
Amazing Content and Analysis! Hopefully they are working on the bandwidth of the chiplets, sounds like that might be the next bottleneck.
@frankg7786 Рік тому
This was very interesting and well explained, thank you!!
@shyamdevadas6099 Рік тому
Very fascinating video. Well done!
@Themisterdee Рік тому ⁺²
Very interesting.. thank you.
Dumb thought I know but ..
Wont that mean that rectangular chips are soon to be obsolete? for there must be a finite limit to SRam gates/ wires per nm along an edge.
As in if the shrunken dies get smaller it would mean more ports per nm of for example logic cell density thus more 'wires ' to the Sram edges
Im assuming you would quicly run out of room .
@SupraSav Рік тому
Solid video. Hope your channel blows up brotha
@horusfalcon Рік тому ⁺²
An interesting presentation! I wondered when something like this would happen. Now, whoever develops a more scaleable SRAM will wind up being the performance leader unless other techniques prove much more cost-effective.
@RM-el3gw Рік тому ⁺⁴
very informative as always. I believe theres multiple physics aspects of semiconductor tech that are being pushed to their limits rn. cheers
@ytviewer267 Рік тому ⁺²
Apple already has a CPU using chiplet tech. The M1 Ultra CPU introduced back in March which stitches together two M1 Max chips into a single package. They aren't currently using it to split off SRAM, but the M1 Max is an extremely large die comparatively.
@HighYield Рік тому ⁺¹
Thats true, but since its "just" two of the same M1 Max fused together, I am separating it from chiplet designs like AMD is using, with chiplets of different sizes.
@IgoByaGo Рік тому
I have no idea why I have never seen your channel, but I totally subscribed. Great content.
@7rich79 Рік тому ⁺²
One of the typical advantages of process node shrinks that is advertised is increased performance, increased power efficiency, or a combination of both. Does this mean then that if you cannot continue to shrink the process, SRAM performance will be the bottleneck for newer architectures? What are the alternatives to SRAM?
@sharktooh76 Рік тому ⁺³
nvidia 4000 series is made on 5nm not on 4.
4N is Nvidia customized node based on TSMC N5 5nm node.
TSMC N4 is 4nm .
4N is *NOT* N4.
@Eskoxo Рік тому ⁺³
I Think this could probably have many possible solutions how IBM Telum cpu handles different caches in cluster of cpus comes to mind or perhaps have different chip with slightly slower L4 cache etc
@mihaicraciun8678 Рік тому ⁺¹
love your channel, learned something new today! by the way, how long do you think transistors will be able to scale? or is an atom's width the limit if we can focus our lasers that small?
@Jabjabs Рік тому
The issue is not so much how small we can physically make the transistors, it is how much tolerance to errors and accuracy that can be managed via electron leakage. It something that is becoming a real major issue in chip design. We can make small transistors but they are so small electrons can just just tunnel through these switches (more like have the energy to over come magnetic/electric resistance) and thus negate the binary nature of the transistor. I little can be tolerated. This is called Boltzmann's Tyranny. This is a real world example of Boltzmann distribution in action. en.wikipedia.org/wiki/Boltzmann_distribution
This is complicated but I will try to make it as easy to understand.
Transistors still have analog properties, what determines something as being an on and an off state is not as absolute as we would like to think. It is a case of tolerances. If enough electronic flow gets through then the gate is considered on. The switch in a transistor isn't actually a physical switch but an electric field that when active prevents the majority of the electrical flow from getting through. There will always be electrons that get through, it is a case of how many that do and the tolerance of the output from this.
Now have transistors that have only a few silicon atoms separating them, making the electron tunneling much more likely. This can mean that because of this fuzzy nature, it is getting more likely that we will have transistors that while are physical small enough to be functional, quantum tunneling makes them useless.
There are two ways to combat this. Either we pump more energy into the transistor to increase the switch resistance but this will increase the temperature output. This is not a great solution as we have already been pushing the upper limits of thermal capacity for a good 20 years now. The other is lower the tolerance on accuracy. Meaning we could make smaller chips but there is a greater risk of them operating in odd ways. Could we build these things? Yes, will they work as we want? No. The base physics of the universe will have the last laugh here. We are toying with the fundamental laws of physics itself and it will not bend to us.
To answer your main question, I think we will make it to the 1nm mark but it will be a long slow push up to it. I suspect that over the next decade we are going to see a few major features of future processors.
Further ASIC design. Things like we are seeing today with the Apple M series chips where there are a lot of highly specific cores to accelerate functions were possible. We are going back to the design principals of the Amiga!
More Chiplet design to compliment this. In order to keep chip binning to a minimum we are going to see the amount of chiplets bundled go through the roof!
Increased electrical power demands as this chiplet designs allow for more compute power to be packed into a machine but this will be the last phase of just desperately trying to get performance out of silicon computing systems.
This is it the actions of desperation to get a few more percent out of computers as we finally peak out in the early 2030's.
After that, I feel the real push to optimize software will be the main game to get further performance. How do you sell hardware that is no better that the previous years systems? The way Microsoft has with Windows 11 - have processor specific security requirements. ;) This is why I envision that in the next decade the amount of vendor lock in is only going to get worse. I hope I am wrong.
@mihaicraciun8678 Рік тому
@@Jabjabs wow, thanks for this, very interesting and informative! Kinda sad since it really feels like we're just scrambling for workarounds to a physically impossible problem. I wonder if even beyond the next 10-20 years, how will things evolve? will we move on from silicon chips and into fundamentally different designs? I'm thinking that since we're currently using electrons to interact with the transistors, perhaps using smaller particles like photons (I don't think these guys even have a size since they're massless) could be a solution since they have energy so maybe that could be used as a toggle.
@Jabjabs Рік тому ⁺¹
@@mihaicraciun8678 Photon computation has been something that has been proposed for a few decades now. I don't see any major theoretical problems with in terms of physics it but it is clearly a major engineering problem as it has yet to materialize. Great in theory, terrible in practice - for now.
This is why I feel software optimization will be the last big step. A lot of the software we use nowadays is astoundingly sluggish considering what is possible. Modern hardware is just so amazingly fast, it just plows through the inefficiencies. I remember building a sorting algorithm in assembly on a 33Mhz 486 back in the mid 90's. It could sort a data set of about a million variables in about 5 seconds. Excel running on my 4.2Ghz i7 will do the same in about the same speed... despite there being 6 cores each running at 100 times the clock rate. So yeah there is some room to maneuver here.
Yes, we are scrambling for work around technologies nowadays. Every new transistor design and technique we are coming up with is buying us less time until we need an all new design again. I remember Intel in the mid 90's saying we had about 25-30 years until we would hit the end of the silicon road. They were not far off the mark. The problem is we don't even have anything viable on the horizon. This is a similar issue to things like batteries and power generation. We have come a really long way but the next major step is a long way off and we don't know where it will come from. But we will try anything and just maybe one of these experiments will hit the jackpot.
And don't even get me started on Quantum computers, they are neat physics experiments but they will not do work in a fashion anything like what we use today. There are also fundamental limitations they have that additional complexity can decrease their reliability.
@HighYield Рік тому ⁺¹
I think logic transistors will continue to scale for a couple of years, but with diminishing returns. There are things like graphene as a material to succeed silicon, there are photonic chips and then we have EUV-lithography improvements like High-NA.
@Nahrix Рік тому ⁺²
Use SRAM as a physical buffer between cores, and build vertically. The relative physical size difference would mean there is a larger distance between each core, distancing the hottest parts, and allowing better thermals.
@gstormcz Рік тому
Skull is lovely. Content is great. Narration no waste of time.
Merry Christmas.
@ChiquitaSpeaks Рік тому ⁺¹
I’d like to know if there’s a difference in the implications of the importance of cache/SRAM in an SOC but I guess Apple’s decision making offers some insight in on that somewhat?
@jazzochannel Рік тому
5:40 "isn't there anything that can be done? great question, so glad you asked" smoothest transition of the year.
@kotekzot Рік тому ⁺¹
I wonder if Zen 5 is going to have any L2/3 cache on the die or are they going to stack it all on top of the die.
@vinylSummer Рік тому
Awesome video! Subbed, going to watch your other videos
@johnsavard7583 Рік тому ⁺⁹
At about 5:55 in your video, you finally mentioned chiplet design - if you can't scale static RAM, just put it off the chip. Of course, that involves some additional delays, so you still need L1 cache on the die with the logic, but it helps a lot.
@xeridea Рік тому ⁺¹
Yeah L1 and L2 probably still best on the same chip since latency is critical, but L3 is a great candidate.
@Tigerfox_ Рік тому ⁺³
I feel like we're back in days of Pentium II and III.
@BenjaminCronce Рік тому
@@Tigerfox_ Except for many work loads, the P2/P3 with smaller on-chip L2 cache was faster than the larger off-chip cache. Celeron with 128KiB of on-chip L2 cache was faster than the 512KiB off-chip cache Pentium. In this case, I think the off-chip ran at half frequency. Much faster than DRAM, but a few factors lower bandwidth and higher latency than on-chip. Going off of memory from 2 decades ago. Take it with a grain of salt.
@Tigerfox_ Рік тому
@@BenjaminCronce I know all that, but I don't understand what you're trying to say. Of course, for some workloads more cache is better than faster cacher, for some it's the other way around. I haven't seen an in dept analysis of what applications profit more from Raptor Lakes increased cache yet, but I know that for example only some games profit greatly from 5800X3D's 3D-cache, same as some games run faster on Broadwell i7-5775C wirth eDRAM L4-cache than on on i7-7700K.
They'll have to find a compromise. AMD reduced the size of infinity cache slightly on RDNA3, but vastly increased it's speed.
@Raven-lg7td Рік тому ⁺¹
omg i never heard about this before and I subbed to MLID, AdoredTV, Coreteks....you're a real hidden gem plz keep up! this is so interesting
@growthmonger4341 Рік тому
Great information and no BS, will definitely drop by again.
@TheEVEInspiration Рік тому ⁺⁵
I think some caches will become near obsolete to make room for the more essential caches.
Think of the separate cache for code that is indirectly fed from a data-cache.
By changing them to just storing pre-decoded meta-data (like instruction boundaries on x64, or other decoding hints) and fetching the actual code from the data-cache instead when needed.
There are more such tradeoffs to make for sure, like cache-complexity versus cache size.
If cache size is under pressure by this scaling development, expect more complex/smarter caching systems that until now did not make economic sense.
@stevetodd7383 Рік тому ⁺²
There’s a very good reason for split I and D caches - they allow simultaneous fetching of instructions and data. A pure Von-Neumann design (shared instruction and data memory) can only execute one instruction every other clock cycle (one instruction fetch followed by a data access relating to that instruction). Modern cores are all modified Harvard designs, that allow simultaneous fetching of instructions and data access via the two different caches. They are also quite small compared to later caches in the scheme, so unifying them will save little space.
The better solution to the problem is 3D stacking and using simpler/cheaper process nodes to create cache layers. This actually gets the cache closer to the point of use while letting you increase sizes.
@TheEVEInspiration Рік тому
@@stevetodd7383 I understand those points and I think it's an argument that has been loosing validity for some time now.
Ever since the introduction of the level-0 uOp cache, the effect of large level-1 instruction caches has been going down. And those level-0 caches are getting bigger every generation!
There is a saving to be had there for sure. By making them smaller, but smarter. For example by increasing set associativity or as I suggested by storing only meta-info/tagging relevant cache-lines in L2 as being used for code.
As both L1 caches are fed from L2, there already is concurrent fetch capability at that level. I1 Cache is virtually all about lowering latency for non-decoded instructions! A smaller cache that speeds up the decoding would give the same benefit as todays caches. Putting some of the cache area towards a bigger uOp cache will see more benefit I think (at least that is the trend right now).
As for Die stacking, that is all about level 2, not level 1 caches. This also speaks in favor of the idea of a smaller L1 instruction cache as the code will be in that extra large L2 anyway.
Level 1 instruction is simply between a rock and a hard place (the much faster already decoded uOp cache and the much larger and extendable L2 I+D cache).
And there is another trend looming, sharing massive level 2 caches between cores! That can be a huge transistor count saving architecture feature.
@stevetodd7383 Рік тому
@@TheEVEInspiration a cache only accesses the next higher level in the case of a miss. At this point there is typically a burst of activity while a cache line is written or read. Because of this I and D caches don’t typically access the L2 concurrently. Each level of cache has a progressively higher miss cost, and then adding multi-port access adds more. The I and D caches are deliberately small and fast. L2 is larger and slower, L3 larger and slower again. The job of the I cache is to keep the instruction decode pipe fed as much as possible. That pipe results in L0 uOps, but there’s a higher penalty if L0 misses and you have to go all the way to L2. The job of the D cache is to keep the data needs of the uOps fed as much as possible while avoiding the need to go to L2 again.
There’s a reason that we don’t just have a single layer of cache. Big and complicated caches are slow. Cache models are a trade-off between the need to maximise hits and the time to return cached data.
Oh, and to add to that, L0 cache is in the form of VLIW instructions that are far from compact. You’ll not get efficient use of space if you try for a large boost to the L0 to make up for no I cache.
@yujaeha Рік тому
Amazing info. Thanks 🙏
@MarianRambo1 Рік тому ⁺²
4:10 You forgot to mention about ryzen 5800X3D witch has 96 MB l3 cache.
@tjtjmich16p Рік тому
Dude your channel will explode with subscribers and viewers it's already happening now many recommendations from your channel is what UA-cam's algorithm is showing me and many more tech nerds out there so expect huge growth and you will reach 100 thousand subs before you know it,
And awesome content by the way,
Really well edited and well thought out videos,
And I really like your accent it makes you sound like a tech company owner.
@HighYield Рік тому
It’s a bit overwhelming right now to be honest, but I’ll manage. Thanks for the kind words!
@NTeKLullaby Рік тому
Great and concise video. Thanks.
@BlenderRookie Рік тому ⁺¹
Bigger dies are inevitable, along with wider memory busses. Transistors and d latches(or whatever they are called these days), can only get so small and transistors can only switch so fast. The eventual step is wider word processing and wider memory word accessing. But hey, I am old and when I was into the nitty gritty of this stuff, CPUs were running typical TTL voltages of about 5 volts. So yeah, I'm expired.
@electronash Рік тому ⁺¹
This is weird. I just bought a Ryzen 9 5900X, to upgrade a 3200G in my second PC.
When I was comparing it to chips like the 5800X3D, I noticed the different in L3 cache sizes, and wondered how much area cache must be taking up on the chip.
I figured that a BIG part of the cost of the chip is the cache, since even 32MB will take up quite a large area of the silicon.
I didn't realize there was a problem with SRAM cell size on the smaller nodes, though. Interesting vid.
If only SRAM was somehow smaller and simpler to produce, we would likely never have needed to use DRAM at all.
I've often wondered how fast a PC would be if it's main RAM could use SRAM instead of DRAM.
(Modern DDR SDRAM is FAST, but the latency is still high compared to what I would think SRAM could do.)
@BiggySeth Рік тому
Need to start using nvme drives directly to the gpus to help compensate?
@NootNoot. Рік тому ⁺³
As for chiplets and specifically future RDNA designs I wonder if moving from a N6 to N5/N3E MCD would even be worth it? And although it seems TSMC has hit a dead end with SRAM scaling, I wonder how well other foundries are doing. Like for example as you say, Intel is using some TSMC manufacturing for Meteor lake, and I wonder if Intel has a more efficient SRAM scaling.
This also calls for Nvidias Blackwell. They've benefited a lot from Samsung's 8nm node to a custom TSMC N4 node. While I don't doubt Nvidia to take the performance crown again, I feel like 4000 series has benefited a lot from the silicon. Will they also have a desegregated design as well or will they pull some blackmagic with further increased power draw?
Btw I think the thumbnail is great lol
@dra6o0n Рік тому ⁺³
Nvidia hasn't got much CPU experience to do proper chiplet designs like AMD or Intel does, and Apple just brute force it's engineering with lots and lots of money in R&D to poach talents for that.
Otherwise Nvidia would have pushed for chiplets sooner instead of showing a proof of concept one time and then forget about it later.
@zxuiji Рік тому
Well the can potentially create ERAM, using electrowetting and light rays it is possible to create a fast RW byte with minimal power usage, using just the position of light caught one can determine 0 or 1, could also try storing an entire unsigned integer/float with the strength of light caught
@lockmuertos Рік тому
Is it possible to stack LAYERS of SRAM cells, to create doubling and tripling setups ?
@sumeetwadile5590 Рік тому
Very well explained!
@mjdevlog Рік тому ⁺³
Great video! I really appreciated the thorough analysis of the potential problems with next-gen CPUs and GPUs. It's important to consider these issues and have a critical eye towards new technology. Keep up the excellent work!
@V3RM1LI0N Рік тому
Hey man , do you work in the fab ?
@confidential303 Рік тому
Can you do short on this what is your advice, because i am planning to upgrade gpu and maybe whole computer ..
@HighYield Рік тому
As a consumer you don’t really need to care, it’s more something companies like AMD, Intel or Nvidia have to think about.
@HablaConOwens Рік тому
What do you think about the idea of large tech companies starting a project to build new operating systems based on risc5 with less code and neural engines in mind. We could see gains from less work.
@karlogrimaldi6787 Рік тому
And throw our old programs out the window
@rayraycthree5784 Рік тому
Why can't the same transistors used in the ALUs, LUTs and controller be used to build memory flip flop cache?
@46three Рік тому ⁺²
Gamers Nexus has an interview with one of AMD's lead engineers, Sam Naffziger, who explains this exact issue as one of the key concerns that chiplet design (and 3d V-cache) aims to mitigate. Interesting chat for sure.
@46three Рік тому
ua-cam.com/video/8XBFpjM6EIY/v-deo.html
@johntupper1369 8 місяців тому
Loving your content
@Kevin-jb2pv Рік тому ⁺²
Unless we have some sort of new paradigm shift in computer hardware, these limitations are why I think we're probably going to head into an era of off-loading CPU functions to dedicated co-processors. We already did it with GPU's, and bitmining did prove that certain functions are better handled by dedicated hardware and can be done cost-effectively. Plus, NVidia has been selling dedicated, specialized GPU hardware for AI for years, now. I think we're going to start seeing more processing handled by specialized units as demands grow. Exactly which functions? I can't say. For gaming, physics is the first thing that comes to mind, but PhysX was already a thing that failed and then got absorbed back into GPU hardware. Perhaps we'll see a return of discrete physics units? We also have dedicated AI chips out there, and I believe one of the things they get used extensively for is in processing image data in some phones. It was heavily marketed a few years ago by several major players, but I don't know if that's a thing that's still being done on current-gen phones.
Point is, manufacturers have already done it and are at least trying to find other applications to offload to dedicated silicon. So far, the physical limits of semiconductors have not, yet, hit that brick wall that we've been getting warned about for years. It's slowed, but so far manufacturers have been able to use other tricks to get generational improvements in computing power, so the wider industry and enthusiast community hasn't had to feel the pain quite yet. Who knows, maybe manufacturers will be able to keep squeezing more cycles out of what we have right now for many years just because they will actually have to start doing real work on architecture re-working now that they can't just fall back on shrinking their transistors (and this, for the most part, is what we have been seeing, it's just a matter of how long they can keep doing that).
But I think that when backs are really pushed up against the wall, we'll start seeing more radical solutions start being brought to market. I think that the fact that Moore's law is just about done with will likely mean that we're about to see a _boom_ in innovative and creative new solutions because the "safe" path is no longer a viable one and corporate leadership will start being forced to try new things to stay competitive.
@ItsAkile Рік тому
Great video brother
@chibby0ne Місяць тому
This answers why chiplet design is becoming so popular lately. Thanks a lot for the well conveyed and duly researched video.
@Razor2048 Рік тому ⁺¹
What are your thoughts on CPU makers moving to add HBM to the CPUs, where it effectively becomes a massive level 4 type cache?
@tyaty Рік тому
Intel is already planning launch them in the near future . (Xeon Max)
@NoneofyourBusiness-ii1ps Рік тому
well, there is also a physical limit of how dense you can store information, which happens to be the number of bits by counting the number of Planck squares on the surface of a black hole. Basically if you pack too much information into a given space it will collapse into a blackhole, literally...
@Buciasda33 Рік тому
Pretty good explanation, keep it up.
@TellScape Рік тому
Sweet work my guy!
@635574 Рік тому ⁺¹
Maybe even more impressive are neuromorpic chips where the compute and the memory are in the same place on the chip, and they are processing asynchronously.
@kenohara4574 Рік тому
This channel has 5.16k subscribers in Dec 27 , iam writting this cuz so this will be the proof that how good and informative this channel is and how fast it will grow , this channel will hit 1 million in a year mark my words :)
@Enkaptaton 4 місяці тому
So we should be satisfied with what we already have?
Do programmers finally have to make their code efficient?
@louisfriend9323 Рік тому
What do you mean when you say, delicate, for SRAM cells? Isn't it just because their design can not use stacked transistors, which is what the new nodes are offering more of?
@loganwolv3393 Рік тому
I wonder that theoretically over a long period of time, is it possible that RAM would just disappear at least for gaming PC's at some point? Or at least you won't need it to run games? Because theoretically if you have enough CPU cache and GPU cache and v-ram, you won't need a larger centralized RAM to play games.
@fredrikl5152 Рік тому
I wonder how long performance increases will be possible with silicon, maybe we will see some other medium be used out of necessity in the next decade or two?
@lou7139 Рік тому
Chiplet future makes sense. The tiled and stacked package design on Meteor Lake is interesting but looks complicated to manufacture. Gone are the days of the simple-to-build and test monolithic die...so nostalgic.
@jabezhane Рік тому
I remember back in the mid 90's "the issue with going lower than XXnm" and then in the early 2000's the near impossible task of going past XXnm"...and so on. We keep going somehow.
@tomtomkowski7653 Рік тому ⁺⁹
Let's wait and see how well this 1nm non-silicon process TSMC and MIT are working on will perform.
And yes, chiplets is the way to go and the question is how well different companies will develop this idea with their different approaches.
@intetx Рік тому
3D stacking might never use another node. The problem is two different nodes bend differently, which I could imagine could cause issues with bonding them directly.
@jtjames79 Рік тому ⁺¹
Good. Necessity is the mother of invention.
It's actually a problem that substrates only change when you absolutely have to.
@josuad6890 Рік тому
so, is SRAM just stopped scaling for now, or is it forever? I mean, some parts just can't be shrinked again like some analog stuff, because shrinking it actually kills it's performance. But for SRAM, denser is always better. Sure, TSMC's 3N is approaching FinFET's extreme limits, but GAAFET is around the corner. Do you think GAAFET can be a remedy to this issue?
@CanIHasThisName Рік тому
Great explanation.
@gab882 Рік тому
Just curious, I would imagine in next 10 years, after Chiplet design, the next breakthrough would be graphene chips with photonics gateways/paths between the chips in a Chiplet. And then maybe another 30 years after this graphene revolution, we would hopefully get quantum computing into mainstream.
Am I right to say all this? Or are there other techniques/technology that I am not aware of?
@cyber_robot889 Рік тому ⁺¹
Wow, I'm in PC hardware like almost from 2003 year, and never ever heard about SRAM. thank you for new a and interesting information. Like a real reveal under my nose, lmao
@rembaron1318 Рік тому
Amazing Content!
@RayanMADAO Рік тому
How much slower is having the cache on a different die than having it all on one die
@HighYield Рік тому ⁺⁴
In the case of Zen 3D it isnt really any slower at all. If its done right, especially with 3D stacking, there is no performance loss.
@kkgt6591 Рік тому
Cool video, can you do more of technical videos.

Наступне

Автоматичне відтворення

Why next-gen chips separate Data & Power