I love how Wendall from Level1Techs is so popular now he can go by his first name like Cher or Madonna. I remember when he used to hide his face behind a set of monitors and nobody knew who he was, or at the very least what he looked like.
@@teekanne15the tl;dr I remember is there was some misappropriation of funds and covering up or lies about day to day. The unfortunate issues/drama that can arise in some business partnerships.
Yeah... Nah. By that logic, the OG would be the individual transistors days before ICs were invented. I'd point to the Pentium Pro as the closest OG to the chiplet concept. Two dies (cpu and cache) on one ceramic package.
@@myne00 So then by YOUR logic, DRAM in DIPP packages were the original chiplets, which is inane. A separate processor connected by a high speed bus = chiplets. But if you want to play that way, the Amiga with it's audio and graphic co-processors were the OG chiplets.
@@Mike80528 high speed bus? When your processor is slow enough like on Amiga OCS, that's just the bus. What now, will you declare the NES mapper chips on the cartridge a chiplet of the main processor, simply because they share the cpu bus?
The first half of your discussion is exactly why I want to go back to school for my masters. I don't think UCIe will be of much use for x86 based systems, but ARM powered and coupled with ASICs and ML, yes. I'd go further and say that there'll be extra fab capacity in the future, which will lead to new processes and cheaper manufacturing costs.
I'm really hoping RISC-V becomes and someone decides to use it in a consumer product. ARM's dominance based on what shouldn't even be patentable/copyrightable is insane.
@@arthurmoore9488 R-V is already used in plenty of embedded and server products. Most of the recent MCUs these days embed R-V cores. A lot of FPGA solutions use R-V as well. The thing is, it gains popularity in the overlooked parts of computing and as customers we often don't care about the hw and sw of those chips/machines. BRICs in particular are pushing R-V to get rid of US embargoes. More and more SBCs are using R-V and the software stack (mainly Linux) is getting stronger every year. It's just a matter of time to get R-V promoted as the "application core" of a smartphone/tablet/chromebook.
@@arthurmoore9488 lots of consumer products use RISC-V, it's just not (like ARM before) not customer facing, embedded devices and embedded inside of devices like Western Digital storage devices running RISC-V
I think back side power delivery chips will see good leaps in speeds. Even prototype older chips see a 6-10% frequency bump with same power usage. And add with that gate all around you could see 10%+ bumps just in packaging
I'm no engineer or anything remotely close to that, just a typical PC enthusiast. And i really like the idea of chiplets. That's why my own personal RiG is full of them, considering its currently got a Ryzen 7950X3D + an RX 7900 XTX Red Devil Limited Edition. The CPU has 3 chiplets while the GPU has 7 chiplets. So a total of 10 chiplets on two of the main components in my RiG. Pretty cool, i think 😁
Me to kind ok I only use it for gaming so delided direct die cooled 7800x3d and 7900xtx liquid Devil. Honestly never understood the 7950x3d it's worse at gaming without disabling half the cores and worse at multitasking and rendering than the non x3d chip.
i love chiclets. its an old minty candy. really. there is no pun in this comment. im glad moores law still alive. his family must be mssing him. joking aside, processing unit will go back into monolithic design once one of those company will begin producing crystal, light based processing unit. thats when CPU will be at hundreds of gigahertz.
Could there be standardization of chiplet heights/thicknesses (like, they could only be divisible by 0.200mm) to make them easier to fit together, and present a flat surface for either the heat spreader or for chiplets that connect to more than one from the top?
No. Making a transistor is a 3D thing, and transistors are about to get a lot TALLER because of the very thing Ian mentioned at the end, backside power delivery and you have to research that one to get the full implication of it, and it's also why Ian said Moore's law isn't dead because that ONE thing alone and using High-NA lithography is the next 10 years from those two things alone. High-NA lithography gives finer etching, and backside power delivery means you don't need as fine of detail for most the transistor since power and data are coming from opposite sides. You have to see an example of how a transistor is laid out using frontside for both data and power and then using frontside for only data and backside for power. If you try to set standards on ICs, you kill innovation and make it much harder for companies to do new things, which is ALSO a point Ian was getting at when he had to verbally edit what Will said when talking about open or closed IP, when Ian said "is it open and everyone can use it, or is it closed and I can make money"? If you kill the ability for real innovation by having an open standard, you're also killing the ability for a company to do something new that can't go by that standard but can make them a lot of money because the product is much better. So, do you absolutely have to have a standard for Z height? There's only 2 or 3 companies that can even package an SoC with different chiplets, where the chiplets are made using a "3" process node (Intel 3, TSMC N3, Samsung 3N). There's only one that does vertical stacking right now with a consumer product, TSMC. Can you have a standard for dimensions other than height for an IC? NO WAY IN HELL. That's basically telling the 100s of chipmakers around the world that they have to limit the number of transistors that can be on a die. The chipmakers would all have a good laugh at that one. And as a side note, backside power delivery could allow for another 1GHz of clock speed added to CPUs, maybe even more, so this isn't a trivial thing, it's a REAL change in how transistors are laid out on the silicon that has different benefits and it's why both TSMC and Intel are moving to making die with that method as fast as they can. However it really wasn't needed so much before now as they want to go down to 20A or 17A.
@@johndoh5182 Moores law has been revived, previously it was revived from finfet in 22nm and now with going in the vertical dimension. There is so much more potential and the companies are just starting, from vertical transistors in the future (ie current flows up ) to compute in memory, and this is all in sillicon, dont get me started on CNT potential with graphene. Thats my 2 cents, moores law has been truly dead financially since 28nm, but if you ignore money there is so much more potential
The market you desire for chiplets already exists for semiconductor IP blocks but once a company pays for a license, I believe they can simulate the whole chip rather than a black box model. Once the power and signal interfaces are standard, a market for commodity chiplets might emerge. The real reason for chiplets is to reduce defect rates and assemble a "chip" that can exceed the reticle limits of a fab. The way this works is that every time you fab a wafer, almost every reticle step will have defects. If you are making chips at the reticle limit size, every defect creates a defective part unless you can selectively disable portions and bin it. If you are making 50 chiplets in every reticle step, a wafer sort will probably yield 48+ working chiplets, then you can assemble a "chip" that uses several chiplets to compete against monolithic designs with much lower yield. So it's a balance of yield against interposer and assembly cost.
Chiplets is good for better yields, because chiplets are small, and if it has defect, you only throw away that small chiplet, instead if you have 1 massive chip, and has defect you throw away massive chip, and thats big waste. Abit of downside is that you need to make connections with all those chiplets to make a cpu.
Moores law when I learned it back in early 2000s in college meant the transistor count would double every year. The count has stoped doubling each year since 2010. It’s not like transistor count has stopped increasing altogether. It just took longer to double. Now that process nodes are approaching one atom thickness, there is nothing smaller than atom thickness. This was Jensens’s argument. This is why he switched from high performance graphics processors to high performance AI processors. To accelerate computing by going away from high quality results through raw performance, to having lower quality results that are upsampled by AI and Machine Learning which uses a lot less raw power.
This is a very interesting conversation. Thank you ! A quick question tho, it is off-topic. When will we see ARM processors as the mainstream processors for desktop? It is clear that ARM is superior to the currently dominating x86_64 architecture, the only obstacle I see are the software developers. Am I seeing it correct? Will Intel and AMD adopt and change their profile? If yes, which will do it first?
Probably never. Firstly, it's not clear that ARM is superior to x86. Apple made a lot of changes to ARM specs to get their perf. They also made a lot of software refinements, like a console manufacturer. Secondly, the large sotfware collection on x86 put a severe inertia for PC user to want to switch on an incompatible system. Again, Apple made a lot of work for the few chips they provides. Thirdly, ARM is pretty much an embedded design, with a risc core whereas x86 is more flexible by essence. It's way less convenient to have a bootrom rather than a full bios (even though uefi are getting more popular on ARM) and the famous micro-ops on x86 are allowing much more performance uniformity running a programs on various hardware than on ARM where it's often required to recompile to get the most of each chip design. There is also no microcodes on ARM, which means defects and vulnerabilities have to be fixed at the OS level, which is an issue when Windows itself is barely capable of doing that even on x86. Finally, ARM itself is slowly dying. They are stuck between the hammer (Risc-V, which eats the marketshare from the low and it will get worse and worse) and the anvil (the good old x86 which keeps evolving) Only Apple and its closed ecosystem can successfully bring ARM to the laptop market.
Their is a LOT of existing software. So it very much depends on how widespread good x86 translation we got, the one from Apple is probably the best/high performance, but they have that and nobody else has that right now.
This question and the likes of it "when will PC go RISC" has been hovering for 25 years or longer. The consensus that emerged 20ish years ago is that a high performance CPU is fundamentally so big and complex, that the added complexity of X86 CISC parsing really isn't adding that much overhead, and they've been getting bigger and more complex since, and the die area and power requirements of cache is also increasing as the memory becomes relatively slower. In turn the instruction set extensions are causing ARM trouble as well. The success of Apple M series processors is likely to be a whole system engineering win since they could go full custom instead of building around an off the shelf product rather than "because ARM".
Jensen has a vested interest in saying Moore's law is dead. If you could get a better product for cheaper just by waiting a year or two, you'd be less likely to buy what he's currently selling, right? It's his job to convince you now's as good a time as any to spend your money.
But we (royal we, the industry or human race) are still delivering a lot more bang for the buck every generation and the generations aren't taking much longer than before.
Moores law simply states that the number of transistors that can be placed on a chip will double every 2 years. It doesn’t say that those transistors will be cheaper or faster. The law is (more or less) on target to hold for another 10 years or so, but how those transistors are used will need to change.
@@stevetodd7383 the biggest bottle neck will be the power draw. (apart from the cost) as each generation is not seeing the same power reduction per transistor. they could get away with it for a few generations by just keep cranking up how much power the chip needs. that time is also coming to an end. i give it 10 years. by that time they will have pretty much exhausted all reasonable options.
Moore's law died about 10 years ago, when intel started playing PR games by adding +++ to their 13nm process. Now they all just colluded to use same naming scheme (e.g. Intel 3, TSMC N3, Samsung 3N), which implies by its name that it's 3nm, but actually isn't.
Sounds like it would allow more freedom in how to design the connections on both sides of the die, and involves one or two of the same steps that need to be done for stacking.
There's a channel called High Yield. Research his videos. You're welcome. Dude, backside power delivery is INSIDE the IC, so it has nothing to do with integrating chiplets. It was brought up when talking about Moore's Law being dead. The reason it came up there by Ian is because he understands the implications of it. Go watch the video done recently by High Yield and you'll figure out. There's images of how transistors could be made using traditional single side die or moving to using the backside of the die for bringing in power to the transistors.
@@whyjay9959 It has nothing in common with stacking. It simply means the power is coming into the transistor from the top while the data is coming in from the bottom and it makes the transistor WAY different, to where it's much taller. To make the transistors this way requires the die gets flipped when building the layers for the backside, or top side really. The benefits are many. EVERY company that's continuing on the path of advancing to smaller nodes will use it. Moving to High-NA lithography combined with backside delivery is going to allow for the next 10 years of advancement to 17A, maybe as far down to 14A. And as I said to the OP, go watch the video made recently by the channel High Yield. It gives a really good explanation of this.
@@johndoh5182 That did seem like a great video and I'm basing much of my attempted understanding on it. At least one method of stacking involves grinding the inert backside of the chip to expose connections(TSV) made through the top side. BPD also involves grinding the backside, followed by building new connections on it. The implementation discussed in that video apparently placed fine internal connections on one side and coarse external connections(both power and communication) on the other, but I assume the same method could be used to give both sides a more traditional mix of them(still with fewer constraints since you have more space to work with) and create more useful external connections in both directions.
@@whyjay9959 So I think what you said is more relevant only if you want to stack the die, because you would have to build in the connection points for the data, to which interposers would come in contact with and that would come in on the side the power is on. So, I didn't pay much attention to that part because of the newer interconnect AMD will move to with Zen 6. I don't think AMD will need to stack cache once we get to Zen 6. I don't know what AMD will actually do, but they won't need to stack cache. TLDR: An L3 chiplet that sits beside the core chiplets is a FAR better solution than stacking. The full explanation: You should understand that what's happening when stacking cache is you're making a DIRECT connection between logic on the two die. I think you need transmission gates still, which use larger current, maybe a little more voltage to get the data across that interconnect and then also receive data on the other side. So, you are sending data across a parallel data path just like what happens internally, inside a die with the digital logic of a CPU core. The speed isn't quite as fast as the clock speed of the CPU but it's fast enough for a secondary bulk amount of L3. Zen 6 should move to DIRECT connections between the different die, and this is why I'm waiting for Zen 6 to upgrade from Zen 3 because it's going to be far superior. High Yield also has a video about Zen 6 and the new possible interconnects. You should watch that one too. This has been rumored on so it's already pretty well known Zen 6 is a new MCM architecture. The point is, it doesn't matter if the direct connection is on a die that's stacked on another die, or you have two die sitting side by side with direct connections (no signal is touching a PCB). Either way, you can clock this faster, and the logic of one die is physically connected to the logic of the other die. There are multiple problems AMD has with stacking cache. It's not really ideal because you stack one die on another die that you want to clock as fast as possible which generates more heat. For both Zen 3 and 4 you have to clock the CPU slower, or at least the core chiplets with the stacked die and this is because of that built up heat. Next, you get a higher failure rate. I've built 8 systems using the 5800X3D (I build for people) and 2 of the 8 CPUs throttled because die got too hot, so they failed to regulate themselves properly, so I had to RMA 25% of the 5800X3D CPUs I initially bought. That's a TERRIBLE failure rate. I wasn't surprised to see 5600X3D parts after the 5800X3D had been out for over a year. I'm sure AMD had lots of stacked chiplets that couldn't meet the specs of a 5800X3D and they needed to do something with all that. The last problem for stacking is what happens when you have a two core chiplet part. AMD chose to stack cache on one core chiplet but not the other core chiplet. It's not a great solution, but it allows one of those chiplets to run faster so it's better for workloads. However it creates a problem for the Windows scheduler, mostly for games. The solution for all this should be easy to see, which would be an L3 chiplet sitting beside the core chiplets and have direct connects to any of the core chiplets. This would allow a part with two core chiplets to share the same pool of L3 and this is a MUCH better solution.
Of Course Jensen says Mores Law is Dead. He wants to sell you Stuff. Meaning if he says that then he can also say "but dont worry, we still are able to bring you 2 Times more Performance but it just costs more now so YOU STUPID PPL PAY, thanks". Thats all there is to it. The other Day he said Mores Law is going 2 Times faster. He just is a (very) very good Salesman with actual Vision too. I give him that. But thats it. He is not some God like Human.
moores law is absolutely on its last legs, cost of transistor is now going up. power savings per generation going down. thermal envolopes are getting thrashed.
Moore's law the way it was originally defined, IS dead. Stop trying to twist it to mean other things, stop sticking to this old rule that was made simply as a guide, people act as if it's inscribed on stone tablets and we shall never disrespect it. Moore's law served its purpose for decades but due to the increased complexity of modern manufacturing is no longer viable, get over it.
I'm pretty sure probabilistic machines like AI will be built on entirely new architecture in the future. In the form of phased analog storage arrays. Were weights will be stored in the numeral structure of the analog cell. Existing binary systems will be phased out eventually for these systems. Analogue systems are terrible be deterministic solutions. But for probabilistic solutions can use analog circuits.
Moving from monolithic to chiplet would be like doing a downgrade from Sandy Bridge to Core 2 Duo. I want more architectures like Alder Lake, less than Meteor Lake.
im not a fan of chiplets because of the unpredictable numa-like latency introduced when data is non local and has to traverse the slow interconnect fabric
AMD's Chiplet GPU architecture simply didn't deliver... 7900 series have horrible power consumption on low load tasks like video playback, office work & web etc. * 65W is NOT understandable on UA-cam / Netflix / VLC etc. 🤔
In theory zen 1/zen+ didnt deliver either (the way they used chiplets for those processors) but it was fixed with zen2 and the same concept is still being used today (and most likely in zen 5 too) Rumors is that zen 6 will be a complete redesign in chiplets
You don't know shit. I am running an 7900xt right now. 170hz 1440p and 1080p 60hz displays both running while playing this very video at 4k 60hz. The card is currently pulling 40 watts. If I lower the clock to 165hz the memory idles and it falls to 10w. Use freesync that's what it was made for. Lowering power draw.
Backside power delivery is going to easily get TSMC and Intel down to 17A using ASML High NA EUV lithography. There you go, that's the next 10 years right there. Geez I thought there would be better questions for Ian on this topic. Too much focus on getting to some kind of happy place, but in a field where change is always happening there is no such thing. I thought it was a bit silly to ask about AMD or Intel opening up their chips, which they GUARD as MUCH AS POSSIBLE since that IP is their life blood. It will be either AMD, Intel, or Nvidia for now doing this kind of packaging on a SoC or a custom circuit like those large graphics/AI processors. I mean there are other companies that are trying to do the same thing and you think Intel or AMD is going to hand over a CPU chiplet for someone ELSE to make money off their parts? Once again CPUs are the life blood of those companies, and THEY'LL be the ones making custom packages like they already are. Anyone else wanting to do some of the same things will have to use ARM.
@@skilletpan5674 Do you have a good understanding of backside power delivery? Ian said engineers have planned out through about 2045 AND know how they're going to achieve the higher densities. He's an engineer, ask him.
Hahahaha....what a submissively bribed video to try to put doubt into illiterate consumers about AMD cpu-s....while the fact is that AMD is winning by performance, efficience and aecurity so clearly that Shintel is practicaly dead in the water.
@@GetOffMyyLawn In reality, AMD was the only true innovator most of the time....Intel was only good on stealing proprietary tech and bribery of the review outlets.
Thank you for Adam NOT doing this one so I can watch it because he says some of the most idiotic things. You can now buy AMD chiplet based CPUs for as low as $120, which is the price I've seen a 5600 going for at Micro Center in the US. However that's using AMD's IF and a serial interface which isn't great. AMD wants to move away from that interface and move to direct connects which is far superior and what they're doing with RDNA 3 graphics, for any GPU that's the 7700 XT or better and in fact that interconnect may be what AMD moves to for Zen 6.
Adam is supposed to be the 'everyman'. Also yeah AMD chiplets are getting mature now so the cost is rather nice. The real draw for AMD though is they can do much more effective binning of the silicon. This really reduces the costs. It's also why they have lots of X3D chips now. Any gains AMD can make with Infiniry Fabric definately show. The older ryzen cpus are noticeably slower than the newer generations because of clock speed and latency inprovements. I really want to see 3d stacking get better but the heat issue is a thing.
@@skilletpan5674 Adam is a stupid "everyman". I can't watch PCWorld since he's become the main voice because he says so many idiotic things. "Any gains AMD can make with Infiniry Fabric definately show" Actually the IF is already affecting the performance of Zen 4 in a negative way, it will affect Zen 5 even more, and for Zen 6 AMD moves to direct connects. AMD can't make more gains using their Infinity Fabric unless they want to become like Intel and make 300W CPUs. Do you understand how the IF works? Do you understand WHY direct connects are FAR superior to it? A direct connect is a way of connecting two chiplets using typical transmission methods that are INSIDE an IC, meaning parallel transfers at the full clock speed of the die which beats the CRAP out of serial transfers. It also means that there is nothing inbetween the logic on the two die other than the transmission transistors (amplifiers). Now, transmitting data using parallel transfers from one die to another can't be achieved at the full speed that these newer CPUs can run, as in 5+ GHz, but these newer interconnects allow for MUCH faster transfers than what was possible in the past. For the Infinity Fabric, you have serial transmission between all the die. That's a data conversion already. There's ALSO a dedicated circuit on the central die, which in this case is the IOD, and that dedicated circuit receives and transmits data to/from the correct places. That circuits acts like a very complex multiplexer. If I have a 2 core chiplet die, say a 16 core 5950X, and core 1 on core chiplet 0 has to get data from core 2 on core chiplet 1, the data has to pass from core chiplet 1, coverted to serial data, transmitted to the IOD, passes through the Infinity Fabric multiplexer, gets transmitted to core chiplet 0, gets converted back to parallel and then passed to core 1. You want data from memory? Data comes off DRAM to the IOD, goes to the IF multiplexer and then gets sent to the die with the core that needs the data. Everything passes through the IF multiplexer before it can get to a core chiplet or before data from a core chiplet can go anywhere else. The REALLY hard problem to solve is doing serial data transfers at ever faster speeds and the fact is you can't without starting to eat up a lot more power AND also getting to the issues of signal degradation. Serial data transfers are fine for transmitting data from inside the CPU to something OUTSIDE the CPU. But speeds inside a CPU are so fast that serial transfers are a limiting factor. "I really want to see 3d stacking get better but the heat issue is a thing." Zen 5 will still use 3D stacking. Zen 6 might not because once they move to direct connects they could move data just as fast to a chiplet that's beside the core chiplet. Stacking is a heat issue as was talked about. It's also a cost issue. For AMD to make a CPU with 2 core chiplets AND have stacked cache on both it adds costs to both chiplets AND both chiplets have to run slower. AMD can get rid of both these problems by moving away from stacking and have an L3 cache chiplet, where two different core chiplets can directly connect to that cache, so NOW you would have a shared L3 cache between ALL cores of a two core chiplet CPU and that's a far superior solution. Last point. AMD's IF was fine for Zen 2 - 3 which had slower clock speeds than Zen 4, and for Zen 2 the IPC wasn't nearly as good as Zen 4, so Zen 4 is a much faster CPU than Zen 2, which is where AMD first used this MCM design. Zen 4 has something like a 27 - 30% IPC uplift over Zen 2 and the clock speeds are about 1GHz faster across the different parts. Zen 3 was still fine with the IF, but Zen 4 being much faster suffers from the latency, and it also takes a hit from the memory limitation where even if the mem controller can allow for 7200 DRAM, you can't get the data to the cores at that speed because about the best it gets with IF clocks is 3200MHz which is a 1:1 with a 6400 kit, and that's if you won the die lottery. After that, the speed of the IF limits the speed of getting the data across that IF logic on the IOD and then doing that serial transfer to the core chiplet that needs the data.
@@johndoh5182 "Blah blah wall of text. Do you understand why parallel is better than serial". Yes, I do. You sound like a bit of a pretentious guy. IF is what's being used right now so any gains they can make with it are fantastic as we've seen over the last several generations. It's a general BUS so multiple ICs can talk to each other. The benefits of serial vs parallel obviously worked for them. There's a reason why SCSI died and SAS took over. There's also a reason why centronics cables stopped being used in favor of USB. There are more reasons for these things generally than most people know about. I'm not an engineer but I know some things. I also know that the guys and gals at AMD designed IF for a reason. That reason has obviously been for things like cost and ease. If IF was so bad then EPYC would be using something else. Either way the future looks much brighter because whatever they design and make in the future will benefit from the IF designs they have done in the past. Either way I'll be happy with whatever low cost solutions they come up with. Also hopefully intel will get their act together as well.
@@skilletpan5674 You're talking about buses OUTSIDE of a chip. There's a large difference between transferring data from a CPU, through a socket, across foil runs, through ANOTHER socket and then into another component like DRAM, or a GPU, NVMe, etc....., or transferring data INSIDE a chip. Parallel transfers at full clock speed which is what happens inside a die is WAY faster, especially when these buses get doubled to 128 bit vs. 64 bit. Or, you move to AVX-512. You can do the math because I don't want to sound pretentious. 128 bit x clock speed. How many bits a second when the bus is a double word bus? A word being 64 bits. However AMD couldn't move to true AVX-512 because they said the transmission was too problematic, and yes you're trying to clock 512 logic gates at exactly the same time and run the CPU at near 6 GHz, so they stuck with doubling up 256 bits, which means they're clocking 256 bits on a parallel transfer at near 6 GHz. Once again you can do the math on that one. I promise you it beats the CRAP out of anything you mentioned. The current interconnect AMD uses means data has to pass from a transmission circuit, out of the IC, across wires on a PCB (the small PCB that everything for the CPU fits on), then into the IC for the IOD, then into a receiver, sending data to the IF multiplexer. Yes they use serial transfers for this. On the other hand for stacked cache that uses parallel transfers. If you couldn't, you couldn't have stacked cache because it wouldn't be fast enough. For the technology AMD will move to, the PHYSICAL connection between the two die will be wires running from one IC directly into the other. Direct connect. The signal doesn't touch a PCB. That's similar to how stacked cache works, direct connects between the logic on one die to the logic on the other die. This can be done with parallel transfers. You don't care about this much or you wouldn't have made the point you did because you'd already understand the HUGE difference between signals going across a PCB or signals moving inside an IC and without that kind of understanding you will always make false assumptions. Serial is great data going across cables, or PCB. It's not so good inside a die, UNLESS that die is clocked slower. All this adds up to 7 years of working with the current MCM architecture and AMD can't clock the IF faster than about 3 GHz. On the other hand INSIDE the die using parallel data transfers AMD can clock a bus close to 6 GHz. A MUCH wider bus. And the FULL problem of a parallel transfer becomes harder the wider the bus gets. You are at the same time, enabling logic gates to put data onto a bus, and then almost instantaneously enabling destination logic gates to sample the data on that bus and this is all managed with clock signals. If you have a 512 bit wide bus for AVX-512 for instance you have to enable 1024 gates at the same time, as fast as the CPU will clock. And the faster you try to do that, the more distorted the leading edge of these signals get which then delays when you can sample the data on that bus. But it's STILL much faster than serial transfers, and much easier to create the logic. Some day try a little reading.
I love how Wendall from Level1Techs is so popular now he can go by his first name like Cher or Madonna. I remember when he used to hide his face behind a set of monitors and nobody knew who he was, or at the very least what he looked like.
Tek syndicate was a good era before they went all crazy. Except Wendell, love Wendell
I once heard he cloned himself. But we don't talk about that.
@@phthano2580 I'll admit I do miss the dynamic they had. Sad about the falling out it's never a good thing really. But happy for Wendell
@@2muchjpop what happened? I just lost interested in their rambling about accoustics
@@teekanne15the tl;dr I remember is there was some misappropriation of funds and covering up or lies about day to day.
The unfortunate issues/drama that can arise in some business partnerships.
Less binning, moooore winning 😅
Floating point co-processors were the OG PC chiplets...
im glad someone said it
Yeah... Nah.
By that logic, the OG would be the individual transistors days before ICs were invented.
I'd point to the Pentium Pro as the closest OG to the chiplet concept. Two dies (cpu and cache) on one ceramic package.
@@myne00 So then by YOUR logic, DRAM in DIPP packages were the original chiplets, which is inane.
A separate processor connected by a high speed bus = chiplets.
But if you want to play that way, the Amiga with it's audio and graphic co-processors were the OG chiplets.
@@Mike80528 high speed bus? When your processor is slow enough like on Amiga OCS, that's just the bus. What now, will you declare the NES mapper chips on the cartridge a chiplet of the main processor, simply because they share the cpu bus?
Maybe we will be buying NPUs in the future.
The first half of your discussion is exactly why I want to go back to school for my masters. I don't think UCIe will be of much use for x86 based systems, but ARM powered and coupled with ASICs and ML, yes. I'd go further and say that there'll be extra fab capacity in the future, which will lead to new processes and cheaper manufacturing costs.
I'm really hoping RISC-V becomes and someone decides to use it in a consumer product. ARM's dominance based on what shouldn't even be patentable/copyrightable is insane.
@@arthurmoore9488 R-V is already used in plenty of embedded and server products. Most of the recent MCUs these days embed R-V cores. A lot of FPGA solutions use R-V as well. The thing is, it gains popularity in the overlooked parts of computing and as customers we often don't care about the hw and sw of those chips/machines.
BRICs in particular are pushing R-V to get rid of US embargoes. More and more SBCs are using R-V and the software stack (mainly Linux) is getting stronger every year. It's just a matter of time to get R-V promoted as the "application core" of a smartphone/tablet/chromebook.
@@arthurmoore9488 lots of consumer products use RISC-V, it's just not (like ARM before) not customer facing, embedded devices and embedded inside of devices like Western Digital storage devices running RISC-V
@@arthurmoore9488RISC-V is already in many consumer level devices, maybe not the main part in smartphones & laptops but in aittle corner of an SoC
I think de8auer has a video with one of those ibm chiplets
We saw this same idea with the Pentium II chip that moved L2 cache off die and connected it via a "backside bus." That was way back in 1997.
Was the APU inside the Intel NUC featuring Vega graphics an early attempt to horizontally integrate a chiplet design?
There's also Intel Clarkdale from even earlier, which had a dual core CPU on one 32nm die, and an integrated northbridge on another 45nm die.
I think back side power delivery chips will see good leaps in speeds. Even prototype older chips see a 6-10% frequency bump with same power usage. And add with that gate all around you could see 10%+ bumps just in packaging
Ian starting off strong with a LotR reference. 🤓
Really cool to hear where we are with chiplets, and where we're going. Thanks, guys!
I'm no engineer or anything remotely close to that, just a typical PC enthusiast. And i really like the idea of chiplets. That's why my own personal RiG is full of them, considering its currently got a Ryzen 7950X3D + an RX 7900 XTX Red Devil Limited Edition. The CPU has 3 chiplets while the GPU has 7 chiplets. So a total of 10 chiplets on two of the main components in my RiG. Pretty cool, i think 😁
Your chiplets can't compile shaders without chugging what a waste of silicon.
@@elgoblinospark2949 you're a charmer for sure.
@@jeffjohnstone4362 Being a "charmer" Is moral Instead of being a shill that sells people snake oil.
@@elgoblinospark2949 There you go again!
Me to kind ok I only use it for gaming so delided direct die cooled 7800x3d and 7900xtx liquid Devil. Honestly never understood the 7950x3d it's worse at gaming without disabling half the cores and worse at multitasking and rendering than the non x3d chip.
i love chiclets. its an old minty candy. really. there is no pun in this comment.
im glad moores law still alive. his family must be mssing him.
joking aside, processing unit will go back into monolithic design once one of those company will begin producing crystal, light based processing unit. thats when CPU will be at hundreds of gigahertz.
It's awesome to see NotThatWillSmith again! It's been a long time!
Could there be standardization of chiplet heights/thicknesses (like, they could only be divisible by 0.200mm) to make them easier to fit together, and present a flat surface for either the heat spreader or for chiplets that connect to more than one from the top?
No. Making a transistor is a 3D thing, and transistors are about to get a lot TALLER because of the very thing Ian mentioned at the end, backside power delivery and you have to research that one to get the full implication of it, and it's also why Ian said Moore's law isn't dead because that ONE thing alone and using High-NA lithography is the next 10 years from those two things alone. High-NA lithography gives finer etching, and backside power delivery means you don't need as fine of detail for most the transistor since power and data are coming from opposite sides. You have to see an example of how a transistor is laid out using frontside for both data and power and then using frontside for only data and backside for power.
If you try to set standards on ICs, you kill innovation and make it much harder for companies to do new things, which is ALSO a point Ian was getting at when he had to verbally edit what Will said when talking about open or closed IP, when Ian said "is it open and everyone can use it, or is it closed and I can make money"? If you kill the ability for real innovation by having an open standard, you're also killing the ability for a company to do something new that can't go by that standard but can make them a lot of money because the product is much better. So, do you absolutely have to have a standard for Z height? There's only 2 or 3 companies that can even package an SoC with different chiplets, where the chiplets are made using a "3" process node (Intel 3, TSMC N3, Samsung 3N). There's only one that does vertical stacking right now with a consumer product, TSMC.
Can you have a standard for dimensions other than height for an IC? NO WAY IN HELL. That's basically telling the 100s of chipmakers around the world that they have to limit the number of transistors that can be on a die. The chipmakers would all have a good laugh at that one.
And as a side note, backside power delivery could allow for another 1GHz of clock speed added to CPUs, maybe even more, so this isn't a trivial thing, it's a REAL change in how transistors are laid out on the silicon that has different benefits and it's why both TSMC and Intel are moving to making die with that method as fast as they can. However it really wasn't needed so much before now as they want to go down to 20A or 17A.
@@johndoh5182 Moores law has been revived, previously it was revived from finfet in 22nm and now with going in the vertical dimension.
There is so much more potential and the companies are just starting, from vertical transistors in the future (ie current flows up ) to compute in memory, and this is all in sillicon, dont get me started on CNT potential with graphene.
Thats my 2 cents, moores law has been truly dead financially since 28nm, but if you ignore money there is so much more potential
The market you desire for chiplets already exists for semiconductor IP blocks but once a company pays for a license, I believe they can simulate the whole chip rather than a black box model.
Once the power and signal interfaces are standard, a market for commodity chiplets might emerge.
The real reason for chiplets is to reduce defect rates and assemble a "chip" that can exceed the reticle limits of a fab. The way this works is that every time you fab a wafer, almost every reticle step will have defects. If you are making chips at the reticle limit size, every defect creates a defective part unless you can selectively disable portions and bin it. If you are making 50 chiplets in every reticle step, a wafer sort will probably yield 48+ working chiplets, then you can assemble a "chip" that uses several chiplets to compete against monolithic designs with much lower yield.
So it's a balance of yield against interposer and assembly cost.
Great video. Thanks.
PS: Love Ian's t-shirt (of course, I have the same )
Dude, I was wondering while where Will went for the longest time
Chiplets are the way to the future ! Even monolithic die had their cons, especially the cost as they got bigger.
sup with the closed captions?
Chiplets is good for better yields, because chiplets are small, and if it has defect, you only throw away that small chiplet, instead if you have 1 massive chip, and has defect you throw away massive chip, and thats big waste. Abit of downside is that you need to make connections with all those chiplets to make a cpu.
Looking at the kaleidoscope of definitions, at this point it's "Moore's general suggestion"
Didn't I hear AMD is doing this with some of it's GPU cores on some RISC-V boards?
Moores law when I learned it back in early 2000s in college meant the transistor count would double every year. The count has stoped doubling each year since 2010. It’s not like transistor count has stopped increasing altogether. It just took longer to double. Now that process nodes are approaching one atom thickness, there is nothing smaller than atom thickness. This was Jensens’s argument. This is why he switched from high performance graphics processors to high performance AI processors. To accelerate computing by going away from high quality results through raw performance, to having lower quality results that are upsampled by AI and Machine Learning which uses a lot less raw power.
This is a very interesting conversation. Thank you !
A quick question tho, it is off-topic. When will we see ARM processors as the mainstream processors for desktop? It is clear that ARM is superior to the currently dominating x86_64 architecture, the only obstacle I see are the software developers. Am I seeing it correct? Will Intel and AMD adopt and change their profile? If yes, which will do it first?
Probably never.
Firstly, it's not clear that ARM is superior to x86. Apple made a lot of changes to ARM specs to get their perf. They also made a lot of software refinements, like a console manufacturer.
Secondly, the large sotfware collection on x86 put a severe inertia for PC user to want to switch on an incompatible system. Again, Apple made a lot of work for the few chips they provides.
Thirdly, ARM is pretty much an embedded design, with a risc core whereas x86 is more flexible by essence. It's way less convenient to have a bootrom rather than a full bios (even though uefi are getting more popular on ARM) and the famous micro-ops on x86 are allowing much more performance uniformity running a programs on various hardware than on ARM where it's often required to recompile to get the most of each chip design. There is also no microcodes on ARM, which means defects and vulnerabilities have to be fixed at the OS level, which is an issue when Windows itself is barely capable of doing that even on x86.
Finally, ARM itself is slowly dying. They are stuck between the hammer (Risc-V, which eats the marketshare from the low and it will get worse and worse) and the anvil (the good old x86 which keeps evolving) Only Apple and its closed ecosystem can successfully bring ARM to the laptop market.
Their is a LOT of existing software.
So it very much depends on how widespread good x86 translation we got, the one from Apple is probably the best/high performance, but they have that and nobody else has that right now.
This question and the likes of it "when will PC go RISC" has been hovering for 25 years or longer. The consensus that emerged 20ish years ago is that a high performance CPU is fundamentally so big and complex, that the added complexity of X86 CISC parsing really isn't adding that much overhead, and they've been getting bigger and more complex since, and the die area and power requirements of cache is also increasing as the memory becomes relatively slower. In turn the instruction set extensions are causing ARM trouble as well. The success of Apple M series processors is likely to be a whole system engineering win since they could go full custom instead of building around an off the shelf product rather than "because ARM".
Jensen has a vested interest in saying Moore's law is dead. If you could get a better product for cheaper just by waiting a year or two, you'd be less likely to buy what he's currently selling, right? It's his job to convince you now's as good a time as any to spend your money.
Not enabling even auto subtitles is a big detractor from your videos.
Moore's law, look as long as no one opens the box, we can just keep making up transistor nicknames.
Moores Law is absolutely slowing down. No Dennard Scaling. No SRAM scaling with TSMC N3, etc.
But we (royal we, the industry or human race) are still delivering a lot more bang for the buck every generation and the generations aren't taking much longer than before.
Moores law simply states that the number of transistors that can be placed on a chip will double every 2 years. It doesn’t say that those transistors will be cheaper or faster. The law is (more or less) on target to hold for another 10 years or so, but how those transistors are used will need to change.
@@stevetodd7383 the biggest bottle neck will be the power draw. (apart from the cost) as each generation is not seeing the same power reduction per transistor. they could get away with it for a few generations by just keep cranking up how much power the chip needs.
that time is also coming to an end. i give it 10 years. by that time they will have pretty much exhausted all reasonable options.
Moore's law died about 10 years ago, when intel started playing PR games by adding +++ to their 13nm process. Now they all just colluded to use same naming scheme (e.g. Intel 3, TSMC N3, Samsung 3N), which implies by its name that it's 3nm, but actually isn't.
This man loves his Coca
ine?
How does Backside Power Delivery integrate with chiplets?
Sounds like it would allow more freedom in how to design the connections on both sides of the die, and involves one or two of the same steps that need to be done for stacking.
There's a channel called High Yield. Research his videos. You're welcome.
Dude, backside power delivery is INSIDE the IC, so it has nothing to do with integrating chiplets.
It was brought up when talking about Moore's Law being dead. The reason it came up there by Ian is because he understands the implications of it.
Go watch the video done recently by High Yield and you'll figure out. There's images of how transistors could be made using traditional single side die or moving to using the backside of the die for bringing in power to the transistors.
@@whyjay9959 It has nothing in common with stacking. It simply means the power is coming into the transistor from the top while the data is coming in from the bottom and it makes the transistor WAY different, to where it's much taller.
To make the transistors this way requires the die gets flipped when building the layers for the backside, or top side really.
The benefits are many. EVERY company that's continuing on the path of advancing to smaller nodes will use it. Moving to High-NA lithography combined with backside delivery is going to allow for the next 10 years of advancement to 17A, maybe as far down to 14A.
And as I said to the OP, go watch the video made recently by the channel High Yield. It gives a really good explanation of this.
@@johndoh5182 That did seem like a great video and I'm basing much of my attempted understanding on it.
At least one method of stacking involves grinding the inert backside of the chip to expose connections(TSV) made through the top side. BPD also involves grinding the backside, followed by building new connections on it. The implementation discussed in that video apparently placed fine internal connections on one side and coarse external connections(both power and communication) on the other, but I assume the same method could be used to give both sides a more traditional mix of them(still with fewer constraints since you have more space to work with) and create more useful external connections in both directions.
@@whyjay9959 So I think what you said is more relevant only if you want to stack the die, because you would have to build in the connection points for the data, to which interposers would come in contact with and that would come in on the side the power is on.
So, I didn't pay much attention to that part because of the newer interconnect AMD will move to with Zen 6. I don't think AMD will need to stack cache once we get to Zen 6. I don't know what AMD will actually do, but they won't need to stack cache.
TLDR:
An L3 chiplet that sits beside the core chiplets is a FAR better solution than stacking.
The full explanation:
You should understand that what's happening when stacking cache is you're making a DIRECT connection between logic on the two die. I think you need transmission gates still, which use larger current, maybe a little more voltage to get the data across that interconnect and then also receive data on the other side. So, you are sending data across a parallel data path just like what happens internally, inside a die with the digital logic of a CPU core. The speed isn't quite as fast as the clock speed of the CPU but it's fast enough for a secondary bulk amount of L3.
Zen 6 should move to DIRECT connections between the different die, and this is why I'm waiting for Zen 6 to upgrade from Zen 3 because it's going to be far superior. High Yield also has a video about Zen 6 and the new possible interconnects. You should watch that one too. This has been rumored on so it's already pretty well known Zen 6 is a new MCM architecture.
The point is, it doesn't matter if the direct connection is on a die that's stacked on another die, or you have two die sitting side by side with direct connections (no signal is touching a PCB). Either way, you can clock this faster, and the logic of one die is physically connected to the logic of the other die.
There are multiple problems AMD has with stacking cache. It's not really ideal because you stack one die on another die that you want to clock as fast as possible which generates more heat. For both Zen 3 and 4 you have to clock the CPU slower, or at least the core chiplets with the stacked die and this is because of that built up heat. Next, you get a higher failure rate. I've built 8 systems using the 5800X3D (I build for people) and 2 of the 8 CPUs throttled because die got too hot, so they failed to regulate themselves properly, so I had to RMA 25% of the 5800X3D CPUs I initially bought. That's a TERRIBLE failure rate. I wasn't surprised to see 5600X3D parts after the 5800X3D had been out for over a year. I'm sure AMD had lots of stacked chiplets that couldn't meet the specs of a 5800X3D and they needed to do something with all that.
The last problem for stacking is what happens when you have a two core chiplet part. AMD chose to stack cache on one core chiplet but not the other core chiplet. It's not a great solution, but it allows one of those chiplets to run faster so it's better for workloads. However it creates a problem for the Windows scheduler, mostly for games.
The solution for all this should be easy to see, which would be an L3 chiplet sitting beside the core chiplets and have direct connects to any of the core chiplets. This would allow a part with two core chiplets to share the same pool of L3 and this is a MUCH better solution.
I only have 100k for my new computer, 250k seems like a lot.
These interfaces are proprietary. It is unlikely that they will be willing to give up the proprietary information.
Someone reinventing IBM Micro Miniature concept from like 1960s
Sounds like MCM (Multi-chip module).
Comments section under Bot Attack!
Wait, so you're saying Ian, has been the Google translate edition the whole time?
AI that isn't deep learning generally does not require acceleration. Just say deep learning. Please.
Nvida server gpus will have 200 billion transistors
Of Course Jensen says Mores Law is Dead. He wants to sell you Stuff. Meaning if he says that then he can also say "but dont worry, we still are able to bring you 2 Times more Performance but it just costs more now so YOU STUPID PPL PAY, thanks". Thats all there is to it.
The other Day he said Mores Law is going 2 Times faster. He just is a (very) very good Salesman with actual Vision too. I give him that. But thats it. He is not some God like Human.
moores law is absolutely on its last legs, cost of transistor is now going up. power savings per generation going down. thermal envolopes are getting thrashed.
Chip readers, insert chip, chip cards, now chiplets.... I'm not getting any royalties from this!
State of the art: 6502
Desactive subtitule
Hi it's will Smith and I'm about to slap the potato
😂😂😂
Moore's law the way it was originally defined, IS dead.
Stop trying to twist it to mean other things, stop sticking to this old rule that was made simply as a guide, people act as if it's inscribed on stone tablets and we shall never disrespect it.
Moore's law served its purpose for decades but due to the increased complexity of modern manufacturing is no longer viable, get over it.
new b650 trash problematic how to deal with?
I'm pretty sure probabilistic machines like AI will be built on entirely new architecture in the future. In the form of phased analog storage arrays. Were weights will be stored in the numeral structure of the analog cell. Existing binary systems will be phased out eventually for these systems. Analogue systems are terrible be deterministic solutions. But for probabilistic solutions can use analog circuits.
mb
Guys, I compressed a rock and made a CPU....its not rocket science
Moving from monolithic to chiplet would be like doing a downgrade from Sandy Bridge to Core 2 Duo. I want more architectures like Alder Lake, less than Meteor Lake.
im not a fan of chiplets because of the unpredictable numa-like latency introduced when data is non local and has to traverse the slow interconnect fabric
I know what NUMA is but now I have Dragostea Din Tei playing in my head can't help it.
Bot intrusion in the comments ...
red line
problems chiplet on next modern hardware
next chiplet problem on temperature bending
on chiplet era more law is dead
moores law wil never die, maybe at a slower pace maybe at exponential costs, but never dead.
its been proven time and time again
AMD's Chiplet GPU architecture simply didn't deliver... 7900 series have horrible power consumption on low load tasks like video playback, office work & web etc.
* 65W is NOT understandable on UA-cam / Netflix / VLC etc. 🤔
In theory zen 1/zen+ didnt deliver either (the way they used chiplets for those processors) but it was fixed with zen2 and the same concept is still being used today (and most likely in zen 5 too)
Rumors is that zen 6 will be a complete redesign in chiplets
You don't know shit. I am running an 7900xt right now. 170hz 1440p and 1080p 60hz displays both running while playing this very video at 4k 60hz. The card is currently pulling 40 watts. If I lower the clock to 165hz the memory idles and it falls to 10w. Use freesync that's what it was made for. Lowering power draw.
Why do these Chanel’s always have cringe openings
Backside power delivery is going to easily get TSMC and Intel down to 17A using ASML High NA EUV lithography. There you go, that's the next 10 years right there.
Geez I thought there would be better questions for Ian on this topic.
Too much focus on getting to some kind of happy place, but in a field where change is always happening there is no such thing.
I thought it was a bit silly to ask about AMD or Intel opening up their chips, which they GUARD as MUCH AS POSSIBLE since that IP is their life blood. It will be either AMD, Intel, or Nvidia for now doing this kind of packaging on a SoC or a custom circuit like those large graphics/AI processors. I mean there are other companies that are trying to do the same thing and you think Intel or AMD is going to hand over a CPU chiplet for someone ELSE to make money off their parts? Once again CPUs are the life blood of those companies, and THEY'LL be the ones making custom packages like they already are. Anyone else wanting to do some of the same things will have to use ARM.
Do the engineers whisper quietly to the chips to achieve 17A?
@@skilletpan5674 Do you have a good understanding of backside power delivery?
Ian said engineers have planned out through about 2045 AND know how they're going to achieve the higher densities. He's an engineer, ask him.
@@johndoh5182 and VFET wont start untill after 2030, there is so much more potential to squeeze out of sillicon
Hahahaha....what a submissively bribed video to try to put doubt into illiterate consumers about AMD cpu-s....while the fact is that AMD is winning by performance, efficience and aecurity so clearly that Shintel is practicaly dead in the water.
Op's avatar checks out.
Intel and Amd have been going back and forth in innovation for years... Intel is down now, so they will innovate to get back up
@@GetOffMyyLawn In reality, AMD was the only true innovator most of the time....Intel was only good on stealing proprietary tech and bribery of the review outlets.
@@Jonathan-Roden If I could post a "MWA" emoji, I would buttercup :).
Thank you for Adam NOT doing this one so I can watch it because he says some of the most idiotic things.
You can now buy AMD chiplet based CPUs for as low as $120, which is the price I've seen a 5600 going for at Micro Center in the US. However that's using AMD's IF and a serial interface which isn't great.
AMD wants to move away from that interface and move to direct connects which is far superior and what they're doing with RDNA 3 graphics, for any GPU that's the 7700 XT or better and in fact that interconnect may be what AMD moves to for Zen 6.
Adam is supposed to be the 'everyman'.
Also yeah AMD chiplets are getting mature now so the cost is rather nice. The real draw for AMD though is they can do much more effective binning of the silicon. This really reduces the costs. It's also why they have lots of X3D chips now.
Any gains AMD can make with Infiniry Fabric definately show. The older ryzen cpus are noticeably slower than the newer generations because of clock speed and latency inprovements. I really want to see 3d stacking get better but the heat issue is a thing.
@@skilletpan5674 Adam is a stupid "everyman". I can't watch PCWorld since he's become the main voice because he says so many idiotic things.
"Any gains AMD can make with Infiniry Fabric definately show"
Actually the IF is already affecting the performance of Zen 4 in a negative way, it will affect Zen 5 even more, and for Zen 6 AMD moves to direct connects.
AMD can't make more gains using their Infinity Fabric unless they want to become like Intel and make 300W CPUs. Do you understand how the IF works? Do you understand WHY direct connects are FAR superior to it?
A direct connect is a way of connecting two chiplets using typical transmission methods that are INSIDE an IC, meaning parallel transfers at the full clock speed of the die which beats the CRAP out of serial transfers. It also means that there is nothing inbetween the logic on the two die other than the transmission transistors (amplifiers). Now, transmitting data using parallel transfers from one die to another can't be achieved at the full speed that these newer CPUs can run, as in 5+ GHz, but these newer interconnects allow for MUCH faster transfers than what was possible in the past.
For the Infinity Fabric, you have serial transmission between all the die. That's a data conversion already. There's ALSO a dedicated circuit on the central die, which in this case is the IOD, and that dedicated circuit receives and transmits data to/from the correct places. That circuits acts like a very complex multiplexer. If I have a 2 core chiplet die, say a 16 core 5950X, and core 1 on core chiplet 0 has to get data from core 2 on core chiplet 1, the data has to pass from core chiplet 1, coverted to serial data, transmitted to the IOD, passes through the Infinity Fabric multiplexer, gets transmitted to core chiplet 0, gets converted back to parallel and then passed to core 1.
You want data from memory? Data comes off DRAM to the IOD, goes to the IF multiplexer and then gets sent to the die with the core that needs the data.
Everything passes through the IF multiplexer before it can get to a core chiplet or before data from a core chiplet can go anywhere else.
The REALLY hard problem to solve is doing serial data transfers at ever faster speeds and the fact is you can't without starting to eat up a lot more power AND also getting to the issues of signal degradation.
Serial data transfers are fine for transmitting data from inside the CPU to something OUTSIDE the CPU. But speeds inside a CPU are so fast that serial transfers are a limiting factor.
"I really want to see 3d stacking get better but the heat issue is a thing."
Zen 5 will still use 3D stacking. Zen 6 might not because once they move to direct connects they could move data just as fast to a chiplet that's beside the core chiplet. Stacking is a heat issue as was talked about. It's also a cost issue. For AMD to make a CPU with 2 core chiplets AND have stacked cache on both it adds costs to both chiplets AND both chiplets have to run slower. AMD can get rid of both these problems by moving away from stacking and have an L3 cache chiplet, where two different core chiplets can directly connect to that cache, so NOW you would have a shared L3 cache between ALL cores of a two core chiplet CPU and that's a far superior solution.
Last point. AMD's IF was fine for Zen 2 - 3 which had slower clock speeds than Zen 4, and for Zen 2 the IPC wasn't nearly as good as Zen 4, so Zen 4 is a much faster CPU than Zen 2, which is where AMD first used this MCM design. Zen 4 has something like a 27 - 30% IPC uplift over Zen 2 and the clock speeds are about 1GHz faster across the different parts. Zen 3 was still fine with the IF, but Zen 4 being much faster suffers from the latency, and it also takes a hit from the memory limitation where even if the mem controller can allow for 7200 DRAM, you can't get the data to the cores at that speed because about the best it gets with IF clocks is 3200MHz which is a 1:1 with a 6400 kit, and that's if you won the die lottery. After that, the speed of the IF limits the speed of getting the data across that IF logic on the IOD and then doing that serial transfer to the core chiplet that needs the data.
@@johndoh5182 "Blah blah wall of text. Do you understand why parallel is better than serial". Yes, I do. You sound like a bit of a pretentious guy. IF is what's being used right now so any gains they can make with it are fantastic as we've seen over the last several generations. It's a general BUS so multiple ICs can talk to each other. The benefits of serial vs parallel obviously worked for them. There's a reason why SCSI died and SAS took over. There's also a reason why centronics cables stopped being used in favor of USB. There are more reasons for these things generally than most people know about. I'm not an engineer but I know some things. I also know that the guys and gals at AMD designed IF for a reason. That reason has obviously been for things like cost and ease.
If IF was so bad then EPYC would be using something else. Either way the future looks much brighter because whatever they design and make in the future will benefit from the IF designs they have done in the past. Either way I'll be happy with whatever low cost solutions they come up with. Also hopefully intel will get their act together as well.
@@skilletpan5674 You're talking about buses OUTSIDE of a chip. There's a large difference between transferring data from a CPU, through a socket, across foil runs, through ANOTHER socket and then into another component like DRAM, or a GPU, NVMe, etc....., or transferring data INSIDE a chip.
Parallel transfers at full clock speed which is what happens inside a die is WAY faster, especially when these buses get doubled to 128 bit vs. 64 bit. Or, you move to AVX-512.
You can do the math because I don't want to sound pretentious. 128 bit x clock speed. How many bits a second when the bus is a double word bus? A word being 64 bits. However AMD couldn't move to true AVX-512 because they said the transmission was too problematic, and yes you're trying to clock 512 logic gates at exactly the same time and run the CPU at near 6 GHz, so they stuck with doubling up 256 bits, which means they're clocking 256 bits on a parallel transfer at near 6 GHz. Once again you can do the math on that one. I promise you it beats the CRAP out of anything you mentioned.
The current interconnect AMD uses means data has to pass from a transmission circuit, out of the IC, across wires on a PCB (the small PCB that everything for the CPU fits on), then into the IC for the IOD, then into a receiver, sending data to the IF multiplexer. Yes they use serial transfers for this.
On the other hand for stacked cache that uses parallel transfers. If you couldn't, you couldn't have stacked cache because it wouldn't be fast enough. For the technology AMD will move to, the PHYSICAL connection between the two die will be wires running from one IC directly into the other. Direct connect. The signal doesn't touch a PCB. That's similar to how stacked cache works, direct connects between the logic on one die to the logic on the other die. This can be done with parallel transfers.
You don't care about this much or you wouldn't have made the point you did because you'd already understand the HUGE difference between signals going across a PCB or signals moving inside an IC and without that kind of understanding you will always make false assumptions. Serial is great data going across cables, or PCB. It's not so good inside a die, UNLESS that die is clocked slower. All this adds up to 7 years of working with the current MCM architecture and AMD can't clock the IF faster than about 3 GHz. On the other hand INSIDE the die using parallel data transfers AMD can clock a bus close to 6 GHz. A MUCH wider bus.
And the FULL problem of a parallel transfer becomes harder the wider the bus gets. You are at the same time, enabling logic gates to put data onto a bus, and then almost instantaneously enabling destination logic gates to sample the data on that bus and this is all managed with clock signals. If you have a 512 bit wide bus for AVX-512 for instance you have to enable 1024 gates at the same time, as fast as the CPU will clock. And the faster you try to do that, the more distorted the leading edge of these signals get which then delays when you can sample the data on that bus. But it's STILL much faster than serial transfers, and much easier to create the logic. Some day try a little reading.
@@johndoh5182 one word. Cost.
Moore's low is dead in terms of performance
It was never about the performance
@@GFClocked It was
and *WHO* cares how many transistors your machine has?
@@sanji663 that's incorrect. You need to do some research