AMD's 3D V-Cache Problem

Поділитися
Вставка
  • Опубліковано 10 чер 2024
  • In this video we will take a closer look at AMD's 2nd gen 3D V-Cache and discover why AMD had to implement a lot of changes over it's 1st generation 3D stacking technology to make it fit.
    Become a supporter on Patreon: www.patreon.com/user?u=46978634
    Follow me on Twitter: / highyieldyt
    0:00 Intro
    0:24 2nd Gen 3D V-Cache Process Node
    0:57 1st vs 2nd Gen Chiplets / TSVs
    3:16 The Size Problem
    5:10 Zen 4 CCD vs CCX
    6:23 L3 & L2 Cache
    7:10 Problems & Solutions
  • Наука та технологія

КОМЕНТАРІ • 137

  • @ProceuTech
    @ProceuTech Рік тому +64

    My dad’s room mate in college back in the late 90’s did some of the fundamental research on TSVs and how to mass manufacture them. According to him, it’s really cool seeing all this stuff come to the consumer market now; imagine what will be available in 20-30 years!

    • @RM-el3gw
      @RM-el3gw Рік тому +10

      "hahaha remember when our electro brain implants used non-quantum chips? hahaha what a joke!"

    • @RobBCactive
      @RobBCactive Рік тому +2

      Interesting :) When I was designing and implementing CAD software for chip design, adding extra metal layers and linking the planes with vias was an issue but in production .. about a decade before your Dad's room mate so the fundamental research would be a decade or two further back. :)

    • @ChinchillaBONK
      @ChinchillaBONK Рік тому

      We can all be cyborgs with NANO MACHINES BABY

    • @christophermullins7163
      @christophermullins7163 Рік тому

      Certainly we will have super efficient stacks of many high logic and cache transistors. Moores law is coming to an end but this doesn't mean performance won't continue to improve.
      Some people think Moores law is still in action but I will remind you that the cost of manufacturing is also involved in the equation. If a node is 50% better but 50% more expensive.. it's a null gain. The future is all about 3d.

    • @RobBCactive
      @RobBCactive Рік тому

      @@christophermullins7163 but Moore's Law is continuing because of stacking and other tech, it's about transistors in ICs. Yes, costs of process shrinks increased, but it's Dennard scaling that ceased, it used to be shrinks brought faster, denser and more efficient, now leakage is highly significant and thermal density a real constraint.

  • @johnclondike3896
    @johnclondike3896 Рік тому +30

    One “positive” that you left out is that the cache on the main level of the chip continues to take up more and more% of the overall Chip. So this means more and more of the chip will be cache going forward, meaning more and more of the chip will be suitable for putting vcache on top. In the end, the way to “solve” this is to simply design the main level L3 cache size so that it is big enough to fit whatever amount of vcache you desire. If you don’t have enough room for the vcache by 5%… just add 5% more L3 die size. Because memory isn’t scaling anymore the chip sizes won’t decrease much anyway so the problem isn’t that big of a deal IMO.

    • @RobBCactive
      @RobBCactive Рік тому +3

      Zen4c aka Sienna might object to the "cache bloat" being inevitable, in hyperscaling they're better cutting L3 cache and giving more % area to lower clocked core logic that's layed out denser too, because it doesn't have to reach high boost clocks.

    • @mathyoooo2
      @mathyoooo2 Рік тому

      @@RobBCactive True but I doubt it'd make sense to put a cache chiplet on zen4c

  • @maynardburger
    @maynardburger Рік тому +5

    Stacking the cache chip underneath the core chip is THE solution to this. It doesn't matter that you have to route power through it since you can make it much less dense and have a ton of freedom in layout. You could also cut down the cache on the core chiplet. The downside is that you basically have to make Vcache 'standard', and AMD seems reluctant to do that, at least through Zen 5, though they are seemingly going this route with CDNA3 so we know it works. But they can gouge consumers more the way they're doing it now.

  • @adamw.7242
    @adamw.7242 Рік тому +3

    Great analysis, glad to see your channel growing

  • @procedupixel213
    @procedupixel213 Рік тому +5

    Chances are that VCache dies are designed with a bit of redundancy, i.e. with physically more SRAM blocks than nominal cache capacity. It might not even be neccessary to have a dedicated mechanism for mapping out bad blocks. The cache controller can be tricked into never hitting a bad block simply by setting tags appropriately.
    If that hypothesis were correct, then one could further assume that the amount of redundancy could have been reduced from 1st to 2nd generation (of VCache), as the 7nm process has matured and yield has improved. This would be another way to save die space.

  • @6SoulHunter9
    @6SoulHunter9 Рік тому

    I was traveling and I missed this video. Great analysis, great info. Those are somethings that I wondered. I like this channel a lot because it talks about aspects of engineering that we mortals can understand.
    Also, I think that you greatly improved your pronunciation. Not that it was bad before, but now you have less accent which is more pleasant for ears and lots of viewers put great emphasis on that.
    Keep improving!

  • @kirby0louise
    @kirby0louise Рік тому +10

    On die water cooling with microscopic pipes sounds promising, but I have no idea how realistic it is. I will say shortly after AMD showed off the new IHS for Zen 4 and said the unusual design had an engineering reason behind it I wondered if they were going to be inlets and outlets for such on die cooling. That was not the case, but I certainly thought it was an interesting idea.

    • @dan2800
      @dan2800 Рік тому +3

      Then the erosion will be huge problem because you have only few microns or at beast maybe millimeter of space to run water thru and with decent pressure that could be gone kinda quickly

    • @oMega-sm1eg
      @oMega-sm1eg Рік тому +1

      That's still quite far away from us. The nature of microscopic pipes through the silicon means it must use ultra-clean close loop coolant, otherwise it will be clogged in days. This means it would require a two-stages loop. A more realistic solution is to use a vapor chamber to replace the solid metal IHS, so it would have direct-die contact. I would expect similar level of performance from it as direct-die liquid cooling with liquid metal TIM.

    • @musaran2
      @musaran2 Рік тому

      I am expecting it too at some point.
      IMO liquid transport with nanoparticle phase change is the way.

  • @Vaxovillion
    @Vaxovillion Рік тому

    Learning so much thank you!

  • @TheBackyardChemist
    @TheBackyardChemist Рік тому +7

    I think the most likely way out is the stack inversion you have mentioned.

    • @pedro.alcatra
      @pedro.alcatra Рік тому

      Is quite hard to pass 1500+ connections thru the cache die. But not impossible.
      Let's see how they solve this

  • @solenskinerable
    @solenskinerable Рік тому +4

    i believe that heterogenic crystals can be part of the solution. for example silicon carbide has a heat conductivity about 3 times higher than silicon. synthetic diamond has about 60 times higher heat conductivity. silicon carbide is already used in HEMT transistors. i can imagine growing flat crystal heat spreaders on top of the logic, between the layers, and crystal "trough silicon vias" to transfer heat up trough the stack.

    • @needles_balloon
      @needles_balloon 6 місяців тому +2

      Likely the biggest problem when choosing materials for this is idea is differing thermal expansion rates. If the thermal expansion rates are too different, it could cause the dies to pull apart from each other when the CPU gets hot or cold

  • @koni_ey
    @koni_ey Рік тому

    Just also wanted to drop a comment. Great channel and thanks for the motivation to study for my computer architecture exam tomorrow ;)

  • @dan2800
    @dan2800 Рік тому +4

    They could slap some HBM memory on to the IO die too to be like big L4 cache
    I think that the flipping it could have potential by making big main L3 die putting on top the cores and stacking more L3 on the L3 to use minimal amount of structer silicone

  • @crazyelf1
    @crazyelf1 Рік тому +1

    I think that both solutions may have to be used.
    So, in the future, the CPU will be above the cache chiplets and in turn, multiple layers of cache chiplets will be used. Having the CPU on top should address the heat issues, although there will be engineering challenges as you've noted with routing the data and power through the cache.
    So in the future for higher end CPUs:
    - CPU on top, with cache underneath in many layers
    - This then links to an active interposer with IO die
    - Then there is an HBM L4 for higher end SKUs
    Which is the best compromise.

  • @awdrifter3394
    @awdrifter3394 Рік тому +2

    AMD has said that SRAM scaling had pretty much stopped. So no point in using smaller node for it.

    • @alexmills1329
      @alexmills1329 Рік тому

      It just AMD saying it, it’s TSMC saying they aren’t getting benefits in node shrinks for certain applications like memory and even logic is less than perfect.

  • @BaBaNaNaBa
    @BaBaNaNaBa Рік тому +2

    if it's so cheap, why didn't AMD make the 7950x3d dual 3d vcache CCD...

  • @pf100andahalf
    @pf100andahalf Рік тому

    Excellent video.

  • @justinmacneil623
    @justinmacneil623 Рік тому +1

    Thanks for an interesting review. I suspect that future iterations might end up with the whole L3 cache separated out in a chiplet rather than the current mixed situation with 1/3rd in the CCD and 2/3rds on a separate cache chiplet. Presumably with a larger L2 to compensate for slightly increased L3 latency.

    • @GustavoNoronha
      @GustavoNoronha Рік тому

      Or maybe they'll reduce the size of the v-cache and add another level of cache, L4. I remember when chips did not have even L2, maybe it's time for a new level.

    • @NoToeLong
      @NoToeLong Рік тому

      @@GustavoNoronha - Intel had L4 cache on some of their Broadwell CPUs back in the day, with an extra 128MB die separate from the main die.

    • @greebj
      @greebj Рік тому +1

      Each hop to next level of cache adds latency, as does increasing the structure to support a larger cache size. The Broadwell L4 EDRAM was only about half that of a trip out to dram (DDR3 at the time), which is pretty slow for cache. It's always a tradeoff.
      Navi31 would have had dual stacked vcache on the MCDs for 192Mb infinity cache, but AMD decided it the minimal performance gains weren't worth the added cost

    • @MaxIronsThird
      @MaxIronsThird Рік тому

      It will be like the 7900XTX GPU, CCD on the center and MCD surrounding it(separate chips) and there will be only 3D-cache on top of the L3 dies.

  • @VideogamesAsArt
    @VideogamesAsArt Рік тому

    Just found your channel, very interesting analysis. Intel meteor lake will have cache at the bottom so we will see how that compares!

  • @Maxxilopez92
    @Maxxilopez92 Рік тому +6

    You are the new AdoredTV for me. Keep going!

    • @HighYield
      @HighYield  Рік тому +4

      Interestingly, AdoredTV also touched on this in his last video :)

    • @gameoverman9610
      @gameoverman9610 Рік тому

      @@HighYield It is also the delivery, a bit more grounded. When I am more in a relaxed mod I can follow your subjects, but for high energy delivery I feel AdoredTV. Both has its place.

    • @markvietti
      @markvietti Рік тому

      jim shows more of his feels towards the product manufactures.

    • @maynardburger
      @maynardburger Рік тому

      Way better than AdoredTV. Honest, not manipulative, not defensive, not trying to pretend he's some major insider with advanced knowledge of products.

  • @skywalker1991
    @skywalker1991 Рік тому +1

    Can AMD stack l3 cache , like 4 layers or more . CPU with 1 GB L3 cache could be change how game engines be designed to take advantage of large cache.

  • @IncapableLP
    @IncapableLP 7 місяців тому

    0:36 - No, this shows, that memory cells scale really badly with newer process-nodes, which is a huge issue at the moment.

  • @mirkomeschini80
    @mirkomeschini80 Рік тому

    Why not put the L3 on the io die, and 3d vcache on top of it, instead of compute chiplets? Another solution could be stacking all them on top of the iodie, so io die (with L3) on bottom, vache and ccd's on top. Is it possible?

  • @jack504
    @jack504 Рік тому

    How about moving the cache chip next to the core chip, i.e. no stacking. Similar to 7900xt(x) design. Might result in a slower connection, L4 cache maybe? Would heat problem but need a more complex interposer

    • @hansolo8225
      @hansolo8225 7 місяців тому +1

      That would increase the latency, reducing the performance.

  • @NootNoot.
    @NootNoot. Рік тому

    Hey, hopefully you get better soon! As I think we've discussed before, on figuring out what node 2nd Gen V-cache was on whether it was 5nm on 5nm or 6nm on 5nm process, I was quite surprised they were still using the same node. Although, this is on par with AMDs design/manufacturing philosophy. Once again, AMD has decided to cleverly think of new ways to scale the cache with TSVs as you mentioned, and I think Zen 3s and Zen 4s similar design has greatly contributed to that fact. Their 'min/max' strategy is very strategic and benefited them greatly thus far.
    Now onto the future with 3rd Gen 3D Cache, I'd like to think once again AMD will take a conservative approach, either that may be another clever design change, or something else entirely. With Zen 5, (and correct me if I'm wrong), the design and architecture will be different to all iterations of Zen thus far. They have probably streamlined cache stacking, accounting for TSV layout and other things. With this change to the architecture thus the layout (CCDs/CCxs), engineering changes to the 3D$ itself won't be radical (or at least be expensive for a new design/manufacture), maybe a change to another process node (7nm on 3nm just sounds too impressive to pull off). These are just tin foil hat thoughts and may not even work as I mentioned lol, but I think AMD may surprise us once again with another clever design that will first, maximize efficiency and profits.

  • @dazza1970
    @dazza1970 Рік тому +4

    Im no tech designer or expert.. but if the 3d cache becomes bigger than the ccd chiplet then why not just increase the ccd size by making it a single 16 core as apposed to the current 8 core and so smaller nodes make it smaller.. but doubling the cores will give you a larger overall space to put the cache, and i know we dont need a 16 core chiplet yet.. but it would delete the infinity fabric latency times.. unless AMD go for broke and do a 32 core cpu for desktop.. which would be mad.. but amazing too..

    • @Centrioless
      @Centrioless Рік тому

      Latency is a problem

    • @wawaweewa9159
      @wawaweewa9159 Рік тому

      ​@@Centrioless less than what it is now

    • @hansolo8225
      @hansolo8225 7 місяців тому

      Increasing the chip area drastically reduces the wafer yield. Translation: Smaller chips a much cheaper to produce than larger chips.

  • @Psychx_
    @Psychx_ Рік тому +1

    If shit hits the fan, the CPU and cache chiplets can always be put onto an interposer and the cache used as L4 instead of L3 in order to reflect the added latency. As long as it provides sufficiently lower latency than DDR5, there'll still be performance benefits.

  • @RM-el3gw
    @RM-el3gw Рік тому

    thanks for the insight

  • @greebj
    @greebj Рік тому +1

    I think we'll just end up with logic dies more separate from cache. Data will be moved over links between a tiny logic die on cutting edge node, and vcache dies on cheap nodes stacked stories high. It's all about cost and profit and datacenter wants wide and efficient and doesn't care about hot single thread throughput, so heat transfer through stacked silicon will always be an afterthought with a patch job to allow consumer parts to clock a bit higher.
    I think that's more likely than pie in the sky concepts like through chip watercooling. The maths on moving enough water at sane pressures through the heat dense core logic just doesn't check out at all, and I can't imagine the feat of design and engineering it will be to route all those tiny water channels so close to electrical wires with perfect reliability.

  • @wawaweewa9159
    @wawaweewa9159 Рік тому +1

    Even though 5nm vcache would provide little perf benefits, wouldn't it allow the vcache to be made thinner and thus less thermally constraining?

    • @kognak6640
      @kognak6640 Рік тому

      There's very little heat produced in L3 cache in first place, it doesn't matter how much Vcache chip reduces thermal conductivity. Bulk of the heat is produced in cores and there's just blank silicon pieces on top of them. Unless material of blanks are changed, there's not much AMD can do.

  • @craighutchinson1087
    @craighutchinson1087 Рік тому

    Great video

  • @alb.1911
    @alb.1911 Рік тому +1

    Thank you. 🙏

  • @klaudialustig3259
    @klaudialustig3259 Рік тому

    Gute Besserung!

  • @ChristopherBurtraw
    @ChristopherBurtraw Рік тому +1

    I'm so confused now. Is the L3 on the base die, or on the chiplet? Or is it on both, essentially doubling it? I can't find the info clearly online, it's either too general overview of 3D V cache, or too detailed for someone with my limited expertise on the subject. Also, why is it called "V" Cache?

    • @HighYield
      @HighYield  Рік тому +3

      The base die has the CPU cores with 1MB L2$ each and all eight cores share 32MB L3$.
      Then the 3D V-Cache chiplet adds another 64MB of L3$ on top. So in total its 32MB (base die) + 64MB (cache chiplet) = 96MB L3$.

    • @ChristopherBurtraw
      @ChristopherBurtraw Рік тому +1

      @@HighYield thank you! I recall learning about it from a previous video, I must have just forgotten! Do you know why it is called "V" cache?

    • @kirby0louise
      @kirby0louise Рік тому +3

      @@ChristopherBurtraw V is short for Vertical, because the V-Cache is literally directly above the conventional 2D cache. They are building vertically instead of horizontally

    • @ChristopherBurtraw
      @ChristopherBurtraw Рік тому

      @@kirby0louise thank you so much, it all makes way more sense

  • @Tainted-Soul
    @Tainted-Soul Рік тому

    if de lidding drops the temp by 20deg C why not sell the chips without a lid and get EK to make a mount watercooling block they they pre fit to the chip. they still can have AIO stuff as in quick connectors a pump that either inline or on the rad but without the think lid.
    or they could build in micro heat pips that run to the lid. also thought about putting the cache on the bottom and cpu chips on top.
    the future looks good

  • @BaBaNaNaBa
    @BaBaNaNaBa Рік тому

    Also why not stack CPU CCX above 3D VCache?

  • @asmongoldsmouth9839
    @asmongoldsmouth9839 Рік тому +3

    I just did my research on the 3D v-cache for the 7000 series procs. There is no problem with the v-cache. It performs as well as everyone expected it to.

  • @lunamiya1689
    @lunamiya1689 Рік тому

    it will be interesting if intel releases a consumer grade stacked cache using EMIB.

  • @chriskaradimos9394
    @chriskaradimos9394 Рік тому

    awesome video

  • @theminer49erz
    @theminer49erz Рік тому

    AH!!! HOW DID I MISS THIS?!! UA-cam has been recommending almost exclusively crap I would never watch or stuff that I have already watched multibe times. Fantastic algorithm you got the google! I had to check when I thought "why haven't I seen any videos from [you] in a while, I hope he is ok!" and searched for you. It didnt even come up in the auto fill. Thats lame!! Great video!
    I don't know enough about such engineering to suggest a way to do it. However perhaps down the road, maybe not even that long with AI helping out, they could find a way use photons/optical data transmission to link chips/chiplets horizontally adjacent. I could see that at least being faster or at least as fast as physical connections. Although I will admit I wouldn't be surprised if the the process to do so would slow it down and am just describing a type of Quantum Computer.
    I am a little Bummed that RDNA3 didn't have as much of a jump as I was hoping for, but that is on me. I set those expectations. I am going to get one still though. I have come to terms with the reality of the situation. In fact I think it is more important to support them now than ever. It's the first chiplet based GPU. There was absolutely no real world data on how it would perform because of that and plenty of potential for unforeseen issues to arise. If I (we) want to have a card perform well thw way we use them, then we need to get one and use it that way so that they can get the data needed to improve the next one. If I recall correctly, there was at least one physical hardware issue with the launch cards too. Maybe if we are lucky, we will see a refresh like the 6050 series with fresh chips that are redesigned to fix that problem. There also seemed to be some performance left on the table too according to the specs, so maybe some driver updates after they have enough real world usage data will be available to help use more of the potential? Idk I am optimistic though that AMD will prevail. Not vs Intel/Nvidia, I couldn't care less if they "win", I just lile them as a company and like to see them create awesome stuff! I even really like their "failed" products like the old AM2+ APUs and especially their GX 9590!! The 9590 catches a lot of crap but I'm sure 90% of that is based off of standard benchmarks or YT reviewer regurgitation. I admit it is a touchy chip, but with the right configuration and use case, it is actually a really nice chip! I have my old one running my "offline home assistant"/Server and it is fantastic! I can run AI for my video surveillance, automated greenhouse environmental controls/hydroponics systems, and chicken coupe!! Granted it is leveraging my two 8GB MSI RX480s to do a lot of that, but even when I'm running a game server, streaming plex to multible devices, and downloading a file, there is absolutely no sign of lag or any other issues. I'm sure many can do that, but I appreciate it's ability to do that after like 6 years of gaming and general use.
    Sorry I digress as per usual. I'm glad you didn't dissappear and hope yiur annoying cough goes away asap! Looking forward to your next video and YT better let me know this time!!! Your's is one of the very few channels I watch as soon as possible after I see it uploaded! I'm glad others are picking up on the quailty of thw content as well! Be well! And good job on the Leppards Deutschland!!!

  • @VoldoronGaming
    @VoldoronGaming Рік тому +1

    Seems to be fix for this is more cores in the CCX so the sram doesn’t outgrow the ccx chiplet.

    • @maynardburger
      @maynardburger Рік тому

      You dont really need more cores, but you can keep up the die size by making each core much wider/more powerful(at a given process node).

  • @MaxIronsThird
    @MaxIronsThird Рік тому

    Inverting the CPU configuration is a really good idea.

    • @winebartender6653
      @winebartender6653 Рік тому

      Largest issue I see with that is noise, capacitance issues and voltage stability.
      AMD already bins the CCD quite heavily for the vcache chips to be able to maintain decent clock speed at lower voltage thresholds. Flipping the stack would makes this even more important or lose more clock speed/core performance.

  • @Gindi4711
    @Gindi4711 Рік тому

    If SRAM does not scale anymore with N3/N2 etc. the 32MB L3 on the CCD do not get smaller any more so the 64MB VCache on top do not need to get smaller either.
    If AMD decides they need more L3 (for example to support 16k cores per CCD) they will probably need to increase L3 on both their VCache and non VCache lineup.
    With 48MB L3 on CCD and 96MB on top it will still work.
    But as the price gap between leading edge and N7 increases further AMD will want to move to a design with no L3 on CCD and everything stacked on top and this is where things get complicated.
    What I am wondering:
    In the long term I see AMD using a Meteor Lake like approach:
    .) Having an active base tile (N7) for fast and energy efficient die2die communication and put all L3 there.
    .) Compensate additional L3 latency by increasing L2 per core.

  • @winebartender6653
    @winebartender6653 Рік тому

    I do not think we will see compute die size itself shrinking all that much regardless of node shrinkages. The CCD is already quite small in relative terms. With the current unified l3 design, I imagine they could/would add more l2 and l3 on the compute die itself.
    Then there is the option to add more cores per ccd, add more complex units (Instruction Accelerators as an example) or widen other areas of the chip (larger infinity fabric interconnect for increased bandwidth as an example).
    There is also the option of moving towards an hbm/mcd or Intel style interposer/chip let type package.
    There are a vast amount of options out there. Sticking strictly to your points based on CCD size, I don't think it holds much water. Take a look over the past two decades of consumer die sizes and node shrinks and you'll find that things just become more complex and utilize more die area that just shrinking the die more and more.

  • @AdrianMuslim
    @AdrianMuslim Рік тому +1

    Will 7800X3D age poorly or wont be future-proof because of low clock speed/single core performance? (For gaming)

    • @maynardburger
      @maynardburger Рік тому

      No. Loss of 400Mhz or so isn't nothing, but it's a very limited reduction in performance that will remain relatively constant no matter what going forward. Older processors usually become outdated through lack of features, instructions, general IPC deficits and raw core/thread scaling, not through a minor disadvantage in clock speeds like this.

    • @AdrianMuslim
      @AdrianMuslim Рік тому

      @@maynardburger X3D or intel, which will be more future-proof in gaming and which will be faster in the long run?
      .

  • @junofirst01
    @junofirst01 9 місяців тому

    Just put some dumb silicone aside future smaller cores which the first floor cache can sit on. This will increase the distance to the cache but should solve heat dissipation.

  • @davidgunther8428
    @davidgunther8428 Рік тому

    They could put the cache chiplet under the logic chiplet, then the cores would be closest to the heatsink.

  • @MasterBot98
    @MasterBot98 Рік тому

    What do you think about a hypothetical 7600x3d?

    • @HighYield
      @HighYield  Рік тому +1

      Would be a great gaming CPU, but as you can see from the 7900X3D, it doesnt quiet match the 8-cores.
      Since AMD already has 6C-X3D chiplets for the 7900X3D I think the only reason they dont offer a 7600X3D is market segmentation.

    • @MasterBot98
      @MasterBot98 Рік тому

      @@HighYield id love 7600x3d if it was higher clock speed than 7800x3d

  • @lovelessclips433
    @lovelessclips433 Рік тому

    I read the title "it is too big?" My girl reply from the other room. "Not even close"

  • @DanielLopez-cl8sv
    @DanielLopez-cl8sv Рік тому

    I was wandering why don’t they stack the 3D V-Cache underneath the chiplet

  • @mrlk665
    @mrlk665 Рік тому

    I think if they can make a 3D silicon base it can slove the problem or make the 3d cache on separate die like io die

  • @RobBCactive
    @RobBCactive Рік тому

    Cool! I couldn't see why they would use 5nm in that channel poll/answers, but that they're STILL using 7nm for the cache is hilarious! Imagine if you're involved in the "fastest game GPU" and you're overtaken by someone using such a mature and unfashionable process, even the IOD has moved to 6nm!
    Overall I'm pretty relaxed about this, already in Zen4 the heat density of the cores is causing enthusiast comments, "the chiplets are close together, bad for thermals" despite the IOD often being the truly hottest part of the die under many workloads, then there's der8auer's claims on the thicker IHS being "a mistake" and if you use his delidding tool and direct die cooling your thermals improve. Yet OTOH review journalists tried out using the Wraith box coolers with Zen4 and were amazed at how little the performance impact was, though the chips were running hotter than spec. Recently der8er had an Intel engineer in who actually works on the sensor placement and power management for boosting who explained why thermal targets have become a thing, because if you're not aggressively boosting you're leaving performance behind on the table.
    Now for me, the future involves power constraints, I just don't see an i9 13900K as superior even if it's slightly faster than an Ry 9 5900x3D using half the power.

  • @builtofire1
    @builtofire1 Рік тому +1

    i dont care about 0.5GHz boost, i care more that the L3 cache is per CCD, if the thread scheduler will send the thread to another CCD, this will have huge performance impact. so all this heat dissipation problems are not that important compared to threads constantly changing CCDs.

  • @coladict
    @coladict Рік тому +1

    On-die water cooling sounds like a disaster in the making.

  • @shanent5793
    @shanent5793 Рік тому

    3D V-cache on the compute die is a dead end. It's more appropriate to put it on the IO die, or altogether separate. AMD had per-die SDRAM controllers on Zen "Zepplin," causing extreme NUMA effects. "Rome" rolled this back into the IO die, reducing the NUMA levels. The current architecture can't scale synchronization bottle-necked applications past 8 cores, especially when inter-die latency is slower than DRAM access. This has become apparent even on their client products like Ryzen 9.
    Physical proximity only indirectly constrains latency, at 8 inches (20cm) per nanosecond, distance is a minor contribution to latencies in the range of 40-200ns. Most of the delay takes place in the encoding, decoding and clock recovery at each end of the link, while distance also impacts the energy required to charge and discharge parasitics. Photonics is one way to remove the distance dependent energy cost, and it is already ubiquitous in networking. Take a look at what Xilinx is doing with separate dies for 28/56G transceivers, or Intel Si photonics, to see where this is all headed.

  • @heinzbongwasser2715
    @heinzbongwasser2715 Рік тому

  • @SevenMilliFrog
    @SevenMilliFrog Рік тому

    great content. would be nicer if u add english subs

  • @oskansavli
    @oskansavli Рік тому

    Why are they putting the extra cache on top in the first place? This is a desktop CPU so why not just put them side by side and make it slightly bigger?

    • @HighYield
      @HighYield  Рік тому +1

      Latency, if you put it to the side it wouldn't retain the L3 cache speed.

  • @Stopinvadingmyhardware
    @Stopinvadingmyhardware Рік тому

    GaN is going to be awesome

  • @anonymouscommentator
    @anonymouscommentator Рік тому

    while i am very impressed by the gains that 3d vcache has made, i have to say i find it to be a rather "lazy" approach. amd's "just slap more cach on it" has a similar vibe to intels "just turn up the power". in fact both are running into their thermal limit though i have to admit that even though it is not easy, the chiplet approach is way more sustainable in the long run and will most likely be the standard going forward.
    personally, i see vcache much more of a built in accelerator as it only helps in games and even regresses the multicore performance a bit (while drastically reducing the power needed). it doesnt really feel the same as ipc/clockspeed improvements through architectural changes to the core design.

    • @maynardburger
      @maynardburger Рік тому

      The idea that it only helps in games is patently false. In fact, a large majority of Vcache stacks getting made will be going to Epyc processors, not Ryzen. There are plenty of different workloads that benefit from more L3, it's just not *quite* as universal as something like higher clocks or something. I'd even go as far to say that AMD are being stingy by NOT going with Vcache as standard. The amount of workloads that benefit more from an extra 400Mhz than the L3 is not actually that big, and certainly the performance disadvantage is pretty small even in such situations. But AMD is desperate to retain every drop of performance potential possible in order to keep up against Intel.

    • @anonymouscommentator
      @anonymouscommentator Рік тому

      @@maynardburger we dont have to kid ourselves. finding a productivity benchmark where the 7950x3d is notably faster than the 7950x is very, very rare. generally, the x3d variant is a few percent slower than the normal one. sure there are exceptions but its faaar from the norm.
      as to why amd is putting them on epyc: 1. large servers often run programs custom built for them. this means that such a program could very well use the vcache even though i cannot. 2. vcach drastically reduces the power needed to a point where it close to halves it. this is huge for datacenters. 3. maybe on a $10.000+ cpu amd can afford to put in more cache which only costs them a couple of bucks as explained in this video.

  • @BAJF93
    @BAJF93 Рік тому

    Silicon on both sides of the PCB? That's one more unreasonable workaround.

  • @TGC1775
    @TGC1775 Рік тому

    I struggle to keep my 5800x3d cool. I can’t imagine a 8800x3D if they don’t fix the heat.

    • @stebo5562
      @stebo5562 Рік тому

      Got mine on a basic hyper 212, no problems with cooling. What cooler are you using?

    • @hansolo8225
      @hansolo8225 7 місяців тому

      Undervolt your cpu and use a liquid cooler, mine never goes above 65 degrees under full load.

  • @dr.python
    @dr.python Рік тому +2

    The trend will be more cache taking up die space in the future, but 3D cache isn't the solution as it has inherent heat dissipation flaws, best isn 2.5D like FinFET.
    We need to find a better and more sustainable way to design chips, shifting to RISC architecture like ARM would be a good first step.

  • @arthurgentz2
    @arthurgentz2 Рік тому

    Excellent video. Though, would you consider removing the very distracting 'rap beat' background music in the next video please?

    • @maynardburger
      @maynardburger Рік тому

      Nah it's fine. A bit of background ambience helps a lot. Or if your issue is that it's a 'rap beat' specifically, then well, we all know what you really want to say.

    • @arthurgentz2
      @arthurgentz2 Рік тому

      @@maynardburger Debased thinking on your part, and i believe you can do better than that considering we're here on this channel. Anyway, i am not so young as to keep up to date with the current rap 'scene', thus have no idea what one would call that 'rap beat', hence my referring to it as 'rap beat'. The way it rattles in the ear is not conducive to 'ambience', as you've put it. There are plenty of better background noise out there to choose from, i merely suggest that he consider them next time.

  • @ejkk9513
    @ejkk9513 Рік тому

    The problem is that the x86 instruction set is at the limit of what we can do with it. AMD tripled the L3 cache of the Zen 4 chips, and all we got was at best 15% better performance. Look at what Apple did with their M1Arm chips. It's incredible how efficient and powerful the RISC based instructions set is. I hate Apple... but M1 is brilliant. Imagine that scaled up to P.C! They're still making regular, large performance increases while retaining the amazing efficiency it is known for. I know Intel and AMD are aware and nervous about an ARM future. The problem will be compatibility. All this x86 code will have to be rewritten, or they will have to use a compatibility layer like Rosetta. The obvious problem with that is the performance degradation rendering the performance increases inert. If they can introduce a compatibility layer that won't degrade performance... Intel and AMD will be in big trouble. X86 is bloated and far too inefficient for use going forward. It's a dead end. Intel and AMD know that. All they're doing is blasting the power draw and keeping it on life support.

  • @aeropb
    @aeropb Рік тому

    nice video but ccd is core complex die and ccx is core complex

  • @christophermullins7163
    @christophermullins7163 Рік тому

    Seems like the average person sleeps on AMD. Their cpu team is very clever and have been hitting it out of the park regularly for a while now.

  • @bernddasbrot-offiziell7040
    @bernddasbrot-offiziell7040 Рік тому

    Hi, ich komme auch aus Deutschland

  • @mrdali67
    @mrdali67 Рік тому

    Definetly think they will NEED ondie water cooling if they are going to go further in stacking cache and ccd's on top of each other to solve the heat dicipation problems. Stacking is propably still the solution for the forseeable future to make CPU's and gpu's more powerfull. They have been talking about bio chips for about 4 decades and nothing makes me believe they are any close to achieving breakthru in this department. It has been about 40 years since I read an article in a science magazine in the early 80's during my teens about bio chips that ran on sugar water and was based on a bio neural network and it still seems more fiction than science 40 years later. I don't really get why a company like Intel still refuses to see they are bangin their head into a brickwall to keep the brute force tactics of forcing higher clocks onto a monolithic die using huge ammount of power and creating more problems for them self each chip generation trying to compete.

  • @asdf_asdf948
    @asdf_asdf948 Рік тому

    This is completely incorrect. SRAM cells do not shrink the same amount as overall logic going from 7nm to 5nm. Therefore the L2/L3 areas of the compute chiplet will not shrink beyond the 3d cache chiplet.

    • @HighYield
      @HighYield  Рік тому +1

      I don't know what you are talking about, but the fact that SRAM scales worse than logic is literally the entire basis for this video.
      There is still a lot of logic on the CPU chiplet, which continues to scale down in size and as a result, the entire CCD will get smaller, while the L3D will stay a similar size.
      Especially for future versions, where you want more cache on the L3D chiplet.

    • @asdf_asdf948
      @asdf_asdf948 Рік тому

      @@HighYield your main contention in the video is that the L2/L3 area of the compute chiplet will shrink... hence the 3d chiplet of the x3d is too big to fit over it. That is completely incorrect as you yourself acknowledge that SRAM does not shrink along with logic

    • @longdang2681
      @longdang2681 Рік тому +2

      @@asdf_asdf948 Currently the 64MB V-cache is put over the 32MB L3 cache + other electronics logic. Whilst 32MB(of the 64MB) V-cache will still sit nicely over the underneath 32MB of L3 cache. The remaining 32MB V-cache will likely have an overhang problem in future models, as the electronics logic underneath it will shrink faster in area taken.
      I don't think it will be a big problem as AMD can simply use the area for additional logic, or more L2/L3 cache.

    • @maynardburger
      @maynardburger Рік тому

      @@asdf_asdf948 There's still room to shrink the current on-die SRAM cells for the compute chiplets. These Vcache chiplets are testament to that with their custom, higher density memory design which is more likely at the limits of what can be achieved(on 7nm). It is definitely becoming much harder, though.

  • @ChadKenova
    @ChadKenova Рік тому +1

    Love my 7950x3d so far was on a 5950x 3080ti then got a 4090 so i picked up a 5800x3d when it got $320 and now picked up a 7950x3d when it launched at microcenter. Running 32gb’s 6200mhz cl30 ram and a b650e aorus master I see no point for 95% of the people to get a 670 or 670e this time.

  • @wolfiexii
    @wolfiexii Рік тому

    Wake me up when I can get the cache on both CCD - it's a waste right now because it turns off the second CCD to ensure games run on the high cache load. They need to fix this silly nonsense.

    • @maynardburger
      @maynardburger Рік тому

      I agree they should have just put Vcache on both and taken the clock hits. Yes, there would be a few workloads that might have like 5% less performance than the normal Zen 4 products, but so what? People clearly are buying the Vcache option because they think the benefits will be much bigger in higher priority workloads(for themselves).

    • @Centrioless
      @Centrioless Рік тому

      ​@@maynardburger that will make the product a zero sum game (with higher price tag), since 3d cache also only gives you ~5% performance increase

  • @BatteryAz1z
    @BatteryAz1z Рік тому +1

    5:23 CCD = core complex die.

    • @HighYield
      @HighYield  Рік тому +1

      AFAIK, CCX = "core complex" & CCD = "CPU Compute Die", at least according to AMD's own ISSCC slides.

  • @5poolcatrush
    @5poolcatrush Рік тому +1

    I wonder why they've made a frankenstein's monsters of 7900~ and 7950~ X3Ds maing cache only on one die. I believe guys who aim for such top end processors expect to get full pack not some crippled thing. And gamers would be perfectly happy with 5800X3D anyways. Being AMD fan myself, i absolutely hate such intel-like behavoir from AMD.

    • @GustavoNoronha
      @GustavoNoronha Рік тому +2

      Some would argue that this is actually uncrippling it, as you get one set of 8 cores that can clock higher, so you get the best of both worlds in such a system if you have the proper scheduling (which is early days, but will evolve). I think the use cases that would benefit more from having the cache on all CCDs lean more towards workstation, so we will likely see a ThreadRipper3D at some point.

    • @5poolcatrush
      @5poolcatrush Рік тому

      @@GustavoNoronha thing is, "proper scheduling" is a crutch and workaround needed for those cripppled things to work. Why not just deliver proper hardware that can handle the tasks on its own? I suppose thats nothing more than cursed modern marketing trends that affect products in weird ways rather than selling reasonable products made as they meant to be from purely technical standpoint

    • @GustavoNoronha
      @GustavoNoronha Рік тому +2

      @@5poolcatrush it's not a crutch if you can really benefit from both high cache and high frequency. If you think like that you could say that SMT is a crutch as well for not being able to add more proper cores. But it's in fact a design decision, weighing a lot of different trade-offs.

    • @blkspade23
      @blkspade23 Рік тому

      @@5poolcatrush Most of the things you'd genuinely need more than 8 cores for would benefit from the higher frequencies and not the additional cache. Not even all games will use the additional cache. It makes sense to also care about games while having many other uses for a 16 core. You could already see from how the 5800X3D is worse in so many common applications compared to the 5800X, that the hit from dual v-cache wouldn't make sense. Having a die that can clock higher is a good thing.

  • @jelipebands1700
    @jelipebands1700 Рік тому

    Come on Amd put this in a laptop

  • @markvietti
    @markvietti Рік тому

    I hate these New chips ... they run way too hot...even on water.

    • @agrikantus9422
      @agrikantus9422 Рік тому

      You mean the 7950x3d and 7900x3d?

    • @greebj
      @greebj Рік тому

      Because heat killing CPUs is a thing? (It isn't, the MTBF of CPUs at TJmax is still orders of magnitude less than the period at which they offer relevant performance at stock voltages. Intel and AMD know this, that's why they have exploited the headroom that was once used for an easy overclock, as "turbo")

  • @pinotarallino
    @pinotarallino Рік тому +1

    Hey stepCPU, you V-cache is soo big...