Programming is going to be the limiting factor for hbm for the near future. Taking advantage of that bandwidth without causing a latency issue in the rest of the system is going to be a fun challenge. They spoke briefly about it with distancing of the analogy to digital components. Great explanation of the difference.
As a software guy, I agree with this. Memory has become cheaply available to the programmer, and as such, we've seen a glut of programming practices focused more on favoring the programmer's time over the end user/product. While not everything needs to be as clever as a fox, it's nonetheless painful to look at code that could be made 1000x faster with a small dose of knowledge and a few hours if not just minutes of careful thought put into it's design.
Wonderful explanatory diagram. For HBM2 - ultra Wide Bus Access , 2.5D structure with Silicon interposer , Uses a stacked Die approach. There is the issue of crosstalk and routing a 1024 bit bus. But Upside the Dynamic power of HBM2 memory is a lot lesser as the memory read/write clock is slower- P = kCV^2f. GDDR6 - 32 bit wide buses - no need for Silicon interposer . Bus capacitance is lower so driving power is more limited. GDDR6 RAM must run at 1024/32 = 32 times faster. So RAM power is higher dur to higher operating frequency. Signal routing is simpler due to 32 bit wide buses. For an FPGA based system for packet processing e.g. Achronix the GDDR6 Interface comes Bulti-in. Certain FPGA based systems have HBM2 Interface built in. I will recirculate this Video on Linkedin.
HBM3 I/O voltage on the 1024 wide bus has dropped from 1.2 to 0.4V. That 3X drop in voltage, when the V term is squared, is 9X less power. On frequency: if a bus operates at 32x higher frequency but is 1/32nd as wide, the power stays the same, right? All else being equal.
in the end hbm will probably last longer, as you will reach a point where you cant raise the speed much, and with hbm having so many more pins it will take longer for that to be a problem, the interposer problem is already being addressed by using the interposer only as a bridge between the soc and the hbm chip, thus reducing the interposer cost significantly ( intel's emib, amd has also patented the same approach) but it will take time before hbm's cost goes down as its being used in much less products
I have no formal training in tech, just random acquired knowledge over time. But, I used to work at a notable graphics card manufacturer. I knew, generally, what this dude was saying as I researched it when the AMD Fury gfx card came out and had HBM. The difference between he and I though, is my explanation wouldn't make much sense to you and he probably makes 200k more a year than I ever did.
The other part is using a 1024 bit bus implies fast parallel access so changes to the DRAM access controller o decompose the 1K wide word in some endian format into 64 bit words that can be directly used by the processor.
A GPU made with HMB memory for working memory and GDDR6 memory for a second teir of working memory and background memory. a graphics card like this would be so fast.
Great video, may I ask so what is "best" "overall" as in HB2 (HBM3 now exist?) versus GDDR6? ...Did GDDR6 catch up? I know HM2 is much rarer and thinking AMD was using it to try to "catch up" as HUGE COST to them as in when they did use it on VEGA as in VEGA 56 AND VEGA 64 now a days in 2022 is so much faster it was released as a competitor to the 1070 or in between a 1070 & 1070 ti and now I have a workstation & later released VEGA card as in the Radeon WX 8200 PRO, and it is essentially a VEGA-56 but NOW competes with the 1080 and whatever a 1080 compares with as in the 20 and 30 series just the same, AMAZING! Shame that we may never see HBM2 or even HBM3 on GPU's anymore now that RDNA architecture with AMD has caught up? .... Is there any memory that is used as system memory on Windows or least Linux or even a QNX (like) OS using some sort of HBM as its own SYSTEM MEMORY? ...It really made VEGA turn out to be GOOD in 2022 long as they keep up with the drivers as PROMISED the "Fine Wine" will be something AMD promised and a PROMISE delivered! ...
theorically it is possible, like total memory of 16GB which consists of 8GB GDDR6 + 8GB HBM2, but in the other side this would be a total waste since these 2 type had a gread differences in terms of system design, use cases, and markets.
Ummm so putting the high cost of it aside, i would like to see how a High end Phone or Tablet using HBM fares. That kind of device would be fitting for the niche market Asus, Razer have created.
why AMD didn't release the new ryzen 3 series with HBM compatibility? year 2019 and we are still using DDR and DUAL CHANNEL !!! the roadmaps show DDR5 in 2020 ...
u wont believe how hard to find vids that expalins such things so deep.
thx for the job
Programming is going to be the limiting factor for hbm for the near future. Taking advantage of that bandwidth without causing a latency issue in the rest of the system is going to be a fun challenge. They spoke briefly about it with distancing of the analogy to digital components. Great explanation of the difference.
As a software guy, I agree with this. Memory has become cheaply available to the programmer, and as such, we've seen a glut of programming practices focused more on favoring the programmer's time over the end user/product. While not everything needs to be as clever as a fox, it's nonetheless painful to look at code that could be made 1000x faster with a small dose of knowledge and a few hours if not just minutes of careful thought put into it's design.
Like he said, it's application specific. Some workload can benefit from the wide memories, others can't.
Wonderful explanatory diagram. For HBM2 - ultra Wide Bus Access , 2.5D structure with Silicon interposer , Uses a stacked Die approach. There is the issue of crosstalk and routing a 1024 bit bus. But Upside the Dynamic power of HBM2 memory is a lot lesser as the memory read/write clock is slower- P = kCV^2f.
GDDR6 - 32 bit wide buses - no need for Silicon interposer . Bus capacitance is lower so driving power is more limited. GDDR6 RAM must run at 1024/32 = 32 times faster. So RAM power is higher dur to higher operating frequency. Signal routing is simpler due to 32 bit wide buses.
For an FPGA based system for packet processing e.g. Achronix the GDDR6 Interface comes Bulti-in. Certain FPGA based systems have HBM2 Interface built in.
I will recirculate this Video on Linkedin.
HBM3 I/O voltage on the 1024 wide bus has dropped from 1.2 to 0.4V. That 3X drop in voltage, when the V term is squared, is 9X less power.
On frequency: if a bus operates at 32x higher frequency but is 1/32nd as wide, the power stays the same, right? All else being equal.
in the end hbm will probably last longer, as you will reach a point where you cant raise the speed much, and with hbm having so many more pins it will take longer for that to be a problem, the interposer problem is already being addressed by using the interposer only as a bridge between the soc and the hbm chip, thus reducing the interposer cost significantly ( intel's emib, amd has also patented the same approach) but it will take time before hbm's cost goes down as its being used in much less products
What about 100 million videogame consoles from 2020 to 2028? Microsoft have done some work on his with AMD for Q and others since 2009.
Good interview, good questions, very informative.
Very informative. Nice channel.
Where can I get a tsmc28nm HBM phy?
I have no formal training in tech, just random acquired knowledge over time. But, I used to work at a notable graphics card manufacturer. I knew, generally, what this dude was saying as I researched it when the AMD Fury gfx card came out and had HBM. The difference between he and I though, is my explanation wouldn't make much sense to you and he probably makes 200k more a year than I ever did.
The other part is using a 1024 bit bus implies fast parallel access so changes to the DRAM access controller o decompose the 1K wide word in some endian format into 64 bit words that can be directly used by the processor.
Hoping consoles adopting hbm2 2.5d routing will provide the volume and economies of scale to make it mainstream.
GREAT explanation, thank you!!
A GPU made with HMB memory for working memory and GDDR6 memory for a second teir of working memory and background memory. a graphics card like this would be so fast.
hey...........what happened to XDR2???
Great video, may I ask so what is "best" "overall" as in HB2 (HBM3 now exist?) versus GDDR6? ...Did GDDR6 catch up? I know HM2 is much rarer and thinking AMD was using it to try to "catch up" as HUGE COST to them as in when they did use it on VEGA as in VEGA 56 AND VEGA 64 now a days in 2022 is so much faster it was released as a competitor to the 1070 or in between a 1070 & 1070 ti and now I have a workstation & later released VEGA card as in the Radeon WX 8200 PRO, and it is essentially a VEGA-56 but NOW competes with the 1080 and whatever a 1080 compares with as in the 20 and 30 series just the same, AMAZING! Shame that we may never see HBM2 or even HBM3 on GPU's anymore now that RDNA architecture with AMD has caught up? .... Is there any memory that is used as system memory on Windows or least Linux or even a QNX (like) OS using some sort of HBM as its own SYSTEM MEMORY? ...It really made VEGA turn out to be GOOD in 2022 long as they keep up with the drivers as PROMISED the "Fine Wine" will be something AMD promised and a PROMISE delivered! ...
Would a hybrid of these be possible? (i.e. LM1-LM3 cache like setup.)
theorically it is possible, like total memory of 16GB which consists of 8GB GDDR6 + 8GB HBM2, but in the other side this would be a total waste since these 2 type had a gread differences in terms of system design, use cases, and markets.
Great video.
Ummm so putting the high cost of it aside, i would like to see how a High end Phone or Tablet using HBM fares. That kind of device would be fitting for the niche market Asus, Razer have created.
Great video
What's the difference in latency? I'm assuming GDDR6 has lower latency due to the higher clock speed.
why AMD didn't release the new ryzen 3 series with HBM compatibility? year 2019 and we are still using DDR and DUAL CHANNEL !!! the roadmaps show DDR5 in 2020 ...
HBM is too expensive ...
Barrows Lodge
Hitman Blood Money?
Johnson Edward Robinson David White Gary
Let me are edit this video for you... Just the audio it's killing me to watch it and hear awful audio
Smith Paul Hall Dorothy Martin Betty
Pim