Great talk. He covered the power and the bandwidth issues and tried to improve from density point of view. Sometimes, we divide things and focus on the limited area, but forget the big picture.
Realize it’s a computer architecture talk meaning there’s possibly a good question on another network architecture shift like transformers, which would be another paradigm shift
A lot of useful information, but he is focusing on inference compute density, while the actual bottleneck is dram bandwidth. You will hardly get 10% inference compute utilization on the best hardware, even when maxing out practical batch size. Headline flops number is eye catching, but they have to be more honest about real usage.
Wrong. He's talking about in a serving setting, where you'll have N users querying your service at any one time. If N is large enough (I'm talking 10^3), the problem becomes compute bounded again!
@@BlockDesignz True, but in practice no one has large enough batch size and compute bounded. My critique is they grew compute way more than they grew memory bandwidth.
Great talk. He covered the power and the bandwidth issues and tried to improve from density point of view. Sometimes, we divide things and focus on the limited area, but forget the big picture.
great hosting, talk and questions. thanks for uploading it
This man deserves a congressional medal of freedom award.
Realize it’s a computer architecture talk meaning there’s possibly a good question on another network architecture shift like transformers, which would be another paradigm shift
A lot of useful information, but he is focusing on inference compute density, while the actual bottleneck is dram bandwidth. You will hardly get 10% inference compute utilization on the best hardware, even when maxing out practical batch size. Headline flops number is eye catching, but they have to be more honest about real usage.
Wrong.
He's talking about in a serving setting, where you'll have N users querying your service at any one time.
If N is large enough (I'm talking 10^3), the problem becomes compute bounded again!
@@BlockDesignz True, but in practice no one has large enough batch size and compute bounded. My critique is they grew compute way more than they grew memory bandwidth.