Bill Dally | Directions in Deep Learning Hardware

Поділитися
Вставка
  • Опубліковано 25 гру 2024

КОМЕНТАРІ •

  • @katechen9458
    @katechen9458 6 місяців тому +1

    Great talk. He covered the power and the bandwidth issues and tried to improve from density point of view. Sometimes, we divide things and focus on the limited area, but forget the big picture.

  • @radicalrodriguez5912
    @radicalrodriguez5912 7 місяців тому +2

    great hosting, talk and questions. thanks for uploading it

  • @Wobbothe3rd
    @Wobbothe3rd 8 місяців тому +3

    This man deserves a congressional medal of freedom award.

  • @spartaleonidas540
    @spartaleonidas540 9 днів тому

    Realize it’s a computer architecture talk meaning there’s possibly a good question on another network architecture shift like transformers, which would be another paradigm shift

  • @gesitsinggih
    @gesitsinggih 7 місяців тому +6

    A lot of useful information, but he is focusing on inference compute density, while the actual bottleneck is dram bandwidth. You will hardly get 10% inference compute utilization on the best hardware, even when maxing out practical batch size. Headline flops number is eye catching, but they have to be more honest about real usage.

    • @BlockDesignz
      @BlockDesignz 7 місяців тому +2

      Wrong.
      He's talking about in a serving setting, where you'll have N users querying your service at any one time.
      If N is large enough (I'm talking 10^3), the problem becomes compute bounded again!

    • @gesitsinggih
      @gesitsinggih 7 місяців тому +3

      @@BlockDesignz True, but in practice no one has large enough batch size and compute bounded. My critique is they grew compute way more than they grew memory bandwidth.