#77 - VITALIY CHILEY (Cerebras)

Поділитися
Вставка
  • Опубліковано 20 тра 2024
  • Patreon: / mlst
    Discord: / discord
    Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.
    Pod: anchor.fm/machinelearningstre...
    [00:00:00] Housekeeping
    [00:01:08] Preamble
    [00:01:50] Vitaliy Chiley Introduction
    [00:03:11] Cerebrus architecture
    [00:08:12] Memory management and FLOP utilisation
    [00:18:01] Centralised vs decentralised compute architecture
    [00:21:12] Sparsity
    [00:22:35] Does Sparse NN imply Heterogeneous compute?
    [00:28:09] Cost of distributed memory stores?
    [00:29:48] Activation vs weight sparsity
    [00:36:40] What constitutes a dead weight to be pruned?
    [00:36:40] Is it still a saving if we have to choose between weight and activation sparsity?
    [00:39:50] Cerebras is a cool place to work
    [00:42:53] What is sparsity? Why do we need to start dense?
    [00:45:24] Evolutionary algorithms on Cerebras?
    [00:46:44] How can we start sparse? Google RIGL
    [00:50:32] Inductive priors, why do we need them if we can start sparse?
    [00:54:50] Why anthropomorphise inductive priors?
    [01:01:01] Could Cerebras run a cyclic computational graph?
    [01:02:04] Are NNs locality sensitive hashing tables?
    References;
    Rigging the Lottery: Making All Tickets Winners [RIGL]
    arxiv.org/pdf/1911.11134.pdf
    [D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet
    / d_dannet_the_cuda_cnn_...
    A Spline Theory of Deep Learning [Balestriero]
    proceedings.mlr.press/v80/bal...

КОМЕНТАРІ • 17

  • @Mutual_Information
    @Mutual_Information Рік тому +18

    It’s good to hear there will be more “Netflix pedagogical” type of content. I know from experience it’s much harder to produce but it’s what sticks around for the long term. Looking forward to it.

    • @PremierSullivan
      @PremierSullivan Рік тому +2

      I respectfully disagree. Unplugged is already much higher quality than any other content (podcast etc) available anywhere on ML/AI. In MLST's case, I'd prefer more regular content rather than (even) higher quality, less regular content. Just my two cents.

  • @oncedidactic
    @oncedidactic Рік тому +7

    “That was easy *shrug* “ 😂
    Really interesting hearing from someone who is simultaneously very grounded and practical, playing with new unstudied things, and comfortable with big statements!

  • @lenyabloko
    @lenyabloko Рік тому +9

    You have consistently been the highest quality channel of information for people consistently interested in AI. Admittedly this is a very small audience on average over the long term. This separates you from packs of people following trends and bate- switching. My respect for you guys is so much greater because I don't understand the economics of doing something better with more people which is the opposite of what virtually everyone else is doing. Frankly, I expected you to dwindle in the current economic environment. Instead you seem to be flourishing without any visible compromise. I only hope that you are ahead of our time in more than technology and that your magic continues and multiplies like a new form of life. If anything is approaching a singularity in AI, you represent it for me. Good luck and many thanks.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Рік тому +3

      Thanks so much Len, this means so much to us!

    • @nomenec
      @nomenec Рік тому +1

      Thank you so much for noticing and providing your appreciation! The path we've chosen is not without difficulty and frustration. Comments like yours help keep the internal fires burning.

    • @lookwatchadone2684
      @lookwatchadone2684 Рік тому

      😈

  • @Artula55
    @Artula55 Рік тому

    Thank you! Can't wait for the next one, I'm very curious :)

  • @billykotsos4642
    @billykotsos4642 Рік тому +2

    Cerebras ! wow!

  • @tinyentropy
    @tinyentropy Рік тому

    Which episode were you referring to at the end, when speaking about the hashing table equivalent?

  • @dr.mikeybee
    @dr.mikeybee Рік тому

    It seems inefficient to always go core to core. Would a partial hub and spoke architecture make sense? Some bus lines could route to far away "hubs" in one clock cycle. You could call them wormholes. 😀 The rest could route to adjacent cores. Anyway, I'm glad you're covering this. When I mentioned Nvidia's large systems, Yann told me to look at this company, and I haven't had time.

  • @dr.mikeybee
    @dr.mikeybee Рік тому

    For sparsity, is it the case that if nothing enters a core, nothing leaves it.

  • @CandidDate
    @CandidDate Рік тому +4

    I had deja vu that ML space is exploding at the moment with Google "Chatbot" Lambda and you guys are at the forefront of the next wave of the future. Not to mention that this channel is underrated, but what history will remember is not what goes on in the background.

  • @DeadtomGCthe2nd
    @DeadtomGCthe2nd Рік тому

    What happened to the Noam episode?

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Рік тому +2

      We have been waiting all day for UA-cam to trim out a copyright Id flag segment. I'm so angry about it. I'm close to just reuploading

  • @dr.mikeybee
    @dr.mikeybee Рік тому

    Is Cerebras doing out-of-order execution?

  • @EinsteinNewtonify
    @EinsteinNewtonify Рік тому +2

    Third!