Quantization in Deep Learning (LLMs)

Поділитися
Вставка
  • Опубліковано 12 лис 2024

КОМЕНТАРІ • 19

  • @MojtabaJafaritadi
    @MojtabaJafaritadi 6 місяців тому +3

    Thanks for this clear and easy explanation of Quantization in NNs.

    • @AIBites
      @AIBites  6 місяців тому

      glad u liked 😊

  • @Ram-oj4gn
    @Ram-oj4gn Місяць тому

    The quantisation of changing number format applies only to the result of activation function or also to the individual weights ? Where we apply this quantisation in the NN

  • @mashood3624
    @mashood3624 Рік тому +2

    A good comprehensive video......good work. I liked the related links you added in description. A tiny recommendation: Please increase the speed of speaking as there were many secs breaks between topics. Thank you and looking forward for more content.

    • @AIBites
      @AIBites  Рік тому

      Thank you so much 😊

  • @mdnghbrs1283
    @mdnghbrs1283 10 місяців тому +1

    I just have a question about the quantisation on tensorflow. For a project of mine i used QKeras library for QTA, the weights that i got in the end the were pretty large numbers (here speaking more about the volume like for example 0.235215266523415e-2). On the qunationzation config i used int8 and that number is not representable for int8 format.
    Does the tranining still happen in fp32 but the quantisation is treated as noise?
    Also what do i do to get the weights to be representable in int8 format?
    How to test the accurracy of the weight quantised model?

    • @AIBites
      @AIBites  9 місяців тому

      Hey, I almost always use PyTorch. So will try my best to help you with TF.
      Its normal for the weights to be large numbers like you have said if the weights are in fp32. If the number cannot be quantized into int8, the training could collapse as it could be rounded off to an extreme like 0. So if too many numbers get rounded off like this, gradients will collapse leading to training hitting a wall.
      To test, you can run inference the same way as you do on your eval or test set. What is stopping you doing that?

  • @abuali5513
    @abuali5513 9 місяців тому +2

    Thank you for the informative content. Is it possible to combine pruning and quantization while maintaining accuracy?

    • @AIBites
      @AIBites  9 місяців тому +1

      Fantastic question. I feel it should be possible as they do two different things to the weights, though I have never tried both.
      First prune and get rid of the unnecessory params. Then quantize what is left :)

  • @SreeramAjay
    @SreeramAjay Рік тому +1

    Thankyou, it was really clean and clear explanation that too in a short time. 👏

    • @AIBites
      @AIBites  Рік тому +1

      oh thanks for the encouraging words. Helps me keep going :)

  • @vishalchovatiya1361
    @vishalchovatiya1361 3 місяці тому +1

    very well explained.

    • @AIBites
      @AIBites  2 місяці тому

      thank you Vishal! :)

  • @Techiiot
    @Techiiot Рік тому

    very good explanation,please make a video on how to calibrate the data and compute scaling factor and zero point by analysing the weight distribution of each layer for Int8 quantization in tensorflow tensorrt, also the role of fake quantizers during backpropagation

    • @AIBites
      @AIBites  Рік тому

      Thanks for the nice suggestion. There are courses on Quantization these days. So wasn't able to cover everything or deep dive into specifics :)

  • @davidcmoffatt
    @davidcmoffatt 3 місяці тому +1

    Just start watching but... signed bye is -128..127 not 127..127. google 2's complement to see why.

    • @AIBites
      @AIBites  2 місяці тому

      sorry, thats an embarrassing errata! and good spot. Thanks a lot! will keep it up for next time.

  • @nazeem35
    @nazeem35 11 місяців тому

    Thanks!

    • @AIBites
      @AIBites  10 місяців тому

      Thanks so much for the monetary reward! Very encouraging 🙂