Deep Dive on PyTorch Quantization - Chris Gottbrath

Поділитися
Вставка
  • Опубліковано 15 лис 2024

КОМЕНТАРІ • 28

  • @leixun
    @leixun 4 роки тому +42

    *My takeaways:*
    *0. Outline of this talk **0:51*
    *1. Motivation **1:42*
    - DNNs are very computationally intensive
    - Datacenter power consumption is doubling every year
    - Number of edge devices is growing fast, and lots of these devices are resource-constrained
    *2. Quantization basics **5:27*
    *3. PyTorch quantization **10:54*
    *3.1 Workflows **17:21*
    *3.2 Post training dynamic quantization **21:31*
    - Quantize weights at design time
    - Quantize activations (and choose their scaling factor) at runtime
    - No extra data are required
    - Suitable for LSTMs/transformers, and MLPs with small batch size
    - 2x faster computing, 4x less memory
    - Easy to do, use a 1-line API
    *3.3 Post training static quantization **23:57*
    - Quantize both weights and activations at design time
    - Extra data are needed for calibration (i.e. find scaling factor)
    - Suitable for CNNs
    - 1.5-2x faster computing, 4x less memory
    - Steps: 1. Modify model 25:55 2. Prepare and calibration 27:45 3. Convert 31:34 4. Deploy 32:59
    *3.4 Quantization aware training **34:00*
    - Make the weights "more quantizable" through training and fine-tuning
    - Steps: 1. Modify model 36:43 2. Prepare and train 37:28
    *3.5 Example models **39:26*
    *4. New in PyTorch 1.6*
    4.1 Graph mode quantization 45:14
    4.2 Numeric suite 48:17: tools to aid debugging accuracy drops due to quantization at layer-by-layer level
    *5. Framework support, CPU (x86, Arm) backends support **49:46*
    *6. Resources to know more **50:52*

    • @lorenzodemarinis2603
      @lorenzodemarinis2603 3 роки тому +3

      This is gold, thank you!

    • @leixun
      @leixun 3 роки тому +3

      @@lorenzodemarinis2603 You are welcome!

    • @harshr1831
      @harshr1831 3 роки тому +2

      Thank you very much!

    • @leixun
      @leixun 3 роки тому +2

      @@harshr1831 You are welcome!

    • @prudvi01
      @prudvi01 3 роки тому +1

      MVP!

  • @rednas195
    @rednas195 5 місяців тому +1

    In the accuracy results, how come there is a difference in inference speed up between QAT and PTQ? Is this because of the different models used? because i would expect no differences in speed up if the same model was used

  • @aayushsingh9622
    @aayushsingh9622 3 роки тому +1

    How to test the model after quantization?
    I am using post training static quant
    How to prepare the input to feed in this model

  • @MrGHJK1
    @MrGHJK1 4 роки тому +3

    Awesome talk, thanks!
    Too much to ask, but it would be nice if Pytorch had a tool to convert quantized tensors parameters to TensorRT calibration tables

  • @ankitkumar-kg5ue
    @ankitkumar-kg5ue Рік тому

    what if want to fuse multiple conv and relu.

  • @ajwaus
    @ajwaus 2 дні тому

    Thank you

  • @吴漫-y8c
    @吴漫-y8c 4 роки тому

    sorry, can you share the example code? Thank you

    • @raghuramank1
      @raghuramank1 4 роки тому +1

      Please take a look at the pytorch tutorials page for example code: pytorch.org/tutorials/advanced/static_quantization_tutorial.html

  • @parcfelixer
    @parcfelixer 3 роки тому +1

    Awesome talk, thank you so much.

  • @jetjodh
    @jetjodh 4 роки тому

    Why not go lower than 8 bit int for quantization? Won't that be much more speedier?

    • @raghuramank1
      @raghuramank1 4 роки тому +2

      Currently kernels on processors do not provide any speedup for lower bit precision

    • @吴漫-y8c
      @吴漫-y8c 4 роки тому +1

      Tade off between accuracy and speed

  • @dsagman
    @dsagman 9 місяців тому

    great info but please buy a pop filter.

  • @briancase6180
    @briancase6180 3 роки тому

    OMG. We already have a term of the art for "zero point." It's called bias. We have a term, please use it. Otherwise, thanks for the great talk.

    • @ashermai2962
      @ashermai2962 2 роки тому +1

      The reason it's called a zero_point is so that when pre-quantized weights bring the output to zero (for RELU activation), you want to add a zero_point so that quantized weights also bring the output to zero. Also the naming of scale and zero_point distinguishes themselves from the naming of each module's weights and bias, which are different concepts

  • @jonathansum9084
    @jonathansum9084 4 роки тому

    Then I am the second.

  • @吴漫-y8c
    @吴漫-y8c 4 роки тому

    And then I am the third

  • @ramanabotta6285
    @ramanabotta6285 4 роки тому +1

    First view