TorchScript and PyTorch JIT | Deep Dive

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

tinyML Talks: A Practical Guide to Neural Network Quantization

The IMPOSSIBLE Puzzle..

Disrespect or Respect 💔❤️

Каха и лужа #непосредственнокаха

Deep Dive on PyTorch Quantization - Chris Gottbrath

PyTorch

Переглядів 22 767

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 лис 2024

КОМЕНТАРІ • 28

@leixun 4 роки тому ⁺⁴²
*My takeaways:*
*0. Outline of this talk **0:51*
*1. Motivation **1:42*
- DNNs are very computationally intensive
- Datacenter power consumption is doubling every year
- Number of edge devices is growing fast, and lots of these devices are resource-constrained
*2. Quantization basics **5:27*
*3. PyTorch quantization **10:54*
*3.1 Workflows **17:21*
*3.2 Post training dynamic quantization **21:31*
- Quantize weights at design time
- Quantize activations (and choose their scaling factor) at runtime
- No extra data are required
- Suitable for LSTMs/transformers, and MLPs with small batch size
- 2x faster computing, 4x less memory
- Easy to do, use a 1-line API
*3.3 Post training static quantization **23:57*
- Quantize both weights and activations at design time
- Extra data are needed for calibration (i.e. find scaling factor)
- Suitable for CNNs
- 1.5-2x faster computing, 4x less memory
- Steps: 1. Modify model 25:55 2. Prepare and calibration 27:45 3. Convert 31:34 4. Deploy 32:59
*3.4 Quantization aware training **34:00*
- Make the weights "more quantizable" through training and fine-tuning
- Steps: 1. Modify model 36:43 2. Prepare and train 37:28
*3.5 Example models **39:26*
*4. New in PyTorch 1.6*
4.1 Graph mode quantization 45:14
4.2 Numeric suite 48:17: tools to aid debugging accuracy drops due to quantization at layer-by-layer level
*5. Framework support, CPU (x86, Arm) backends support **49:46*
*6. Resources to know more **50:52*
@lorenzodemarinis2603 3 роки тому ⁺³
This is gold, thank you!
@leixun 3 роки тому ⁺³
@@lorenzodemarinis2603 You are welcome!
@harshr1831 3 роки тому ⁺²
Thank you very much!
@leixun 3 роки тому ⁺²
@@harshr1831 You are welcome!
@prudvi01 3 роки тому ⁺¹
MVP!
@rednas195 5 місяців тому ⁺¹
In the accuracy results, how come there is a difference in inference speed up between QAT and PTQ? Is this because of the different models used? because i would expect no differences in speed up if the same model was used
@aayushsingh9622 3 роки тому ⁺¹
How to test the model after quantization?
I am using post training static quant
How to prepare the input to feed in this model
@MrGHJK1 4 роки тому ⁺³
Awesome talk, thanks!
Too much to ask, but it would be nice if Pytorch had a tool to convert quantized tensors parameters to TensorRT calibration tables
@ankitkumar-kg5ue Рік тому
what if want to fuse multiple conv and relu.
@ajwaus 2 дні тому
Thank you
@吴漫-y8c 4 роки тому
sorry, can you share the example code? Thank you
@raghuramank1 4 роки тому ⁺¹
Please take a look at the pytorch tutorials page for example code: pytorch.org/tutorials/advanced/static_quantization_tutorial.html
@parcfelixer 3 роки тому ⁺¹
Awesome talk, thank you so much.
@jetjodh 4 роки тому
Why not go lower than 8 bit int for quantization? Won't that be much more speedier?
@raghuramank1 4 роки тому ⁺²
Currently kernels on processors do not provide any speedup for lower bit precision
@吴漫-y8c 4 роки тому ⁺¹
Tade off between accuracy and speed
@dsagman 9 місяців тому
great info but please buy a pop filter.
@briancase6180 3 роки тому
OMG. We already have a term of the art for "zero point." It's called bias. We have a term, please use it. Otherwise, thanks for the great talk.
@ashermai2962 2 роки тому ⁺¹
The reason it's called a zero_point is so that when pre-quantized weights bring the output to zero (for RELU activation), you want to add a zero_point so that quantized weights also bring the output to zero. Also the naming of scale and zero_point distinguishes themselves from the naming of each module's weights and bias, which are different concepts
@jonathansum9084 4 роки тому
Then I am the second.
@吴漫-y8c 4 роки тому
And then I am the third
@ramanabotta6285 4 роки тому ⁺¹
First view
@motelejesuolamilekan1950 4 роки тому
Lol

Наступне

Автоматичне відтворення

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

tinyML Talks: A Practical Guide to Neural Network Quantization

tinyML Talks: A Practical Guide to Neural Network Quantization

The IMPOSSIBLE Puzzle..

The IMPOSSIBLE Puzzle..

Disrespect or Respect 💔❤️

Disrespect or Respect 💔❤️

Каха и лужа #непосредственнокаха

Каха и лужа #непосредственнокаха

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks

Inside TensorFlow: Quantization aware training

Inside TensorFlow: Quantization aware training

Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan

Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan

Introducing ExecuTorch from PyTorch Edge: On-Device AI... - Mergen Nachin & Orion Reblitz-Richardson

Introducing ExecuTorch from PyTorch Edge: On-Device AI... - Mergen Nachin & Orion Reblitz-Richardson

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

AWQ for LLM Quantization

AWQ for LLM Quantization

The Lottery Ticket Hypothesis and pruning in PyTorch

The Lottery Ticket Hypothesis and pruning in PyTorch

Inside TensorFlow: TF Model Optimization Toolkit (Quantization and Pruning)

Inside TensorFlow: TF Model Optimization Toolkit (Quantization and Pruning)

Motorbike Smashes Into Porsche! 😱

Motorbike Smashes Into Porsche! 😱

ГРИГОРІЙ ОМЕЛЬЧЕНКО: я звертаюсь до Президента Зеленського...

ГРИГОРІЙ ОМЕЛЬЧЕНКО: я звертаюсь до Президента Зеленського...

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

Мой тг: Подвал Стинта #стинт #stint #stintik

Мой тг: Подвал Стинта #стинт #stint #stintik

風船をキャッチしろ！🎈 Balloon catch Challenges

風船をキャッチしろ！🎈 Balloon catch Challenges

ДИНАМО - ПОЛІССЯ. ПРЯМА ТРАНСЛЯЦІЯ. УПЛ. 13 ТУР

ДИНАМО — ПОЛІССЯ. ПРЯМА ТРАНСЛЯЦІЯ. УПЛ. 13 ТУР

⚡️ГОРДОН: ВСЕ! Гаряча фаза війни ЗАКІНЧИТЬСЯ у 2024. Трамп запропонує УГОДУ Путіну та Зеленському

⚡️ГОРДОН: ВСЕ! Гаряча фаза війни ЗАКІНЧИТЬСЯ у 2024. Трамп запропонує УГОДУ Путіну та Зеленському

Пробую гриб за 880 000 рублей за кг

Пробую гриб за 880 000 рублей за кг