The quantisation of changing number format applies only to the result of activation function or also to the individual weights ? Where we apply this quantisation in the NN
A good comprehensive video......good work. I liked the related links you added in description. A tiny recommendation: Please increase the speed of speaking as there were many secs breaks between topics. Thank you and looking forward for more content.
I just have a question about the quantisation on tensorflow. For a project of mine i used QKeras library for QTA, the weights that i got in the end the were pretty large numbers (here speaking more about the volume like for example 0.235215266523415e-2). On the qunationzation config i used int8 and that number is not representable for int8 format. Does the tranining still happen in fp32 but the quantisation is treated as noise? Also what do i do to get the weights to be representable in int8 format? How to test the accurracy of the weight quantised model?
Hey, I almost always use PyTorch. So will try my best to help you with TF. Its normal for the weights to be large numbers like you have said if the weights are in fp32. If the number cannot be quantized into int8, the training could collapse as it could be rounded off to an extreme like 0. So if too many numbers get rounded off like this, gradients will collapse leading to training hitting a wall. To test, you can run inference the same way as you do on your eval or test set. What is stopping you doing that?
Fantastic question. I feel it should be possible as they do two different things to the weights, though I have never tried both. First prune and get rid of the unnecessory params. Then quantize what is left :)
very good explanation,please make a video on how to calibrate the data and compute scaling factor and zero point by analysing the weight distribution of each layer for Int8 quantization in tensorflow tensorrt, also the role of fake quantizers during backpropagation
Thanks for this clear and easy explanation of Quantization in NNs.
glad u liked 😊
The quantisation of changing number format applies only to the result of activation function or also to the individual weights ? Where we apply this quantisation in the NN
A good comprehensive video......good work. I liked the related links you added in description. A tiny recommendation: Please increase the speed of speaking as there were many secs breaks between topics. Thank you and looking forward for more content.
Thank you so much 😊
I just have a question about the quantisation on tensorflow. For a project of mine i used QKeras library for QTA, the weights that i got in the end the were pretty large numbers (here speaking more about the volume like for example 0.235215266523415e-2). On the qunationzation config i used int8 and that number is not representable for int8 format.
Does the tranining still happen in fp32 but the quantisation is treated as noise?
Also what do i do to get the weights to be representable in int8 format?
How to test the accurracy of the weight quantised model?
Hey, I almost always use PyTorch. So will try my best to help you with TF.
Its normal for the weights to be large numbers like you have said if the weights are in fp32. If the number cannot be quantized into int8, the training could collapse as it could be rounded off to an extreme like 0. So if too many numbers get rounded off like this, gradients will collapse leading to training hitting a wall.
To test, you can run inference the same way as you do on your eval or test set. What is stopping you doing that?
Thank you for the informative content. Is it possible to combine pruning and quantization while maintaining accuracy?
Fantastic question. I feel it should be possible as they do two different things to the weights, though I have never tried both.
First prune and get rid of the unnecessory params. Then quantize what is left :)
Thankyou, it was really clean and clear explanation that too in a short time. 👏
oh thanks for the encouraging words. Helps me keep going :)
very well explained.
thank you Vishal! :)
very good explanation,please make a video on how to calibrate the data and compute scaling factor and zero point by analysing the weight distribution of each layer for Int8 quantization in tensorflow tensorrt, also the role of fake quantizers during backpropagation
Thanks for the nice suggestion. There are courses on Quantization these days. So wasn't able to cover everything or deep dive into specifics :)
Just start watching but... signed bye is -128..127 not 127..127. google 2's complement to see why.
sorry, thats an embarrassing errata! and good spot. Thanks a lot! will keep it up for next time.
Thanks!
Thanks so much for the monetary reward! Very encouraging 🙂