Great job. You started off very good by disecting the numbers in the tables. I wish I could see more of the tables in the paper to be analyzed. I also liked the final conclusion in the end saying that there is still a CNN teacher trained in the end.
Thanks for the feedback Amir. I will try to cover more results tables in the future. As a general trend the viewership drops quite a lot after the concept is explained. So less emphasis on the results and conclusion.
I also don't see the uses of the distillation token in the formula. Although the paper stated that the best performance obtained through class+distillation token
Very Interesting Waiting patiently to see its use case in the Industry and its value proposition over current methods that utilize various CNN's @AI Bites Keep up the good work
Thank you for your great work. I would like to seek a question: is the distillation loss computed between the softmax of the teacher and the softmax of the student model, or is it computed between the softmax of the teacher and the ground truth? In the paper, I noted it seemed to be the former, but in your video presentation, it appeared to be the latter. Could you please confirm? If I misunderstand you, I am so sorry.
hey Zach, great question. The papers are always right as they are the source of information for my videos. Most of the times when we do reading groups in University, its quite normal for people to disagree on something. As such everything won't be 100% clear from the paper. So I feel you are right on this occasion :)
Great job. You started off very good by disecting the numbers in the tables. I wish I could see more of the tables in the paper to be analyzed. I also liked the final conclusion in the end saying that there is still a CNN teacher trained in the end.
Thanks for the feedback Amir. I will try to cover more results tables in the future. As a general trend the viewership drops quite a lot after the concept is explained. So less emphasis on the results and conclusion.
Great explanation
How is the distillation token initialized and what is its purpose? Looking at the code it doesn't seem to take intputs from the output of a CNN?
I agree with you, that part looks a mistake but not sure
I also don't see the uses of the distillation token in the formula. Although the paper stated that the best performance obtained through class+distillation token
Very helpful would love to see more of such content
Thank you Yefet! quite encouraging to keep going :)
Very clear explanation, thank you!
Very Interesting
Waiting patiently to see its use case in the Industry and its value proposition over current methods that utilize various CNN's
@AI Bites Keep up the good work
Thank you 🙂
Very clear and concise! Great video!
Glad you enjoyed it!
Thank you for your great work. I would like to seek a question: is the distillation loss computed between the softmax of the teacher and the softmax of the student model, or is it computed between the softmax of the teacher and the ground truth? In the paper, I noted it seemed to be the former, but in your video presentation, it appeared to be the latter. Could you please confirm? If I misunderstand you, I am so sorry.
hey Zach, great question. The papers are always right as they are the source of information for my videos. Most of the times when we do reading groups in University, its quite normal for people to disagree on something. As such everything won't be 100% clear from the paper. So I feel you are right on this occasion :)
@@AIBites Thanks for your kind reply.
I think you’ve got the temperature effect wrong. At higher temperatures we get more distributed logits.
okies, may be I realized it on hindsight! :)
@@AIBites What does that mean?
This is incredible, thank you so much! subscribed!
Amazing, thank you very much
where to draw these kinda charts ?. could you tell me. itll be helpful
I think its keynote if I can recollect as its been a long time I did this video.
@AIBites thanks for sharing, any others, android specific?
Love your videos 🥰
Thank you! Encouraging comments like yours keep me going :)
good explanation
Thank you Mustafa!