I started learning SVM looking for some material that would provide an intuitive understanding of how this model works. By this time, i have already covered in depth all the mathematics behind it and I have spent almost a month on it. It sounds like a eternity, but i can’t feel myself confident, until i consider everything in details. In my opinion, basic intuition is the most important thing in model’s exploration and you did this extremely cool. Thank you for your time and work. For those, who are new to this channel, i highly recommend you to subscribe. This guy makes an awesome content!
Note, to get the inner product after transformation to be equivalent to (1+x_i * x_j)^2, the transformation will need to have some constants. Specifically, the transformation should be [x1, x2] --> [1, sqrt(2)*x1, sqrt(2)*x2, x1^2, x2^2, sqrt(2)x1*x2]
Instead ignore the coefficients (for example will have a term 2 xi^(1) xj^(1) so only consider xi^(1) xj^(1) and drop the 2 in the expansion you will get the match).
Great Man. After months of stumbling over the convex optimization theories and KKT and whatnot, this video made everything clear . Highly appreciated.👏👏
Dude, like the other commenters say, you are so good at just laying stuff out in plain English. Just for this and the prior video I'm going to hit subscribe...you deserve it!
Amazing, Amazing, you are my true guru while I prepare for the university exam. You are far far above my college professors whom I barely understand. Hope you get your true due some how. Subscribed already. 🙏
Hi Ritvik, this is an excellent explanation of the kernel trick concept. I have a doubt though. When we apply 2-degree polynomial trick to the dot product of the two vectors we will apply (a+b+c)**2 formula. Doing this will introduce a factor of 2 for a few terms. Is it ignored since it will just scale the dot product?
Ignore the coefficients (for example will have a term 2 xi^(1) xj^(1) so only consider xi^(1) xj^(1) and drop the 2 in the expansion you will get the match).
Really explained well. If you want to get the theoretical concepts one could try doing the MIT micromasters. It’s rigorous and demands 10 to 15 hours a week.
This is a good explanation, but I'm a bit confused about the terms on the bottom right corner. Did we reach this by squaring the parentheses And then taking? That's gonna result in the sum of the terms, so what did we do next, take each term independently and set it as a term?
Amazing explanation! Thanks for making these series of video on SVM. One question is that does kernel/kernel trick can also be applied on other model like logistic regression? I saw some online posts saying kernel can be applied on logistic regression but seems like it's very unpopular. Wonder if it's because the logistic regression and other models can't really get the dot product term, which makes computation expensive or other reasons? Thanks!
Little late but still, It can be applied to any ML algorithm, for example Linear regression (Kernelized) and so on, to include higher dimensional polynomial features instead of linear attributes.
Hi Ritvik, in the end you have to sum the values in the 6-tuple to get the equivalent to the kernel output, right? (in order to get a proper scalar from the scalar product)
Why we calculate the the inner products ? I understand the data points need to be transformed in higher dimensions, so that they can be linearly sepereble. But why we calculate the 6 dimensional space for that ?, say we have 2d space (original feature space), we can transform it to 3d space to make things done.
If we plugged in the kernel function output(similarity of our points in higher dimensional space) into the primal version of the cost function i.e use the similarity instead of the inputs themselves. Would it be equivalent to solving the dual function? Just a lot more inefficient?
what is xj exactly? am i understanding it right if i can consider it as the triangle data point and xi are the x data points...? so xj is like feature variables within our data...?
I am still confused about how you developed the kernels in the first place. I know what they do but don't know how to obtain them without using the transformed space.
What is the purpose of finding the relationship between two separate vectors? Why can't you just take the polynomial of a vector with respect to itself (xi_1^T xi_1+c)^2? Wouldn't your number of terms just blow up when you have to find K(xa,xb) for every a and b in X?
The Phi is always impossible to compute directly If u don't mind I can give u a simple kernel PCA example to help viewers because this concept is hard to understand if u are new to this topics
He did not derive the kernel. He showed that if you use (1 + )^2 as a kernel, then if you work it out, you get exactly the same terms as when you explicitly compute (except for a few factors 2). If you would take the kernel ()^2 then you would not get the same terms. Probably some clever person invented the kernel: (1 + )^2 , but it is not explained here how he/she found it. Note there are also other kernel functions that work well for SVM, but with different basis functions.
Hey, I know your videos are according to the current theme, but would be great to have a projector matrix/subspace video at some point in the future! Keep up the great content
you have been teaching me the fundementals of SVMs better than my expensive professor at my university. thank you, man.
same
+1
Hey, i love that everything we learn in the video is already written on the board. It's so clean and compact, yet so much information. Just great man
Thanks so much !
I started learning SVM looking for some material that would provide an intuitive understanding of how this model works. By this time, i have already covered in depth all the mathematics behind it and I have spent almost a month on it. It sounds like a eternity, but i can’t feel myself confident, until i consider everything in details. In my opinion, basic intuition is the most important thing in model’s exploration and you did this extremely cool. Thank you for your time and work. For those, who are new to this channel, i highly recommend you to subscribe. This guy makes an awesome content!
You are blowing my mind sir, thank you for this amazing explanation! No one else has been able to teach the subject of SVM this well.
By far the best explanation of kernels that I've seen/read. Fantastic job!
Amazing explanation!! Finally kernels are way more clearer to me than they have been in the past.
Was stuck for 3 days on kernels looking at numerous lectures online. You just made it clear. Thank you so much!
I'd love to see a video on Gaussian Process Regression, or just Gaussian Processes in general! Thanks for this video - very helpful
When some one has tries a lot to know something, he can explain it much better than others, thanks a lot.
The two paths diagram explains everything so clearly! Thank you!!
You're very welcome!
As someone who's searched everywhere for an explanation about this topic, this is the only good one out there. Thanks so much!
Note, to get the inner product after transformation to be equivalent to (1+x_i * x_j)^2, the transformation will need to have some constants. Specifically, the transformation should be [x1, x2] --> [1, sqrt(2)*x1, sqrt(2)*x2, x1^2, x2^2, sqrt(2)x1*x2]
Instead ignore the coefficients (for example will have a term 2 xi^(1) xj^(1) so only consider xi^(1) xj^(1) and drop the 2 in the expansion you will get the match).
Dude, you are a legend. Finally I understood the power of Kernel functions. Thanks!
Great Man. After months of stumbling over the convex optimization theories and KKT and whatnot, this video made everything clear . Highly appreciated.👏👏
Dude, like the other commenters say, you are so good at just laying stuff out in plain English. Just for this and the prior video I'm going to hit subscribe...you deserve it!
Wow, thanks!
Huge, big thank you, for your hard work and spreading the knowledge. Nice, brave explanation.
My pleasure!
This is the clearest explanation of this topic I've seen so far. Thank you
Bro - Thanks much!!'
The way that you are teaching and your understanding is crazy!
Happy to help!
Might be one of the best videos I have seen on SVM. Crazy
You are amazing! Thank you so much for explaining the math and the intuition behind all of this. Fantastic teaching skills.
Good stuff
Thanks for the visit!
I found you by case and this was a damn miracle, will constantly check for new videos
This explanation cleared up everything for me! Amazing work, I can’t thank you enough!
You summed up all the needed knowledge about svm, and the discussion in this episode is more philosophical, thank you very much for the course.
Been struggling to grasp this even after watching a bunch of UA-cam videos. Finally understand! Must be the magic of the white board!
Amazing, Amazing, you are my true guru while I prepare for the university exam. You are far far above my college professors whom I barely understand. Hope you get your true due some how. Subscribed already. 🙏
Hey man,
Just wanted to admire you for your beautiful work on bringing some of the key complex fundamentals such as this to ease. :D.
Very clear and well-organized explanation. Thank you!
Glad it was helpful!
This is the best video I've seen on this topic. Thank you, sir.
Cant thanks you enough to explain it so simply.
Much better than other youtubers explaining the same concept.
Absolutely great !
Marketer studying Data Science here. Amazing content!
Glad you enjoy it!
You spittin knowledge, GD! This needs to go viral
Such a great explanation. First time I get it after many attempts
Can’t get any better explanation than this 👌🏼
dude I like before even watching the vids because I know I won't be disappointed
Your data science concepts video series is one of a kind
Clearly explained! Thank you!
Awesome explanation. Thank you!
Glad it was helpful!
You have done a very good job here - Thank You! How about a list of youtube videos you have done? ( I just subscribed)
That's a great video. Thank you for making this.
Thats the best video I have seen on kernels on YT! great content
very well explained, thank you!
Very underrated video
Smart AND fit - these videos are like candy for my eyes and brain 🧠 😂
I finally understood what a kernel does! Thanks!
This is an incredible explanation. It helped me alot. Thank you so much.
Hi Ritvik, this is an excellent explanation of the kernel trick concept. I have a doubt though. When we apply 2-degree polynomial trick to the dot product of the two vectors we will apply (a+b+c)**2 formula. Doing this will introduce a factor of 2 for a few terms. Is it ignored since it will just scale the dot product?
Ignore the coefficients (for example will have a term 2 xi^(1) xj^(1) so only consider xi^(1) xj^(1) and drop the 2 in the expansion you will get the match).
we get inner products of high dimensional data with out even converting data into high dimension, thats the conclusion i drew, correct me if am wrong.
Yup, that's exactly the main point !
Really explained well. If you want to get the theoretical concepts one could try doing the MIT micromasters. It’s rigorous and demands 10 to 15 hours a week.
this is the first time i get it! thank you
Masha'Allah man, like really Masha'Allah. This is just beautiful and truly a piece of gold. Thank you for this
This is a good explanation, but I'm a bit confused about the terms on the bottom right corner. Did we reach this by squaring the parentheses And then taking? That's gonna result in the sum of the terms, so what did we do next, take each term independently and set it as a term?
Amazing explanation! Thanks for making these series of video on SVM. One question is that does kernel/kernel trick can also be applied on other model like logistic regression? I saw some online posts saying kernel can be applied on logistic regression but seems like it's very unpopular. Wonder if it's because the logistic regression and other models can't really get the dot product term, which makes computation expensive or other reasons? Thanks!
Little late but still, It can be applied to any ML algorithm, for example Linear regression (Kernelized) and so on, to include higher dimensional polynomial features instead of linear attributes.
Nice explanation
Very insightful thanks a lot
This is amazing
Simply amazing 🤩
Great video man thanks a lot!
Hi Ritvik, in the end you have to sum the values in the 6-tuple to get the equivalent to the kernel output, right? (in order to get a proper scalar from the scalar product)
Your videos are of exquisite quality.
Amazing explanation!
this video is goated
Beautiful
Thank you! Cheers!
You are GREAT!
amazing teacher
Glad you think so!
Thank you. I just imagined what a hard time I would have if I tried to grind through all of this math on my own. It is not a good idea for a beginner)
Ritvik for president!
haha!
Thanks, this was just what I wanted 😙
dude I love you
what do you mean by Inner products of original data?
Thank you very much ❤, you save us a lot of time and effort, hope I can work with you someday
Your videos rank pretty high on the 'binge-ability' matrix...
Thanks!
Why we calculate the the inner products ? I understand the data points need to be transformed in higher dimensions, so that they can be linearly sepereble. But why we calculate the 6 dimensional space for that ?, say we have 2d space (original feature space), we can transform it to 3d space to make things done.
Thats correct applyinf polynomial kernel quadratic for example will convert it to 3d dimensions but rdf can convert it to infinite dimensions
If we plugged in the kernel function output(similarity of our points in higher dimensional space) into the primal version of the cost function i.e use the similarity instead of the inputs themselves. Would it be equivalent to solving the dual function? Just a lot more inefficient?
What is not clear for me is that, is the output of the kernel function a scalar?
what is xj exactly? am i understanding it right if i can consider it as the triangle data point and xi are the x data points...? so xj is like feature variables within our data...?
Wow, thank you so much!
quality👏
I am still confused about how you developed the kernels in the first place. I know what they do but don't know how to obtain them without using the transformed space.
Great Job! Thank you soo much!!
i love you man. i am vt student. i wish that i knew this a month a go :(
What is the purpose of finding the relationship between two separate vectors? Why can't you just take the polynomial of a vector with respect to itself (xi_1^T xi_1+c)^2? Wouldn't your number of terms just blow up when you have to find K(xa,xb) for every a and b in X?
magic
Thank you!
sorry may I ask? how if we have 4/5 class ? how we describe or using it?
Can someone elaborate how a kernal exactly does that? At the end of the day, we still need the higher demsion data no? I'm confused.
Thanks!
Why does 1 mean in the transformed matrix?
1 is just for the "intercept". It's like the "b" term in the linear equation "y=mx+b"
The Phi is always impossible to compute directly
If u don't mind I can give u a simple kernel PCA example to help viewers
because this concept is hard to understand if u are new to this topics
sure! any resources are always welcome
why do we add 1 term to the dot product in Kernel?
He did not derive the kernel. He showed that if you use (1 + )^2 as a kernel, then if you work it out, you get exactly the same terms as when you explicitly compute (except for a few factors 2). If you would take the kernel ()^2 then you would not get the same terms. Probably some clever person invented the kernel: (1 + )^2 , but it is not explained here how he/she found it. Note there are also other kernel functions that work well for SVM, but with different basis functions.
let him cook
india se ho kya bhai?
Hey, I know your videos are according to the current theme, but would be great to have a projector matrix/subspace video at some point in the future! Keep up the great content
that well explained. thank you
Glad it was helpful!