Part 25-Support Vector Machines, the Kernel trick

Pedram Jahangiry

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 6 жов 2024

КОМЕНТАРІ • 30

@SwanBaby31 2 роки тому ⁺⁹
Hi, just want to say thankyou so much for the crash course! Its so intuitive and understandable, i have watched the SVM series till part 26 and its so much clearer compared to my prof's lessons :"). This is so underrated and deserve more views! Would be great to include deep learning concept in future videos!
@pedramjahangiry 2 роки тому ⁺⁵
Thanks for your feedback! I am working on the Deep Learning videos!
@Savedbygrace952 Рік тому
@@pedramjahangiry God bless you!
@vijaysista3894 Рік тому
Your videos are very clear and lucid. Thank you so much for the clear explanation.
@diabl2master 8 місяців тому
Interestingly, although you could describe it as "going from 2 dimensions to 3", what you're really doing is going from 2 dimensions to 1 - we are going from (x,y) to r=sqrt (x^2+y^2)
@pedramjahangiry 8 місяців тому
The kernel trick allows you to compute relationships in a higher-dimensional space without explicitly doing the high-dimensional mapping, but it doesn't typically reduce dimensions.
@Ash-bc8vw Рік тому ⁺¹
Thank you much for this clear explanation! Can you please point out to books/resources which we can refer to for SVR in python .
@pedramjahangiry Рік тому ⁺¹
the main textbook for the course is the "introduction to statistical learning" which you can find a free copy here: www.statlearning.com/
for all the other charts and graphs in the video, please find the info below each graph. The slides are available on my GitHub account.
@Ash-bc8vw Рік тому
@@pedramjahangiry thank you so much!
@clapathy Рік тому
Thanks for such a great explanation!
@pedramjahangiry Рік тому
Glad you enjoyed it!
@konigludwig5.099 Рік тому ⁺¹
Very nicely explained, thank you! However, I don't understand how to get the landmarks. Also via hyperparamter tuning? And how does the kernel function work with multiple landmarks? I would get multiple function values for the same x then.
@pedramjahangiry Рік тому ⁺¹
Hi Konig! I'm glad you found the explanation helpful! I'll be happy to address your concerns regarding landmarks and the kernel function.
Landmarks: In the context of SVM and the RBF kernel, landmarks are typically the training data points themselves. This means that for each data point in the training set, there is a corresponding landmark. By using the training data points as landmarks, you're able to transform the input space into a higher-dimensional feature space, which can make it easier to find the optimal decision boundary.
Hyperparameter tuning: Hyperparameter tuning isn't directly related to selecting landmarks, as the landmarks are simply the training data points. However, hyperparameter tuning is important for selecting the best values for the parameters in the RBF kernel, such as the regularization parameter 'C' and the kernel parameter 'gamma'. Tuning these parameters can help you find the best model with the right balance between bias and variance.
Kernel function with multiple landmarks: Indeed, when you have multiple landmarks, the RBF kernel function will compute multiple values for each input 'x'. For instance, if you have 'm' landmarks, you will have 'm' RBF kernel values for each input 'x'. These kernel values are essentially the transformed features in the higher-dimensional space. The idea is to compute the similarity between 'x' and each landmark using the RBF kernel, which will result in 'm' similarity values. Then, these values will be used as features for your SVM classifier.
I hope this clears up your concerns. Feel free to ask if you have any more questions!
@konigludwig5.099 Рік тому ⁺¹
@@pedramjahangiry This cleared things up for me, thanks for the detailed answer!
@morpheus1586 Рік тому ⁺¹
@@pedramjahangiry thank you for the explaination. I was wondering the same thing. I have watched a loy of svm videos and you're the first kne to mention landmarks which makes more sense. I am working with a binary vlassification project using svm with 3 coordinate (i.e. x1, x2,x3). Are you saying that for each observation (i, j,k...z) I would have to compute the Euclidean distance between each and every point?...
Sorry Im still not 100% sure how to determine this landmarks. Say I have 1000 observation points, that would be a 1,000,000 (1000 x 1000) distance calculated if each data point is also the landmarks. I thought that that would defeat the purpose of a kernel trick to explicitly define the feature space. Also it would be computationally expensive no? Maybe Im missing sometime.
@pedramjahangiry Рік тому ⁺¹
@@morpheus1586 I'm glad to hear that you found the explanation about landmarks in SVM useful! In regards to your question, the selection of landmarks is indeed a challenging aspect of SVM with Gaussian RBF kernels.
You're correct in your understanding that if you choose every data point as a landmark, it can lead to a high computational cost, especially with a large number of observations. This is one of the trade-offs with this approach.
However, the 'landmarks' in the context of SVM with Gaussian RBF kernel are not always the actual data points. Instead, these landmarks are points in the feature space to which the similarity of any data point is measured. Often, the original inputs themselves (i.e., each data point) are used as landmarks, but that is not a requirement.
It's also important to understand that the purpose of the kernel trick is to transform data into a higher-dimensional space where it becomes linearly separable, not necessarily to reduce computational cost. That said, computational efficiency is indeed a practical concern, so it's important to balance these considerations when choosing your approach.
One possible strategy to mitigate this is to use clustering methods to select a smaller number of representative landmarks, rather than using every data point. This can help in reducing the computational burden while still capturing the essential structure of the data.
I hope this clarifies your question! Let me know if you have more.
@morpheus1586 Рік тому ⁺¹
@pedramjahangiry That's a great explanation. That makes sense. What clustering method would you recommend? Also, there is another thing I was wondering. When mapping into the feature space, say I've got a data point that is 1 and another that is -1. When mapping into the feature space using the rbf, how do I distinguish how to categorize the new data point in the feature space. So, for argument, I do a cluster where most of the ones closest to each other were in separate classes in the input space (1 and -1). How do I label these data points? Or could I just map datapoints the same classes to keep to homogeneous?
Also, Im better with nueral networks, and it's much easier to understand and build from scratch. What are your thoughts on using neural networks or deep nueral networks for classification? I have heard it's better.
@rahilnecefov2018 2 місяці тому ⁺¹
is there any chance to get the presentations? I cant find this materials in your github account.
@pedramjahangiry 2 місяці тому
I just added the slides here: github.com/PJalgotrader/Machine_Learning-USU/tree/main/Lectures%20and%20codes/miscellaneous
@alo633 Рік тому ⁺¹
nice video, i have a question, when i want to transform the data from 2d to 3d with kernel rbf, what is the equation to know z coordinate in 3d transformation? thankyou
@pedramjahangiry Рік тому
Great question alo, the process of transforming data from 2D to 3D using an RBF (Radial Basis Function) kernel, such as in kernel PCA (Principal Component Analysis) or SVM (Support Vector Machines), doesn't provide a straightforward equation for computing a Z-coordinate. This is because the RBF kernel projects the data into a higher-dimensional space (not necessarily 3D) where the dimensionality can be infinite.
While you cannot directly compute a Z-coordinate from this, you can visualize the effect of the RBF kernel in 3D using a simple 1D example. If you have a single feature x and you apply the RBF kernel, you could create a 3D plot where the x-axis is the original feature, the y-axis is the feature again, and the z-axis is the result of the RBF kernel applied to every pair of points in the dataset.
However, when dealing with higher dimensions (e.g., 2D to 3D, or more), the transformation becomes less straightforward and it is not typically possible to simply compute a Z-coordinate.
In practice, the transformation done by the RBF kernel is used implicitly, i.e., we compute similarities between images of data points in the high-dimensional space without actually computing these images. This is known as the "kernel trick".
I hope this helps! If you have further questions, feel free to ask.
@azzahrafatima-l1m 8 місяців тому
sir, what the function of 1/2 in the formula quadrtic programming (masalah optimasi)?? to divde in half or what??
@pedramjahangiry 8 місяців тому
Great question! It is mostly for mathematical convenience. It simplifies the calculus involved, especially when taking derivatives of the quadratic terms in optimization problems. This practice makes the mathematical computations and formulations more straightforward.

Наступне

Автоматичне відтворення

Part 26-Support Vector Machines Regression