Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Umar Jamil

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 тра 2024
In this video, I will be explaining Kolmogorov-Arnold Networks, a new type of network that was presented in the paper "KAN: Kolmogorov-Arnold Networks" by Liu et al.
I will start the video by reviewing Multilayer Perceptrons, to show how the typical Linear layer works in a neural network. I will then introduce the concept of data fitting, which is necessary to understand Bézier Curves and then B-Splines.
Before introducing Kolmogorov-Arnold Networks, I will also explain what is the Universal Approximation Theorem for Neural Networks and its equivalent for Kolmogorov-Arnold Networks called Kolmogorov-Arnold Representation Theorem.
In the final part of the video, I will explain the structure of this new type of network, by deriving its structure step by step from the formula of the Kolmogorov-Arnold Representation Theorem, while comparing it with Multilayer Perceptrons at the same time.
We will also explore some properties of this type of network, for example the easy interpretability and the possibility to perform continual learning.
Paper: arxiv.org/abs/2404.19756
Slides PDF: github.com/hkproj/kan-notes
Chapters
00:00:00 - Introduction
00:01:10 - Multilayer Perceptron
00:11:08 - Introduction to data fitting
00:15:36 - Bézier Curves
00:28:12 - B-Splines
00:40:42 - Universal Approximation Theorem
00:45:10 - Kolmogorov-Arnold Representation Theorem
00:46:17 - Kolmogorov-Arnold Networks
00:51:55 - MLP vs KAN
00:55:20 - Learnable functions
00:58:06 - Parameters count
01:00:44 - Grid extension
01:03:37 - Interpretability
01:10:42 - Continual learning
Наука та технологія

КОМЕНТАРІ • 96

@josephamess1713 20 днів тому ⁺⁶⁴
The fact this video is free is incredible
@umarjamilai 20 днів тому ⁺⁷
You're welcome 🤗
@AdmMusicc 3 дні тому ⁺²
You're on a mission to make the best and friendliest content to consume deep learning algorithms and I am all in for it.
@xl0xl0xl0 10 днів тому ⁺⁴
Wow this was a super clear an on-point explanation. Thank you, Umar.
@edsonjr6972 20 днів тому ⁺¹⁰
Your videos are literally the only ones with 1hr+ I would ever watch on UA-cam. Keep going mate, extremely high quality content 👏🏽👏🏽
@nokts3823 19 днів тому ⁺⁶
Thanks a lot for making this accessible for people outside the field, for which reading and understanding these papers is quite tough. Thanks to you I'm able to stay slightly more up to date with the crazy quick developments in ML!
@mohamedalansary2542 20 днів тому ⁺¹⁵
Clearly explained and very valuable content as always Umar. Thank you!
@franciscote-lortie8680 14 днів тому ⁺³
Incredibly clear explanations, the flow of the video is also really smooth. It’s almost like you’re telling a story. Please keep making content!!
@goldentime11 11 днів тому ⁺²
Thanks Umar for such a wonderful tutorial! I've been eyeing this paper for a while!
@MrNathanShow 20 днів тому ⁺³
The intro of a basic linked up linear layers was so well done and really makes this introduction friendly!
@andreanegreanu8750 День тому ⁺¹
Very clear, well explained, top notch!
@manumaminta6131 19 днів тому ⁺²
Your videos help me (a grad student) really understand difficult, often abstract concepts. Thank you so much... I'll always support your stuff!
@AlpcanAras 18 днів тому ⁺²
This is life changing, in my opinion. Thank you for the efforts on the videos!
@jeunjetta 18 днів тому ⁺²
I think KAN will be the catalist of a significant tipping point in science.
I want to apply this to power system grids and replace existing dynamic models with ones made from PMU data using KAN
@MuhammadrizoMarufjonov-os5fv 20 днів тому ⁺⁶
Thanks for including prerequisites
@anirudh514 20 днів тому ⁺⁴
Thanks for the crystal clear explaination!!
@odysy5179 8 днів тому ⁺²
Fantastic explanation!
@stacks_7060 19 днів тому ⁺¹
One of the best math videos I’ve watched on UA-cam
@JONK4635 19 днів тому ⁺¹
Extremely clear explanation and content here! Very helpful. I am happy that you came from PoliMI as well :) keep it up!
@ozgunsungar9370 7 днів тому ⁺¹
awesome, easy to follow even person dont know anything :)
@luigigiordanoorsini5980 13 днів тому ⁺¹
Ho appena letto la piccola bio del tuo canale, spero di non essere offensivo dicendo che adesso capisco perché il tuo ottimo inglese mi sembrasse comunque molto familiare.
Ad ogni modo ti ringrazio enormemente per il tuo contributo hai spiegato tutta la teoria in un modo, a mio avviso, estremamente chiaro e soprattutto coinvolgente.
Ti prego continua così, di nuovo un enorme grazie e complimenti per il tuo contributo alla scienza
@umarjamilai 13 днів тому ⁺¹
Grazie a te per aver visitato il mio canale! Spero di pubblicare più spesso, anche se per fare contenuti di qualità ci vogliono settimane di studio e preparazione. In ogni caso, spero di rivederti presto! Buon weekend
@luigigiordanoorsini5980 13 днів тому
@@umarjamilai Avevi già guadagnato un iscritto adesso hai guadagnato un fan.
Ahahahahah
@MuhammadMuzzamil-ki4he 18 днів тому ⁺¹
Thank you for such great and detailed explanation.
@zaevi6855 20 днів тому
crazy that it took me an hr video to understand that its the (control points) being trained on the spline graph vs weights with MLPs and CNNs, thank you!
@user-pu4oc9ek9u 20 днів тому ⁺¹
Hello Umar, this video is my best birthday gift I have ever received, thanks a lot :)
@ScottzPlaylists 19 днів тому ⁺²
High quality explanations.. Thanks.
@johanvandermerwe7687 20 днів тому ⁺¹
I saw this paper on papers with code, and thought to myself I wonder if Umar Jamil will cover this.
Thanks for your effort and videos!
@user-il1hu5xp2x 20 днів тому ⁺¹
What funny, is that i predicted your next video will be on KAN, after i see you in github.
I WILL WATCH THIS VIDEO, AS I FEEL THIS WILL BE THE FUTURE OF NEUR NETWORK, THANK YOU FOR YOUR WORK AND CONTENT ❤
@lethnis9307 18 днів тому ⁺¹
Your explanations are the best, thank you so much😘🤗
@howardmeng256 15 днів тому ⁺²
Amazing video! Thanks a lot !
@arupsankarroy8722 17 днів тому ⁺²
Sir, you are great..💙💙
@ansonlau7040 13 днів тому ⁺¹
Thankyou Jamil, what a cool video
@enricovompa1876 20 днів тому ⁺²
Thank you for making this video!
@GUANGYUANPIAO 7 днів тому ⁺¹
awesome explanation
@anmolmittal9 12 днів тому ⁺¹
This is really great! Power to you!!🚀
@artaasadi9497 10 днів тому ⁺¹
that is very useful, informative and interesting! Thanks a lot!
@bankayxy00 15 днів тому ⁺¹
Thank you so so much for this amazing content.
@kmalhotra3096 13 днів тому ⁺¹
Hats off, what an awesome video!!!
@ezl100 15 днів тому
thanks Umar. Very nice explanation. Just 2 questions :
1 - Does it mean we can specify different knots per edge?
2 - I am not understanding how the backpropagation will work. Let's say we calculate the gradient from h1. It will update phi 1,1 and phi 1,2 but how the learning process will impact the knots to the desired value?
@coolkaran1234 20 днів тому ⁺²
You are savior, without you mortals like me would be lost in the darkness!!!
@sergiorego6321 19 днів тому ⁺¹
Phenomenal! Thank you :)
@samadeepsengupta 20 днів тому ⁺²
Great Content !!
@hajaani6417 20 днів тому ⁺¹
You’re fantastic, mate.
@JuliusSmith 12 днів тому
Excellent video, thanks! At the end, I _really_ wanted to see an illustration of the relatively "non-local" adaptation of MLP weights. Can that be found somewhere?
@RiteshBhalerao-wn9eo 5 днів тому ⁺¹
Amazingg explanation !
@user-wy1xm4gl1c 9 днів тому ⁺¹
This is awesome!
@pabloe1802 15 днів тому
An implementation video will be awesome
@prathamshah2058 18 днів тому ⁺¹
Thank-you so much for explaining the paper, it is so easy to understand now, btw can you also make a hands on video with the kan package developed by mit which is based off pytorch.
@dhackmt 12 днів тому ⁺¹
i loved it sir .
@subhamkundu5043 19 днів тому
Hey @Umar, great content as always. Looking forward to a KAN implementation video from scratch. Also I think in 31:01 there is a minor language mistake. I think it will be for using a quadratic Bspline curve rather than quadratic Bezier curve
@satviknaren9681 18 днів тому ⁺¹
Please do post more ! please do more videos !
@danielegiunchi9741 18 днів тому ⁺¹
brilliant video!
@seelowst 18 днів тому
Having a such good teacher is so adorable, i wish i could be your students.
@umarjamilai 18 днів тому ⁺¹
哪里哪里啊，谢谢你的赞成！
@seelowst 18 днів тому
@@umarjamilai 太棒了，您还会中文👍
@umarjamilai 18 днів тому ⁺¹
@@seelowst 我就是刚刚从中国来的，在中国主了4年了，现在回欧洲了。
@seelowst 18 днів тому
@@umarjamilai 我从没离开过我的城市，我希望像您一样👍
@shubhamrandive7684 18 днів тому
Great explanation. What app do you use to create slides ?
@umarjamilai 18 днів тому
PowerPoint + a lot a lot a lot a lot a lot of patience.
@vaadewoyin 17 днів тому ⁺¹
Cant wait to watch this, saved! Will comment again when i actually watch it..😅
@fouziaanjums6475 20 днів тому ⁺¹
Hi, can you please make a video on multimodal LLMs, fine tuning it for custom dataset...
@p4ros960 10 днів тому ⁺¹
bruh so good. Keep it up!
@fatemeshams9758 4 дні тому ⁺¹
awesome👍
@willpattie581 19 днів тому
One thing I didn’t catch: how are the functions tuned? If each function consists of points in space and we move around the points to move the B spline, how do we decide to move the points? Doesn’t seem like backprop would work in the same way.
@umarjamilai 19 днів тому ⁺¹
The same way we move weights for MLPs: we calculate the gradient of the loss function w.r.t the parameters of these learnable functions and change them in the opposite direction of the gradient. This is how you reduce the loss.
We are still doing backpropagation, so nothing changed on that front compared to MLPs.
@ai__76 20 днів тому ⁺¹
amazing
@faiqkhan7545 19 днів тому ⁺¹
Umar bhai you the great
@user-jb3ht1wq5l 14 днів тому ⁺¹
THANK YOU
@akramsalim9706 20 днів тому ⁺¹
awesome bro.
@Kishan31468 20 днів тому ⁺¹
Thanks man. Next xLSTM please.
@user-sy6xn7nq7s 9 днів тому
There are continuous but indiferable points in the spline, right? What are you going to do?
@routerfordium 17 днів тому
Thank you for the great video! Can you (or anyone) help understand why you need to introduce the basis functions b(x) in the residual activation functions?
@plutophy1242 11 днів тому ⁺¹
this video is so amazing！！！！！！！
@bzzzzz1736 18 днів тому ⁺¹
thank you
@user-hd7xp1qg3j 20 днів тому ⁺²
Could you please next explain multi modal llms, techniques like Llava, llava plus, llava next?
@Patrick-wn6uj 20 днів тому ⁺¹
I waiting for that day too
@kiffeeify 11 днів тому
Thanks!
@MrAloha 20 днів тому ⁺²
Wow! 🙏
@daleanfer7449 20 днів тому ⁺¹
刚好期盼这个!
@umarjamilai 20 днів тому
期待你的评价😇
@daleanfer7449 20 днів тому ⁺¹
❤很好的内容，有考虑做inverse rl的内容吗❤
@baba42kachari 15 днів тому
Thanks
@Engrbilal143 20 днів тому
Time to implement it
@DiegoSilva-dv9uf 19 днів тому
Valeu!
@jeremykothe2847 18 днів тому
fwiw I took a MLP solution for MNIST, substituted KAN for the MLP layers and no matter what I did (adding dimensions etc) it couldn't solve it. My intuition is that KANs only work well for approximating linear-ish functions, not irregular, highly discontinuous ones like image classification would need. But perhaps I just screwed it up :D
@rohitjindal124 20 днів тому
Sir I have been a huge fan of your videos and have watched all of them . I am currently in my second year BTech and really passionate about learning ml sir if possible can work under you I don’t want any certificate or anything just want to see observe and learn
@ScottzPlaylists 19 днів тому ⁺¹
Please explain DSPy
@suman14san 20 днів тому ⁺¹
Please add a payment option
@umarjamilai 20 днів тому ⁺⁸
Your love and support is enough! Have a great weekend!
@Patrick-wn6uj 20 днів тому ⁺¹
@@umarjamilaiJust woow
@pratishdewangan132 14 днів тому ⁺¹
In search of gold i found a diamond
@einsteinsapples2909 19 днів тому
Your explenations are great. I think though, you should take breaks to blow your nose maybe, because you were sniffing a lot. It will make you videos more enjoyable.
@ln_exp1 18 днів тому
Interesting
@emiyake 19 днів тому
Thanks!
@alfredmanto5487 18 днів тому
Thanks

Наступне

Автоматичне відтворення

Was "Machine Learning 2.0" All Hype? The Kolmogorov-Arnold Network Explained