As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.
It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on UA-cam)
Stellar video! As another UA-camr who recently started, I wish you all the best :) I know now how much effort it takes to make these videos. Great use of manim, too.
@ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :) Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.
This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go
@@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.
@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library. Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'
🤔 Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.
I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.
Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).
@@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.
Interesting take-home message. I would never think that just the non-linearity itself is so important.
Anyone else expect to hear like lecture hall applause at the end of the video lmaooo, that was really good
Great educational video! I expected this to have a lot more views; keep it up and you'll grow quickly!
As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.
I made a short about this! ua-cam.com/video/eXdVAAFCkHU/v-deo.html
It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on UA-cam)
God I love that video so much@@FunctionallyLiteratePerson
@@FunctionallyLiteratePerson All of Suckerpinch's videos can be really recommended!
Can't believe I watched this video for free.
Stellar video! As another UA-camr who recently started, I wish you all the best :)
I know now how much effort it takes to make these videos. Great use of manim, too.
@ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :)
Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.
keep it up. i watch a lot of these sort of videos, and normally they don't pull me in. this one did
This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go
Honestly the most surprising result was the performance of sine&square
bro this is such a good video.. Nice voice, nice animations and overall stile... I wish you all the best. Keep it up!
These is really good. If this really an AI voice it’s so natural lol
Didn't even notice it was an AI voice, great video
Can you please give me the source of the lecture from this timestamp? 0:32
Prof. Thomas Garity?
How does this only have 700 views?!
Just watched the video, and I am shocked this does not have thousands of views
Did the minecraft activation have zero gradient everywhere? (b/c cubes lol)
@@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.
Neat.. More please
Hang on, don't we need non-polynomial activation functions for the Universal Approximation Theorem? You gave x^2 as an example activation function...
I was wondering the same..
I'll wait for the reply here
How did you define the derivative of the Minecraft activation function to use in backprop?
Maybe numerical differentiation? Literally taking a neighboring height and subtracting
UQuark0 is correct, I just let PyTorch autograd do its thing.
AI Generated voices teaching people how to do machine learning, what a time to be alive
What a time to be alive!📄📄
This is not a generated voice
@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library.
Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'
I wonder if you could use this to evolve a good activation function
Subbed !! I think i like this channel. hope it grows
What if you used a neural network to approximate the optimal activation function for another neural network?
No the random inputs i use, use the count of alpha particles
🤔
Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.
So, max pooling with no further activation function would probably work just as well?
I'd also be interested in a half-formal proof of the universal approximation theorem instead of just empirical results. Nice video though!
I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.
underrated
How did you train the minecraft network? Doesnt it have the same issue as the step function witha derivative of 0 everywhere?
Pretty sure he used each blocks height as a single datapoint, connected linearly.
Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).
i thought it was minecraft 100 days challenging video. i'm too brain rotted
high quality educational vid. 🎉Subscribed Thanks for it
I wonder how would the minecraft+max pooling perform
In my experiment, it worked about as well as minecraft+avg pooling (a few percent better).
ah great video! new sub
Took me a while to realize the voice was AI
how do you only have such few subs 😶
Nvidia 6090 rushing to implement this as dlss 6 instead of adding 2 more giagabytes of vram:
this is great! :0
obviously not any function will work, the fuctions have to be a unital point-separating subalgebra
Which is a very lax restriction.
It feels a little pedantic to be so clear.
@@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.
@@Galinaceo0 agreed
Approximation*
subbed :)
great video but AI voice :(
Your videos are really great but i'd really rather listen to your real voice, the AI one is just too jarring