- 6
- 14 194
Tiny Volt
Taiwan
Приєднався 31 гру 2020
Linear Algebra for Programmers - a sneak peek (link in description)
These are some clips recorded from the e-book of visual essays that I recently finished: Linear Algebra for Programmers (www.linearalgebraforprogrammers.com/)
Переглядів: 92
Відео
5. RealNVP for 2D data and images
Переглядів 3 тис.3 роки тому
Code for 2D data: github.com/TinyVolt/normalizing-flows/tree/main/realnvp_2d Code for images: github.com/TinyVolt/normalizing-flows/tree/main/realnvp_images Link to the pickle file mentioned in the video: github.com/TinyVolt/normalizing-flows/blob/main/realnvp_images/celeb.pkl
4. Normalizing flows for images
Переглядів 8883 роки тому
Code: github.com/TinyVolt/normalizing-flows/tree/main/multi_dim_shared_params
3. Normalizing Flows for 2D data
Переглядів 1,2 тис.3 роки тому
Code: github.com/TinyVolt/normalizing-flows/tree/main/2d_autoregressive The content is largely taken from the excellent course "Deep Unsupervised Learning" by UCB. The original repo is here: github.com/rll/deepul/tree/master/deepul
2. Composing multiple normalizing flows
Переглядів 1,6 тис.3 роки тому
Code: github.com/TinyVolt/normalizing-flows/tree/main/1d_composing_flows The content is largely taken from the excellent course "Deep Unsupervised Learning" by UCB. The original repo is here: github.com/rll/deepul/tree/master/deepul Background music: "Bleeping Demo" Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 License creativecommons.org/licenses/by/4.0/
1. Normalizing flows - theory and implementation - 1D flows
Переглядів 7 тис.3 роки тому
This is an introduction to the theory behind normalizing flows and how to implement for a simple 1D case. The code is available here: github.com/TinyVolt/normalizing-flows/tree/main/1d The content is largely taken from the excellent course "Deep Unsupervised Learning" by UCB. The original repo is here: github.com/rll/deepul/tree/master/deepul All I did was rewrite the code in a cleaner and (sli...
Amazing man! Awesome!
once the model is trained How do I go from z to x ?
Awesome video. The example you showed seems to match very closely with the types of problems that you would solve with a KDE. I think a lot of what people use normalizing flows for right now is learning an invertible mapping from a source distribution to a sink distribution. Could you shed some light on how you could use this normalizing flow model you built to generate new samples of data?
this guy is genius
I just want to ask from which paper do you implement this algorithm? NICE, Glow or Real NVP?
Hey man, the entire lecture series on flow based model is amazing! Much clearer and with more patience and details than original cs294, which is great for anyone having trouble with the basics. Much much appreciated!
why affine is define this way? is alpha x + (1-alpha)x also feasible?
I think that is because at x = 0. Your affine transform tends to -inf which could break DL stuff. I might be wrong though.
Thank your for the lecture.
Thank your Volt for this class.
But how do you inverse the neural network? like how do you get x from an output y using a combination of multiple cdfs
some questions related to video 4:53 , why x1 should be any function? Can it be neglected?
You want to learn the distribution of x1 by learning how to map it to a uniform distribution.
Thankyou so much
Classic
Hey man love you for your explanation. It is a serious game changer for understand NF. Got here after watching many videos on this topic and this is by far the greatest.
great video, Thanks a lot
good description but the background sound is really distractive and annoying.
Thank you for your greate video, but I have some question. In AffineTransform2D class, you apply "log_scale = log_scale.tanh() * self.scale_scale + self.shift_scale" to modulate the log_scale. But why do we need it, because we already applied MLP to estimate beta and gamma from x? Is there any reason. Sorry if I misunderstand this.
My guess is it is easier to train this way. If log_scale is used directly, the gradients are more likely to vanish or explode. Eg if scale is small, log_scale will be a large negative number. By using tanh and a simple shifting + scaling, you can tame the gradients.
@@tinyvolt It makes sense. Thank you very much for your explanation.
This should have 100k views. Amazing work
Thank you sooo much 🙂. Totall 5 classes was all great!
Your video is super amazing!! I was having a lot of trouble understanding NF. Thanks for kind explanation of normalizing flow.
2:20 Where is the uniform distribution?
Please read out the equations and explain what they mean in plain English. And remove the distracting music.
Well done.
Hello, I would like to thank you for yours videos which are really goods. So far I only have see this two, but there where very nice. So yeah, thanks you :)
Thank you so much
That is a great tutorial. Do you have the code uploaded anywhere?
I also wonder how to generate samples x, from the uniform z
So lemme see if this is making sense now.... Our goal might be something like this..... we have say 20 features or so, and 1 label we're trying to predict...... the idea is using a normalized flows model, we can use a dnn to parameterize a desired distribution that we define based on the input features... and the idea is, the model learns the "normalized flows" to get from the combo of our model inputs and desired distribution, to the actual distribution of the real thing we're wanting to forecast?
I have a potentially stupid question, but I'm just a drop out who's learning on their own trying to make it lol........ anyways.... would it be reasonable to think of a normalizing flow model similar to a stochastic optimal control problem, such as using the hamilton jacobi bellman equation... for say, defining a dynamic strategy in a financial market, and defining its behavior to be relative to the distributions the market creates as it plays out, and essentially learn the optimal policy such that as the distribution of the market evolves, our relative execution evolves similarly, such that the end result of our execution will resemble the p(z) we defined to be desirable, and we got there by transforming the markets dists p(x). Does that sound anything close to ballpark? Or would it actually be more like, setting up the model as a normalizing flows model, and using a hamilton jacobi bellman set up to optimize and train said model?
So i went back to the beginning and I'm gonna try this again. So basically here is the idea.... we have some desired distribution that we want and/or we want to use, and that is p(z) ...... and then using the math, and using our data, we can run our data thru the math, and find those CDFs needed to transform the data, based on our described distribution, into some other distribution p(x), which does two things, 1 it now gives us a more "true" distribution for how the actual process were trying to model, and 2 it gives a translated way to interpret that distribution, as it being related to the p(z) we defined thru the CDF transformations. But now I have another rabbit hole to go down, cause it seems like another method for a copula or something lol
I have to believe that a super complex version of this in 8 dimensions using spinor fields is what James Simons figured out way way back
awesome explanation ! In the jacobian matrix in 7.41 you have values v1,v2,v3,v4 ...So z2 is dependent on x1 , that is why the derivative dz2/dx1 is non zero ?
Yes that's true
I wish youtube had some way to promote these kind of hidden gems !
wow what a explanation and visualization.....simply outstanding !
What a wonderful video!
Do you recommend PyTorch for probabilistic programming?
I tried Pyro a long time ago and found it pretty easy to use. Also starting to like JAX more and more recently.
tensorflow probabilities is pretty dope too tbh
Excelllent
What a nice explanation! but is that okay to use ReLU for activation function? because it is not invertible.
Yes. Note that you are only using a neural network to find `beta` and `gamma` to scale and shift the input values. You only need to invert the scaling and shifting of input values; you don't need to invert the neural network itself. During the inversion, we need to calculate `beta` and `gamma` again. But this time we shift by `-beta` and scale by `1/gamma`. To calculate `beta` and `gamma` we use the same network.
Thank you
Thank you!!