Flow Matching for Generative Modeling (Paper Explained)

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 лис 2024

КОМЕНТАРІ • 82

@zerotwo7319 7 місяців тому ⁺⁹⁶
A Jedi himself is teaching us about generative AI. I couldn't be more grateful.
@amedeobiolatti216 7 місяців тому ⁺⁵
These are not the papers you are looking for 🖐
@MilesBellas 7 місяців тому ⁺³
Hu-Po = Yoda ?😅
@blacksages 7 місяців тому ⁺³⁰
Man, I have a presentation to do on this paper in a few days, but I've been stuck on it, you just make it so much clearer thank youuu!
All the step by step and reminders you put in your video are so helpful, I've been through Y. Lipman presentation and he just glosses over these things because they're too obvious, but you don't and I'm so grateful!
@ArkyonVeil 7 місяців тому ⁺³
Thank you for the indepth analysis. I personally only have passing interest in the content of these videos but I find listening to them a relaxing experience. And as a bonus, learn something useful every now and then. Cheers
@diga4696 7 місяців тому ⁺⁶
Best birthday gift ever!
@YannicKilcher 7 місяців тому ⁺⁸
Happy birthday
@jonatan01i 7 місяців тому ⁺⁴⁵
"that's a dog in a hat. I'm very very sorry"
@guillaumevermeillesanchezm2427 7 місяців тому ⁺¹⁵
I see a video from Yannic on a Monday, I click like.
@Kram1032 7 місяців тому ⁺⁵
very cool stuff.
Interesting how, in the optimal flow version, the shape (in their examples) does indeed get matched sooner, but initially it looks kinda small, and only then reaches its full size.
I guess that amounts to them hitting the shape sooner than the distribution is even able to spread out in full whereas in the original diffusion process, you'd first do the spreading out, and only *then* hone in on the result
@sebastianp4023 7 місяців тому ⁺²
Question:
Do you have a video/opinion on gMLPs from the paper "Pay Attention to MLPs" Liu et al. 2021?
@xplained6486 7 місяців тому ⁺³
Insane video yannic, your explanation was superb! Keep up the great work
@JTMoustache 7 місяців тому ⁺⁶
Damn.. that data probability path formalism is awesome.
@kev2582 7 місяців тому
Great walkthrough as always. This paper shines with its abstraction/generalization and mathematical rigor. What is missing is qualitative difference between the diffusion probability versus the OT approaches. Since this paper aged a bit, it would be interesting to look up where the authors are now. My hunch is straight line path finding will be worse qualitatively with image generation compared to diffusion models.
@novantha1 7 місяців тому ⁺¹⁴
Wait, so the essence of this paper is that we can define a source "Gaussian distribution" and translate that into a target Gaussian distribution based on a learned vector field which indicates a direction of flow, essentially.
Notably, this is...Maybe not a deterministic process, but certainly is a finite one, in contrast to traditional diffusion denoising.
But...How do we...Encode images in our dataset as a Gaussian distribution? How do we get the source distribution? Is it just noise "tokenized" as a Gaussian distribution? Is it a constant? Is it conditional on the prompt, like a latent LLM embedding (this last one would be wild, I would imagine it would be more effective for the LLM embedding to condition the target distribution but I digress).
I feel like I do understand the process here, but I have no idea how I'd go about implementing this.
@drdca8263 7 місяців тому ⁺⁸
I believe the images are the points in R^d , not the distributions. For each point in the training set (each image in the training set), I think they associate a probability distribution which is a Gaussian with that image as is mean, and a very small standard deviation ,
So, like, the distribution associated with a particular training image is “this starting image, plus a very small amount of noise”.
@u2b83 7 місяців тому ⁺⁶
@@drdca8263 Karpathy made an offhand remark a few years ago that for high-dimension points (R^d) you effectively can recover the exact point just by knowing the distribution.
The "concentration of measure" phenomenon suggests that in high-dimensional spaces, points tend to be closer to the surface of a hypersphere than to its center. This implies that for a given distribution, many points will have similar distances from the mean of the distribution, making the space effectively "smaller" in some intuitive sense than one might expect. This phenomenon can sometimes allow for predictions or reconstructions of data points based on less information than would be necessary in lower dimensions.
@CalebCranney 6 місяців тому ⁺¹
Here's a video that I thought did an excellent job explaining the concept of normalizing flow from the coding perspective: ua-cam.com/video/yxVcnuRrKqQ/v-deo.html. Then this one has some code that matches the diagrams in the Hans video: ua-cam.com/video/bu9WZ0RFG0U/v-deo.html. I just spent a number of hours trying to grasp the concept of flow, and these were what made it start to click for me.
@mikaellindhe 7 місяців тому ⁺¹
"Hey why don't you just go toward the target" seems like a reasonable optimization
@chia-yuhung6598 3 місяці тому ⁺²
I love this video! Would you do a video on rectified flow as well?
@sergiomanuel2206 7 місяців тому ⁺¹
Thank you so much Yannic!! Amazing explanation for such a complicated topic!!!!
@TheRohr 7 місяців тому
Thanks for the video! Two open questions: (1) We still need lots of data to get a good estimate of the probability distribution, right? How much should we expect and how should the dataset look like? Which is related to (2) What is actually meant with a data point or sample here? I understand for diffusion we have an image that becomes noisy. But what would be the 2-d gaussian for an RGB-image? Or is a sample here something different than an image?
@timothy-ul9wp 7 місяців тому ⁺²
I wonder how “straight” the flow matching path during inference actually is, as the model doesn’t actually have information from previous steps
I assume path will always point to the mean of all choices of x_0? (in Eq 23)
@herp_derpingson 5 місяців тому
20:44 if I am understanding it correctly, the u_t can be just stored during the noising process. Assuming we are using a flow based noising algorithm. The paper doesnt seem to do that, but it can be done quite easily.
.
28:29 p_t(x) after marginalization should be thought as if we threw a bunch of points at the screen and let them flow, where would they settle? v_t(x) after marginalization can be thought as the net flow at a particular point at the screen? It is hard to intuit what they are doing.
.
36:16 psi_t(x) is the predicted phi_t(x) by the model?
.
39:31 whenever I see a wall of equations like this my bs sensors start tingling.
.
I was unable to build an intuition of what is happening in the paper and how it helps generate images. In normal diffusion, the pixel colors change appears out of nowhere. In this, they are supposed to "flow" and move into their correct places? Thats all I understood.
@OperationDarkside 7 місяців тому ⁺¹
Would a sand desert dune and wind analog work to visualize the probability density flow and the vector field?
The grains of sand are the probability in one point, the dunes are the distribution density in 2D and the wind is the vector field.
@nulenmaths8654 25 днів тому
One thing I don’t understand: during the training process, let's suppose we have random noise variables x1 and x2, and we want to obtain y1 and y2 such that y1 is approximately equal to y2. For instance, y1 and y2 are two images of the same dog. How can we ensure that x1 and x2 remain close in the latent distribution?
@loukasa 7 місяців тому ⁺¹
Great explanation Yannic
@simonstrandgaard5503 7 місяців тому ⁺¹
Great explanations.
@ljh0412 7 місяців тому
I was waiting for this. Thank you Yannic. Hopely you also check a paper Bespoke solver, which is implemented to speed up flow matching in AudioBox from meta.
@fireinthehole2272 7 місяців тому ⁺¹
Hi Kilcher, could you do "ReFT: Representation Finetuning for Language Models" it's really interesting.
@LouisChiaki 7 місяців тому
Hmm... What is their choice of simga_min? Is the end conclusion simply that we should down scale the noise by (1 - sigma_min)?
@광광이-i9t 7 місяців тому ⁺²
awesome so interesting !! it is really helpful :)) Thanks !!
@PRAKASHFEB 5 місяців тому
Thanks for simple explanation
@vangos154 7 місяців тому ⁺¹
One of the disadvantages of flow-based models is they require reversible layers, and thus they limit the DNN architectures that can be used. Isn't that a problem anymore?
@xandermasotto7541 6 місяців тому
continuous normalizing flows are always invertable. It's just integrating an ODE forwards vs backwards
@punkdigerati 7 місяців тому ⁺¹
Like Atz and Jewel Kilcher?
@向明义 4 місяці тому
Thanks a lot! Nice work!
@DeepThinker193 7 місяців тому ⁺¹³
Omg he's wearing a hoodie. Is he hacking?
@timeTegus 7 місяців тому ⁺⁷
yes
@tiagotiagot 7 місяців тому
Are they basically using the butterfly effect to disturb a standardized gaussian distribution into the desired result?
@wenzhengli6716 3 місяці тому ⁺¹
Okay but for image generated is space being traversed is the colorspaces of each individual pixel?
@counterfeit25 2 місяці тому
I've been wondering about that too, my guess is the same as yours, that the space being traversed is the color space of an individual pixel. So for an RGB image those points and vectors and traversals would be in 3D space, for a grayscale image it would be a 1D space.
Edit: The above would be if the model (the UNet or the DiT) is making predictions directly in pixel space. If the model is making predictions in latent space then all those points/vectors/etc would not be in "color" space.
@tornikeonoprishvili6361 7 місяців тому ⁺³
Damn the paper is math-dense. Watching this I feel like I'm being dragged along by a professional sprinter that I just can't keep up with.
@eriglac 6 місяців тому
i’d like to join the saturday discussions. where do i find that info?
@IsraelMendoza-OOOOOOO 7 місяців тому ⁺²
God Bless You brother ❤
@SofieSimp 7 місяців тому
Do you have a record for your Stable Diffusion 3 presentation?
@wolpumba4099 7 місяців тому ⁺⁹
*Abstract*
This video delves into the technical aspects of flow matching for generative models, contrasting it with traditional diffusion models. It explores the concept of morphing probability distributions from a source to a target, emphasizing the significance of conditional flows and the role of vector fields in guiding this transformation. The video delves into the mathematical underpinnings of flow matching, introducing key objects such as probability density paths and time-dependent vector fields. It demonstrates how these concepts are operationalized through the conditional flow matching objective, allowing for the training of neural networks to predict vector fields for data points. Finally, the video explores specific instances of flow matching, including its relationship to diffusion models and the advantages of the optimal transport path for efficient and robust sampling.
*Summary*
*Introduction to Flow Matching*
* 0:00 - Introduction to flow matching for generative models and its application in image generation, specifically text-to-image tasks.
* 1:06 - Comparison of flow matching with traditional diffusion-based models used in image generation.
* 2:29 - Explanation of the diffusion process as a multi-step process of image generation involving the gradual denoising of random noise to produce a target image.
* 5:46 - Introduction to flow matching as a generalization of the diffusion process, where the focus shifts from defining a fixed noising process to directly learning the morphing of a source distribution into a target distribution.
*Mathematical Framework*
* 6:04 - Illustration of morphing a simple Gaussian distribution into a data distribution, highlighting the challenge of the unknown target distribution and the use of Gaussian mixture models as an approximation.
* 10:52 - Introduction of the concept of a probability density path as a time-dependent function that defines the probability density at a given point in data space and time.
* 13:41 - Explanation of the time-dependent vector field, denoted as V, which determines the direction and speed of movement for each point in the data space to achieve the desired distribution transformation.
* 17:54 - Demonstration of how the flow, representing the path of each point along the vector field over time, is determined by the vector field and the initial starting point.
*Learning the Flow*
* 19:26 - Explanation of how the vector field is set to generate the probability density path by ensuring its flow satisfies a specific equation.
* 20:31 - Introduction of the concept of regressing the flow, which involves training a neural network to predict the vector field for each given position and time.
* 21:56 - Highlighting the ability to define probability density paths and vector fields in terms of individual samples, enabling the construction of conditional probability paths based on specific data points.
* 26:16 - Demonstration of how marginalizing over conditional vector fields, weighted appropriately, can yield a total vector field that guides the transformation of the entire source distribution to the target distribution.
*Conditional Flow Matching*
* 29:40 - Acknowledging the intractability of directly computing the marginal probability path and vector field, leading to the introduction of the conditional flow matching objective.
* 30:48 - Explanation of conditional flow matching, where flow matching is performed on individual samples by sampling a target data point and a corresponding source data point, and then regressing on the vector field associated with that specific sample path.
* 33:30 - Introduction of the choice to construct probability paths as a series of normal distributions, with time-dependent mean and standard deviation functions, allowing for interpolation between the source and target distributions.
*Optimal Transport and Diffusion Paths*
* 38:43 - Exploration of special instances of Gaussian conditional probability paths, including the recovery of the diffusion objective by selecting specific mean and standard deviation functions.
* 41:21 - Introduction of the optimal transport path, which involves a straight-line movement between the source and target samples, contrasting it with the curvy paths characteristic of diffusion models.
* 44:08 - Visual comparison of the vector fields and sampling trajectories for diffusion and optimal transport paths, highlighting the efficiency and robustness of the optimal transport approach.
*Conclusion*
* 46:48 - Recap of the key differences between flow matching and diffusion models, emphasizing the flexibility and efficiency of flow matching in learning probability distribution transformations.
* 47:56 - Reiteration of the process of using a learned vector field to move samples from the source distribution to the target distribution, achieving the desired transformation.
* 53:37 - Explanation of how the knowledge about the data set is incorporated into the vector field predictor, enabling it to guide the flow of the entire source distribution to the target distribution.
i used gemini 1.5 pro
Token count
12,628 / 1,048,576
@nevokrien95 7 місяців тому ⁺²
Israel mentioned
@SouravMazumdar-ki7vv 7 місяців тому
Can someone say which approach is begin discussed here 5:20
@SouravMazumdar-ki7vv 7 місяців тому
Can someone say which approch is begin discussed in 5:20
@jabowery 7 місяців тому ⁺⁴
UNCLE TED!!!
@andylo8149 7 місяців тому ⁺¹
Given that flow matching is completely deterministic I don't see how it is a generalisation of diffusion models. Sure, the (deterministic) probability flow induced by a diffusion model is deterministic and is a special kind of flow matching but the training objective of a diffusion model is inherently stochastic.
I think diffusion models and flow matching are different classes of models.
@gooblepls3985 6 місяців тому
The stochasticity lives in the p(x0) of the expectation used as the loss: x0 is in the general case a randomly drawn sample from a tractable prior such as a Gaussian, just as in the diffusion literature (though the diffusion literature likes to call the data point x0, so the terminology is reversed there).
@kaikapioka9711 7 місяців тому
Thx bud!
@abhimanyu30hans 7 місяців тому
For some reason I get "Unable to accept invite" from your discord invite link.
@ScottzPlaylists 7 місяців тому ⁺¹⁰
@YannicKilcher
What Hardware / Software are you using❓
It seems to be a tablet and pen, but the details would be interesting..
Would it be a Good Video on "Hot to Yannic a paper"❓ 😄 I'd watch it..
Keep up the quality content❗
@Python_Scott 7 місяців тому ⁺⁶
👍 I wondered the same... Make the video please. Or just answer here.
@AGIBreakout 7 місяців тому ⁺⁶
👍I'd watch that, and Thumbs it UP 👍 an odd number of times ❗
@NWONewsGod 7 місяців тому ⁺⁵
Me Too!!!!!!!
@NWONewsGod 7 місяців тому ⁺⁴
@@AGIBreakout Ha, Ha.... "odd number of times" would work too..!!
@NWONewsGod 7 місяців тому ⁺³
@@Python_Scott Something different and useful ! Yes, count me in. ☺
@JohnViguerie 7 місяців тому ⁺³
Very hand wavy
@LeetOneGames 7 місяців тому ⁺¹
Sorry but too many formulas in that paper ;P
Anyway, I kind of lost track in the beginning what was going on, it started out nice with images and suddenly all was about points flowing.
All going through my mind was ”What points are you talking about? Pixels?”
Haha, guess I will have to watch this again when the state of my mind is more up for it :D
@mullachv 7 місяців тому
Can't be over prepared for the solar eclipse
@Blooper1980 7 місяців тому
I wish I could understand this.
@ttul 7 місяців тому
This one is going to take me several passes…
@MrNightLifeLover 7 місяців тому
Published in 2022? Looks like I missed something :/
@BooleanDisorder 7 місяців тому
Obvious Labrador Retriever! 01:33
@robmacl7 7 місяців тому ⁺²
1: Probability path go Woom!
2: Waifus
3: profit
@drdca8263 7 місяців тому
Ugh, I wish “generating images of attractive women” wasn’t such a large fraction of the use of such models.
I don’t think it is good for the person doing the viewing.
Beetles and broken beer bottles, and all that.
@AndrewRafas 7 місяців тому
At 20:53 what you talk and what you mark in the paper do not match. v() is the vector field, and not the other way around.
@YannicKilcher 7 місяців тому ⁺¹
u() is the actual vector field, v() is the neural network learned vector field
@drdca8263 7 місяців тому
It seems to me like this kind of procedure should have many applications outside of images!... but I don’t know what?
So, specifically, this should be applicable for when we want to learn a way to sample from a particular (but unknown) probability distribution. So, “generative AI” type stuff, I guess.
Maybe quantizing like in language models might make this not as applicable to language models? Idk.
What about world-model stuff? Or like, learning a policy?
Hm, while that does involve selecting actions at random, those are often more discrete?
Though, I guess not always. If one is doing a continuous control task thing, then I guess sampling from a continuous family of possible actions, may be the thing to do.
Uh.
Hm, so, if you started with a uniform distribution over a continuous family of actions, and wanted to evolve it towards a good distribution given the current scenario?
Hm, no, I guess this probably isn’t especially applicable to that, because like, how do you obtain the samples from the target distribution?
There must be *something* other than image generation, that this applies straightforwardly to...
@eitanporat9892 7 місяців тому ⁺³
I feel like this paper is a very convoluted and long-winded way of saying “move in straight lines” the mathematical part is obvious and not very interesting. Your explanation was great - I just dislike when people write math for the sake of writing math in ML paper.
@not_a_human_being 7 місяців тому
Another attempts to sprinkle some "statistics and theory" on machine learning. This will fail.

Наступне

Автоматичне відтворення