Flow Matching for Generative Modeling (Paper Explained)

Поділитися
Вставка
  • Опубліковано 7 кві 2024
  • Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.
    Paper: arxiv.org/abs/2210.02747
    Abstract:
    We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
    Authors: Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
    Links:
    Homepage: ykilcher.com
    Merch: ykilcher.com/merch
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: ykilcher.com/discord
    LinkedIn: / ykilcher
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Наука та технологія

КОМЕНТАРІ • 75

  • @zerotwo7319
    @zerotwo7319 Місяць тому +82

    A Jedi himself is teaching us about generative AI. I couldn't be more grateful.

  • @jonatan01i
    @jonatan01i Місяць тому +41

    "that's a dog in a hat. I'm very very sorry"

  • @blacksages
    @blacksages Місяць тому +23

    Man, I have a presentation to do on this paper in a few days, but I've been stuck on it, you just make it so much clearer thank youuu!
    All the step by step and reminders you put in your video are so helpful, I've been through Y. Lipman presentation and he just glosses over these things because they're too obvious, but you don't and I'm so grateful!

  • @guillaumevermeillesanchezm2427
    @guillaumevermeillesanchezm2427 Місяць тому +15

    I see a video from Yannic on a Monday, I click like.

  • @ArkyonVeil
    @ArkyonVeil Місяць тому +3

    Thank you for the indepth analysis. I personally only have passing interest in the content of these videos but I find listening to them a relaxing experience. And as a bonus, learn something useful every now and then. Cheers

  • @xplained6486
    @xplained6486 Місяць тому +3

    Insane video yannic, your explanation was superb! Keep up the great work

  • @diga4696
    @diga4696 Місяць тому +4

    Best birthday gift ever!

  • @sergiomanuel2206
    @sergiomanuel2206 Місяць тому

    Thank you so much Yannic!! Amazing explanation for such a complicated topic!!!!

  • @Kram1032
    @Kram1032 Місяць тому +5

    very cool stuff.
    Interesting how, in the optimal flow version, the shape (in their examples) does indeed get matched sooner, but initially it looks kinda small, and only then reaches its full size.
    I guess that amounts to them hitting the shape sooner than the distribution is even able to spread out in full whereas in the original diffusion process, you'd first do the spreading out, and only *then* hone in on the result

  • @loukasa
    @loukasa Місяць тому +1

    Great explanation Yannic

  • @simonstrandgaard5503
    @simonstrandgaard5503 Місяць тому +1

    Great explanations.

  • @JTMoustache
    @JTMoustache Місяць тому +6

    Damn.. that data probability path formalism is awesome.

  • @user-xk6rg7nh8y
    @user-xk6rg7nh8y Місяць тому +2

    awesome so interesting !! it is really helpful :)) Thanks !!

  • @ljh0412
    @ljh0412 Місяць тому

    I was waiting for this. Thank you Yannic. Hopely you also check a paper Bespoke solver, which is implemented to speed up flow matching in AudioBox from meta.

  • @sebastianp4023
    @sebastianp4023 Місяць тому +2

    Question:
    Do you have a video/opinion on gMLPs from the paper "Pay Attention to MLPs" Liu et al. 2021?

  • @kev2582
    @kev2582 Місяць тому

    Great walkthrough as always. This paper shines with its abstraction/generalization and mathematical rigor. What is missing is qualitative difference between the diffusion probability versus the OT approaches. Since this paper aged a bit, it would be interesting to look up where the authors are now. My hunch is straight line path finding will be worse qualitatively with image generation compared to diffusion models.

  • @DeepThinker193
    @DeepThinker193 Місяць тому +13

    Omg he's wearing a hoodie. Is he hacking?

  • @IsraelMendoza-OOOOOO
    @IsraelMendoza-OOOOOO Місяць тому +2

    God Bless You brother ❤

  • @OperationDarkside
    @OperationDarkside Місяць тому +1

    Would a sand desert dune and wind analog work to visualize the probability density flow and the vector field?
    The grains of sand are the probability in one point, the dunes are the distribution density in 2D and the wind is the vector field.

  • @novantha1
    @novantha1 Місяць тому +12

    Wait, so the essence of this paper is that we can define a source "Gaussian distribution" and translate that into a target Gaussian distribution based on a learned vector field which indicates a direction of flow, essentially.
    Notably, this is...Maybe not a deterministic process, but certainly is a finite one, in contrast to traditional diffusion denoising.
    But...How do we...Encode images in our dataset as a Gaussian distribution? How do we get the source distribution? Is it just noise "tokenized" as a Gaussian distribution? Is it a constant? Is it conditional on the prompt, like a latent LLM embedding (this last one would be wild, I would imagine it would be more effective for the LLM embedding to condition the target distribution but I digress).
    I feel like I do understand the process here, but I have no idea how I'd go about implementing this.

    • @drdca8263
      @drdca8263 Місяць тому +6

      I believe the images are the points in R^d , not the distributions. For each point in the training set (each image in the training set), I think they associate a probability distribution which is a Gaussian with that image as is mean, and a very small standard deviation ,
      So, like, the distribution associated with a particular training image is “this starting image, plus a very small amount of noise”.

    • @u2b83
      @u2b83 Місяць тому +5

      @@drdca8263 Karpathy made an offhand remark a few years ago that for high-dimension points (R^d) you effectively can recover the exact point just by knowing the distribution.
      The "concentration of measure" phenomenon suggests that in high-dimensional spaces, points tend to be closer to the surface of a hypersphere than to its center. This implies that for a given distribution, many points will have similar distances from the mean of the distribution, making the space effectively "smaller" in some intuitive sense than one might expect. This phenomenon can sometimes allow for predictions or reconstructions of data points based on less information than would be necessary in lower dimensions.

    • @CalebCranney
      @CalebCranney 16 днів тому +1

      Here's a video that I thought did an excellent job explaining the concept of normalizing flow from the coding perspective: ua-cam.com/video/yxVcnuRrKqQ/v-deo.html. Then this one has some code that matches the diagrams in the Hans video: ua-cam.com/video/bu9WZ0RFG0U/v-deo.html. I just spent a number of hours trying to grasp the concept of flow, and these were what made it start to click for me.

  • @mikaellindhe
    @mikaellindhe Місяць тому

    "Hey why don't you just go toward the target" seems like a reasonable optimization

  • @kaikapioka9711
    @kaikapioka9711 Місяць тому

    Thx bud!

  • @timothy-ul9wp
    @timothy-ul9wp Місяць тому +1

    I wonder how “straight” the flow matching path during inference actually is, as the model doesn’t actually have information from previous steps
    I assume path will always point to the mean of all choices of x_0? (in Eq 23)

  • @TheRohr
    @TheRohr Місяць тому

    Thanks for the video! Two open questions: (1) We still need lots of data to get a good estimate of the probability distribution, right? How much should we expect and how should the dataset look like? Which is related to (2) What is actually meant with a data point or sample here? I understand for diffusion we have an image that becomes noisy. But what would be the 2-d gaussian for an RGB-image? Or is a sample here something different than an image?

  • @jabowery
    @jabowery Місяць тому +4

    UNCLE TED!!!

  • @fireinthehole2272
    @fireinthehole2272 Місяць тому +1

    Hi Kilcher, could you do "ReFT: Representation Finetuning for Language Models" it's really interesting.

  • @LouisChiaki
    @LouisChiaki Місяць тому

    Hmm... What is their choice of simga_min? Is the end conclusion simply that we should down scale the noise by (1 - sigma_min)?

  • @eriglac
    @eriglac 9 днів тому

    i’d like to join the saturday discussions. where do i find that info?

  • @punkdigerati
    @punkdigerati Місяць тому +1

    Like Atz and Jewel Kilcher?

  • @TiagoTiagoT
    @TiagoTiagoT Місяць тому

    Are they basically using the butterfly effect to disturb a standardized gaussian distribution into the desired result?

  • @SofieSimp
    @SofieSimp Місяць тому

    Do you have a record for your Stable Diffusion 3 presentation?

  • @vangos154
    @vangos154 Місяць тому +1

    One of the disadvantages of flow-based models is they require reversible layers, and thus they limit the DNN architectures that can be used. Isn't that a problem anymore?

    • @xandermasotto7541
      @xandermasotto7541 24 дні тому

      continuous normalizing flows are always invertable. It's just integrating an ODE forwards vs backwards

  • @SouravMazumdar-ki7vv
    @SouravMazumdar-ki7vv Місяць тому

    Can someone say which approch is begin discussed in 5:20

  • @SouravMazumdar-ki7vv
    @SouravMazumdar-ki7vv Місяць тому

    Can someone say which approach is begin discussed here 5:20

  • @ButtBandit9000
    @ButtBandit9000 Місяць тому

  • @mullachv
    @mullachv Місяць тому

    Can't be over prepared for the solar eclipse

  • @abhimanyu30hans
    @abhimanyu30hans Місяць тому

    For some reason I get "Unable to accept invite" from your discord invite link.

  • @Blooper1980
    @Blooper1980 Місяць тому

    I wish I could understand this.

  • @ScottzPlaylists
    @ScottzPlaylists Місяць тому +10

    @YannicKilcher
    What Hardware / Software are you using❓
    It seems to be a tablet and pen, but the details would be interesting..
    Would it be a Good Video on "Hot to Yannic a paper"❓ 😄 I'd watch it..
    Keep up the quality content❗

    • @Python_Scott
      @Python_Scott Місяць тому +6

      👍 I wondered the same... Make the video please. Or just answer here.

    • @AGIBreakout
      @AGIBreakout Місяць тому +6

      👍I'd watch that, and Thumbs it UP 👍 an odd number of times ❗

    • @NWONewsGod
      @NWONewsGod Місяць тому +5

      Me Too!!!!!!!

    • @NWONewsGod
      @NWONewsGod Місяць тому +4

      @@AGIBreakout Ha, Ha.... "odd number of times" would work too..!!

    • @NWONewsGod
      @NWONewsGod Місяць тому +3

      @@Python_Scott Something different and useful ! Yes, count me in. ☺

  • @andylo8149
    @andylo8149 Місяць тому +1

    Given that flow matching is completely deterministic I don't see how it is a generalisation of diffusion models. Sure, the (deterministic) probability flow induced by a diffusion model is deterministic and is a special kind of flow matching but the training objective of a diffusion model is inherently stochastic.
    I think diffusion models and flow matching are different classes of models.

    • @gooblepls3985
      @gooblepls3985 2 дні тому

      The stochasticity lives in the p(x0) of the expectation used as the loss: x0 is in the general case a randomly drawn sample from a tractable prior such as a Gaussian, just as in the diffusion literature (though the diffusion literature likes to call the data point x0, so the terminology is reversed there).

  • @nevokrien95
    @nevokrien95 Місяць тому +1

    Israel mentioned

  • @EsotericAI
    @EsotericAI Місяць тому +1

    Sorry but too many formulas in that paper ;P
    Anyway, I kind of lost track in the beginning what was going on, it started out nice with images and suddenly all was about points flowing.
    All going through my mind was ”What points are you talking about? Pixels?”
    Haha, guess I will have to watch this again when the state of my mind is more up for it :D

  • @tornikeonoprishvili6361
    @tornikeonoprishvili6361 Місяць тому +1

    Damn the paper is math-dense. Watching this I feel like I'm being dragged along by a professional sprinter that I just can't keep up with.

  • @ttul
    @ttul Місяць тому

    This one is going to take me several passes…

  • @BooleanDisorder
    @BooleanDisorder Місяць тому

    Obvious Labrador Retriever! 01:33

  • @JohnViguerie
    @JohnViguerie Місяць тому +2

    Very hand wavy

  • @MrNightLifeLover
    @MrNightLifeLover Місяць тому

    Published in 2022? Looks like I missed something :/

  • @wolpumba4099
    @wolpumba4099 Місяць тому +7

    *Abstract*
    This video delves into the technical aspects of flow matching for generative models, contrasting it with traditional diffusion models. It explores the concept of morphing probability distributions from a source to a target, emphasizing the significance of conditional flows and the role of vector fields in guiding this transformation. The video delves into the mathematical underpinnings of flow matching, introducing key objects such as probability density paths and time-dependent vector fields. It demonstrates how these concepts are operationalized through the conditional flow matching objective, allowing for the training of neural networks to predict vector fields for data points. Finally, the video explores specific instances of flow matching, including its relationship to diffusion models and the advantages of the optimal transport path for efficient and robust sampling.
    *Summary*
    *Introduction to Flow Matching*
    * 0:00 - Introduction to flow matching for generative models and its application in image generation, specifically text-to-image tasks.
    * 1:06 - Comparison of flow matching with traditional diffusion-based models used in image generation.
    * 2:29 - Explanation of the diffusion process as a multi-step process of image generation involving the gradual denoising of random noise to produce a target image.
    * 5:46 - Introduction to flow matching as a generalization of the diffusion process, where the focus shifts from defining a fixed noising process to directly learning the morphing of a source distribution into a target distribution.
    *Mathematical Framework*
    * 6:04 - Illustration of morphing a simple Gaussian distribution into a data distribution, highlighting the challenge of the unknown target distribution and the use of Gaussian mixture models as an approximation.
    * 10:52 - Introduction of the concept of a probability density path as a time-dependent function that defines the probability density at a given point in data space and time.
    * 13:41 - Explanation of the time-dependent vector field, denoted as V, which determines the direction and speed of movement for each point in the data space to achieve the desired distribution transformation.
    * 17:54 - Demonstration of how the flow, representing the path of each point along the vector field over time, is determined by the vector field and the initial starting point.
    *Learning the Flow*
    * 19:26 - Explanation of how the vector field is set to generate the probability density path by ensuring its flow satisfies a specific equation.
    * 20:31 - Introduction of the concept of regressing the flow, which involves training a neural network to predict the vector field for each given position and time.
    * 21:56 - Highlighting the ability to define probability density paths and vector fields in terms of individual samples, enabling the construction of conditional probability paths based on specific data points.
    * 26:16 - Demonstration of how marginalizing over conditional vector fields, weighted appropriately, can yield a total vector field that guides the transformation of the entire source distribution to the target distribution.
    *Conditional Flow Matching*
    * 29:40 - Acknowledging the intractability of directly computing the marginal probability path and vector field, leading to the introduction of the conditional flow matching objective.
    * 30:48 - Explanation of conditional flow matching, where flow matching is performed on individual samples by sampling a target data point and a corresponding source data point, and then regressing on the vector field associated with that specific sample path.
    * 33:30 - Introduction of the choice to construct probability paths as a series of normal distributions, with time-dependent mean and standard deviation functions, allowing for interpolation between the source and target distributions.
    *Optimal Transport and Diffusion Paths*
    * 38:43 - Exploration of special instances of Gaussian conditional probability paths, including the recovery of the diffusion objective by selecting specific mean and standard deviation functions.
    * 41:21 - Introduction of the optimal transport path, which involves a straight-line movement between the source and target samples, contrasting it with the curvy paths characteristic of diffusion models.
    * 44:08 - Visual comparison of the vector fields and sampling trajectories for diffusion and optimal transport paths, highlighting the efficiency and robustness of the optimal transport approach.
    *Conclusion*
    * 46:48 - Recap of the key differences between flow matching and diffusion models, emphasizing the flexibility and efficiency of flow matching in learning probability distribution transformations.
    * 47:56 - Reiteration of the process of using a learned vector field to move samples from the source distribution to the target distribution, achieving the desired transformation.
    * 53:37 - Explanation of how the knowledge about the data set is incorporated into the vector field predictor, enabling it to guide the flow of the entire source distribution to the target distribution.
    i used gemini 1.5 pro
    Token count
    12,628 / 1,048,576

  • @drdca8263
    @drdca8263 Місяць тому

    It seems to me like this kind of procedure should have many applications outside of images!... but I don’t know what?
    So, specifically, this should be applicable for when we want to learn a way to sample from a particular (but unknown) probability distribution. So, “generative AI” type stuff, I guess.
    Maybe quantizing like in language models might make this not as applicable to language models? Idk.
    What about world-model stuff? Or like, learning a policy?
    Hm, while that does involve selecting actions at random, those are often more discrete?
    Though, I guess not always. If one is doing a continuous control task thing, then I guess sampling from a continuous family of possible actions, may be the thing to do.
    Uh.
    Hm, so, if you started with a uniform distribution over a continuous family of actions, and wanted to evolve it towards a good distribution given the current scenario?
    Hm, no, I guess this probably isn’t especially applicable to that, because like, how do you obtain the samples from the target distribution?
    There must be *something* other than image generation, that this applies straightforwardly to...

  • @AndrewRafas
    @AndrewRafas Місяць тому

    At 20:53 what you talk and what you mark in the paper do not match. v() is the vector field, and not the other way around.

    • @YannicKilcher
      @YannicKilcher  Місяць тому

      u() is the actual vector field, v() is the neural network learned vector field

  • @eitanporat9892
    @eitanporat9892 Місяць тому +2

    I feel like this paper is a very convoluted and long-winded way of saying “move in straight lines” the mathematical part is obvious and not very interesting. Your explanation was great - I just dislike when people write math for the sake of writing math in ML paper.

  • @robmacl7
    @robmacl7 Місяць тому +2

    1: Probability path go Woom!
    2: Waifus
    3: profit

    • @drdca8263
      @drdca8263 Місяць тому

      Ugh, I wish “generating images of attractive women” wasn’t such a large fraction of the use of such models.
      I don’t think it is good for the person doing the viewing.
      Beetles and broken beer bottles, and all that.

  • @not_a_human_being
    @not_a_human_being Місяць тому

    Another attempts to sprinkle some "statistics and theory" on machine learning. This will fail.