Deep dive: model merging (part 1)

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 26

  • @ramsever5087
    @ramsever5087 27 днів тому

    Thank you for sharing the video!
    I believe an important point that may have been overlooked in the context of model merging is the necessity for the models to remain within the same optimization basin.
    For instance, if the models are fine-tuned for too long and diverge significantly, weight averaging could result in a collapse in performance instead of an improvement.

  • @arnabsinha2339
    @arnabsinha2339 Місяць тому

    Thanks Julien another good video explaining model merging strategies. It just blew my mind when I heard Maxime Labonne talk about it at a conference. I am guessing the hyperscalars and NVDA are not hyping up this technique as there is no need for accelerated compute. :) Is this still research? Have you seen practical implementation of this? Why is SLM more hyped than merging LLMs?
    Thank you for responding.

    • @juliensimonfr
      @juliensimonfr  28 днів тому

      Merging is still an active research field, but great production models are built with it, like Google Gemma2 and of course the Arcee models. Merging and SLMs are a great fit because we have so many models to choose from. LLMs are much much more expensive to build....

  • @melikanobakhtian6018
    @melikanobakhtian6018 5 місяців тому +2

    That was great and it helped me so much! Is there this possibility to have the presentation slides?

    • @juliensimonfr
      @juliensimonfr  4 місяці тому +1

      Hi, you can find the slides on Slideshare at fr.slideshare.net/slideshow/julien-simon-deep-dive-model-merging/270921708

  • @uygarkurtai
    @uygarkurtai 8 місяців тому +1

    Great viedo thank you! What I didn't grasp quite well is that, let's say I'm merging 2 models. One is trained on maths, other is trained on coding. Do we expect the merged model to perform high level in both tasks?

    • @juliensimonfr
      @juliensimonfr  8 місяців тому +1

      Yes, that's the expectation :)

  • @abse-mj8pw
    @abse-mj8pw 7 місяців тому +1

    I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?

    • @juliensimonfr
      @juliensimonfr  7 місяців тому

      Check out arcee.ai, their platform is definitely going that way.

    • @abse-mj8pw
      @abse-mj8pw 7 місяців тому

      @@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!

  • @ushiferreyra
    @ushiferreyra 3 місяці тому

    It wasn't clear to me if these methodologies first do some kind of sorting by weight and connectivity similarity accross layers. I can imagine that when merging models that were fine tuned from the same base checkpoint, that we can proceed without sorting. But if we trained two models from different random initializations, we would need to sort them by similarity previously.
    In any case, has there been any research into this?

    • @juliensimonfr
      @juliensimonfr  3 місяці тому

      Not sure what you mean by sorting. Most methods require that merged models share the same base architecture. Frankenmerging is different and you need to pick which layers come from which model.

    • @ushiferreyra
      @ushiferreyra 3 місяці тому

      @@juliensimonfr Sorry, I meant that not only should they share the same architecture, but they should also share the same initial pretraining and weights.
      Two models with the same architecture but trained from scratch with different initial randomized weights would not merge very well.
      Unless that is, and assuming that the patterns found are the same or very similar, some analysis is done on the layers of the two models to find similar weight and connectivity distributions scattered in different parts in each, then somehow ordering for similarity before merging.

    • @ushiferreyra
      @ushiferreyra 3 місяці тому

      @@juliensimonfr The question came up because I'm currently training two models from scratch, with different initial randomized weights, with the same X data but different Y data each and I'm curious about any research done in merging these two into a single model with both Ys as multi-headed output.

  • @kenchang3456
    @kenchang3456 9 місяців тому +5

    Thank you for this video. I gotta give this a try 🙂

    • @juliensimonfr
      @juliensimonfr  9 місяців тому

      You're welcome, and yes, you should :)

  • @SrikanthIyer
    @SrikanthIyer 9 місяців тому +1

    Thanks for the fantastic video. Loved how you simplified almost all the methods to merge the models!

  • @gnibu42
    @gnibu42 9 місяців тому

    Super intersting Julien, thanks a lot for sharing

  • @subhamkundu5043
    @subhamkundu5043 9 місяців тому

    Hey @Julien, great vide. I have a question regarding the scale factor in TIES method. How we determine the scale factor?

    • @juliensimonfr
      @juliensimonfr  9 місяців тому

      Thank you. It's up to you, depending on how much you want to "influence" the base model. mergekit has a parameter called 'density': fraction of weights in differences from the base model to retain. Example at github.com/arcee-ai/mergekit/blob/edd3817e4a470c7a959ef4c505f52a650a46ff07/examples/ties.yml

  • @AbdennacerAyeb
    @AbdennacerAyeb 9 місяців тому +12

    This is a random comment to boost your channel. Thank you.

  • @philtoa334
    @philtoa334 3 місяці тому

    Nice.