Deep dive: model merging, part 2

Deep dive - Better Attention layers for Transformer models

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

TOY STORY IN BRAWL STARS!?

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Deep dive: model merging (part 1)

Julien Simon

Переглядів 27 709

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 23 гру 2024

КОМЕНТАРІ • 26

@ramsever5087 27 днів тому
Thank you for sharing the video!
I believe an important point that may have been overlooked in the context of model merging is the necessity for the models to remain within the same optimization basin.
For instance, if the models are fine-tuned for too long and diverge significantly, weight averaging could result in a collapse in performance instead of an improvement.
@arnabsinha2339 Місяць тому
Thanks Julien another good video explaining model merging strategies. It just blew my mind when I heard Maxime Labonne talk about it at a conference. I am guessing the hyperscalars and NVDA are not hyping up this technique as there is no need for accelerated compute. :) Is this still research? Have you seen practical implementation of this? Why is SLM more hyped than merging LLMs?
Thank you for responding.
@juliensimonfr 28 днів тому
Merging is still an active research field, but great production models are built with it, like Google Gemma2 and of course the Arcee models. Merging and SLMs are a great fit because we have so many models to choose from. LLMs are much much more expensive to build....
@melikanobakhtian6018 5 місяців тому ⁺²
That was great and it helped me so much! Is there this possibility to have the presentation slides?
@juliensimonfr 4 місяці тому ⁺¹
Hi, you can find the slides on Slideshare at fr.slideshare.net/slideshow/julien-simon-deep-dive-model-merging/270921708
@uygarkurtai 8 місяців тому ⁺¹
Great viedo thank you! What I didn't grasp quite well is that, let's say I'm merging 2 models. One is trained on maths, other is trained on coding. Do we expect the merged model to perform high level in both tasks?
@juliensimonfr 8 місяців тому ⁺¹
Yes, that's the expectation :)
@abse-mj8pw 7 місяців тому ⁺¹
I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?
@juliensimonfr 7 місяців тому
Check out arcee.ai, their platform is definitely going that way.
@abse-mj8pw 7 місяців тому
@@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!
@ushiferreyra 3 місяці тому
It wasn't clear to me if these methodologies first do some kind of sorting by weight and connectivity similarity accross layers. I can imagine that when merging models that were fine tuned from the same base checkpoint, that we can proceed without sorting. But if we trained two models from different random initializations, we would need to sort them by similarity previously.
In any case, has there been any research into this?
@juliensimonfr 3 місяці тому
Not sure what you mean by sorting. Most methods require that merged models share the same base architecture. Frankenmerging is different and you need to pick which layers come from which model.
@ushiferreyra 3 місяці тому
@@juliensimonfr Sorry, I meant that not only should they share the same architecture, but they should also share the same initial pretraining and weights.
Two models with the same architecture but trained from scratch with different initial randomized weights would not merge very well.
Unless that is, and assuming that the patterns found are the same or very similar, some analysis is done on the layers of the two models to find similar weight and connectivity distributions scattered in different parts in each, then somehow ordering for similarity before merging.
@ushiferreyra 3 місяці тому
@@juliensimonfr The question came up because I'm currently training two models from scratch, with different initial randomized weights, with the same X data but different Y data each and I'm curious about any research done in merging these two into a single model with both Ys as multi-headed output.
@kenchang3456 9 місяців тому ⁺⁵
Thank you for this video. I gotta give this a try 🙂
@juliensimonfr 9 місяців тому
You're welcome, and yes, you should :)
@SrikanthIyer 9 місяців тому ⁺¹
Thanks for the fantastic video. Loved how you simplified almost all the methods to merge the models!
@juliensimonfr 9 місяців тому ⁺¹
Glad it was helpful!
@gnibu42 9 місяців тому
Super intersting Julien, thanks a lot for sharing
@juliensimonfr 9 місяців тому
Glad you enjoyed it
@subhamkundu5043 9 місяців тому
Hey @Julien, great vide. I have a question regarding the scale factor in TIES method. How we determine the scale factor?
@juliensimonfr 9 місяців тому
Thank you. It's up to you, depending on how much you want to "influence" the base model. mergekit has a parameter called 'density': fraction of weights in differences from the base model to retain. Example at github.com/arcee-ai/mergekit/blob/edd3817e4a470c7a959ef4c505f52a650a46ff07/examples/ties.yml
@AbdennacerAyeb 9 місяців тому ⁺¹²
This is a random comment to boost your channel. Thank you.
@juliensimonfr 9 місяців тому ⁺⁵
LOL, thank you.
@philtoa334 3 місяці тому
Nice.
@juliensimonfr 3 місяці тому
Thanks!

Наступне

Автоматичне відтворення

Deep dive: model merging, part 2

Deep dive: model merging, part 2

Deep dive - Better Attention layers for Transformer models

Deep dive - Better Attention layers for Transformer models

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рабочий способ бросить вредную привычку

Рабочий способ бросить вредную привычку

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

Amazon Bedrock vs Amazon SageMaker Jumpstart

Amazon Bedrock vs Amazon SageMaker Jumpstart

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Arcee.ai - Tailoring Small Language Models for Enterprise Use Cases (09/2024)

Arcee.ai - Tailoring Small Language Models for Enterprise Use Cases (09/2024)

Merge LLMs to Make Best Performing AI Model

Merge LLMs to Make Best Performing AI Model

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V3 I Retrieval Augmented Language Models

Deep Dive: Quantizing Large Language Models, part 1

Deep Dive: Quantizing Large Language Models, part 1

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

Правильный подход к детям

Правильный подход к детям

Wall Rebound Challenge 🙈😱

Wall Rebound Challenge 🙈😱

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts