What is Low-Rank Adaptation (LoRA) | explained by the inventor

Edward Hu

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 чер 2024
Low-rank adaptation, or LoRA, is one of the most popular methods for customizing large AI models. Why was it invented? What is it? Why should I consider using it? Find out the answers in this video from the inventor of LoRA.
*Like, subscribe, and share if you find this video valuable!*
Paper: arxiv.org/abs/2106.09685
Repo: github.com/microsoft/lora
0:00 - Intro
0:34 - How we came up with LoRA
1:33 - What is LoRA?
3:14 - How to choose the rank r?
3:57 - Does LoRA work for my model architecture?
4:48 - Benefits of using LoRA
6:03 - Engineering ideas enabled by LoRA
Training or serving multiple LoRA modules in a single batch with the following community implementations:
github.com/S-LoRA/S-LoRA
github.com/sabetAI/BLoRA
github.com/TUDB-Labs/multi-lo...
Follow me on Twitter:
/ edwardjhu
🙏Gratitude:
Super grateful to my coauthors on the LoRA paper: Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.
Наука та технологія

КОМЕНТАРІ • 46

@jett_royce Місяць тому ⁺¹
LoRA is such an unlock for resource-constrained creators looking to leverage models for specific domains. Thank you for this amazing work!
@redthunder6183 4 дні тому
Thank you so much for explaining this clearly, everything I watch on UA-cam is made by ppl who have no idea how the tech works, or don’t even know how to code outside of copy/paste/change inputs, but pretend like they do.
Furthermore, there’s just so many useless libraries around LLMs that ppl claim are the next big thing, but in reality, they create code bloat, introduce more unknowns, make the code harder to work with since u now gotta learn the library, and don’t work as well as if u just wrote everything urself.
@seblund 5 місяців тому ⁺⁷
Your videos are really of the highest quality Edward! Thanks for posting these quick overviews
@vuluu4942 5 місяців тому
Thanks for the explanation Edward, very informative!
@unclecode 4 місяці тому ⁺²
First, thank you for bringing LORA to life, and secondly, thank you for the humble explanation. I am working on a new startup that makes sense thanks to LORA, especially the hierarchical structure you just explained. Thanks again. I subscribed to the channel and am following on X.
@Ejun0 5 місяців тому ⁺²
Thank you for expanding on you Paper! Would love to see your thoughts on QLoRA as well!
@eoghanf 5 місяців тому ⁺¹
Excellent talk. Thank you.
@stephanembatchou5300 5 місяців тому ⁺¹
This is excellent. Thanks!
@thegrumpydeveloper 4 місяці тому ⁺⁶
Awesome, thanks for explaining. Can’t imagine what would have happened if this technique hadn’t been created. Full training, huge models for just one concept, not being able to use multiple styles together. Saved time, saved gpu training and even saved energy. Such a big breakthrough and appreciate the explanation.
@giyaseddinbayrak5828 5 місяців тому
You're a king. Keep up these videos 👍
@ph10m Місяць тому
This was a great intuitive explanation of it. I wish more people took the adaptability of lora seriously, though: everyone (and their dog) upload full models after doing small fine-tunes *with* lora, instead of just the adapters. Not only would it help experimentation, but time too, as we have to download unnecessary base models over and over...
@kaaloo 5 місяців тому ⁺¹
Thank you so much for your very clear and to the point presentation🎉❤ (And of course for all the hard work to develop this technique) 🙏
@hushhmanish 5 місяців тому
a+ Edward thank you for the good work and content
@jannijp7 4 місяці тому ⁺⁴
Thank you and your team for the research. It saved our ML project in university, because fine-tuning the SAM model with it's billion parameters was just not possible on consumer GPUs....but with loRA, no problem (based on the MeLo Repo).
I just have a bit of trouble understanding exactly the impact of the ranks. Example: We only want to segment one specific object when we fine tune SAM. With rank 2, there are better results than with rank 512. But why exactly? Is is because a lower rank causes the model be trained to a more specific task? (but also faster overfitting?)
@frankiethou7366 4 місяці тому
interesting question😮
@user-pc1vi4pq4d 4 місяці тому
Thanks Edwar
@TIENTI0000 5 місяців тому
thanks
@usercurious 3 місяці тому
Respect
@aMclub11 5 місяців тому ⁺¹
Edward, can you tell me which side of AI engineering is better, fine tuning a model (fine tuning techniques)l or creating a model with large code?
@todamont 11 годин тому
Very cool. I know some of these words.
@eapresents6061 5 місяців тому
awesome video. Highly appreciated. Just a side note, maybe slightly better microphone might make things sound a little better :)
@edwardjhu 5 місяців тому
Noted! I have a RODE VideoMic but can definitely clean better
@ickorling7328 4 місяці тому ⁺¹
Hi, just a curious thought. I've done some reading on MoE-Mamba and Vision Mamba, particularly of note is how MoE Mamba was solved to interleave MoE function layer with Mamba LLM specified expert layer. Then how vision mamba demonstrateds the spatial awareness of data correlations across gaps of contextually irrelvant data due to its nature being a selective state space model (SSM). An RNN is a type of SSM, but a better implementation of that is S6 (Mamba) which has replaced convolution with an algorithm in creating an efficent selective attention mechnism valid for LLM application. I've heard that it shows a lot of base similarity with transformer architecture.
I'm wondering... what ifff. The Lora had an offest like [add row] and we filled it with noise but we prompted an embedded Mamba layer to look at [select topics] the parameters of the new data and compare with base model on the GPU mostly because Mamba does that, and mamba has to edit the noise layer within an offest, an added dimension, to become a bridge of weights. So the added layer is like a context aware translation that finds and relates sparse clusters or motifs in the paramterized dataset that have relations but only if un-convoluted slightly and thus the ofset injection layer optimized by mamba is a type of rosetta stone between knowledge a and kniwledge b. But its a layer or set of layers in the deep neural network created by a non-neural network AI entity thats actually neural assembly based-- aha!! Atleast to me-- Aha☆
The deep layers are so dense in dimensions really, that editing a single one is like managing a neuron in a feed forward neuron network. Well mamba is more like an assembly for its bi-directional contextal awareness and likened to hebbian placticity, neurons that wire together fire together.
So perhaps to cause the alignment to happen: we have the S6 Layer adjust the noise in the offset injected layer in respince to the alignment or misalignment of network responses to stimulus that both models should definetly know, then progress into more topics to imprint on the translator to encorage room to grow.
Then we should he able to iterate forward and skip a layer or two, then offset and inject again, so this time it has exposure from a more advanced lens, so unpacking and repacking the lora so it affects the base layer with pauses in its feed forward which re-routes to this offset injected lora layers which lets the Lora show the base model how to interface with its concepts better.
The mamba should be able to perceive the clusters of real information as patterns its algorithm picks up from tensors representing tokens. And if it always only thinks in tensors, and mamba is a selective state space ssm akin to rnn and convolution, then to selectively attend to sparesly connected examples in tensors is therefore likely to be natural to Mamba. So, it should help with fine-tuning loras in this way. I hope! 🎉
Thank you dude and your team, and the open-source community for uplifting the entire collective, I hope my big thinker imaginative approach can contribute to sometbing useful in your own minds as productive and innovative as you are. From one futurist to many others-- God bless!
@ickorling7328 3 місяці тому
Hi, this a month later. I suppose I had something there, not sure. I think I misunderstood somethings before. Technically close enough.
The Mamba S6 thinks with many tokens at once, and theres even a tokenless mamba that looks at machine code forms of data, yet still capable of speech.
What I suggested prior was this (shorter, for tl;dr):
Between each group of steps considered a cycle of inference ticks, we add atleast one layer that runs to inject mamba shaping over and space-time mapped data where selective regions and concepts can be held on memory to enhance inference quality while super efficent parallel structure puts it in GOD teir compute costs compared to ALL other models over ling contaxts. Being that Mamba has a linear compute cost, all others have not exponential, but quadratic costs.
So, the Mamaba makes image generation smarer by adding little suggestion imprints like watermarks or tags which invites one of 1000's of LoRA models to help in just one area of data within any n dimension of space-time relationship data.
This could be monumental actually... imagine if we compited it all at machine code level with pre-compiled real code, plus Mamba tokenless layers interleaved with any other model. A stack of layers in a cycle even.
All thought for the ai could have sound and video and speech for its thoughts, to represent a total immersion for the AI, or neccisarily an immersive user experience for a 3D + time = 4D spatial mapped voxel world where voxels are tagged by mamba, and tags load and call LoRA'a into action from the SSD, onto memory.
I'm talking holo-deck bro. Like, forget video, that's inherent, we can use this Mamba stuff live to organize world models with any data set which contains a world model or otherwise playing field.
If we can interleave the MoE layer for Mamba MoE, then were also effectively making mamba an interleaved layer, why not just stack it anywhere? Stack it with transformers, stack it without, use Mamba to learn how to tag in a database for a voxel game engine, make tags call LoRA's that affect frame generation when that tag is present and mix with another lora according to tags in that pixel's overlap / transparency showing ontop. The lora only affects that pixel 'tag-hue' group.
So, real-time 360 image generation where every hue group is prompted a certain procedural way with LoRA'a to adapt and lower inference cost raisning quality, then it's otherwise like real-time inference for painting image-to-image ai with frame upscaling. Then, you can probably call on MP4 codec technologies to interpolate frames with commonly available video encoding hardware. And, you can probably speed things up with FSR by AMD, again, for commonly available 'highly parallel' hardware (gpu's or igpu for the power savers).
The whole 360% view responds to a VR headset location in a videogame made with a voxel engine. The native graphics *could* exist still, but only as far as the 'tag-hue' generates (potatoe mode vfx). Each frame is a 360° potential, so extra frames should be generated for smoothing when fast head movents occur. It only has actually work to generate the user’s vision, maybe just outside there. Being spatially similar, the AI approach will likely let us borrow from the current frame a ton to persitantly maintain awareness of frames most likely to be generated based on trajectories and event simulation cues, and if a slight deviation occures its only a slight adaptation of the progress that was pre-computed-- since its not technically just frame generation, but 4D voxels informing an ai to understand the area based on tags, if the tags move, its a simple translation relative to others in a statistically simulated senario, not simply video generared in one batch or even ine step (idk how sora does it, something similar).
The holo deck, it's coming.
@raminziaei6411 5 місяців тому ⁺¹
Great video! So, your last point is we can fine tune a previously finetuned model with LoRA? What about catastrophic forgetting? Isn't that an issue?
@raminziaei6411 5 місяців тому
Another question would be can we finetune (with LoRA) a model that has already been fully finetuned (withoud LoRA, for example Llama2-chat)?
@user-wp8yx 2 місяці тому
Trying to teach a mistral7b model sanskrit. It already has Sanskrit characters as tokens and is the best performing 7b llama based model I can find.
You seem like a knowledgable person in this area. Do you have any advice for lora? Rank, alpha? How about targeting of q,k,v? Other strategies?
I have about 3gb of datasets that range from translations, corpus, to data tables. I wonder if I should use different strategies for different data types?
@user-de8ro3tk5f 4 місяці тому ⁺¹
I am still a bit concerned about the continual learning effect LoRA may cause in terms of catastrophic forgetting, as you are playing around with the internal feature representations of the model by adding new info, it might happen that all info added at each layer flows in the form of noise that accumulates changing the info in every layer until it reaches the output, did you tested when developing the technique such effects?, how far were the output vectors of the adapted model from the original? (maybe fix it with knowledge distillation could be an expensive option) it is very difficult to find a good paper on this.
@definty 3 місяці тому
Surely the point of training is to turn the noise into a function that represents the domain of that data, also you only train 1 layer of the feed forward network and/or the self attention mechanism.
But I think the reason he mentions that you can remove new lora additions to get the base model back.
@MrZidanegenius 5 місяців тому
Can the same be done for teaching robots to learn specific tasks. For example if a robot learns how to pick up a ball, pick up an apple might require fine-tuning the model for further use?
@edwardjhu 5 місяців тому
In principle, yes! There are foundation models trained on robotics data.
@MrZidanegenius 5 місяців тому
Can u provide any foundational model ?
@user-im2nt5dg1t 3 місяці тому
第1000个订阅 The 1000th subscription
@scottthornton4220 4 місяці тому
I have a simple question: is it feasible or beneficial to use low-rank approximations on the various Q, K, and V matrices? I'm guess not since no one appears to be doing it.
@edwardjhu 4 місяці тому ⁺¹
It is feasible, and we tried it. It wasn't beneficial in our case because of rapid model degradation as the rank decreases and the decreased parallelism.
@scottthornton4220 4 місяці тому
@@edwardjhuThanks for the reply. For the reduced parallelism (I assume having to do 2 GEMV's in serial A*B*v), you could always store the reduced weights on disk, and then compute the simulated full rank matrix before inferencing. Love the video, BTW; it adds context to the paper.
@edwardjhu 4 місяці тому
@@scottthornton4220 if we are talking about compressing the base model, the goal would be to reduce inference, not storage, cost because there's only one copy to store.
@guillermojmontenegro95 4 місяці тому ⁺¹
It`s posible to merge LoRA with MOE?(A MOE of LoRAs).
I think you could have a bunch of experts in LoRAs and switch between them. It would require less memory and will have a faster inference
@julianschmidt590 2 місяці тому ⁺¹
Wow, that's an outstanding idea! XD
@BruceChar007 Місяць тому
能不能继续在微调后的LoRA模型上面微调，效果怎么样
@renanmonteirobarbosa8129 4 місяці тому
Application of Quasi-orthogonal dimensions to Large Models is something that came at the same time as LORA but did not gain popularity unfortunately.
@muralikrishna-ux6mz 5 місяців тому
I dont want to study anymore in my college i want to do research in this field im addicted to it.But getting out of college is really tough so Edward if you have any choices please help me i want to just immerse in this field of AI and will learn anything faster so please consider me.My college really worst college which takes my time and wont explain anything by them i can learn on my own i want to work with tech memberes like you.So please help me man i request you .I don't want any money i just want to learn and work with this AI.I'm regretting every day about this problem.Time is more valuable than any other in this world so please make use of it guys.
@edwardjhu 5 місяців тому ⁺⁵
There is a lot of great content online for self-studying AI. I started with Andrew Ng's deep learning course on Coursera almost a decade ago!
@FailingProject185 5 місяців тому ⁺¹
@@edwardjhu Appreciate your patience in replying to all these questions.
If possible make a roadmap to reach your level. Assume someone is out of highschool and he wants to know everything you know. Make an hour or more long roadmap. It'll help millions of us to become advanced. And actually understand what your videos.
There's another guy called Umar Jamil, he's also goat like you. Builts stuff from scratch, has in depth knowledge. I was asking him. Now I'm asking you.
Oh you've started decade ago. You must make 10hour+ video on your roadmap lol
@scottthornton4220 4 місяці тому ⁺²
Linear Algebra -- basis vectors, linear transformations, eigendecomposition, row space, column space, singular value decomposition (applicable to this video).

Наступне

Автоматичне відтворення

LoRA explained (and a bit about precision and quantization)