Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)

1littlecoder

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 жов 2024

КОМЕНТАРІ • 79

@robxmccarthy Рік тому ⁺³⁰
Would love to see more videos on preparing training data for domains.
Classification, summarization, sentiment analysis all seem clear enough. But more complex tasks are a bit if a mystery.
For example, training question answering and other instruction for a specific knowledge base like a code library, book, etc. How can the original content from the source be used in the response? Is fine tuning a use case, or can we only use embeddings for domain specific knowledge?
@gameplaychulangado494 Рік тому ⁺¹
I Search 🔍 for the same thing
@takuyayzu Рік тому
Yes!! I'd love to see a similar thing as well. @robxmccarthy, have you been able to find anything in the meantime?
@victor75570 Рік тому ⁺¹
I'm trying to fine tune a model for summarization tasks. Not quite sure how to use the question answer pairs to fine tune the model. Is there any documentation on how to do it?
@davisonyeoguzoro9232 8 місяців тому
I’m working on something similar
@tarun4705 Рік тому ⁺¹⁰
Very informative as always.
@1littlecoder Рік тому
Thank you Tarun
@vigneshprajapati1565 Рік тому
Thanks for the summarised video😊
@alealejandroooooo Рік тому ⁺⁶
Little correction: the target_modules in the LoraConfig should include the dense layers as well, as per paper.
@1littlecoder Рік тому ⁺²
Oooh. Thanks for the information
@HashtagTiluda 4 місяці тому
I have been following your channel for a while now. All your videos are informative, there's no clickbait material. I am a UA-cam myself (food vlogger, not tech vlogger LoL) and I know the condition of Indian UA-cam nowadays. Comparing to that your videos are gems in themselves. About this video, can you tell us approximately how much time does it take to complete the fine tuning process (approximately)? Will it reduce if I subscribe to Colab Pro? Are there any other free or relatively cheaper alternatives than Colab?
@EdwinFairchild Рік тому
Does this all need to happen in the cloud. I understand the training part but what if I want to use it locally. Can I load this fine tuned model into gpt4all for example.? And what constitutes a “dataset” ? Can I just give it a bunch of C source files and then train it and then ask it questions about the code"? Sorry if my questions are ignorant , my background is in EE and nothing to do with AI
@marcelgeller6859 Рік тому ⁺⁴
Is it possible to use QLoRA to fine-tune LLMs on labelled data? My downstream task is supervised text classification
@kazihiseguy-fernand4637 Рік тому ⁺¹
sounds like overkill, supervised text classification shouldnt require such big model no ?
@ohnoman5364 Рік тому ⁺⁴
Hey what is the best Server site to Host your LLM?i mean for massiv use Not localy
@inplainview1 Рік тому ⁺³
Brilliant. Thank you.
@1littlecoder Рік тому ⁺¹
You're welcome
@weizhili6732 Рік тому ⁺⁶
I have a question: the dateset has features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],how does the trainer know which feature to select for training (fine-tune)?
@takuyayzu Рік тому
I'm also wondering the same thing! Have you managed to find an answer?
@saivivek5563 Рік тому ⁺⁶
please finetune using any custom dataset like question answering and make a video that will be helpful thanks
@StijnSmits Рік тому ⁺²
Interested as well.
@incameet Рік тому ⁺²
What is the max parameter size which a 16 GB GPU can train up to with QLORA?
@DKUnhinged Рік тому ⁺⁴
The Last Ben repository is not working on free Collab anymore, can you create a video on alternatives?
@1littlecoder Рік тому ⁺¹
Let me check
@ifeanyiidiaye1889 Рік тому ⁺¹
How do you correct "CUDA out of memory" error in free version of Google Colab? I get that when finetuning an LLM
@lucyledezma709 Рік тому
Quantize the model for 4 bits,
@nattyzaddy6555 Рік тому ⁺²
Why would I bother fine tuning a giant 65B model that I wouldnt be able to run anyways? So that I can quantize it or something? And that makes me wonder, should you fine tune then quantize, or quantize then fine tune?
@JurekOK Рік тому
yes, the point is that you quantize 65B model down to 4-bit so that it fits on a ~16GB of a 24GB GPU - and you have some space left over for uptraining or batch evaluation.
@nattyzaddy6555 Рік тому
@@JurekOK Honestly should I fine tune my models it seems like a lot of work when people already probably have made really good fine tunes. Especially for things like coding
@JurekOK Рік тому
@@nattyzaddy6555 surely enough, if you feel that you are not going to gain anything by your actions, then it's smarter to not do it. I know that I have a couple of applications for tuned models.
@aiplaygroundchannel Рік тому ⁺⁴
Do you have any techinique to understand which target modules should I select for a custom model?
As you said, in this case it was target_modules=["query_key_value"], but if it was another ....
@RajivKumar-nv2gj 5 місяців тому
You can simply execute print(model) after the base model is downloaded .
You will find the names of the sets of parameters in the printed model.
In the model shown in the video , you will find "query_key_value" along with other names , say "fnn" .
This other name can be put in the list alongwith "query_key_value" and given to argument target_modules .
That way, the config will have the information of the trainable parameters.
@nikandr8685 6 місяців тому
Could you explain how I can retrain the new model? Also How to use in further scripts the new trained model?
@aishwaryap.s.v.s7387 5 місяців тому
hi how can we the original implementation behind any library .i know its a basic question how do we check .like qlora or lora etc?ccan u suggest please
@mouadjer3289 9 місяців тому
You are the best Thank you Bro
@LEGENDSNEVERDIE720 Рік тому ⁺¹
can you tell how to train using like a csv file or something from google drive?
@SreshthaBhatt 9 місяців тому
please let me know more about target_modules and how can we check this for any pretained model
@aleefbilal6211 Рік тому
Is it same for GODAL model? And can you explicitly make a video on fine tunning a transformer model.
@koolkid6346 Рік тому
i am getting error that model .to not supporting 4 bit or 8 bit, use the model as it is? any help. gettint this is second cell where we define model
@GAMINGZONE87888 3 місяці тому
Bro when I execute trainer.evaluate() it start 1125 epochs but it it reach to 200 gpu is out of memory aim is to find accuracy f1 what I do pls guide sir
@swannschilling474 Рік тому ⁺¹
This is beautiful ❤
@1littlecoder Рік тому
Thanks
@faisalali5025 Рік тому ⁺¹
Hello sir. I am completely newbke plz make videos on how we can run AIs locally for from (noob guide). Also tell computing source requirements for Corresponding AI. Thanks.
@sergiomanuel2206 Рік тому
Why the total amout of parameters is 10b and the model name says 20b?
@sridharbajpai420 Рік тому
Hey can I connyor talk with u...i wanna fine tune falcon 40B ..what is required GPUs any idea?
@vichitra-yt Рік тому ⁺¹
I know what Lora is. I learnt what Qlora is now.
@1littlecoder Рік тому
🚀
@skyr-inf 5 місяців тому
0.08 %, not 8 %. This is directly given in percentage, it does not sum up to 1, but 100. But a great video!
@MahmoudNasser-cu5hd Рік тому
Dose this work for other languages ? if I want for example to fine tune it for Arabic language tasks like summarization or classification would this give me good results?
@aniketlavand9101 Рік тому ⁺¹
does it support llama 2 as well? please put out a video on same
@echofloripa Рік тому
I'd like to know that as well
@AihongWen Рік тому
Can u share the colab link again? Could not find in your comments
@user-wr4yl7tx3w Рік тому
can you recommend a primer to understand LoRA?
@VinnieDreher Рік тому ⁺²
Can I apply what you did with falcon 40b?
@skibidiBoopBoop 11 місяців тому
Did you tried with falcon using same code?
@ixwix Рік тому ⁺¹²
I think it's actually .08%. Not 8% as you said
@1littlecoder Рік тому ⁺¹
Oh. I need to check again.
@cnotation Рік тому ⁺¹
0.08 is 8%
@ixwix Рік тому ⁺³
Yes but in the video it says 8,650,752 trainable parameters to 10,597,552,128 parameters overall. That would give a ratio of 0.000816297188. Or about 0.08%. Which is even more impressive. But yeah it's an easy one to confuse. Or correct me if I am wrong, but I don't think so
@sergiomanuel2206 Рік тому
Hasn't the model 20b parameters? Why the log reports 10b?
@user-wr4yl7tx3w Рік тому ⁺²
Does that mean it can be trained on Colab?
@1littlecoder Рік тому ⁺⁴
Depends on the size of the model. But my demonstration is on free colab. Fine tuning 20B parameter model
@electric_mind Рік тому
We can fine tune a 20B param model on 16GB VRAM 😅😰😰😰😰😰
@AbhishekKumar-qv3yj Рік тому
We are actually training 0.08% of the params.
@miketyrrell9340 Рік тому ⁺³
It would be better if you used a different dataset - something instruction based. This dataset is kind of useless. Explanation on using your own dataset would make this video more useful.
@1littlecoder Рік тому
Great suggestion!
@StijnSmits Рік тому ⁺¹
@@1littlecoder Great work. Is there an update available for input-output finetuning?
@areefa6268 Рік тому
Yes can you do that please
@areefa6268 Рік тому
@@1littlecoder can you do this please
@1littlecoder Рік тому ⁺¹
@@areefa6268 Already done it ua-cam.com/video/2PlPqSc3jM0/v-deo.html
@user-wr4yl7tx3w Рік тому
what are your thoughts about getting a PhD?
@AjibolaEmmanuel 6 місяців тому
Its useless😂
@stephenf3838 Рік тому
Your colab doesn't work.
@rajkumarjayaprakash9145 Рік тому
could you make a tutorial to generate dataset in the form of csv or json using llms using python script
@StijnSmits Рік тому ⁺¹
Interested as well in a tutorial with json input-output CausalLM dataset.
@user-wr4yl7tx3w Рік тому
what is the difference between system ram and GPU ram?
@markdaga1711 Рік тому ⁺¹
System ram communicates with the CPU. GPU RAM is built onto the graphics card and is essentially separate from the rest of the system. Some newer systems, like the Xbox series and ps5 consoles, and Apple Silicon (M1 & M2 systems) have a shared memory pool, which can be used for either function.
There is a strong likelihood that if you aren’t already aware of your GPU’s VRAM, your system uses an integrated gpu that’s built into the CPU, and won’t be sufficient for running a local LLM. However, if your CPU has 6 or 8+ cores, you may be able to use koboldcpp to run GGML models on your CPU, at a reduced response time.
@Deshwal.mahesh Рік тому
@1littlecoder: It's giving you results and it is supposed to give you this exactly because it's a sentence completion model. so if you input: "Elon Musk" -> It is trying to complete the sentence. thank you :)
@incameet Рік тому
Can QLoRA be applied to other non-LLM models such as Swin Transformer?

Наступне

Автоматичне відтворення