Would love to see more videos on preparing training data for domains. Classification, summarization, sentiment analysis all seem clear enough. But more complex tasks are a bit if a mystery. For example, training question answering and other instruction for a specific knowledge base like a code library, book, etc. How can the original content from the source be used in the response? Is fine tuning a use case, or can we only use embeddings for domain specific knowledge?
I'm trying to fine tune a model for summarization tasks. Not quite sure how to use the question answer pairs to fine tune the model. Is there any documentation on how to do it?
I have been following your channel for a while now. All your videos are informative, there's no clickbait material. I am a UA-cam myself (food vlogger, not tech vlogger LoL) and I know the condition of Indian UA-cam nowadays. Comparing to that your videos are gems in themselves. About this video, can you tell us approximately how much time does it take to complete the fine tuning process (approximately)? Will it reduce if I subscribe to Colab Pro? Are there any other free or relatively cheaper alternatives than Colab?
Does this all need to happen in the cloud. I understand the training part but what if I want to use it locally. Can I load this fine tuned model into gpt4all for example.? And what constitutes a “dataset” ? Can I just give it a bunch of C source files and then train it and then ask it questions about the code"? Sorry if my questions are ignorant , my background is in EE and nothing to do with AI
I have a question: the dateset has features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],how does the trainer know which feature to select for training (fine-tune)?
Why would I bother fine tuning a giant 65B model that I wouldnt be able to run anyways? So that I can quantize it or something? And that makes me wonder, should you fine tune then quantize, or quantize then fine tune?
yes, the point is that you quantize 65B model down to 4-bit so that it fits on a ~16GB of a 24GB GPU - and you have some space left over for uptraining or batch evaluation.
@@JurekOK Honestly should I fine tune my models it seems like a lot of work when people already probably have made really good fine tunes. Especially for things like coding
@@nattyzaddy6555 surely enough, if you feel that you are not going to gain anything by your actions, then it's smarter to not do it. I know that I have a couple of applications for tuned models.
Do you have any techinique to understand which target modules should I select for a custom model? As you said, in this case it was target_modules=["query_key_value"], but if it was another ....
You can simply execute print(model) after the base model is downloaded . You will find the names of the sets of parameters in the printed model. In the model shown in the video , you will find "query_key_value" along with other names , say "fnn" . This other name can be put in the list alongwith "query_key_value" and given to argument target_modules . That way, the config will have the information of the trainable parameters.
hi how can we the original implementation behind any library .i know its a basic question how do we check .like qlora or lora etc?ccan u suggest please
Bro when I execute trainer.evaluate() it start 1125 epochs but it it reach to 200 gpu is out of memory aim is to find accuracy f1 what I do pls guide sir
Hello sir. I am completely newbke plz make videos on how we can run AIs locally for from (noob guide). Also tell computing source requirements for Corresponding AI. Thanks.
Dose this work for other languages ? if I want for example to fine tune it for Arabic language tasks like summarization or classification would this give me good results?
Yes but in the video it says 8,650,752 trainable parameters to 10,597,552,128 parameters overall. That would give a ratio of 0.000816297188. Or about 0.08%. Which is even more impressive. But yeah it's an easy one to confuse. Or correct me if I am wrong, but I don't think so
It would be better if you used a different dataset - something instruction based. This dataset is kind of useless. Explanation on using your own dataset would make this video more useful.
System ram communicates with the CPU. GPU RAM is built onto the graphics card and is essentially separate from the rest of the system. Some newer systems, like the Xbox series and ps5 consoles, and Apple Silicon (M1 & M2 systems) have a shared memory pool, which can be used for either function. There is a strong likelihood that if you aren’t already aware of your GPU’s VRAM, your system uses an integrated gpu that’s built into the CPU, and won’t be sufficient for running a local LLM. However, if your CPU has 6 or 8+ cores, you may be able to use koboldcpp to run GGML models on your CPU, at a reduced response time.
@1littlecoder: It's giving you results and it is supposed to give you this exactly because it's a sentence completion model. so if you input: "Elon Musk" -> It is trying to complete the sentence. thank you :)
Would love to see more videos on preparing training data for domains.
Classification, summarization, sentiment analysis all seem clear enough. But more complex tasks are a bit if a mystery.
For example, training question answering and other instruction for a specific knowledge base like a code library, book, etc. How can the original content from the source be used in the response? Is fine tuning a use case, or can we only use embeddings for domain specific knowledge?
I Search 🔍 for the same thing
Yes!! I'd love to see a similar thing as well. @robxmccarthy, have you been able to find anything in the meantime?
I'm trying to fine tune a model for summarization tasks. Not quite sure how to use the question answer pairs to fine tune the model. Is there any documentation on how to do it?
I’m working on something similar
Very informative as always.
Thank you Tarun
Thanks for the summarised video😊
Little correction: the target_modules in the LoraConfig should include the dense layers as well, as per paper.
Oooh. Thanks for the information
I have been following your channel for a while now. All your videos are informative, there's no clickbait material. I am a UA-cam myself (food vlogger, not tech vlogger LoL) and I know the condition of Indian UA-cam nowadays. Comparing to that your videos are gems in themselves. About this video, can you tell us approximately how much time does it take to complete the fine tuning process (approximately)? Will it reduce if I subscribe to Colab Pro? Are there any other free or relatively cheaper alternatives than Colab?
Does this all need to happen in the cloud. I understand the training part but what if I want to use it locally. Can I load this fine tuned model into gpt4all for example.? And what constitutes a “dataset” ? Can I just give it a bunch of C source files and then train it and then ask it questions about the code"? Sorry if my questions are ignorant , my background is in EE and nothing to do with AI
Is it possible to use QLoRA to fine-tune LLMs on labelled data? My downstream task is supervised text classification
sounds like overkill, supervised text classification shouldnt require such big model no ?
Hey what is the best Server site to Host your LLM?i mean for massiv use Not localy
Brilliant. Thank you.
You're welcome
I have a question: the dateset has features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],how does the trainer know which feature to select for training (fine-tune)?
I'm also wondering the same thing! Have you managed to find an answer?
please finetune using any custom dataset like question answering and make a video that will be helpful thanks
Interested as well.
What is the max parameter size which a 16 GB GPU can train up to with QLORA?
The Last Ben repository is not working on free Collab anymore, can you create a video on alternatives?
Let me check
How do you correct "CUDA out of memory" error in free version of Google Colab? I get that when finetuning an LLM
Quantize the model for 4 bits,
Why would I bother fine tuning a giant 65B model that I wouldnt be able to run anyways? So that I can quantize it or something? And that makes me wonder, should you fine tune then quantize, or quantize then fine tune?
yes, the point is that you quantize 65B model down to 4-bit so that it fits on a ~16GB of a 24GB GPU - and you have some space left over for uptraining or batch evaluation.
@@JurekOK Honestly should I fine tune my models it seems like a lot of work when people already probably have made really good fine tunes. Especially for things like coding
@@nattyzaddy6555 surely enough, if you feel that you are not going to gain anything by your actions, then it's smarter to not do it. I know that I have a couple of applications for tuned models.
Do you have any techinique to understand which target modules should I select for a custom model?
As you said, in this case it was target_modules=["query_key_value"], but if it was another ....
You can simply execute print(model) after the base model is downloaded .
You will find the names of the sets of parameters in the printed model.
In the model shown in the video , you will find "query_key_value" along with other names , say "fnn" .
This other name can be put in the list alongwith "query_key_value" and given to argument target_modules .
That way, the config will have the information of the trainable parameters.
Could you explain how I can retrain the new model? Also How to use in further scripts the new trained model?
hi how can we the original implementation behind any library .i know its a basic question how do we check .like qlora or lora etc?ccan u suggest please
You are the best Thank you Bro
can you tell how to train using like a csv file or something from google drive?
please let me know more about target_modules and how can we check this for any pretained model
Is it same for GODAL model? And can you explicitly make a video on fine tunning a transformer model.
i am getting error that model .to not supporting 4 bit or 8 bit, use the model as it is? any help. gettint this is second cell where we define model
Bro when I execute trainer.evaluate() it start 1125 epochs but it it reach to 200 gpu is out of memory aim is to find accuracy f1 what I do pls guide sir
This is beautiful ❤
Thanks
Hello sir. I am completely newbke plz make videos on how we can run AIs locally for from (noob guide). Also tell computing source requirements for Corresponding AI. Thanks.
Why the total amout of parameters is 10b and the model name says 20b?
Hey can I connyor talk with u...i wanna fine tune falcon 40B ..what is required GPUs any idea?
I know what Lora is. I learnt what Qlora is now.
🚀
0.08 %, not 8 %. This is directly given in percentage, it does not sum up to 1, but 100. But a great video!
Dose this work for other languages ? if I want for example to fine tune it for Arabic language tasks like summarization or classification would this give me good results?
does it support llama 2 as well? please put out a video on same
I'd like to know that as well
Can u share the colab link again? Could not find in your comments
can you recommend a primer to understand LoRA?
Can I apply what you did with falcon 40b?
Did you tried with falcon using same code?
I think it's actually .08%. Not 8% as you said
Oh. I need to check again.
0.08 is 8%
Yes but in the video it says 8,650,752 trainable parameters to 10,597,552,128 parameters overall. That would give a ratio of 0.000816297188. Or about 0.08%. Which is even more impressive. But yeah it's an easy one to confuse. Or correct me if I am wrong, but I don't think so
Hasn't the model 20b parameters? Why the log reports 10b?
Does that mean it can be trained on Colab?
Depends on the size of the model. But my demonstration is on free colab. Fine tuning 20B parameter model
We can fine tune a 20B param model on 16GB VRAM 😅😰😰😰😰😰
We are actually training 0.08% of the params.
It would be better if you used a different dataset - something instruction based. This dataset is kind of useless. Explanation on using your own dataset would make this video more useful.
Great suggestion!
@@1littlecoder Great work. Is there an update available for input-output finetuning?
Yes can you do that please
@@1littlecoder can you do this please
@@areefa6268 Already done it ua-cam.com/video/2PlPqSc3jM0/v-deo.html
what are your thoughts about getting a PhD?
Its useless😂
Your colab doesn't work.
could you make a tutorial to generate dataset in the form of csv or json using llms using python script
Interested as well in a tutorial with json input-output CausalLM dataset.
what is the difference between system ram and GPU ram?
System ram communicates with the CPU. GPU RAM is built onto the graphics card and is essentially separate from the rest of the system. Some newer systems, like the Xbox series and ps5 consoles, and Apple Silicon (M1 & M2 systems) have a shared memory pool, which can be used for either function.
There is a strong likelihood that if you aren’t already aware of your GPU’s VRAM, your system uses an integrated gpu that’s built into the CPU, and won’t be sufficient for running a local LLM. However, if your CPU has 6 or 8+ cores, you may be able to use koboldcpp to run GGML models on your CPU, at a reduced response time.
@1littlecoder: It's giving you results and it is supposed to give you this exactly because it's a sentence completion model. so if you input: "Elon Musk" -> It is trying to complete the sentence. thank you :)
Can QLoRA be applied to other non-LLM models such as Swin Transformer?