Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

Venelin Valkov

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 жов 2024

КОМЕНТАРІ • 122

@venelin_valkov Рік тому ⁺¹¹
Full text turorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/fine-tuning-llm-on-custom-dataset-with-qlora
@ko-Daegu Рік тому ⁺¹
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
@mariocuezzo8027 Рік тому ⁺¹
@@ko-Daegu i wanna know this too!
@shivamkapoor7634 Рік тому ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model
@LifeTravelerAmmu Рік тому ⁺¹⁸
Hello Veneline can you please provide the colab notebook (falcon-qlora-fine-tuning.ipynb)…..if possible
@dataflex4440 Рік тому ⁺⁵
Please make a video on how to increase the inference speeds that is the major problem every one is facing
@maidacundo3471 Рік тому ⁺³
when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.
@quachhengtony7651 Рік тому ⁺²
Is the model multilingual? Can I fine tune it in another language?
@ggximenez Рік тому ⁺²
Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.
@Jeong5499 Рік тому ⁺¹
My model generates multiple redundant answers e.g. : xxxx : xxxx : xxxx : xxxx. How to solve it?
@ChuanMeng-q4f Рік тому ⁺⁴
For the tokenizer, I think we should set padding_side="left", because it is a causal llm. What do you think of it?
@bolarinwarahmonismail8248 Рік тому ⁺²
Does it work without the high RAM, I'm using a free version
@meenalpatidar9405 Рік тому ⁺²
Can someone please share the code that has been used in this tutorial
@shivamkapoor7634 Рік тому ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model
@gokhanersoz5239 Рік тому ⁺²
"IndexError: Invalid key: 78 is out of bounds for size 0" do you have this error ? I try everything but not solving @venelin_valkov
@shivamkapoor7634 Рік тому ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model
@shivamkapoor7634 Рік тому ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model please!
@henkhbit5748 Рік тому ⁺³
Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?
@LinPure Рік тому ⁺¹
I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock
with torch.inference_mode():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)
Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library
@ikjb8561 Рік тому ⁺³
Great video. Would the response times be faster with a better GPU?
@Mohith7548 Рік тому ⁺²
I get this error: Any idea on how to resolve this:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
@TheKizoch Рік тому
I get this same error. could you resolve it?
@yusufkemaldemir9393 Рік тому ⁺²
Some of the models recently published/released are not working on M2 MacOS. Any idea if you could make it feasible for M2 Max MacOS? Thanks
@venelin_valkov Рік тому
No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct
@shivamkapoor7634 Рік тому ⁺¹
how to deploy this chat bot model after pushing it to hugging face? i'm talking about qlora fine tuned model
@venelin_valkov Рік тому ⁺¹
I made a video on this topic: ua-cam.com/video/HI3cYN0c9ZU/v-deo.html
Thank you for watching!
@pvlr1788 Рік тому ⁺²
I don't get why inference is so slow.
It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?
@Timotheeee1 Рік тому
when you train the model, it gets trained on every token in the text batch at once (it outputs logits at every step)
@pvlr1788 Рік тому
@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right?
It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...
@chanderbalaji3539 Рік тому ⁺¹
I followed the code above and got following output
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1
kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit
@sandesh-n3r 10 місяців тому
Can we Train model with context (Question: " ", Context: " ", Answer:" " ) . So model will answer from context, Like a RAG ???
@thisurawz 8 місяців тому
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
@cryptojointer Рік тому
what does bnb_4bit_use_double_quant=True do? tried searching for answers, coming up with nothing! lol
@ko-Daegu Рік тому ⁺¹
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
@minhducha8574 10 місяців тому
How do we compute metrics of this model? When I add compute_metric into trainer and it was error. Can you please add the compute_metric?
@priyabnsl Рік тому ⁺³
Please share the notebook
@weystrom Рік тому ⁺³
How much VRAM did you end up using?
@venelin_valkov Рік тому ⁺¹
The Google Colab showed 6.9GB VRAM and 4.6GB RAM, during the training (with parameters shown in the video). Not sure how accurate it is, though.
@ElearningMode Рік тому ⁺¹
Thanks for the great video, can we merge back the adapter.bin to it's original model ? can you make a video onit ?
@joaoalmeida4380 Рік тому ⁺¹
Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅
@sithlordi5170 Рік тому ⁺¹
Wow, finally a working guide on how to finetune LLM's. Thank you very much 🙏
@biodata-i1e Рік тому ⁺¹
Try example, stuck on training part, having error IndexError: Invalid key: 78 is out of bounds for size 0. Does anyone faced with similar?
@gokhanersoz5239 Рік тому
Can you solve that ?
@riyajatar6859 10 місяців тому
if its assistant model , doesn't it should respond only when human asks the questions to him?
here it generate the question and answers on its own.
@AnimeOtakuArt Рік тому ⁺¹
Can you make a QLoRA for text-summarization task on Falcon7B. That would be very much helpful. Cheers 🍻🍻
@SAVONASOTTERRANEASEGRETA Рік тому ⁺¹
Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio
@ashioyajotham Рік тому ⁺¹
Thank you so much ! Just curious, can it run on a free colab?
@georgetarida5653 Рік тому ⁺²
Does the custom dataset needs to be in english or It could be in any language?
@venelin_valkov Рік тому ⁺³
The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/
That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb
@subhamchoudhary4091 Рік тому
I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.
@thevitorialima Рік тому ⁺¹
I just subscribed!! Your tutorials are straightforward and to the point. Love your content. Keep up with the amazing content! 🙌 ✨✨✨
@aimaven Рік тому ⁺¹
How do we add our own data? Just change the link in the jupyter notebook?
@kaihaoliu7869 Рік тому
can you share the link to your notebook?
@sumitmamoria Рік тому ⁺¹
why is the inference consistently slower? Do we know how to speed it up ?
@amnasherafal Рік тому ⁺¹
Nice video Venelin Valkov, I wanted to ask if I have an input size of 4k+ tokens can I train it on a single GPU?
@AIwithParissan 10 місяців тому
many thanks , shall we have colab link or file?
@mariocuezzo8027 Рік тому ⁺¹
Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?
@TailorJohnson-l5y Рік тому ⁺⁴
I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?
@venelin_valkov Рік тому ⁺¹
I would guess the training process can be similar for MTP-7B, but can't be sure. Try it and let me know.
Thank you for watching!
@TailorJohnson-l5y Рік тому ⁺²
@@venelin_valkov I will try and let you know!
@tadificilaxalogin Рік тому ⁺¹
@@TailorJohnson-l5y Did it work? :D
@TailorJohnson-l5y Рік тому ⁺¹
@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b
@tadificilaxalogin Рік тому
@@TailorJohnson-l5y Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.
@Shachu-e3o Рік тому
Жаль на русском не делаешь видео...
@MattJonesYT Рік тому ⁺²
With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?
@pvlr1788 Рік тому ⁺²
As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.
@MattJonesYT Рік тому
@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.
@pvlr1788 Рік тому
@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.
@Ryan-yj4sd Рік тому
Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it?
RuntimeError(f\"weight {tensor_name} does not exist\")
RuntimeError: weight transformer.word_embeddings.weight does not exist
"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
@ShaileshPatel-s1p Рік тому
I have two sample dataset like bello
1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...]
2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ]
so how can i fine tune above dataset using falcon llm model
Please help me
@IchSan-jx5eg Рік тому ⁺¹
Hello, Great video so far. Let me ask some questions here:
1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ?
2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.
@yashjain6372 Рік тому
Read about deepspeed packaage
@amparoconsuelo9451 Рік тому
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?
@josephtsangko3558 10 місяців тому
Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!
@sathvikreddy4807 Рік тому
hey there,
how do I create a generative AI chatbox with my own data?
let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data
I have juggled through the internet today and found
1) Data collection
2) Data preprocessing
3) Selecting a pre trained model(cause it is easy than creating one)
4) Fine tuning the model
5) Iteration
This is my understanding as of now
so basically how do I have preprocess the data?
do I have to learn NLP for that?
@nourghaliaabassi931 Рік тому ⁺¹
is the notebook available ?
@sharathgilla4412 Рік тому
can someone help me out!
my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output
If anyone faced this issue and fixed it please let me know, do i need to change any config or model ?
suggestions are welcome!
thanks
@ghezalahmad Рік тому ⁺¹
Thank you so much
@lifeofcode Рік тому ⁺¹
I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.
@venelin_valkov Рік тому ⁺¹
Strange, just reran the notebook (without changes) and training started as usual.
@lifeofcode Рік тому
I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!
@safihaider6715 Рік тому
I am getting error while executing trainer.run() saying: "can't copy out of meta tensor, no data!"
@sherryhp10 Рік тому ⁺¹
wow wow wow man
@d_b_ Рік тому ⁺¹
Fantastic tutorial.
Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured?
Do the models need to be on the Hugging Face servers for inference?
@enggm.alimirzashortclipswh6010 Рік тому
never finetune your model on raw data, however, you can do pre-training on raw text.
@d_b_ Рік тому
@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?
@yashjain6372 Рік тому
@d_b
@enggm.alimirzashortclipswh6010 How to fine tune if data look like this?
Review(col1)
Nice cell phone, big screen, plenty of storage. Stylus pen works well.
Analysis(col2)
[{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]
@brijeshkaran5369 Рік тому ⁺¹
You're the Best 💯thanks a lot for the video! Can you please upload a video implementing this tutorial using langchain framework.🥺
@venelin_valkov Рік тому ⁺²
You mean use the trained model with LangChain?
Thank you for watching!
@brijeshkaran5369 Рік тому
@@venelin_valkov yes so it'll be useful for the community "end to end" implementation 🙂
@AI_ML_DL_LLM Рік тому ⁺¹
Thanks for the video, the masked language model MLM is set to be "False", then how the model is fine-tuned?
@venelin_valkov Рік тому ⁺²
Using "just" language modelling (predict next token). More info here: paperswithcode.com/task/language-modelling
@PavPetukhov Рік тому ⁺¹
Wow, thanks a lot for the video!
@RoyRajjyoti Рік тому ⁺¹
great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "
@LifeTravelerAmmu Рік тому
where you can get the code ? ..... are you typing manually ??
@kaihaoliu7869 Рік тому
I have that too, how did you solve it
@IchSan-jx5eg Рік тому
@@kaihaoliu7869 I have to install transformers==4.30.1 instead of newest dev transformers to get rid the error.
@flaviovitoriano2429 Рік тому
Can anyone help me please? i get the following error on the Training Part: IndexError: Invalid key: 78 is out of bounds for size 0
@flaviovitoriano2429 Рік тому
The error occur in the following line:
trainer.train()
@alyssonmach Рік тому
A doubt a little out of the context of the video... are Deep Learning models as used as machine learning models in tabular data?
@odev6764 Рік тому ⁺¹
I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?
@zorbat5 Рік тому
You should fine tune it, so less data. It is pretrained with a huge amount of data.
@zorbat5 Рік тому
Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again.
You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.
@pawancreation2311 Рік тому
Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this
: How JP Morgan help me?
: JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available.
Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏
@pawancreation2311 Рік тому
@@zorbat5please help me what can I do please 😭
@zorbat5 Рік тому
@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.
@tptodorov123 Рік тому ⁺¹
Браво, Венелине!
@prakaashsukhwal1984 Рік тому ⁺¹
great video Venelin..thanks for sharing! will you be sharing any such training video with dialogue datasets for contextual conversations?
@venelin_valkov Рік тому ⁺¹
Do you have a dataset in mind?
Thanks for watching!
@prakaashsukhwal1984 Рік тому
@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos.
docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing
@prakaashsukhwal1984 Рік тому
@@venelin_valkov do you think any of the above datasets are useful ? :)
@gokhanersoz5239 Рік тому ⁺¹
T4 enough for tranining ?
@venelin_valkov Рік тому ⁺¹
The QLoRA adapter is trained using T4, yes!
@oncelscu8089 Рік тому
abi ben bu LLM islerine yeni girdim de bana yardimci olabilir misin birkac soru sorsam
@gokhanersoz5239 Рік тому
@@oncelscu8089 elbette
@Purulence-bw7nt Рік тому ⁺¹
Hi bro. Amazing tutorial. I am getting this error:
"ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True'
'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case)
have excessive nesting (inputs type `list` where type `int` is expected)."
I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?
@Purulence-bw7nt Рік тому
@Pranjal Yadav Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?
@gokhanersoz5239 Рік тому
@@Purulence-bw7nt solve problem ?
@Purulence-bw7nt Рік тому
@@gokhanersoz5239 No, I couldn't solve it. Have you solved it?
@gokhanersoz5239 Рік тому
@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.
@Purulence-bw7nt Рік тому
@@gokhanersoz5239 I tried with other optimizers, it fixes the optimizers issue but not sure about the performance since I am not able to start the training process and keep getting the ValueError no matter what I do..

Наступне

Автоматичне відтворення

Faster LLM Inference: Speeding up Falcon 7b (with QLoRA adapter) Prediction Time