is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.
Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.
Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?
I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock with torch.inference_mode(): outputs = model.generate( input_ids=encoding.input_ids, attention_mask=encoding.attention_mask, generation_config=generation_config, ) Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library
I get this error: Any idea on how to resolve this: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations. Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct
I don't get why inference is so slow. It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?
@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right? It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...
I followed the code above and got following output return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin) RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1 kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅
Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio
The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/ That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb
I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.
Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?
I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?
@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b
@@TailorJohnson-l5y Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.
With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?
As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.
@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.
@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.
Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it? RuntimeError(f\"weight {tensor_name} does not exist\") RuntimeError: weight transformer.word_embeddings.weight does not exist "},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
I have two sample dataset like bello 1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...] 2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ] so how can i fine tune above dataset using falcon llm model Please help me
Hello, Great video so far. Let me ask some questions here: 1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ? 2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?
Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!
hey there, how do I create a generative AI chatbox with my own data? let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data I have juggled through the internet today and found 1) Data collection 2) Data preprocessing 3) Selecting a pre trained model(cause it is easy than creating one) 4) Fine tuning the model 5) Iteration This is my understanding as of now so basically how do I have preprocess the data? do I have to learn NLP for that?
can someone help me out! my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output If anyone faced this issue and fixed it please let me know, do i need to change any config or model ? suggestions are welcome! thanks
I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.
I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!
Fantastic tutorial. Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured? Do the models need to be on the Hugging Face servers for inference?
@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?
great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "
I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?
Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again. You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.
Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this : How JP Morgan help me? : JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏
@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.
@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos. docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing
Hi bro. Amazing tutorial. I am getting this error: "ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case) have excessive nesting (inputs type `list` where type `int` is expected)." I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?
@Pranjal Yadav Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?
@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.
@@gokhanersoz5239 I tried with other optimizers, it fixes the optimizers issue but not sure about the performance since I am not able to start the training process and keep getting the ValueError no matter what I do..
Full text turorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/fine-tuning-llm-on-custom-dataset-with-qlora
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
@@ko-Daegu i wanna know this too!
I pushed my model to hugging face can you please tell me how can i deploy that model
Hello Veneline can you please provide the colab notebook (falcon-qlora-fine-tuning.ipynb)…..if possible
Please make a video on how to increase the inference speeds that is the major problem every one is facing
when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.
Is the model multilingual? Can I fine tune it in another language?
Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.
My model generates multiple redundant answers e.g. : xxxx : xxxx : xxxx : xxxx. How to solve it?
For the tokenizer, I think we should set padding_side="left", because it is a causal llm. What do you think of it?
Does it work without the high RAM, I'm using a free version
Can someone please share the code that has been used in this tutorial
I pushed my model to hugging face can you please tell me how can i deploy that model
"IndexError: Invalid key: 78 is out of bounds for size 0" do you have this error ? I try everything but not solving @venelin_valkov
I pushed my model to hugging face can you please tell me how can i deploy that model
I pushed my model to hugging face can you please tell me how can i deploy that model please!
Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?
I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock
with torch.inference_mode():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)
Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library
Great video. Would the response times be faster with a better GPU?
I get this error: Any idea on how to resolve this:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
I get this same error. could you resolve it?
Some of the models recently published/released are not working on M2 MacOS. Any idea if you could make it feasible for M2 Max MacOS? Thanks
No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct
how to deploy this chat bot model after pushing it to hugging face? i'm talking about qlora fine tuned model
I made a video on this topic: ua-cam.com/video/HI3cYN0c9ZU/v-deo.html
Thank you for watching!
I don't get why inference is so slow.
It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?
when you train the model, it gets trained on every token in the text batch at once (it outputs logits at every step)
@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right?
It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...
I followed the code above and got following output
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1
kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit
Can we Train model with context (Question: " ", Context: " ", Answer:" " ) . So model will answer from context, Like a RAG ???
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
what does bnb_4bit_use_double_quant=True do? tried searching for answers, coming up with nothing! lol
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
How do we compute metrics of this model? When I add compute_metric into trainer and it was error. Can you please add the compute_metric?
Please share the notebook
How much VRAM did you end up using?
The Google Colab showed 6.9GB VRAM and 4.6GB RAM, during the training (with parameters shown in the video). Not sure how accurate it is, though.
Thanks for the great video, can we merge back the adapter.bin to it's original model ? can you make a video onit ?
Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅
Wow, finally a working guide on how to finetune LLM's. Thank you very much 🙏
Try example, stuck on training part, having error IndexError: Invalid key: 78 is out of bounds for size 0. Does anyone faced with similar?
Can you solve that ?
if its assistant model , doesn't it should respond only when human asks the questions to him?
here it generate the question and answers on its own.
Can you make a QLoRA for text-summarization task on Falcon7B. That would be very much helpful. Cheers 🍻🍻
Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio
Thank you so much ! Just curious, can it run on a free colab?
Does the custom dataset needs to be in english or It could be in any language?
The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/
That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb
I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.
I just subscribed!! Your tutorials are straightforward and to the point. Love your content. Keep up with the amazing content! 🙌 ✨✨✨
How do we add our own data? Just change the link in the jupyter notebook?
can you share the link to your notebook?
why is the inference consistently slower? Do we know how to speed it up ?
Nice video Venelin Valkov, I wanted to ask if I have an input size of 4k+ tokens can I train it on a single GPU?
many thanks , shall we have colab link or file?
Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?
I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?
I would guess the training process can be similar for MTP-7B, but can't be sure. Try it and let me know.
Thank you for watching!
@@venelin_valkov I will try and let you know!
@@TailorJohnson-l5y Did it work? :D
@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b
@@TailorJohnson-l5y Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.
Жаль на русском не делаешь видео...
With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?
As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.
@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.
@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.
Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it?
RuntimeError(f\"weight {tensor_name} does not exist\")
RuntimeError: weight transformer.word_embeddings.weight does not exist
"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
I have two sample dataset like bello
1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...]
2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ]
so how can i fine tune above dataset using falcon llm model
Please help me
Hello, Great video so far. Let me ask some questions here:
1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ?
2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.
Read about deepspeed packaage
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?
Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!
hey there,
how do I create a generative AI chatbox with my own data?
let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data
I have juggled through the internet today and found
1) Data collection
2) Data preprocessing
3) Selecting a pre trained model(cause it is easy than creating one)
4) Fine tuning the model
5) Iteration
This is my understanding as of now
so basically how do I have preprocess the data?
do I have to learn NLP for that?
is the notebook available ?
can someone help me out!
my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output
If anyone faced this issue and fixed it please let me know, do i need to change any config or model ?
suggestions are welcome!
thanks
Thank you so much
I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.
Strange, just reran the notebook (without changes) and training started as usual.
I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!
I am getting error while executing trainer.run() saying: "can't copy out of meta tensor, no data!"
wow wow wow man
Fantastic tutorial.
Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured?
Do the models need to be on the Hugging Face servers for inference?
never finetune your model on raw data, however, you can do pre-training on raw text.
@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?
@d_b
@enggm.alimirzashortclipswh6010 How to fine tune if data look like this?
Review(col1)
Nice cell phone, big screen, plenty of storage. Stylus pen works well.
Analysis(col2)
[{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]
You're the Best 💯thanks a lot for the video! Can you please upload a video implementing this tutorial using langchain framework.🥺
You mean use the trained model with LangChain?
Thank you for watching!
@@venelin_valkov yes so it'll be useful for the community "end to end" implementation 🙂
Thanks for the video, the masked language model MLM is set to be "False", then how the model is fine-tuned?
Using "just" language modelling (predict next token). More info here: paperswithcode.com/task/language-modelling
Wow, thanks a lot for the video!
great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "
where you can get the code ? ..... are you typing manually ??
I have that too, how did you solve it
@@kaihaoliu7869 I have to install transformers==4.30.1 instead of newest dev transformers to get rid the error.
Can anyone help me please? i get the following error on the Training Part: IndexError: Invalid key: 78 is out of bounds for size 0
The error occur in the following line:
trainer.train()
A doubt a little out of the context of the video... are Deep Learning models as used as machine learning models in tabular data?
I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?
You should fine tune it, so less data. It is pretrained with a huge amount of data.
Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again.
You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.
Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this
: How JP Morgan help me?
: JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available.
Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏
@@zorbat5please help me what can I do please 😭
@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.
Браво, Венелине!
great video Venelin..thanks for sharing! will you be sharing any such training video with dialogue datasets for contextual conversations?
Do you have a dataset in mind?
Thanks for watching!
@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos.
docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing
@@venelin_valkov do you think any of the above datasets are useful ? :)
T4 enough for tranining ?
The QLoRA adapter is trained using T4, yes!
abi ben bu LLM islerine yeni girdim de bana yardimci olabilir misin birkac soru sorsam
@@oncelscu8089 elbette
Hi bro. Amazing tutorial. I am getting this error:
"ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True'
'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case)
have excessive nesting (inputs type `list` where type `int` is expected)."
I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?
@Pranjal Yadav Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?
@@Purulence-bw7nt solve problem ?
@@gokhanersoz5239 No, I couldn't solve it. Have you solved it?
@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.
@@gokhanersoz5239 I tried with other optimizers, it fixes the optimizers issue but not sure about the performance since I am not able to start the training process and keep getting the ValueError no matter what I do..