We only train the last few layers (classification head, projection layers) or task-specific layers are trained or fine-tuned. In Our case as i have explained in video target modules are q_proj, v_proj (query and value projection). You asked what should you do, I am not able to understand. You have to explain your problem statement then i can assist you.
@@tamilselvan3525 I haven't trained completely because of GPU limitation. So i won't be able to answer. I just wanted to share that its possible to train and how to train. But training time is dependent on dataset, hardware(GPU configuration) and no. of epochs you are training for.
@@babusd relax bro. Do one thing rather than saying show the proof. During training on llama 3.2 vision model they have used image and text pair. Read the model architecture. And if you know show me where they have written we can Fine-Tune vision model on videos.
w e are fine tuning llama .2 vision model but collate functionwas utilising Qwen2. IS it fine to use Qwen model in collate function while fine tuning llama-3.2?
By customizing the collate_fn, we are able to control how the data is prepared. we are using it for batch processing, padding bringing data into format to train the model. Its fine to use it.
Nice video
Thanks
Thanks you sir this video help me to understand the this model in very first video
Glad it helped
Good video, how can I test the model that push to Hugging Face? Could you please share an example.
Thanks. You can use AutoModelForVision2Seq to load your model. You need to pass your model path and use huggingface access token.
I have a question ? when we will fine tuning the model, we don't train whole the model right?. So, if it is like this, what should I do?
We only train the last few layers (classification head, projection layers) or task-specific layers are trained or fine-tuned. In Our case as i have explained in video target modules are q_proj, v_proj (query and value projection). You asked what should you do, I am not able to understand. You have to explain your problem statement then i can assist you.
how long will it takes for the whole process?
@@tamilselvan3525 I haven't trained completely because of GPU limitation. So i won't be able to answer. I just wanted to share that its possible to train and how to train. But training time is dependent on dataset, hardware(GPU configuration) and no. of epochs you are training for.
@@nextGenAIGuy490 Okay, thanks.
thank you for the démonstration. do you think can we fine tune this model on a videos data?
@@soulaimanebahi741 No, We can't.
Absolutely wrong ! If you dont know say "dont know" . Dont mislead him , fine tuning over video is possible
@@babusd relax bro. Do one thing rather than saying show the proof. During training on llama 3.2 vision model they have used image and text pair. Read the model architecture. And if you know show me where they have written we can Fine-Tune vision model on videos.
w e are fine tuning llama .2 vision model but collate functionwas utilising Qwen2.
IS it fine to use Qwen model in collate function while fine tuning llama-3.2?
By customizing the collate_fn, we are able to control how the data is prepared. we are using it for batch processing, padding bringing data into format to train the model. Its fine to use it.
I highly doubt that this could work. Different models have different chat templates and processing.