🦙 LLAMA-2 : EASIET WAY To FINE-TUNE ON YOUR DATA Using Reinforcement Learning with Human Feedback 🙌

GPT-4o: Create your own AI girlfriend that talks ❤️ Crazy or Creepy ?

"okay, but I want Llama 3 for my specific use case" - Here's how

Why no RONALDO?! 🤔⚽️

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

When you go out and meet your old-fashioned husband, you 'd better wear normal clothes for the sake

Fine Tune GPT In FIVE MINUTES with RLHF! - "Perform 10x Better For My Use Case" - FREE COLAB 📓

Whispering AI

Переглядів 4 116

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 лис 2024

КОМЕНТАРІ • 13

@nadimkaysar257 Рік тому ⁺³
Your tutorial is very helpful for us. please make a video for chatbots with reinforcement learning. (RLHF)
@WhisperingAI Рік тому ⁺¹
Thanks for the comment. Sure i will, as soon as possible
@animatorai Рік тому ⁺²
Great video
@WhisperingAI Рік тому
Glad you enjoyed it
@Techbytes-on7zl 6 місяців тому ⁺¹
I think this RLAIF instead of RLHF because the feedback is generated using BERT model instead of a human which forms a reward model
@WhisperingAI 6 місяців тому
You are somewhat right and wrong aswell.
But mostly we need to train reward model in order for this to give feedback in human label data. So in that case its rlhf.
So happy that you point out the something interesting.❤️
@shivamshrivastava4242 Рік тому ⁺¹
Please make video covering all 3 steps from scratch, with less parameter LLM, Pleaseeee
@WhisperingAI Рік тому
Sure but i have already created a video, where we have finetuned tinystarcoder which is 164M parameter model.
you can check it here:
1. ua-cam.com/video/G3RZoxPIpXw/v-deo.html
2. ua-cam.com/video/R2paulc3P2M/v-deo.html
@shivamshrivastava4242 Рік тому
@@WhisperingAI I am getting errors while implementing that notebook.
@WhisperingAI Рік тому ⁺¹
Sorry for that let me revisit the notebook and make necessary changes. Will update the notebook in couple of hours.
@shivamshrivastava4242 Рік тому
@@WhisperingAI yes please 🥺
@shivamshrivastava4242 Рік тому ⁺¹
@@WhisperingAI please also clarify the path of models and tokenizer for SFT, REWARD MODEL AND POLICY MODEL

Наступне

Автоматичне відтворення

🦙 LLAMA-2 : EASIET WAY To FINE-TUNE ON YOUR DATA Using Reinforcement Learning with Human Feedback 🙌

🦙 LLAMA-2 : EASIET WAY To FINE-TUNE ON YOUR DATA Using Reinforcement Learning with Human Feedback 🙌

GPT-4o: Create your own AI girlfriend that talks ❤️ Crazy or Creepy ?

GPT-4o: Create your own AI girlfriend that talks ❤️ Crazy or Creepy ?

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

Why no RONALDO?! 🤔⚽️

Why no RONALDO?! 🤔⚽️

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

When you go out and meet your old-fashioned husband, you 'd better wear normal clothes for the sake

When you go out and meet your old-fashioned husband, you 'd better wear normal clothes for the sake

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

🐐Llama 3 Fine-Tune with RLHF [Free Colab 👇🏽]

🐐Llama 3 Fine-Tune with RLHF [Free Colab 👇🏽]

RLHF+CHATGPT: What you must know

RLHF+CHATGPT: What you must know

Reinforcement Learning from Human Feedback (RLHF) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

How to Build an AI Document Chatbot in 10 Minutes ? |🦜️ 🔗LangFlow & Flowise

How to Build an AI Document Chatbot in 10 Minutes ? |🦜️ 🔗LangFlow & Flowise

OpenAI Introduces Fine-tuning with GPT 4o (Tutorial)

OpenAI Introduces Fine-tuning with GPT 4o (Tutorial)

Reinforcement Learning with Human Feedback (RLHF)

Reinforcement Learning with Human Feedback (RLHF)

Qwen Just Casually Started the Local AI Revolution

Qwen Just Casually Started the Local AI Revolution

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning: ChatGPT and RLHF

Reinforcement Learning: ChatGPT and RLHF

ЭТО ОЧЕНЬ ПРИЯТНОЕ ВИДЕО! #Shorts #Глент

ЭТО ОЧЕНЬ ПРИЯТНОЕ ВИДЕО! #Shorts #Глент

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

Як пацієнти зустріли військового лікаря після повернення з фронту

Як пацієнти зустріли військового лікаря після повернення з фронту

Когда муж не доверяет жене @Oscar_elteacher

Когда муж не доверяет жене @Oscar_elteacher

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 23 серія

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 23 серія

Симбу закрыли дома?! 🔒 #симба #симбочка #арти

Симбу закрыли дома?! 🔒 #симба #симбочка #арти

Его считали НЕПОБЕДИМЫМ, но Али доказал, что нет НИЧЕГО НЕВОЗМОЖНОГО #shorts

Его считали НЕПОБЕДИМЫМ, но Али доказал, что нет НИЧЕГО НЕВОЗМОЖНОГО #shorts