- 13
- 185 094
Entry Point AI
United States
Приєднався 22 бер 2023
The modern platform for fine-tuning large language models.
RLHF & DPO Explained (In Simple Terms!)
Learn how Reinforcement Learning from Human Feedback (RLHF) actually works and why Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO) are changing the game.
This video doesn't go deep on math. Instead, I provide a high-level overview of each technique to help you make practical decisions about where to focus your time and energy.
0:52 The Idea of Reinforcement Learning
1:55 Reinforcement Learning from Human Feedback (RLHF)
4:21 RLHF in a Nutshell
5:06 RLHF Variations
6:11 Challenges with RLHF
7:02 Direct Preference Optimization (DPO)
7:47 Preferences Dataset Example
8:29 DPO in a Nutshell
9:25 DPO Advantages over RLHF
10:32 Challenges with DPO
10:50 Kahneman-Tversky Optimization (KTO)
11:39 Prospect Theory
13:35 Sigmoid vs Value Function
13:49 KTO Dataset
15:28 KTO in a Nutshell
15:54 Advantages of KTO
18:03 KTO Hyperparameters
These are the three papers referenced in the video:
1. Deep reinforcement learning from human preferences (arxiv.org/abs/1706.03741)
2. Direct Preference Optimization:
Your Language Model is Secretly a Reward Model (arxiv.org/abs/2305.18290)
3. KTO: Model Alignment as Prospect Theoretic Optimization (arxiv.org/abs/2402.01306)
The Huggingface TRL library offers implementations for PPO, DPO, and KTO:
huggingface.co/docs/trl/main/en/kto_trainer
Want to prototype with prompts and supervised fine-tuning? Try Entry Point AI:
www.entrypointai.com/
How about connecting? I'm on LinkedIn:
www.linkedin.com/in/markhennings/
This video doesn't go deep on math. Instead, I provide a high-level overview of each technique to help you make practical decisions about where to focus your time and energy.
0:52 The Idea of Reinforcement Learning
1:55 Reinforcement Learning from Human Feedback (RLHF)
4:21 RLHF in a Nutshell
5:06 RLHF Variations
6:11 Challenges with RLHF
7:02 Direct Preference Optimization (DPO)
7:47 Preferences Dataset Example
8:29 DPO in a Nutshell
9:25 DPO Advantages over RLHF
10:32 Challenges with DPO
10:50 Kahneman-Tversky Optimization (KTO)
11:39 Prospect Theory
13:35 Sigmoid vs Value Function
13:49 KTO Dataset
15:28 KTO in a Nutshell
15:54 Advantages of KTO
18:03 KTO Hyperparameters
These are the three papers referenced in the video:
1. Deep reinforcement learning from human preferences (arxiv.org/abs/1706.03741)
2. Direct Preference Optimization:
Your Language Model is Secretly a Reward Model (arxiv.org/abs/2305.18290)
3. KTO: Model Alignment as Prospect Theoretic Optimization (arxiv.org/abs/2402.01306)
The Huggingface TRL library offers implementations for PPO, DPO, and KTO:
huggingface.co/docs/trl/main/en/kto_trainer
Want to prototype with prompts and supervised fine-tuning? Try Entry Point AI:
www.entrypointai.com/
How about connecting? I'm on LinkedIn:
www.linkedin.com/in/markhennings/
Переглядів: 3 616
Відео
Ask GPT-4 in Google Sheets
Переглядів 1,6 тис.7 місяців тому
Learn an amazing trick to use OpenAI models directly inside Google Sheets. Watch as I transform whole columns of data using an AI prompt in seconds! After this video, you'll be able to do awesome things like write creative copy, standardize your data, or extract specific details from unstructured text. Topics covered: 0:27 Demo of LLM calls directly in Google Sheets 5:33 The custom function in ...
Fine-tuning Datasets with Synthetic Inputs
Переглядів 4 тис.9 місяців тому
👉 Start building your dataset at www.entrypointai.com There are virtually unlimited ways to fine-tune LLMs to improve performance at specific tasks... but where do you get the data from? In this video, I demonstrate one way that you can fine-tune without much data to start with - and use what little data you have to reverse-engineer the inputs required! I show step-by-step how to take a small s...
How Large Language Models (LLMs) Actually Work
Переглядів 2,1 тис.11 місяців тому
In this video, I explain how language models generate text, why most of the process is actually deterministic (not random), and how you can shape the probability when selecting a next token from LLMs using parameters like temperature and top p. I cover temperature in-depth and demonstrate with a spreadsheet how different values change the probabilities. Topics: 00:10 Tokens & Why They Matter 03...
LoRA & QLoRA Fine-tuning Explained In-Depth
Переглядів 54 тис.Рік тому
👉 Start fine-tuning at www.entrypointai.com In this video, I dive into how LoRA works vs full-parameter fine-tuning, explain why QLoRA is a step up, and provide an in-depth look at the LoRA-specific hyperparameters: Rank, Alpha, and Dropout. 0:26 - Why We Need Parameter-efficient Fine-tuning 1:32 - Full-parameter Fine-tuning 2:19 - LoRA Explanation 6:29 - What should Rank be? 8:04 - QLoRA and R...
Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use
Переглядів 106 тис.Рік тому
Explore the difference between Prompt Engineering, Retrieval-augmented Generation (RAG), and Fine-tuning in this detailed overview. 01:14 Prompt Engineering RAG 02:50 How Retrieval Augmented Generation Works - Step-by-step 06:23 What is fine-tuning? 08:25 Fine-tuning misconceptions debunked 09:53 Fine-tuning strategies 13:25 Putting it all together 13:44 Final review and comparison of technique...
Fine-tuning 101 | Prompt Engineering Conference
Переглядів 7 тис.Рік тому
👉 Start fine-tuning at www.entrypointai.com Intro to fine-tuning LLMs (large language models0 from the Prompt Engineering Conference (2023) Presented by Mark Hennings, founder of Entry Point AI. 00:13 - Part 1: Background Info -How a foundation model is born -Instruct tuning and safety tuning -Unpredictability of raw LLM behavior -Showing LLMs how to apply knowledge -Characteristics of fine-tun...
"I just fine-tuned GPT-3.5 Turbo…" - Here's how
Переглядів 1,1 тис.Рік тому
🎁 Join our Skool community: www.skool.com/entry-point-ai In this video, I'm diving into the power and potential of the newly released GPT-3.5's fine-tuning option. After fine-tuning some of my models, the enhancement in quality is undeniably remarkable. Join me as I: - Demonstrate the model I fine-tuned: Watch as the AI suggests additional items for an e-commerce shopping cart and the rationale...
No-code AI fine-tuning (with Entry Point!)
Переглядів 1,3 тис.Рік тому
👉 Sign up for Entry Point here: www.entrypointai.com Entry Point is a platform for no-code AI fine-tuning, with support for Large Language Models (LLMs) from multiple platforms: OpenAI, AI21, and more. In this video I'll demonstrate the core fine-tuning principles while creating an "eCommerce product recommendation" engine in three steps: 1. First I write ~20 examples by hand 2. Then I expand t...
28 April Update: Playground and Synthesis
Переглядів 83Рік тому
28 April Update: Playground and Synthesis
How to Fine-tune GPT-3 in less than 3 minutes.
Переглядів 4,4 тис.Рік тому
🎁 Join our Skool community: www.skool.com/entry-point-ai Learn how to fine-tune GPT-3 (and other) AI models without writing a single line of python code. In this video I'll show you how to create your own custom AI models out of GPT-3 for specialized use cases that can work better than ChatGPT. For demonstration, I'll be working on a classifier AI model for categorizing keywords from Google Ads...
Can GPT-4 Actually Lead a D&D Campaign? 🤯
Переглядів 390Рік тому
If you want to create / fine-tune your own AI models, check out www.entrypointai.com/
Entry Point Demo 1.0 - Keyword Classifier (AI Model)
Переглядів 350Рік тому
Watch over my shoulder as I fine-tune a model for classifying keywords from Google Ads.
just perfect explaination !
Thank you
Great video
Very useful, thank you!
Such an incredibly clear presentation! Thank you
Excellent explanation thank a lot!
Fact that you gave a concrete examples really helped me go through this! Thank you for the great video
Great insights on AI techniques! With Web3NS, you can fine-tune AI agents on YourName.Web3, combining RAG and prompts to create smarter, decentralized responses tailored to your needs. 🚀
Thanks for this clear explanation about the topic! Your way of relating back to research papers is very interesting and helpful!
I think there is a mistake your explanation. It uses higher precision while finetuning, but it does **not** recover "back to normal". It stays in the quantized dataformat.
Nice
Such a simple way to explain. Thank you.
can you share the presentation document
Loved the explanation. Despite knowing these terms well, I was curious to see how it would be explained and I am glad that I watched this video
Incredibly high quality content, thank you.
Thanks a ton!
Martin Shirley Jackson Kenneth Allen Mary
hello can you help me
bradley cooper in limitless tf
Melisa Branch
This is the best detailed video and nicest explanation on youtube right now. I do think your channel will grow because you are doing an EXCELENT job. Thank you man.
This is amazing! How do you deal with request limits? I'm set to 3RPM for all models, so I'm unable to drag and drop this into a sheet as it will only do 3 at a time.
Good question. I don't have a good answer right now except to find yourself a higher limit :) 3rpm is really really low and you should be able to find even a free service or wrapper that offers more. Check out Groq or Together.ai
damn bro thanks, I'm new to fine-tuning, needed to migrate my rag model to the ragtag model, and your video was very clear and helped me a lot 😊
Sick, glad to hear it!
so good!!
What's the name of the paper you referenced in the video?
Here's LoRA: arxiv.org/abs/2106.09685 and QLoRA: arxiv.org/abs/2305.14314
It may be a stupid qüestion, is it possible to download the fine tunned model at entrypoint? Thankyou. Grate video! I came trying to see if it is possible to download it but I understood more about the website and it's grate.
It depends on what service you use to fine-tune, the proprietary ones like OpenAI don't let you download it. Others like Replicate do.
Perfect differentiations. Thankyou
blud why your eyes like that
Best video I've seen on the topic. Thank you!
Very useful! Marvelous clear explanation with the right amount of detail about a subject that’s worth understanding
I've been binge watching LLM-related videos and most have been regurgitation of docs or (probably) GPT/AI generated fluff pieces. This video clearly explained several concepts that I was trying to wrap my head around. Great job!
Similar comments. He actually explained it properly.
Amazing for struggling students. Love from Korea😂
Thank you!
very well explained, thanks :)
Awesome. Thanks
Thanks!, an amazing explanation!, +1 Sub.
"Great video with an amazing explanation! Thank you for sharing."
Thanks. Is it possible to make a webscraping from a given url in a column?
This is a task better suited for another tool. I'd check out apify.com/
QLORA let's me train on a 4070ti with only 12gb vram. Though I can't go over 7b model
Thisa saved me. Thank you. Keep doing this :)
Great explanation, best that I've seen
Gojo ?
I'm pulling for another vid on alpha. Oobabooga suggests twice your rank. The Chinese alpaca lora people use a rank 8 with alpha 32 and I guess it worked. I've tried high alphas that make the model kinda crazy. Need guidence.
When in doubt, set alpha = rank for the effective scale factor to be 1. There are better ways to have a larger impact on training than bluntly multiplying the change in weights, like improving your dataset or dialing in the learning rate.
@@EntryPointAI this does make sense they way you put it. Thanks so much for your reply!
I used RAG while creating life coach UA-camr AI character in Telegram. I prepared a lot of AI processed data of him, with fine-grained segments. The prompt itself is also crafted according to the latest Claude prompt discovery. the most amazing AI experience for me yet - JulienHimselfBot
Eyes!
Watching the first 10 minutes of this video, explains how AI works. This should be showed in schools.
Nice logo my guy
Great explanation...
Hello, thank you for a great video. I am trying to integrate it with the app script but i keep getting error on var Response line in saying invalid user content. Is there any alternative to that?
Súper... 👍 me funciono... Muchas gracias, Saludos desde Chile... 🇨🇱