LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p

DataMListic

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 чер 2024
In this video, we explore how the temperature, top-k and top-p techniques influence the text generation of large language models (LLMs).
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why We Don't Use the Mean Squared Error (MSE) Loss in Classification: • Why We Don't Use the M...
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why Language Models Hallucinate: • Why Language Models Ha...
Grounding DINO, Open-Set Object Detection: • Object Detection Part ...
Detection Transformers (DETR), Object Queries: • Object Detection Part ...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework f...
Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large...
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:37 - Greedy Decoding
01:05 - Random Sampling
01:50 - Temperature
03:55 - Top-k Sampling
04:27 - Top-p Sampling
05:10 - Pros and Cons
07:30 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#llm #largelanguagemodels #chatgpt #textgeneration #promptengineering

КОМЕНТАРІ • 19

@datamlistic 4 місяці тому
Wondering how you can fine-tune LLMs? Take a look here to see how this is done with LoRa, a popular fine-tuning mechanism: ua-cam.com/video/CNmsM6JGJz0/v-deo.html
VIdeo mistakes:
- At 2:30 the sum should be for j, not for i. Thanks @mriz for noticing this!
- The probability distribution after selecting top-3 words at 4:10 is not accurate, and they should be sunny - 0.46, rainy - 0.38, the - 0.15. Thanks @koiRitwikHai for noticing this!
@user-jx5or8pk2m 27 днів тому ⁺¹
Thanks! Top p and Top k were easy to understand.
@datamlistic 26 днів тому
You're welcome! I'm glad to hear that those concepts were clear and easy to understand. If you have any more questions or need further clarification on this topic, feel free to ask! :)
@waiitwhaat Місяць тому
This is a really clear explanation in this concept. Loved it. Thanks!
@datamlistic Місяць тому ⁺¹
Thanks! Happy to hear that you liked the explanation! :)
@stev__8881 Місяць тому
Great introduction with a clear an simple explanation/ illustration. Thanks!
@datamlistic Місяць тому
Thanks! Glad you found it helpful! :)
@nizhsound 4 місяці тому
Thank you for the video and explanation between the three types of sampling for LLMs. When sampling between Temperature, Top-K and Top-P, are you using or enabling all three sampling methods at the same time?
For example if I chose to do Top-K sampling for controlled diversity and reduced nonsense, does that mean that I will choose a low temperature as well?
@datamlistic 4 місяці тому
Glad it was helpful! Yes, you can combine multiple sampling methods at the same time. :)
@igordias8728 3 місяці тому ⁺¹
Hello, in TOP-P, witch of the 4 words will be chosen? It's randomly between "sunny", "rainy", "the" and "good"?
@datamlistic 3 місяці тому ⁺¹
Yes, it's random according to their distribution.
@Annaonawave 2 місяці тому ⁺¹
@@datamlistic so they are randomly selected, but higher probable values have higher chance of being selected?
@datamlistic 2 місяці тому ⁺¹
@@Annaonawave exactly :)
@varadarajraghavendrasrujan3210 17 днів тому
Let's say I use top_k=4, does the model sample 1 word out of the 4 most probable words randomly? If not, what happens?
@datamlistic 17 днів тому
That's exactly what happens! The model samples 1 word out of the most probable 4, according to their distribution. (i.e. the higher the probabaility of a word, the more likely it is to sample it).
@koiRitwikHai 4 місяці тому
The probability distribution you get after selecting top-3 words at 4:10 is not accurate. The probabilities, after normalizing the 3-word-window, should be sunny-0.46, rainy-0.38, and the-0.15.
@datamlistic 4 місяці тому ⁺¹
Yep, that's correct. Thanks for the feedback! I created/recorder the video over a longer period of time and it seems that I used two version of numbers in doing that (forgot to make any updates). I'm sorry if this has caused any confusion. I will add some corrections about this issue in the description/pinned comment.
p.s. Maybe it would be a good idea to take the ceil of one of the probabilities you enumerated, so they sum up to 1.
@mriz 3 місяці тому
2:3
bro you wrong the sums is not for input i , but for j
@datamlistic 3 місяці тому ⁺¹
Yep, that's correct. Thanks for the feedback and sorry if this confused you! I will add a note about this mistake in the pinned comment. :)

Наступне

Автоматичне відтворення

Jailbroken: How Does LLM Safety Training Fail? - Paper Explained