LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p
Вставка
- Опубліковано 13 чер 2024
- In this video, we explore how the temperature, top-k and top-p techniques influence the text generation of large language models (LLMs).
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why We Don't Use the Mean Squared Error (MSE) Loss in Classification: • Why We Don't Use the M...
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why Language Models Hallucinate: • Why Language Models Ha...
Grounding DINO, Open-Set Object Detection: • Object Detection Part ...
Detection Transformers (DETR), Object Queries: • Object Detection Part ...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework f...
Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large...
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:37 - Greedy Decoding
01:05 - Random Sampling
01:50 - Temperature
03:55 - Top-k Sampling
04:27 - Top-p Sampling
05:10 - Pros and Cons
07:30 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#llm #largelanguagemodels #chatgpt #textgeneration #promptengineering
Wondering how you can fine-tune LLMs? Take a look here to see how this is done with LoRa, a popular fine-tuning mechanism: ua-cam.com/video/CNmsM6JGJz0/v-deo.html
VIdeo mistakes:
- At 2:30 the sum should be for j, not for i. Thanks @mriz for noticing this!
- The probability distribution after selecting top-3 words at 4:10 is not accurate, and they should be sunny - 0.46, rainy - 0.38, the - 0.15. Thanks @koiRitwikHai for noticing this!
Thanks! Top p and Top k were easy to understand.
You're welcome! I'm glad to hear that those concepts were clear and easy to understand. If you have any more questions or need further clarification on this topic, feel free to ask! :)
This is a really clear explanation in this concept. Loved it. Thanks!
Thanks! Happy to hear that you liked the explanation! :)
Great introduction with a clear an simple explanation/ illustration. Thanks!
Thanks! Glad you found it helpful! :)
Thank you for the video and explanation between the three types of sampling for LLMs. When sampling between Temperature, Top-K and Top-P, are you using or enabling all three sampling methods at the same time?
For example if I chose to do Top-K sampling for controlled diversity and reduced nonsense, does that mean that I will choose a low temperature as well?
Glad it was helpful! Yes, you can combine multiple sampling methods at the same time. :)
Hello, in TOP-P, witch of the 4 words will be chosen? It's randomly between "sunny", "rainy", "the" and "good"?
Yes, it's random according to their distribution.
@@datamlistic so they are randomly selected, but higher probable values have higher chance of being selected?
@@Annaonawave exactly :)
Let's say I use top_k=4, does the model sample 1 word out of the 4 most probable words randomly? If not, what happens?
That's exactly what happens! The model samples 1 word out of the most probable 4, according to their distribution. (i.e. the higher the probabaility of a word, the more likely it is to sample it).
The probability distribution you get after selecting top-3 words at 4:10 is not accurate. The probabilities, after normalizing the 3-word-window, should be sunny-0.46, rainy-0.38, and the-0.15.
Yep, that's correct. Thanks for the feedback! I created/recorder the video over a longer period of time and it seems that I used two version of numbers in doing that (forgot to make any updates). I'm sorry if this has caused any confusion. I will add some corrections about this issue in the description/pinned comment.
p.s. Maybe it would be a good idea to take the ceil of one of the probabilities you enumerated, so they sum up to 1.
2:3
bro you wrong the sums is not for input i , but for j
Yep, that's correct. Thanks for the feedback and sorry if this confused you! I will add a note about this mistake in the pinned comment. :)