Developing an LLM: Building, Training, Finetuning

Sebastian Raschka

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 лют 2025

КОМЕНТАРІ •

@Izumichan-nw1zo 5 днів тому ⁺²
the only request i want to do is please do not stop this videos. being from a lower background somewhere in the corner of india i am able to access the top notch quality DL content of a american uni by such a good teacher . this is nothing less than a GOD work please continue this and THANK U SIR FOR EVERYTHING
@SebastianRaschka 5 днів тому
Thanks for the kind words, that's very motivating to hear!
@Izumichan-nw1zo 4 дні тому ⁺¹
@@SebastianRaschka you really do not know how helpful you are being for students like us thanks a lot we have huge respect for u
@la-dev День тому
@@SebastianRaschka And I can confirm that this is not the one case. There are plenty of such people who have no access to the paid courses and can't learn. You definitely enlightening us with your light of knowledge. Mostly we have fluff but no actual course which teaches how to fish. Thanks a lot.
@nintishia 2 місяці тому ⁺⁴
The hallmark of a great teacher is to teach the essentials with depth. Sebastian succeeds fabulously. Thanks for the video.
@SebastianRaschka Місяць тому
Thanks for the kind compliment, I am glad to hear that you found these videos helpful!
@ephremtadesse3195 День тому
Thank you so much. It really guides me on where to put my effort to learn more and practice LLM.
@tusharganguli 8 місяців тому ⁺¹⁹
Your articles and videos have been extremely helpful in understanding how LLMs are built. Building LLM from Scratch and Q and AI are resources that I am presently reading and they provide a hands-on discourse on the conceptual understanding of LLMs. You, Andrej Karpathy and Jay Alammar are shining examples of how learning should be enabled. Thank you!
@SebastianRaschka 8 місяців тому ⁺¹
Thanks for the kind comment!
@box-mt3xv 8 місяців тому ⁺³¹
The hero of open source
@SebastianRaschka 8 місяців тому ⁺²
Haha, thanks! I've learned so much thanks to all the amazing people in open source, and I'm very flattered by your comment to potentially be counted as one of them :)
@kyokushinfighter78 5 місяців тому ⁺²
One of the best 60 minutes of my time. Really thankful for this..
@SebastianRaschka 5 місяців тому
Thanks for the kind words!
@adityasamalla3251 5 місяців тому ⁺⁵
You are the best! Thanks a lot for sharing your knowledge to the world.
@pe6649 5 місяців тому ⁺¹
Danke!
@guis487 6 місяців тому ⁺¹
I am your fan, I have most of your books, thanks for this excellent video ! Another evaluation metric that I found interesting in another channel was to make the LLMs to play chess against each other 10 times.
@SebastianRaschka 6 місяців тому
Hah nice, that's a fun one. How do you evaluate who's the winner, do you use a third LLM for that?
@haribhauhud8881 5 місяців тому ⁺¹
Thank you, Sir. Your lessons are beneficial for the community. Appreciate your hard work..!! 😊
@chineduezeofor2481 7 місяців тому ⁺²
Thank you Sebastian for your awesome contributions. You're a big inspiration.
@tomhense6866 8 місяців тому ⁺¹
Very nice video, I liked it so much that I preordered your new book directly after watching it (to be fair I have read your blog for some time now).
@SebastianRaschka 8 місяців тому
Thanks! I hope you are going to like the book, too!
@admercs 5 місяців тому
You are a true educator. Honored to be a contributor to one of your libraries.
@ZavierBanerjea 6 місяців тому ⁺¹
What wonderful Tech Minds : { Sebastian Raschka, Yann LeCun, Andrej Karpathy, ...} who share their works and beautiful ideations for Mere mortal like me... Sebastian's teachings are so, so fundamental that takes fear off my clogged mind... 🙏
Although I am struggling to build LLMs for specific & niche areas, I am confidant of cracking them with great resources like : Build a Large Language Model (From Scratch)!!!
@nithinma8697 5 місяців тому ⁺⁴
00:02 Three common ways of using large language models
02:39 Developing LLM involves building, pre-training, and fine-tuning.
07:11 LLM predicts the next token in the text
09:30 Training LLM involves sliding fixed size inputs over text data to create batches.
14:22 Byte pair encoding and sentence piece variations allow LLMs to handle unknown words
16:42 Training sets are increasing in size
21:09 Developing an LM involves architecture, pre-training, model evaluation, and fine-tuning.
23:14 The Transformer block is repeated multiple times in the architecture.
27:22 Pre-training creates the Foundation model for fine-tuning
29:28 Training LLMs typically done for one to two epochs
33:44 Pre-training is not usually necessary for adapting LLM for a certain task
35:51 Replace the output layer for efficient classification.
39:54 Classification fine-tuning is key for practical business tasks.
42:01 LLM instruction data set and preference tuning
45:58 Evaluating LLMs is crucial, with MML being a popular metric.
48:07 Multiple choice questions are not sufficient to measure an LM's performance
52:34 Comparing LLM models for performance evaluation
54:32 Continued pre-training is effective for instilling new knowledge in LLMs
58:28 Access slides on the website for more details
@bjugdbjk 4 місяці тому
u r a LEGEND,luv ur work,thnx a ton for sharing!
@DataChiller 8 місяців тому ⁺⁶
the greatest Liverpool fan ever! ⚽
@SebastianRaschka 8 місяців тому ⁺⁵
Haha nice, at least one person watched it until that part :D
@rachadlakis1 8 місяців тому ⁺³
Thanks for the great knowledge You are sharing
@haqiufreedeal 7 місяців тому ⁺³
Oh, my lord, my favourite machine learning author is a Liverpool fan.😎
@SebastianRaschka 7 місяців тому ⁺¹
Haha, nice that people make it that far into the video 😊
@ananthvankipuram4012 7 місяців тому
@@SebastianRaschka You'll never walk alone 🙂
@katelee872 2 місяці тому
super, thanks so much, you are hero of us.
@tilkesh 2 дні тому
Thank you for sharing your entire work on Git Hub. I need your suggestions regarding the older versions of the tensorflow and the PyTorch which are not easily adaptable with the GPU version Python version. Do you have any suggestions on how to tackle this problem? Installing each version conflict with the other versions of nump ext and with the GPU version. How to tackle this problem.
@Xnaarkhoo 5 місяців тому ⁺¹
@16:37 when you say Llama was trained on 1T token, do you still mean there was 32K unique token ? because on your blog post you have "They also have a surprisingly large 151,642 token vocabulary (for reference, Llama 2 uses a 32k vocabulary, and Llama 3.1 uses a 128k token vocabulary); as a rule of thumb, increasing the vocab size by 2x reduces the number of input tokens by 2x so the LLM can fit more tokens into the same input. Also it especially helps with multilingual data and coding to cover words outside the standard English vocabulary."
@SebastianRaschka 5 місяців тому
Thanks for the comment! So in the talk these are the dataset sizes using the respective tokenizer that was used during model training. The vocabulary sizes that the models used are 32k for Llama 2 and 128k for Llama 3.1. So, regarding "do you still mean there was 32K unique token", the vocabulary was 32k unique tokens (but there could be more unique tokens in the dataset). I hope this helps. Otherwise, please let me know, happy to explain more!
@moshoodolawale3591 5 місяців тому ⁺¹
Thanks for the detailed videos and articles. I want to ask if it's possible to create a customized tokenizer as an extension to existing ones for a custom dataset? Also, how do decoder-only models handle other tasks like summarization, and classification after fine-tuning without forgetting their causal pre-trained causal next token task?
@SebastianRaschka 5 місяців тому
Good question. Yes you, can do that, tiktoken for example allows you to extend the vocabulary with additional tokens. However, you have to keep in mind that you'll always have to update the embedding layer and output layer with these tokens in case you want to use the updated tokenizer with an existing LLM. Regarding your second question, you could do that but that would not be ideal because only the last tokens contains information about all other tokens. If you use other tokens, you'll have more information loss.
@sahilsharma3267 8 місяців тому ⁺⁴
When is your whole book coming out ? Eagerly waiting 😅
@SebastianRaschka 8 місяців тому ⁺²
Thanks for your interest in this! It's already available for preorder (both on the publisher's website and Amazon) and if the production stage goes smoothly, it should be out by the end of of August
@muthukamalan.m6316 8 місяців тому ⁺¹
great content! love it ❤
@superfreiheit1 Місяць тому
I would like to create a chatbot for arxiv publications. I do not understand how to create the instuction or qa-Dataset for it. Can you make a tutorial?
@KumR 7 місяців тому ⁺¹
Great Video. Now that LLM is so powerful , will regular machine learning & deep learning slowly vanish?
@SebastianRaschka 7 місяців тому ⁺¹
Great question. I do think that special purpose ML solutions still have and will continue to have their place. The same way ML didn't make certain more traditional statistics based models obsolete. Regarding deep learning ... I'd say LLM is a deep learning model itself. But yeah, almost everything in deep learning is nowadays either a diffusion model, transformer-based model (vision transformer and most LLMs), or state space model
@RobinSunCruiser 7 місяців тому ⁺¹
Hi, nice videos! One question for my understanding. When talking about embedding dimensions such as 1280 in "gpt2-large" do you mean the size of the number vector encoding the context of a single token or the number of input tokens? When comparing gpt2-large and Lama2 the number is the same for the ".. embeddings with 1280 tokens".
@SebastianRaschka 7 місяців тому
Good question, the term is often used very broadly and may refer to the input embeddings or the hidden layer sizes in the MLP layer. Here, I meant the size of the tokens that are embedded.
@timothywcrane 8 місяців тому ⁺¹
I'm interested in SLM RAG with Knowledge graph traversal/search for RAG dataset collection and vector-JIT semantic match for hybrid search. Any repos you think I would be interested in?
@timothywcrane 8 місяців тому
bookmarked, clear and concise.
@SebastianRaschka 8 місяців тому
Unfortunately I don't have a good recommendation here. I have only implemented standard RAGs without knowledge graph traversal.
@tashfeenahmed3526 7 місяців тому
That's great Dr. Hope you will be doing good.
I wish if i could download your deep learning book which is published recently. If there is any open source link to download it please mention in comments.
Thanks and regards,
Researcher at Texas
@bashamsk1288 8 місяців тому ⁺¹
in the instruction fine tuning we propagate loss only on output text tokens? or for all tokens from start to EOS?
@SebastianRaschka 8 місяців тому ⁺¹
That's a good question. You can do both. By default all tokens, but more commonly you'd mask the tokens. In my book, I include the token masking as a reader exercise (it's super easy to do). There was also a new research paper a few weeks ago that I discussed in my monthly research write-ups here: magazine.sebastianraschka.com/p/llm-research-insights-instruction
@bashamsk1288 8 місяців тому
@@SebastianRaschka
Thanks for the reply
I just have a general question: do we use masking? For example, was masking used during the instruction fine-tuning of LLaMA 3 or mistral any Open source LLMs? Also, does your book include any chapters on the parallelization of training large language models?
@SebastianRaschka 8 місяців тому
@@bashamsk1288 Masking is commonly used, yes. We implement it as the default strategy in LitGPT. In my book we do both. I can't speak about Llama 3 and Mistral regarding masking, because while these are open-weight models they are not open source. So there's no training code we can look at. My book explains DDP training in the PyTorch appendix, but it's not used in the main chapters because as a requirement all chapters should also work on a laptop to make them accessible to most readers.
@alihajikaram8004 7 місяців тому
Would
you make videos about time series and trannsformer?
@ArbaazBeg 7 місяців тому
Should we give prompt to LLM when fine tuning for classification with last layer modification or directly pass the input to the LLM like in deberta?
@SebastianRaschka 7 місяців тому ⁺¹
Thanks for the comment, could you explain a bit more what you mean by passing the input directly?
@ArbaazBeg 7 місяців тому ⁺¹
@@SebastianRaschka Hey, sorry for the bad language. I meant should the chat formats like alpaca etc be applied or we give the text as it is to LLM for classification.
@SebastianRaschka 7 місяців тому ⁺¹
@@ArbaazBeg Oh I see now. And yes, you can. I wanted to create an example and performance comparison for that to the GitHub repo (github.com/rasbt/LLMs-from-scratch) some time. For that I wanted to first instruction-finetune the model on a few more spam classification instructions and examples though.
@alokranjansrivastava623 5 місяців тому ⁺¹
Nice Video.
Does LLM mean only auto-regressive models (Not Bert)?
@SebastianRaschka 5 місяців тому
Yes, here LLM is basically synonymous with decoder-style autoregressive model like Llama, GPT, Gemma, etc.
@alokranjansrivastava623 5 місяців тому
@@SebastianRaschka Bert has stack of encoder transformers, but it is not LLM. Am I correct here?
@SebastianRaschka 5 місяців тому
@@alokranjansrivastava623 Architecture-wise, it's kind of the same thing though, except it doesn't have the causal mask, and the pretraining task is not next-token prediction but predicting masked tokens (plus sentence order prediction).
@alokranjansrivastava623 5 місяців тому
@@SebastianRaschkaJust One question. How to define LLM? When we can say that this particular Language model is of LLM category.
@andreyc.3600 5 місяців тому ⁺¹
Ich nehme stark an, dass Du Deutsch sprichst :). Wo kann man Dein Buch im Kindle (mobi oder f2b) Format finden? Danke & LG.
@SebastianRaschka 5 місяців тому
Vielen Dank fuer das Interesse an meinem Buch. So weit ich es vom Verlag mitbekommen habe wird diese Woche zum Drucker geschickt und dann sollte es hoffentlich in ein paar Wochen auch bein Amazon.com/.de als Kindle Version erhaeltlich werden.
@andreyc.3600 5 місяців тому
@@SebastianRaschka super, vielen Dank.
@kartiksaini5847 8 місяців тому ⁺¹
Big fan ❤
@joisco4394 8 місяців тому
I've heard about instruct learning, and it sounds similar to how you define preference learning. I have also heard about transfer learning. How would you compare/define those?
@SebastianRaschka 8 місяців тому ⁺¹
Transfer learning is basically involved in everything you do when you start out with a pretrained model. We don't really name or call it out explicitly anymore because it's so common. In instruction finetuning, the loss function is different from preference tuning mainly. Instruction finetuning trains the model to answer queries, and preference finetuning is basically more about the nuance of how these get answered. All preference tuning methods that are used today (DPO, RLHF+PPO, KTO), etc. expect you to have done instruction finetuning on your model before you preference finetune.
@joisco4394 7 місяців тому ⁺¹
@@SebastianRaschka Thanks for explaining it. I need to do a lot more research :p
@SebastianRaschka 6 місяців тому ⁺¹
@@joisco4394 Btw I recently coded the alignment (using direct preference optimization) here, which might help clarifying this step: github.com/rasbt/LLMs-from-scratch/blob/main/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
@joisco4394 6 місяців тому
@@SebastianRaschka Much appreciated
@mushinart 7 місяців тому ⁺¹
Im sold , im buying your book .. would love to chat with you sometime if possible
@SebastianRaschka 7 місяців тому ⁺¹
Thanks, hope you are liking it! Are you going to SciPy in July by chance, or maybe Neurips end of the year?
@mushinart 7 місяців тому
@@SebastianRaschka unfortunately not,but I'd like to have a zoom/google meet chat with you if possible
@zamot155 Місяць тому
32:50 watched this today, 12.12 and that was kind of scary 😅
@ramprasadchauhan7 7 місяців тому
Hello sir, please also make with javascript
@MadnessAI8X 8 місяців тому ⁺¹
What we are seeking not only fuzzing code
@SebastianRaschka 8 місяців тому ⁺¹
Glad that's useful
@TheCuriousCurator-Hindi 3 місяці тому
I have been in this field for a while (decade +) but not in touch with LLMs and it is useless for uninformed and even more useless for the informed. I don't know I am in which category but I didn't learn anything. I read about transformers when the paper came then I assumed RLHF is just reinforce algorithm which is probably correct to assume. Anyways highly repellent video.
@krum.00 8 місяців тому ⁺¹
🤌
@SebastianRaschka 8 місяців тому
I take that as a compliment!? 😅😊
@krum.00 7 місяців тому ⁺¹
@@SebastianRaschka Yes yes! It was supposed to be a compliment only. You are doing great work with our teaching materials :).
@redthunder6183 8 місяців тому
Easier said than done unless u got a GPU super computer lying around lol
@SebastianRaschka 8 місяців тому ⁺¹
ha, I should mention that all chapters in my book run on laptops, too. It was a personal goal for me that everything should work even without a GPU. The instruction finetuning takes about ~30 min on a CPU to get reasonable results (granted, the same code takes 1.24 min on an A100)
@Quester2023-xp7rb 10 днів тому
you dont sound like professional while speaking and very less is understood for the beginners from this video.....i was thinking to buy your "Build LLM from Scratch" book ...but the way you sound here in this video, i doubt if i will read your book
@SebastianRaschka 8 днів тому
As an opportunity for me to learn, would you mind sharing exactly what it is? Is it more about the audio quality? (Since I don't make UA-cam videos very often, I currently use a relatively inexpensive microphone, which I may replace someday.) Or is it something else like the language choice? By the way, making videos always feels quite challenging for me since I’m not a native speaker. That’s why I prefer technical writing-it comes somehow much more naturally to me than creating videos.
@Quester2023-xp7rb 8 днів тому
@@SebastianRaschka oh i understood that you are a non native its ok.Yes also some audio quality but its ok .... i had already finished reading few pages and creating new tokenized encoder and decoder things like string to int and int to string and continue reading your book ....dont mind i am just a highschool graduate
@SebastianRaschka 5 днів тому
@@Quester2023-xp7rb Thanks for the comment, and I hope you are finding the book useful!

Наступне

Автоматичне відтворення

Building LLMs from the Ground Up: A 3-hour Coding Workshop