I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and UA-cam and this lecturer is no exception.
00:10 Building Large Language Models overview 02:21 Focus on data evaluation and systems in industry over architecture 06:25 Auto regressive language models predict the next word in a sentence. 08:26 Tokenizing text is crucial for language models 12:38 Training a large language model involves using a large corpus of text. 14:49 Tokenization process considerations 18:40 Tokenization improvement in GPT 4 for code understanding 20:31 Perplexity measures model hesitation between tokens 24:18 Comparing outputs and model prompting 26:15 Evaluation of language models can yield different results 30:15 Challenges in training large language models 32:06 Challenges in building large language models 35:57 Collecting real-world data is crucial for large language models 37:53 Challenges in building large language models 41:38 Scaling laws predict performance improvement with more data and larger models 43:33 Relationship between data, parameters, and compute 47:21 Importance of scaling laws in model performance 49:12 Quality of data matters more than architecture and losses in scaling laws 52:54 Inference for large language models is very expensive 54:54 Training large language models is costly 59:12 Post training aligns language models for AI assistant use 1:01:05 Supervised fine-tuning for large language models 1:04:50 Leveraging large language models for data generation and synthesis 1:06:49 Balancing data generation and human input for effective learning 1:10:23 Limitations of human abilities in generating large language models 1:12:12 Training language models to maximize human preference instead of cloning human behaviors. 1:16:06 Training reward model using softmax logits for human preferences. 1:18:02 Modeling optimization and challenges in large language models (LLMs) 1:21:49 Reinforcement learning models and potential benefits 1:23:44 Challenges with using humans for data annotation 1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves 1:29:12 Perplexity is not calibrated for large language models 1:33:00 Variance in performance of GPT-4 based on prompt specificity 1:34:51 Pre-training data plays a vital role in model initialization 1:38:32 Utilize GPUs efficiently with matrix multiplication 1:40:21 Utilizing 16 bits for faster training in deep learning 1:44:08 Building Large Language Models from scratch Crafted by Merlin AI.
Insights By "YouSum Live" 00:00:05 Building large language models (LLMs) 00:00:59 Overview of LLM components 00:01:21 Importance of data in LLM training 00:02:59 Pre-training models on internet data 00:04:48 Language models predict word sequences 00:06:02 Auto-regressive models generate text 00:10:48 Tokenization is crucial for LLMs 00:19:12 Evaluation using perplexity 00:22:07 Challenges in evaluating LLMs 00:29:00 Data collection is a significant challenge 00:41:08 Scaling laws improve model performance 01:00:01 Post-training aligns models with user intent 01:02:26 Supervised fine-tuning enhances model responses 01:10:00 Reinforcement learning from human feedback 01:19:01 DPO simplifies reinforcement learning process 01:28:01 Evaluation of post-training models 01:37:20 System optimization for LLM training 01:39:05 Low precision improves GPU efficiency 01:41:38 Operator fusion enhances computational speed 01:44:23 Future considerations for LLM development Insights By "YouSum Live"
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
This course has so much of insights and a quick summary view of LLMs. I have also gone through coursera course paid one. This one is equally good and free. Thanks for the video.
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
00:10 Обзор создания больших языковых моделей 02:21 Сосредоточьтесь на оценке данных и системах на практике 06:25 Авторегрессивные языковые модели предсказывают следующее слово 08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей. 12:38 Токенизация и обучение токенизаторов 14:49 Оптимизация процесса токенизации и решения по объединению токенов 18:40 GPT 4 улучшил токенизацию для лучшего понимания кода 20:31 Переплетение измеряет колебания модели между словами. 24:18 Оценка открытых вопросов является сложной задачей. 26:15 Различные способы оценки крупных языковых моделей 30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей 32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах. 35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей. 37:53 Проблемы при предобучении крупных языковых моделей 41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей. 43:33 Вычисления определяются данными и параметрами. 47:21 Понимание значения законов масштабирования при создании больших языковых моделей 49:12 Хорошие данные имеют решающее значение для лучшего масштабирования. 52:54 Вывод для больших языковых моделей дорогой. 54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат. 59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами. 1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных. 1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения. 1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения. 1:10:23 Проблемы в создании контента, превышающего человеческие способности 1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений 1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений 1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением 1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград. 1:23:44 Проблемы использования людей в качестве аннотаторов данных 1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди. 1:29:12 Проблемы с перплексией и калибровкой в языковых моделях 1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок 1:34:51 Важность предобучения в больших языковых моделях 1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль. 1:40:21 Уменьшенная точность для более быстрой матричной умножения 1:44:08 Создание больших языковых моделей (ЯМП) Crafted by Merlin AI.
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
I had the privilege of attending an insightful 90-minute lecture by Stanford faculty, which greatly boosted my confidence in completing my thesis. The approach they shared aligns closely with my own research methodology, reinforcing the direction of my work. Grateful for this inspiring experience!"
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
Scaling behavior of LLM fine-tuning, emphasizes the importance of model size, task-specific considerations, and the trade-offs between different fine-tuning approaches. It highlights the need for practitioners to make informed decisions based on their specific needs and resources. As the field of LLMs continues to evolve, further research is needed to fully understand the complex interplay between model architecture, data, and fine-tuning strategies, especially at even larger scales. My research significantly contributes to the ongoing effort to develop more efficient and effective methods for adapting powerful LLMs to a wide range of downstream tasks.
The Chinchilla paper demonstrated that for a fixed FLOPs budget, smaller models trained on more data perform better than larger models trained on less data.
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
the lecture is good but the thing i dislike is the frequent change of the slide screen with the tutor camera. the video should be like a mini-player of tutor camera at the bottom corner with the slide screen on for the full time. that irritates me a lot in the whole lecture, making my focus fluctuate constantly
I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
This is a quality lecture?
@@Fracasse-0x13 for people who dont have access to education, yes, it is a quality lecture.
i am living my dreams
@@Fracasse-0x13 why this is not a quality lecture?
They all the same
Everything is on the web
you don't need certification to tell the world you know it
Build the best
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and UA-cam and this lecturer is no exception.
Thank you for sharing your positive experiences with our courses and lectures!
Wow, big words. Thank you for the comment, your words encouraged me to watch the whole thing and I don't regret it at all. Best decision!
00:10 Building Large Language Models overview
02:21 Focus on data evaluation and systems in industry over architecture
06:25 Auto regressive language models predict the next word in a sentence.
08:26 Tokenizing text is crucial for language models
12:38 Training a large language model involves using a large corpus of text.
14:49 Tokenization process considerations
18:40 Tokenization improvement in GPT 4 for code understanding
20:31 Perplexity measures model hesitation between tokens
24:18 Comparing outputs and model prompting
26:15 Evaluation of language models can yield different results
30:15 Challenges in training large language models
32:06 Challenges in building large language models
35:57 Collecting real-world data is crucial for large language models
37:53 Challenges in building large language models
41:38 Scaling laws predict performance improvement with more data and larger models
43:33 Relationship between data, parameters, and compute
47:21 Importance of scaling laws in model performance
49:12 Quality of data matters more than architecture and losses in scaling laws
52:54 Inference for large language models is very expensive
54:54 Training large language models is costly
59:12 Post training aligns language models for AI assistant use
1:01:05 Supervised fine-tuning for large language models
1:04:50 Leveraging large language models for data generation and synthesis
1:06:49 Balancing data generation and human input for effective learning
1:10:23 Limitations of human abilities in generating large language models
1:12:12 Training language models to maximize human preference instead of cloning human behaviors.
1:16:06 Training reward model using softmax logits for human preferences.
1:18:02 Modeling optimization and challenges in large language models (LLMs)
1:21:49 Reinforcement learning models and potential benefits
1:23:44 Challenges with using humans for data annotation
1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves
1:29:12 Perplexity is not calibrated for large language models
1:33:00 Variance in performance of GPT-4 based on prompt specificity
1:34:51 Pre-training data plays a vital role in model initialization
1:38:32 Utilize GPUs efficiently with matrix multiplication
1:40:21 Utilizing 16 bits for faster training in deep learning
1:44:08 Building Large Language Models from scratch
Crafted by Merlin AI.
Is my teachers in school looked this good, I wouldn't miss a single class. He's handsome af.
🤣🤣🤣😂
No because fr
Came for the speaker; stayed for the knowledge.
… strange
I'm a straight dude and even I'm like "DAMN!"
We live in a tremendous moment in time. Free access to the best lectures on the most relevant topic from the best university
Thanks for your comment, we love to hear this feedback!
Insights By "YouSum Live"
00:00:05 Building large language models (LLMs)
00:00:59 Overview of LLM components
00:01:21 Importance of data in LLM training
00:02:59 Pre-training models on internet data
00:04:48 Language models predict word sequences
00:06:02 Auto-regressive models generate text
00:10:48 Tokenization is crucial for LLMs
00:19:12 Evaluation using perplexity
00:22:07 Challenges in evaluating LLMs
00:29:00 Data collection is a significant challenge
00:41:08 Scaling laws improve model performance
01:00:01 Post-training aligns models with user intent
01:02:26 Supervised fine-tuning enhances model responses
01:10:00 Reinforcement learning from human feedback
01:19:01 DPO simplifies reinforcement learning process
01:28:01 Evaluation of post-training models
01:37:20 System optimization for LLM training
01:39:05 Low precision improves GPU efficiency
01:41:38 Operator fusion enhances computational speed
01:44:23 Future considerations for LLM development
Insights By "YouSum Live"
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
Suddenly I am interested in LLMS
I might not know what you are saying but I have the same feeling as you lol.
😂😂😂
🤣🤣
Why the picture of Zé Pequeno ?
😂
Slides: drive.google.com/file/d/1B46VFrqFAPAEj3kaCrBAtQqeh2_Ztawl/view?usp=sharing
Thank you sir...i heartly appreciate it😊.... lecture was awesome 🤌
thankyou so much. i really appreciate it
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
@@helloadventureworld no, the rest of CS229 has not been released and I don't know if it will. This is only the guest lecture.
@@yanndubois3914 Thanks for the response and information you have shared :)
Damn. That lecturer is fineeee. 😍
finally a someone said Machine Learning instead of slapping AI on everything!
I feel that whenever someone talks about AI a lot it means that they know nothing about it
Right? And a lot of people believing in Yubal Harari because of it
This course has so much of insights and a quick summary view of LLMs. I have also gone through coursera course paid one. This one is equally good and free. Thanks for the video.
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
00:10 Обзор создания больших языковых моделей
02:21 Сосредоточьтесь на оценке данных и системах на практике
06:25 Авторегрессивные языковые модели предсказывают следующее слово
08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей.
12:38 Токенизация и обучение токенизаторов
14:49 Оптимизация процесса токенизации и решения по объединению токенов
18:40 GPT 4 улучшил токенизацию для лучшего понимания кода
20:31 Переплетение измеряет колебания модели между словами.
24:18 Оценка открытых вопросов является сложной задачей.
26:15 Различные способы оценки крупных языковых моделей
30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей
32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах.
35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей.
37:53 Проблемы при предобучении крупных языковых моделей
41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей.
43:33 Вычисления определяются данными и параметрами.
47:21 Понимание значения законов масштабирования при создании больших языковых моделей
49:12 Хорошие данные имеют решающее значение для лучшего масштабирования.
52:54 Вывод для больших языковых моделей дорогой.
54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат.
59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами.
1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных.
1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения.
1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения.
1:10:23 Проблемы в создании контента, превышающего человеческие способности
1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений
1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений
1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением
1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград.
1:23:44 Проблемы использования людей в качестве аннотаторов данных
1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди.
1:29:12 Проблемы с перплексией и калибровкой в языковых моделях
1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок
1:34:51 Важность предобучения в больших языковых моделях
1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль.
1:40:21 Уменьшенная точность для более быстрой матричной умножения
1:44:08 Создание больших языковых моделей (ЯМП)
Crafted by Merlin AI.
Thank you for the video! I am glad that we live in this time and can witness the development of AI technologies.
He is doing his part to encourage women in STEM.
Women have always been in STEM. We all know about Grace Hopper. Please let this go.
Lookup Ruth David. She worked at the CIA redid all of their tech infrastructure and she’s still alive!
haha absolutely
vos queres un marido de stem nada
😮@@astrolillo
what a wonderful lectures...this 1.75 hour is one of the most valuable in my life
Best explanation.. I'm watching at 3 am. Thanks
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
Really incredible delivery of complicated information. ❤
I had the privilege of attending an insightful 90-minute lecture by Stanford faculty, which greatly boosted my confidence in completing my thesis. The approach they shared aligns closely with my own research methodology, reinforcing the direction of my work. Grateful for this inspiring experience!"
Phenomenal Explaiantion, Love for Stanford , Professors and their methodolgies is a never ending tale !!
He is an alien, such brilliant and young human being. Impressed.
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
Very informative, updated and crisp~ keep them coming..don't stop now!
Great talk. Loved the level of detail, the insights, the pacing.
I love the way you answered the questions, very clear and precise.
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
Fabulous lecture! Goes into all important concepts and also highlights the interesting details that are commonly glossed over, thanks for recording!
fantastic, wonderful, significant, magnificent, outstanding, class of titans, world-class🎉
Thanks a lot for sharing this. I would like to point a correction-
time 20:28 -
Consider case prob(true_token)
Yes that's correct, it's the baseline performance of a very bad language model.
Amazing lecture. Great job
Thank you for the gem Standford Online. Great starter - Time to read more papers on LLMs
one good point when they discuss the difference between ppo and dpo is reward model can reduce the dependency of labeled preference data
Thanks for sharing this. It is a great introduction of the LLM system.
this was genuinely interesting and easy to follow through, thanks!
Wow! Such a wonderful presentation! Thanks so much!
My sincere thanks for sharing it.
Great & Comprehensive Presentation 🎉
Scaling behavior of LLM fine-tuning, emphasizes the importance of model size, task-specific considerations, and the trade-offs between different fine-tuning approaches. It highlights the need for practitioners to make informed decisions based on their specific needs and resources. As the field of LLMs continues to evolve, further research is needed to fully understand the complex interplay between model architecture, data, and fine-tuning strategies, especially at even larger scales. My research significantly contributes to the ongoing effort to develop more efficient and effective methods for adapting powerful LLMs to a wide range of downstream tasks.
Great presentation and very helpful. Thanks for sharing this
Please give this dude 15more minutes, for Tiling, Flash Attention, Parallelization for data and model !!
If you know all of that, you don't need 15 more minutes.
great lecture, wish the speaker had more time to go over the full presentation
What an amazing lecture, now want a part 2 about the topics that haven’t been touched upon 🤩
Dayum he’s fine
Most amazing video ever
The Chinchilla paper demonstrated that for a fixed FLOPs budget, smaller models trained on more data perform better than larger models trained on less data.
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
Great lecture
This is a gold mine
The best one we want more
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
Could you please share the link to the lecture on Transformers that you were referring to in the video?
Ignore this comment
Day 1 19:05
Day 2 28:38
Day 3 41:05
Day 4 1:00:00
man this is amazing!
It's never too late to get started for learning
Can we please have access to the previous lecture about Transformers?
You can build my ❤️
Just Amazing!
When will the other lectures be updated? This was so good!
this is amazing, can you guys make a playlist for begginers?. thank you!
thank you! great lecture.
Great content, thanks!
would love to see the other recordings of cs25!
I don’t know what the guy is talking about but imma watch HIM
So Amazing!
what is that paper that mentions from last year at 1:27:25 which is 50x cheaper and better than human agreements?
Thank you for this
LLM - chatbots
Architecture (Neural networks)
Training algorithm
Data
Evaluation
System
Looking forward to do a PostDoc from SU
I like his teaching style and that laughter in between 😂😁🤙. Last one be careful heavyone
Thank you! 🚀
More lecture of Machine learning plz share
The reason Stanford graduate the rule the world
suddenly i m interested in llms😗😗😗
I'm just trying to get started in ML. Good god. Do a you tube channel already. Really good. Or at least do some blog updates.
As a gay guy who studied EE and CS at Stanford, I can confirm I had a crush on him
Where can we find the rest of the videos for CS229 summer 2024?
From Brazil 🇧🇷
Anyone here took the class in which this lecture was held ( cs229 summer 2024) ?
I have a doubt in Scalable data for SFT, isn't the model be biased as its using its own knowledge to generate dataset and further trained on the same?
is there a way to add sections so we can return to specific parts later?
Whoever records these videos need to leave the slides up longer for the viewers to read as the speaker explains the concepts.
to which playlist does this belong to?
This interests me but I have no coding experience. Any tips to where to start , surely Standford lectures ? Coding 101 I guess. Anything helps :)
The training algorithm is actually the key... It is because of RLHF that we have GPT-4
Yann, if you ever get to read this, you are a truly handsome man. I
Cringe
lets call him captain LLM looks a bit like chris evans
agree
This boy is not a vase, who knows what people desire to know behind the scenes.
Impressive
Yes!
Does anyone have the pdf or ppt for this lecture, if so please reply to this comment. Thanks!
the biggest novelty of chatgpt is the UI lol
Steve rogers talking about AI❤
What class is the part of?
Handsome Modeling 225
is there any playlist of this video??
Just one
anybody know of any resources for learning LLM?
the lecture is good but the thing i dislike is the frequent change of the slide screen with the tutor camera. the video should be like a mini-player of tutor camera at the bottom corner with the slide screen on for the full time. that irritates me a lot in the whole lecture, making my focus fluctuate constantly
Thank You