I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and UA-cam and this lecturer is no exception.
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
Insights By "YouSum Live" 00:00:05 Building large language models (LLMs) 00:00:59 Overview of LLM components 00:01:21 Importance of data in LLM training 00:02:59 Pre-training models on internet data 00:04:48 Language models predict word sequences 00:06:02 Auto-regressive models generate text 00:10:48 Tokenization is crucial for LLMs 00:19:12 Evaluation using perplexity 00:22:07 Challenges in evaluating LLMs 00:29:00 Data collection is a significant challenge 00:41:08 Scaling laws improve model performance 01:00:01 Post-training aligns models with user intent 01:02:26 Supervised fine-tuning enhances model responses 01:10:00 Reinforcement learning from human feedback 01:19:01 DPO simplifies reinforcement learning process 01:28:01 Evaluation of post-training models 01:37:20 System optimization for LLM training 01:39:05 Low precision improves GPU efficiency 01:41:38 Operator fusion enhances computational speed 01:44:23 Future considerations for LLM development Insights By "YouSum Live"
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
00:10 Building Large Language Models overview 02:21 Focus on data evaluation and systems in industry over architecture 06:25 Auto regressive language models predict the next word in a sentence. 08:26 Tokenizing text is crucial for language models 12:38 Training a large language model involves using a large corpus of text. 14:49 Tokenization process considerations 18:40 Tokenization improvement in GPT 4 for code understanding 20:31 Perplexity measures model hesitation between tokens 24:18 Comparing outputs and model prompting 26:15 Evaluation of language models can yield different results 30:15 Challenges in training large language models 32:06 Challenges in building large language models 35:57 Collecting real-world data is crucial for large language models 37:53 Challenges in building large language models 41:38 Scaling laws predict performance improvement with more data and larger models 43:33 Relationship between data, parameters, and compute 47:21 Importance of scaling laws in model performance 49:12 Quality of data matters more than architecture and losses in scaling laws 52:54 Inference for large language models is very expensive 54:54 Training large language models is costly 59:12 Post training aligns language models for AI assistant use 1:01:05 Supervised fine-tuning for large language models 1:04:50 Leveraging large language models for data generation and synthesis 1:06:49 Balancing data generation and human input for effective learning 1:10:23 Limitations of human abilities in generating large language models 1:12:12 Training language models to maximize human preference instead of cloning human behaviors. 1:16:06 Training reward model using softmax logits for human preferences. 1:18:02 Modeling optimization and challenges in large language models (LLMs) 1:21:49 Reinforcement learning models and potential benefits 1:23:44 Challenges with using humans for data annotation 1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves 1:29:12 Perplexity is not calibrated for large language models 1:33:00 Variance in performance of GPT-4 based on prompt specificity 1:34:51 Pre-training data plays a vital role in model initialization 1:38:32 Utilize GPUs efficiently with matrix multiplication 1:40:21 Utilizing 16 bits for faster training in deep learning 1:44:08 Building Large Language Models from scratch Crafted by Merlin AI.
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
00:10 Обзор создания больших языковых моделей 02:21 Сосредоточьтесь на оценке данных и системах на практике 06:25 Авторегрессивные языковые модели предсказывают следующее слово 08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей. 12:38 Токенизация и обучение токенизаторов 14:49 Оптимизация процесса токенизации и решения по объединению токенов 18:40 GPT 4 улучшил токенизацию для лучшего понимания кода 20:31 Переплетение измеряет колебания модели между словами. 24:18 Оценка открытых вопросов является сложной задачей. 26:15 Различные способы оценки крупных языковых моделей 30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей 32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах. 35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей. 37:53 Проблемы при предобучении крупных языковых моделей 41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей. 43:33 Вычисления определяются данными и параметрами. 47:21 Понимание значения законов масштабирования при создании больших языковых моделей 49:12 Хорошие данные имеют решающее значение для лучшего масштабирования. 52:54 Вывод для больших языковых моделей дорогой. 54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат. 59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами. 1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных. 1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения. 1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения. 1:10:23 Проблемы в создании контента, превышающего человеческие способности 1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений 1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений 1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением 1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград. 1:23:44 Проблемы использования людей в качестве аннотаторов данных 1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди. 1:29:12 Проблемы с перплексией и калибровкой в языковых моделях 1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок 1:34:51 Важность предобучения в больших языковых моделях 1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль. 1:40:21 Уменьшенная точность для более быстрой матричной умножения 1:44:08 Создание больших языковых моделей (ЯМП) Crafted by Merlin AI.
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
This genius saying "2K return tickets from JFK to LDN are not significant" (in terms of environmental impact) and that "next models will be +10X FLOPS" just makes me conclude that these guys are not only throwing money at the problem (i.e. gen AI) but don't have a thoughtful solution on how to train AI considering the environment and economic aspects of it.
the lecture is good but the thing i dislike is the frequent change of the slide screen with the tutor camera. the video should be like a mini-player of tutor camera at the bottom corner with the slide screen on for the full time. that irritates me a lot in the whole lecture, making my focus fluctuate constantly
I appreciate the sharing but I find this view of LLMs too simplistic and approximate (when not outright missing some pieces) for those in the field, and probably too complicated/misleading for those who aren't. Also I don't see the due attention to mechanistic interpretability, emergent properties and models reasoning debate with appropriate quotes of recent papers.
I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you
This is a quality lecture?
@@Fracasse-0x13 for people who dont have access to education, yes, it is a quality lecture.
i am living my dreams
@@Fracasse-0x13 why this is not a quality lecture?
They all the same
Everything is on the web
you don't need certification to tell the world you know it
Build the best
It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and UA-cam and this lecturer is no exception.
Thank you for sharing your positive experiences with our courses and lectures!
Is my teachers in school looked this good, I wouldn't miss a single class. He's handsome af.
🤣🤣🤣😂
No because fr
Came for the speaker; stayed for the knowledge.
… strange
I'm a straight dude and even I'm like "DAMN!"
Slides: drive.google.com/file/d/1B46VFrqFAPAEj3kaCrBAtQqeh2_Ztawl/view?usp=sharing
Thank you sir...i heartly appreciate it😊.... lecture was awesome 🤌
thankyou so much. i really appreciate it
lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering
@@helloadventureworld no, the rest of CS229 has not been released and I don't know if it will. This is only the guest lecture.
@@yanndubois3914 Thanks for the response and information you have shared :)
This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.
We live in a tremendous moment in time. Free access to the best lectures on the most relevant topic from the best university
Insights By "YouSum Live"
00:00:05 Building large language models (LLMs)
00:00:59 Overview of LLM components
00:01:21 Importance of data in LLM training
00:02:59 Pre-training models on internet data
00:04:48 Language models predict word sequences
00:06:02 Auto-regressive models generate text
00:10:48 Tokenization is crucial for LLMs
00:19:12 Evaluation using perplexity
00:22:07 Challenges in evaluating LLMs
00:29:00 Data collection is a significant challenge
00:41:08 Scaling laws improve model performance
01:00:01 Post-training aligns models with user intent
01:02:26 Supervised fine-tuning enhances model responses
01:10:00 Reinforcement learning from human feedback
01:19:01 DPO simplifies reinforcement learning process
01:28:01 Evaluation of post-training models
01:37:20 System optimization for LLM training
01:39:05 Low precision improves GPU efficiency
01:41:38 Operator fusion enhances computational speed
01:44:23 Future considerations for LLM development
Insights By "YouSum Live"
Thank you for the video! I am glad that we live in this time and can witness the development of AI technologies.
Suddenly I am interested in LLMS
finally a someone said Machine Learning instead of slapping AI on everything!
I feel that whenever someone talks about AI a lot it means that they know nothing about it
This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.
00:10 Building Large Language Models overview
02:21 Focus on data evaluation and systems in industry over architecture
06:25 Auto regressive language models predict the next word in a sentence.
08:26 Tokenizing text is crucial for language models
12:38 Training a large language model involves using a large corpus of text.
14:49 Tokenization process considerations
18:40 Tokenization improvement in GPT 4 for code understanding
20:31 Perplexity measures model hesitation between tokens
24:18 Comparing outputs and model prompting
26:15 Evaluation of language models can yield different results
30:15 Challenges in training large language models
32:06 Challenges in building large language models
35:57 Collecting real-world data is crucial for large language models
37:53 Challenges in building large language models
41:38 Scaling laws predict performance improvement with more data and larger models
43:33 Relationship between data, parameters, and compute
47:21 Importance of scaling laws in model performance
49:12 Quality of data matters more than architecture and losses in scaling laws
52:54 Inference for large language models is very expensive
54:54 Training large language models is costly
59:12 Post training aligns language models for AI assistant use
1:01:05 Supervised fine-tuning for large language models
1:04:50 Leveraging large language models for data generation and synthesis
1:06:49 Balancing data generation and human input for effective learning
1:10:23 Limitations of human abilities in generating large language models
1:12:12 Training language models to maximize human preference instead of cloning human behaviors.
1:16:06 Training reward model using softmax logits for human preferences.
1:18:02 Modeling optimization and challenges in large language models (LLMs)
1:21:49 Reinforcement learning models and potential benefits
1:23:44 Challenges with using humans for data annotation
1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves
1:29:12 Perplexity is not calibrated for large language models
1:33:00 Variance in performance of GPT-4 based on prompt specificity
1:34:51 Pre-training data plays a vital role in model initialization
1:38:32 Utilize GPUs efficiently with matrix multiplication
1:40:21 Utilizing 16 bits for faster training in deep learning
1:44:08 Building Large Language Models from scratch
Crafted by Merlin AI.
great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.
one good point when they discuss the difference between ppo and dpo is reward model can reduce the dependency of labeled preference data
Really incredible delivery of complicated information. ❤
fantastic, wonderful, significant, magnificent, outstanding, class of titans, world-class🎉
Wow! Such a wonderful presentation! Thanks so much!
people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it
What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans
Great talk. Loved the level of detail, the insights, the pacing.
This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often
Thanks a lot for sharing this. I would like to point a correction-
time 20:28 -
Consider case prob(true_token)
Yes that's correct, it's the baseline performance of a very bad language model.
Very informative, updated and crisp~ keep them coming..don't stop now!
00:10 Обзор создания больших языковых моделей
02:21 Сосредоточьтесь на оценке данных и системах на практике
06:25 Авторегрессивные языковые модели предсказывают следующее слово
08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей.
12:38 Токенизация и обучение токенизаторов
14:49 Оптимизация процесса токенизации и решения по объединению токенов
18:40 GPT 4 улучшил токенизацию для лучшего понимания кода
20:31 Переплетение измеряет колебания модели между словами.
24:18 Оценка открытых вопросов является сложной задачей.
26:15 Различные способы оценки крупных языковых моделей
30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей
32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах.
35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей.
37:53 Проблемы при предобучении крупных языковых моделей
41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей.
43:33 Вычисления определяются данными и параметрами.
47:21 Понимание значения законов масштабирования при создании больших языковых моделей
49:12 Хорошие данные имеют решающее значение для лучшего масштабирования.
52:54 Вывод для больших языковых моделей дорогой.
54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат.
59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами.
1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных.
1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения.
1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения.
1:10:23 Проблемы в создании контента, превышающего человеческие способности
1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений
1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений
1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением
1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград.
1:23:44 Проблемы использования людей в качестве аннотаторов данных
1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди.
1:29:12 Проблемы с перплексией и калибровкой в языковых моделях
1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок
1:34:51 Важность предобучения в больших языковых моделях
1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль.
1:40:21 Уменьшенная точность для более быстрой матричной умножения
1:44:08 Создание больших языковых моделей (ЯМП)
Crafted by Merlin AI.
Fabulous lecture! Goes into all important concepts and also highlights the interesting details that are commonly glossed over, thanks for recording!
great lecture, wish the speaker had more time to go over the full presentation
Great presentation and very helpful. Thanks for sharing this
How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.
What an amazing lecture, now want a part 2 about the topics that haven’t been touched upon 🤩
My sincere thanks for sharing it.
suddenly i m interested in llms😗😗😗
The training algorithm is actually the key... It is because of RLHF that we have GPT-4
This is a gold mine
You can build my ❤️
Please give this dude 15more minutes, for Tiling, Flash Attention, Parallelization for data and model !!
If you know all of that, you don't need 15 more minutes.
Dayum he’s fine
The reason Stanford graduate the rule the world
Great lecture
Looking forward to do a PostDoc from SU
@5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.
would love to see the other recordings of cs25!
More lecture of Machine learning plz share
When will the other lectures be updated? This was so good!
this is amazing, can you guys make a playlist for begginers?. thank you!
thank you! great lecture.
The best one we want more
thanks for this great lecture. Is also the lecture on transformers available somewhere?
You might be interested in the lectures in this playlist: ua-cam.com/play/PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM.html&si=KmCNuzfcc_E0cxDg
Most amazing video ever
man this is amazing!
Can we please have access to the previous lecture about Transformers?
This interests me but I have no coding experience. Any tips to where to start , surely Standford lectures ? Coding 101 I guess. Anything helps :)
So Amazing!
Thank you for this
Thank you! 🚀
Great content, thanks!
Just Amazing!
LLM - chatbots
Architecture (Neural networks)
Training algorithm
Data
Evaluation
System
Where can we find the rest of the videos for CS229 summer 2024?
Interesting and a good chest btw. Clark Kent?
This genius saying "2K return tickets from JFK to LDN are not significant" (in terms of environmental impact) and that "next models will be +10X FLOPS" just makes me conclude that these guys are not only throwing money at the problem (i.e. gen AI) but don't have a thoughtful solution on how to train AI considering the environment and economic aspects of it.
the lecture is good but the thing i dislike is the frequent change of the slide screen with the tutor camera. the video should be like a mini-player of tutor camera at the bottom corner with the slide screen on for the full time. that irritates me a lot in the whole lecture, making my focus fluctuate constantly
Could you please share the link to the lecture on Transformers that you were referring to in the video?
Steve rogers talking about AI❤
A 2024 lecture
🇰🇪 well Represented.
Impressive
Yann, if you ever get to read this, you are a truly handsome man. I
Yes!
Ignore this comment
Day 1 19:05
Dat 2 28:38
the biggest novelty of chatgpt is the UI lol
Thank u
Anyone here took the class in which this lecture was held ( cs229 summer 2024) ?
suddenly, i'm a software engineer.
Not fair ,was here to learn ,got distracted by charm
thanks ❤️🤍
It feels like learning LLMs from clark kent (superman) 😂😅
Does anyone have the pdf or ppt for this lecture, if so please reply to this comment. Thanks!
I have a doubt in Scalable data for SFT, isn't the model be biased as its using its own knowledge to generate dataset and further trained on the same?
anybody know of any resources for learning LLM?
"She likely prefers Stanford"
He's hot.
I'm majoring in Finance but he is so hot so im here
I appreciate the sharing but I find this view of LLMs too simplistic and approximate (when not outright missing some pieces) for those in the field, and probably too complicated/misleading for those who aren't. Also I don't see the due attention to mechanistic interpretability, emergent properties and models reasoning debate with appropriate quotes of recent papers.
I can literally watch this whole lecture because he's so hot
We're all thinking it
Lubowitz Walks
Upload next video
are there slides?
aye where can I find cs 336???
how do uk which stanford course to look for? which website do u usually use
❤
Knock Knock!
Banana
Who’s there!
@@JoseMonteverde Banana
We're no strangers to love, you know the rules and so do I
@@HB-kl5ik banana
Gulgowski Crescent
sir I didn't do my homework... please punish me
🇧🇩❤
Good morning i guess?
上帝给他关了哪扇窗啊
tldr?
he is cute
Thomas Mark Williams Robert Taylor Linda
why he kinda...
he kinda......