Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Поділитися
Вставка
  • Опубліковано 7 січ 2025

КОМЕНТАРІ • 275

  • @CG-hj1cu
    @CG-hj1cu 3 місяці тому +674

    I'm a student for life....approaching 40.....never had the privilege of attending a university like Stanford. To get access to these quality lectures is amazing. Thank you

    • @Fracasse-0x13
      @Fracasse-0x13 3 місяці тому +3

      This is a quality lecture?

    • @KevinLanahan0
      @KevinLanahan0 3 місяці тому +25

      @@Fracasse-0x13 for people who dont have access to education, yes, it is a quality lecture.

    • @darrondavis5848
      @darrondavis5848 3 місяці тому +4

      i am living my dreams

    • @shaohongchen1063
      @shaohongchen1063 3 місяці тому +13

      @@Fracasse-0x13 why this is not a quality lecture?

    • @MyLordaizen
      @MyLordaizen 3 місяці тому +3

      They all the same
      Everything is on the web
      you don't need certification to tell the world you know it
      Build the best

  • @nothing12392
    @nothing12392 4 місяці тому +442

    It is one thing to be a great research institution but to be a great research institution that is full of talented and kind lecturers is extremely impressive. I've been impressed by every single Stanford course and lecture I have participated in through SCPD and UA-cam and this lecturer is no exception.

    • @stanfordonline
      @stanfordonline  4 місяці тому +28

      Thank you for sharing your positive experiences with our courses and lectures!

    • @a2ashraf
      @a2ashraf 19 днів тому

      Wow, big words. Thank you for the comment, your words encouraged me to watch the whole thing and I don't regret it at all. Best decision!

  • @devanshmishra-ez1tn
    @devanshmishra-ez1tn 2 місяці тому +64

    00:10 Building Large Language Models overview
    02:21 Focus on data evaluation and systems in industry over architecture
    06:25 Auto regressive language models predict the next word in a sentence.
    08:26 Tokenizing text is crucial for language models
    12:38 Training a large language model involves using a large corpus of text.
    14:49 Tokenization process considerations
    18:40 Tokenization improvement in GPT 4 for code understanding
    20:31 Perplexity measures model hesitation between tokens
    24:18 Comparing outputs and model prompting
    26:15 Evaluation of language models can yield different results
    30:15 Challenges in training large language models
    32:06 Challenges in building large language models
    35:57 Collecting real-world data is crucial for large language models
    37:53 Challenges in building large language models
    41:38 Scaling laws predict performance improvement with more data and larger models
    43:33 Relationship between data, parameters, and compute
    47:21 Importance of scaling laws in model performance
    49:12 Quality of data matters more than architecture and losses in scaling laws
    52:54 Inference for large language models is very expensive
    54:54 Training large language models is costly
    59:12 Post training aligns language models for AI assistant use
    1:01:05 Supervised fine-tuning for large language models
    1:04:50 Leveraging large language models for data generation and synthesis
    1:06:49 Balancing data generation and human input for effective learning
    1:10:23 Limitations of human abilities in generating large language models
    1:12:12 Training language models to maximize human preference instead of cloning human behaviors.
    1:16:06 Training reward model using softmax logits for human preferences.
    1:18:02 Modeling optimization and challenges in large language models (LLMs)
    1:21:49 Reinforcement learning models and potential benefits
    1:23:44 Challenges with using humans for data annotation
    1:27:21 LLMs are cost-effective and have better agreement with humans than humans themselves
    1:29:12 Perplexity is not calibrated for large language models
    1:33:00 Variance in performance of GPT-4 based on prompt specificity
    1:34:51 Pre-training data plays a vital role in model initialization
    1:38:32 Utilize GPUs efficiently with matrix multiplication
    1:40:21 Utilizing 16 bits for faster training in deep learning
    1:44:08 Building Large Language Models from scratch
    Crafted by Merlin AI.

  • @bp3016
    @bp3016 3 місяці тому +502

    Is my teachers in school looked this good, I wouldn't miss a single class. He's handsome af.

  • @EduardoLima
    @EduardoLima 2 місяці тому +42

    We live in a tremendous moment in time. Free access to the best lectures on the most relevant topic from the best university

    • @stanfordonline
      @stanfordonline  2 місяці тому +3

      Thanks for your comment, we love to hear this feedback!

  • @ReflectionOcean
    @ReflectionOcean 4 місяці тому +60

    Insights By "YouSum Live"
    00:00:05 Building large language models (LLMs)
    00:00:59 Overview of LLM components
    00:01:21 Importance of data in LLM training
    00:02:59 Pre-training models on internet data
    00:04:48 Language models predict word sequences
    00:06:02 Auto-regressive models generate text
    00:10:48 Tokenization is crucial for LLMs
    00:19:12 Evaluation using perplexity
    00:22:07 Challenges in evaluating LLMs
    00:29:00 Data collection is a significant challenge
    00:41:08 Scaling laws improve model performance
    01:00:01 Post-training aligns models with user intent
    01:02:26 Supervised fine-tuning enhances model responses
    01:10:00 Reinforcement learning from human feedback
    01:19:01 DPO simplifies reinforcement learning process
    01:28:01 Evaluation of post-training models
    01:37:20 System optimization for LLM training
    01:39:05 Low precision improves GPU efficiency
    01:41:38 Operator fusion enhances computational speed
    01:44:23 Future considerations for LLM development
    Insights By "YouSum Live"

  • @SudipBishwakarma
    @SudipBishwakarma 4 місяці тому +42

    This is really a great lecture, super dense but still digestible. Its not even been 2 years since ChatGPT was released to public and to see the rapid pace of research around LLMs and it getting better is really interesting. Thank you so much, now I have some papers to read to further my understanding.

  • @thedelicatehand
    @thedelicatehand 3 місяці тому +277

    Suddenly I am interested in LLMS

  • @yanndubois3914
    @yanndubois3914 4 місяці тому +476

    Slides: drive.google.com/file/d/1B46VFrqFAPAEj3kaCrBAtQqeh2_Ztawl/view?usp=sharing

    • @Imperfectly_perfect_007
      @Imperfectly_perfect_007 3 місяці тому +9

      Thank you sir...i heartly appreciate it😊.... lecture was awesome 🤌

    • @junnishere00
      @junnishere00 3 місяці тому +6

      thankyou so much. i really appreciate it

    • @helloadventureworld
      @helloadventureworld 3 місяці тому +7

      lecture was perfect. is there a playlist for the whole class of cs229 for the same semester as this video? all I have found was before 2022 which made me wondering

    • @yanndubois3914
      @yanndubois3914 3 місяці тому +7

      @@helloadventureworld no, the rest of CS229 has not been released and I don't know if it will. This is only the guest lecture.

    • @helloadventureworld
      @helloadventureworld 3 місяці тому +3

      @@yanndubois3914 Thanks for the response and information you have shared :)

  • @wop130
    @wop130 Місяць тому +19

    Damn. That lecturer is fineeee. 😍

  • @anshdeshraj
    @anshdeshraj 3 місяці тому +51

    finally a someone said Machine Learning instead of slapping AI on everything!

    • @duartesilva7907
      @duartesilva7907 2 місяці тому +3

      I feel that whenever someone talks about AI a lot it means that they know nothing about it

    • @paolacastillootoya8904
      @paolacastillootoya8904 Місяць тому

      Right? And a lot of people believing in Yubal Harari because of it

  • @megharajpoot9930
    @megharajpoot9930 2 місяці тому +5

    This course has so much of insights and a quick summary view of LLMs. I have also gone through coursera course paid one. This one is equally good and free. Thanks for the video.

  • @dr.mikeybee
    @dr.mikeybee 4 місяці тому +27

    This is very well done. It's super easy to understand. I think your students should learn a lot. It's a great skill to be able to present complex material in a simple fashion. It means you really understand both the material and your audience.

  • @majidmehmood3780
    @majidmehmood3780 2 місяці тому +14

    people should first learn about basic language models like bigrams, unigrams. these were the first language models and stanford really has good lectures in it

  • @namazbekbekzhan
    @namazbekbekzhan 2 місяці тому +6

    00:10 Обзор создания больших языковых моделей
    02:21 Сосредоточьтесь на оценке данных и системах на практике
    06:25 Авторегрессивные языковые модели предсказывают следующее слово
    08:26 Токенизация текста и размер словаря имеют решающее значение для языковых моделей.
    12:38 Токенизация и обучение токенизаторов
    14:49 Оптимизация процесса токенизации и решения по объединению токенов
    18:40 GPT 4 улучшил токенизацию для лучшего понимания кода
    20:31 Переплетение измеряет колебания модели между словами.
    24:18 Оценка открытых вопросов является сложной задачей.
    26:15 Различные способы оценки крупных языковых моделей
    30:15 Шаги по предварительной обработке веб-данных для больших языковых моделей
    32:06 Проблемы с обработкой дубликатов и фильтрацией низкокачественных документов в больших масштабах.
    35:57 Сбор данных о мире имеет решающее значение для практических крупных языковых моделей.
    37:53 Проблемы при предобучении крупных языковых моделей
    41:38 Законы масштабирования предсказывают улучшение производительности с увеличением объема данных и размером моделей.
    43:33 Вычисления определяются данными и параметрами.
    47:21 Понимание значения законов масштабирования при создании больших языковых моделей
    49:12 Хорошие данные имеют решающее значение для лучшего масштабирования.
    52:54 Вывод для больших языковых моделей дорогой.
    54:54 Обучение крупных языковых моделей требует высоких вычислительных затрат.
    59:12 Большие языковые модели (LLM) требуют дообучения для выравнивания, чтобы стать AI-ассистентами.
    1:01:05 Создание крупных языковых моделей (LLM) включает в себя тонкую настройку предварительно обученных моделей на желаемых данных.
    1:04:50 Предобученные языковые модели оптимизируют под конкретные типы пользователей во время дообучения.
    1:06:49 Сбалансирование генерации синтетических данных с человеческим вводом имеет решающее значение для эффективного обучения.
    1:10:23 Проблемы в создании контента, превышающего человеческие способности
    1:12:12 Генерация идеальных ответов с использованием максимизации предпочтений
    1:16:06 Обучение модели вознаграждения с использованием логитов для непрерывных предпочтений
    1:18:02 Обучение крупных языковых моделей с помощью ПО и проблемы в обучении с подкреплением
    1:21:49 Обсуждение о методах обучения с подкреплением и их преимуществах в использовании моделей наград.
    1:23:44 Проблемы использования людей в качестве аннотаторов данных
    1:27:21 LLM более экономичны и предлагают лучшее согласие, чем люди.
    1:29:12 Проблемы с перплексией и калибровкой в языковых моделях
    1:33:00 Вариативность в производительности GPT-4 в зависимости от подсказок
    1:34:51 Важность предобучения в больших языковых моделях
    1:38:32 Использование ГПУ для умножения матриц может быть в 10 раз быстрее, но коммуникация и память играют ключевую роль.
    1:40:21 Уменьшенная точность для более быстрой матричной умножения
    1:44:08 Создание больших языковых моделей (ЯМП)
    Crafted by Merlin AI.

  • @SerhiiFedorov-v1l
    @SerhiiFedorov-v1l 2 місяці тому +4

    Thank you for the video! I am glad that we live in this time and can witness the development of AI technologies.

  • @paolacastillootoya8904
    @paolacastillootoya8904 Місяць тому +144

    He is doing his part to encourage women in STEM.

    • @ProgrammingWIthRiley
      @ProgrammingWIthRiley 20 днів тому +5

      Women have always been in STEM. We all know about Grace Hopper. Please let this go.

    • @ProgrammingWIthRiley
      @ProgrammingWIthRiley 20 днів тому

      Lookup Ruth David. She worked at the CIA redid all of their tech infrastructure and she’s still alive!

    • @fan82209
      @fan82209 17 днів тому +2

      haha absolutely

    • @astrolillo
      @astrolillo 17 днів тому

      vos queres un marido de stem nada

    • @Originalimoc
      @Originalimoc 9 днів тому

      😮​@@astrolillo

  • @김진혁-l4l
    @김진혁-l4l Місяць тому +1

    what a wonderful lectures...this 1.75 hour is one of the most valuable in my life

  • @RaushanKumar-qb3de
    @RaushanKumar-qb3de 2 місяці тому +4

    Best explanation.. I'm watching at 3 am. Thanks

  • @mukammedalimbet2351
    @mukammedalimbet2351 2 місяці тому +5

    great! thanks for sharing! One thing i would suggest is to transcribe or add subtitle of questions that is being asked by the students. That way we could better understand the answer given by lecturer.

  • @BMoRideNGrind
    @BMoRideNGrind 3 місяці тому +9

    Really incredible delivery of complicated information. ❤

  • @NeerajSharma-yf4ih
    @NeerajSharma-yf4ih 2 місяці тому +1

    I had the privilege of attending an insightful 90-minute lecture by Stanford faculty, which greatly boosted my confidence in completing my thesis. The approach they shared aligns closely with my own research methodology, reinforcing the direction of my work. Grateful for this inspiring experience!"

  • @user-rw6iw8jg2t
    @user-rw6iw8jg2t Місяць тому

    Phenomenal Explaiantion, Love for Stanford , Professors and their methodolgies is a never ending tale !!

  • @Joeystumbo
    @Joeystumbo День тому

    He is an alien, such brilliant and young human being. Impressed.

  • @for-ever-22
    @for-ever-22 4 місяці тому +8

    This is an amazing breakdown of the high level overview of an LLM’s. Every aspect of an LLM was mentioned. Thank you for this amazing video. I’ll come back here often

  • @PratikBhavsar1
    @PratikBhavsar1 4 місяці тому +11

    Very informative, updated and crisp~ keep them coming..don't stop now!

  • @KelvinMeeks
    @KelvinMeeks 2 місяці тому +2

    Great talk. Loved the level of detail, the insights, the pacing.

  • @pkprasadtube
    @pkprasadtube Місяць тому

    I love the way you answered the questions, very clear and precise.

  • @Nightsd01
    @Nightsd01 2 місяці тому +1

    What an awesome video. Data quality is a real issue, and even more interestingly, LLM’s learn a lot like humans. Introduce the simpler concepts first (training data prompts) and then introduce more complex subjects, and the LLM’s learn more just like humans

  • @sucim
    @sucim 4 місяці тому +13

    Fabulous lecture! Goes into all important concepts and also highlights the interesting details that are commonly glossed over, thanks for recording!

  • @minhatvo82
    @minhatvo82 3 місяці тому +10

    fantastic, wonderful, significant, magnificent, outstanding, class of titans, world-class🎉

  • @sonudixit-h3w
    @sonudixit-h3w 4 місяці тому +4

    Thanks a lot for sharing this. I would like to point a correction-
    time 20:28 -
    Consider case prob(true_token)

    • @yanndubois3914
      @yanndubois3914 4 місяці тому

      Yes that's correct, it's the baseline performance of a very bad language model.

  • @ProgrammingWIthRiley
    @ProgrammingWIthRiley 20 днів тому +1

    Amazing lecture. Great job

  • @samratsakya
    @samratsakya 2 місяці тому

    Thank you for the gem Standford Online. Great starter - Time to read more papers on LLMs

  • @Qxxliu
    @Qxxliu 3 місяці тому +3

    one good point when they discuss the difference between ppo and dpo is reward model can reduce the dependency of labeled preference data

  • @goldentime11
    @goldentime11 Місяць тому

    Thanks for sharing this. It is a great introduction of the LLM system.

  • @brindaswayamprakasham2102
    @brindaswayamprakasham2102 Місяць тому +1

    this was genuinely interesting and easy to follow through, thanks!

  • @thunderbirdk
    @thunderbirdk 2 місяці тому +1

    Wow! Such a wonderful presentation! Thanks so much!

  • @boeingpameesha9550
    @boeingpameesha9550 3 місяці тому +6

    My sincere thanks for sharing it.

  • @AnupSingh-kt5yn
    @AnupSingh-kt5yn 2 місяці тому +1

    Great & Comprehensive Presentation 🎉

  • @bhoicebychoice5435
    @bhoicebychoice5435 12 днів тому

    Scaling behavior of LLM fine-tuning, emphasizes the importance of model size, task-specific considerations, and the trade-offs between different fine-tuning approaches. It highlights the need for practitioners to make informed decisions based on their specific needs and resources. As the field of LLMs continues to evolve, further research is needed to fully understand the complex interplay between model architecture, data, and fine-tuning strategies, especially at even larger scales. My research significantly contributes to the ongoing effort to develop more efficient and effective methods for adapting powerful LLMs to a wide range of downstream tasks.

  • @carvalhoribeiro
    @carvalhoribeiro 3 місяці тому +4

    Great presentation and very helpful. Thanks for sharing this

  • @cui_1152
    @cui_1152 2 місяці тому +1

    Please give this dude 15more minutes, for Tiling, Flash Attention, Parallelization for data and model !!

  • @mohammedosman4902
    @mohammedosman4902 4 місяці тому +17

    great lecture, wish the speaker had more time to go over the full presentation

  • @maximshaposhnikov7970
    @maximshaposhnikov7970 3 місяці тому +2

    What an amazing lecture, now want a part 2 about the topics that haven’t been touched upon 🤩

  • @squidwardswift
    @squidwardswift 2 місяці тому +49

    Dayum he’s fine

  • @sahejagarwal801
    @sahejagarwal801 4 місяці тому +4

    Most amazing video ever

  • @cristovaoiglesias523
    @cristovaoiglesias523 6 днів тому +1

    The Chinchilla paper demonstrated that for a fixed FLOPs budget, smaller models trained on more data perform better than larger models trained on less data.

  • @imalive404
    @imalive404 3 місяці тому +3

    @5:55 there is an approximation. it lies on the axioms. the axiom being probability should sum to 1. second the approximation is that distribution only comes out of the given corpora. The given corpora is the approximation of the total population. Which we all know has its own biases.

  • @luxbran532
    @luxbran532 3 місяці тому +4

    Great lecture

  • @danieleneh3193
    @danieleneh3193 2 місяці тому +1

    This is a gold mine

  • @meer.sohrab
    @meer.sohrab 4 місяці тому +4

    The best one we want more

  • @nomi6761
    @nomi6761 4 місяці тому +8

    How do people know that "adding more data" is not just increasing likelihood of training on something from the benchmarks, while "adding more parameters" is not just increasing the recall abilities (parametric memory capacity) of the model to retrieve benchmark stuff during evaluation? Really curious about that point.

  • @sanjayg1728
    @sanjayg1728 2 місяці тому +4

    Could you please share the link to the lecture on Transformers that you were referring to in the video?

  • @balajinadar1503
    @balajinadar1503 3 місяці тому +4

    Ignore this comment
    Day 1 19:05
    Day 2 28:38
    Day 3 41:05
    Day 4 1:00:00

  • @hamzadata
    @hamzadata 4 місяці тому +4

    man this is amazing!

  • @futurecharacteristics
    @futurecharacteristics Місяць тому

    It's never too late to get started for learning

  • @nataliatenoriomaia1635
    @nataliatenoriomaia1635 2 місяці тому +5

    Can we please have access to the previous lecture about Transformers?

  • @AlphaVisionPro
    @AlphaVisionPro 3 місяці тому +10

    You can build my ❤️

  • @kartikeychhipa3813
    @kartikeychhipa3813 4 місяці тому +4

    Just Amazing!

  • @SuperLano98
    @SuperLano98 3 місяці тому +3

    When will the other lectures be updated? This was so good!

  • @zeep14dabs
    @zeep14dabs 3 місяці тому +2

    this is amazing, can you guys make a playlist for begginers?. thank you!

  • @keshmesh123
    @keshmesh123 3 місяці тому +3

    thank you! great lecture.

  • @Neilblaze
    @Neilblaze 4 місяці тому +3

    Great content, thanks!

  • @xiaoxiandong7382
    @xiaoxiandong7382 3 місяці тому +2

    would love to see the other recordings of cs25!

  • @MitatEfeÜnal-e3b
    @MitatEfeÜnal-e3b Місяць тому +14

    I don’t know what the guy is talking about but imma watch HIM

  • @F3lp1s
    @F3lp1s 4 місяці тому +3

    So Amazing!

  • @sokhibtukhtaev9693
    @sokhibtukhtaev9693 Місяць тому +1

    what is that paper that mentions from last year at 1:27:25 which is 50x cheaper and better than human agreements?

  • @enzoluispenagallegos5440
    @enzoluispenagallegos5440 4 місяці тому +4

    Thank you for this

  • @njabulonzimande2893
    @njabulonzimande2893 3 місяці тому +2

    LLM - chatbots
    Architecture (Neural networks)
    Training algorithm
    Data
    Evaluation
    System

  • @beansforbrain
    @beansforbrain 2 місяці тому +2

    Looking forward to do a PostDoc from SU

  • @RaushanKumar-qb3de
    @RaushanKumar-qb3de 2 місяці тому +1

    I like his teaching style and that laughter in between 😂😁🤙. Last one be careful heavyone

  • @web3global
    @web3global 3 місяці тому +1

    Thank you! 🚀

  • @esamyakIndore
    @esamyakIndore 2 місяці тому +2

    More lecture of Machine learning plz share

  • @SyedShayanAliShah
    @SyedShayanAliShah 2 місяці тому +6

    The reason Stanford graduate the rule the world

  • @doomed5206
    @doomed5206 2 місяці тому +14

    suddenly i m interested in llms😗😗😗

  • @perrystalsis1818
    @perrystalsis1818 23 дні тому

    I'm just trying to get started in ML. Good god. Do a you tube channel already. Really good. Or at least do some blog updates.

  • @martinaltenburg1247
    @martinaltenburg1247 Місяць тому +8

    As a gay guy who studied EE and CS at Stanford, I can confirm I had a crush on him

  • @Zoronoa01
    @Zoronoa01 2 місяці тому +2

    Where can we find the rest of the videos for CS229 summer 2024?

  • @DonTiagoDonato
    @DonTiagoDonato 13 днів тому

    From Brazil 🇧🇷

  • @chrisj2841
    @chrisj2841 4 місяці тому +3

    Anyone here took the class in which this lecture was held ( cs229 summer 2024) ?

  • @T3NS0R
    @T3NS0R 4 місяці тому +5

    I have a doubt in Scalable data for SFT, isn't the model be biased as its using its own knowledge to generate dataset and further trained on the same?

  • @aminekhelifkhelif7306
    @aminekhelifkhelif7306 2 місяці тому

    is there a way to add sections so we can return to specific parts later?

  • @jdk997
    @jdk997 2 місяці тому +1

    Whoever records these videos need to leave the slides up longer for the viewers to read as the speaker explains the concepts.

  • @swarajgupta2795
    @swarajgupta2795 2 місяці тому +1

    to which playlist does this belong to?

  • @alexmoonrock
    @alexmoonrock 2 місяці тому

    This interests me but I have no coding experience. Any tips to where to start , surely Standford lectures ? Coding 101 I guess. Anything helps :)

  • @Pl15604
    @Pl15604 4 місяці тому +4

    The training algorithm is actually the key... It is because of RLHF that we have GPT-4

  • @losdewill
    @losdewill 2 місяці тому +8

    Yann, if you ever get to read this, you are a truly handsome man. I

  • @rojaishere
    @rojaishere 2 місяці тому +8

    lets call him captain LLM looks a bit like chris evans

  • @joysclass652
    @joysclass652 Місяць тому

    This boy is not a vase, who knows what people desire to know behind the scenes.

  • @shoaibyehya3600
    @shoaibyehya3600 4 місяці тому +6

    Impressive

  • @SettimiTommaso
    @SettimiTommaso 4 місяці тому +5

    Yes!

  • @Sohammhatre10
    @Sohammhatre10 4 місяці тому +3

    Does anyone have the pdf or ppt for this lecture, if so please reply to this comment. Thanks!

  • @weskerrongkaima1173
    @weskerrongkaima1173 2 місяці тому +1

    the biggest novelty of chatgpt is the UI lol

  • @watchitpunk5616
    @watchitpunk5616 3 місяці тому +2

    Steve rogers talking about AI❤

  • @FrantisekNovak55
    @FrantisekNovak55 2 місяці тому +2

    What class is the part of?

  • @dvoir-f1k
    @dvoir-f1k 3 дні тому

    is there any playlist of this video??

  • @ganodiya001
    @ganodiya001 2 місяці тому +1

    anybody know of any resources for learning LLM?

  • @mudassiria
    @mudassiria 2 місяці тому +1

    the lecture is good but the thing i dislike is the frequent change of the slide screen with the tutor camera. the video should be like a mini-player of tutor camera at the bottom corner with the slide screen on for the full time. that irritates me a lot in the whole lecture, making my focus fluctuate constantly

  • @jsherdiana
    @jsherdiana 17 годин тому

    Thank You