Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 9 лют 2025

КОМЕНТАРІ • 96

  • @statquest
    @statquest  2 місяці тому +3

    Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @Sangeekarikalan
      @Sangeekarikalan 20 днів тому

      is there a video lecture to understand the book better

    • @statquest
      @statquest  19 днів тому

      @@Sangeekarikalan Which book are you asking about and what do you mean by "understand better"? Are you asking to learn more about the book? Or is there something inside the book that you have a question about?

  • @RawConceptAI
    @RawConceptAI 2 місяці тому +6

    Complete this new one I may have been roughly watching all of the videos of StatQuest already. Deeply invested in the channel for the last few months, I feel much more confident in my quest to get the first AI related job. Massive thanks Josh for relentlessly bringing the right intuition for the mass of us!!

    • @statquest
      @statquest  2 місяці тому +1

      Good luck with that first job!

  • @wege8409
    @wege8409 9 днів тому +1

    Hi Josh, just bought a hardcover copy of your book, "The StatQuest Illustrated Guide to Neural Networks and AI", can't wait to look it over in a few days. I've learned a lot from your channel, and I appreciate your bottom-up "bam" approach. Sometimes when you're so in the weeds with terminology used and you want to explain a mathematical concept to someone who doesn't know the terminology you forget to re-simplify the information. It's important to take a step back every once in a while, so thanks for perspectives.

    • @statquest
      @statquest  8 днів тому

      TRIPLE BAM!!! Thank you very much!

  • @PradeepKumar-hi8mr
    @PradeepKumar-hi8mr 2 місяці тому +7

    Wowww!
    Glad to have you back, Sir.
    Awesome videos 🎉

  • @boredgamesph4872
    @boredgamesph4872 11 днів тому +1

    Hi Josh! Hope you can discuss reliability vs validity. You are the best stat mentor so far in youtube

    • @statquest
      @statquest  10 днів тому

      I'll keep that in mind.

  • @NottoriousGG
    @NottoriousGG 2 місяці тому +2

    Such a cleverly disguised master of the craft. 🙇

  • @free_thinker4958
    @free_thinker4958 2 місяці тому +3

    You're the man ❤️💯👏 thanks for everything you do here to spread that precious knowledge 🌹 we hope if you could possibly dedicate a future video to talk about multimodal models (text to speech, speech to speech etc...) ✨

    • @statquest
      @statquest  2 місяці тому +2

      I'll keep that in mind!

  • @Kimgeem
    @Kimgeem 2 місяці тому +1

    So excited to watch this later 🤩✨

  • @nossonweissman
    @nossonweissman 2 місяці тому +1

    Yay!!! ❤❤
    I'm starting it now and saving to remember to finish later.
    Also, I'm requesting a video on Sparse AutoEncoders (used in Anthropic's recent research). They seem super cool and I have a basic idea on how they work, but I'd to see a "simply explained" version of them.

    • @statquest
      @statquest  2 місяці тому +1

      Thanks Nosson! I'll keep that topic in mimd.

  • @mbeugelisall
    @mbeugelisall 2 місяці тому +2

    Just the thing I’m learning about right now!

  • @PunmasterSTP
    @PunmasterSTP 10 днів тому +1

    These videos are amazing. Word!

  • @davidlu1003
    @davidlu1003 2 місяці тому +1

    I love you, I will keep going and learn the other courses of yours if they are always free. keep them free please, I will always be your fan.😁😁😁

    • @statquest
      @statquest  2 місяці тому

      Thank you, I will!

  • @davidlu1003
    @davidlu1003 2 місяці тому +1

    And thx for the courses. They are great!!!!😁😁😁

    • @statquest
      @statquest  2 місяці тому

      Glad you like them!

  • @kamal9294
    @kamal9294 2 місяці тому +3

    Nice explanation, if the next topic is about rag or reinforcement learning , i will be happier (or even object detection, object tracking).

    • @statquest
      @statquest  2 місяці тому +3

      I guess you didn't get to 16:19 where I explain how RAG works...

    • @kamal9294
      @kamal9294 2 місяці тому

      @statquest but in LinkedIn I saw many rag types and some retrieval techniques using advanced dsa(like HNSW). That's why I asked.

    • @statquest
      @statquest  2 місяці тому +4

      @@kamal9294 Those are just optimizations, which will change every month. However, the fundamental concepts will stay the same and are described in this video.

    • @kamal9294
      @kamal9294 2 місяці тому +3

      @@statquest Now I am clear, thank you!.

    • @zxcvbnm-pb1we
      @zxcvbnm-pb1we 21 день тому

      @@kamal9294 He is a professor, give some respect. My opinion. I see you referring him as bro. Not feeling good on that.

  • @swarupdas8043
    @swarupdas8043 Місяць тому +1

    What can be better to learn ML when we have a teacher like you. Thanks for all the effort you have put into. I would buy if you have any Udemy courses covering ML stuff. Please let me know

    • @statquest
      @statquest  Місяць тому

      I have a book coming out in the next few weeks about all these neural network videos with Pytorch tutorials

  • @tcsi_
    @tcsi_ 2 місяці тому +11

    100th Machine Learning Video 🎉🎉🎉

    • @statquest
      @statquest  2 місяці тому +1

      Yes! :)

    • @THEMATT222
      @THEMATT222 2 місяці тому +1

      Noice 👍 Doice 👍Ice 👍

  • @patzci
    @patzci 13 днів тому +1

    do you plan a video on hdbscan? ur vids are really great!!:)

    • @statquest
      @statquest  12 днів тому +1

      I'll keep that topic in mind.

    • @patzci
      @patzci 11 днів тому +1

      @statquest horrraaaay!!! Thank you bääm out❤️

  • @iamumairjaffer
    @iamumairjaffer 2 місяці тому +1

    Well explained ❤❤❤

  • @etgaming6063
    @etgaming6063 2 місяці тому +2

    This video came just in time, trying to make my own RoBERTa model and have been struggling understanding how they work under the hood. Not anymore!

  • @thegimel
    @thegimel 2 місяці тому

    Great instructional video, as always, StatQuest!
    You mentioned in the video that the training task for these networks is next word prediction, however, models like BERT have only self-attention layers so they have "bidirectional awareness". They are usually trained on masked language modeling and next sentence prediction, if I recall correctly?

    • @statquest
      @statquest  2 місяці тому

      I cover how a very basic word embedding model might be trained in order to illustrate its limitations - that it doesn't take position into account. However, the video does not discuss how an encoder-only transformer is trained. That said, you are correct, an encoder-only transformer uses masked language modeling.

  • @draziraphale
    @draziraphale 2 місяці тому +1

    Great explanation

  • @alecollins01
    @alecollins01 Місяць тому +1

    THANK YOU

  • @fingerscrossed1307
    @fingerscrossed1307 3 дні тому +1

    Josh...I know you're doing more of the shiny new stuff, but can you do one on monte carlo simulation if you have the time? Love from 🇧🇷

    • @statquest
      @statquest  3 дні тому +1

      I'll definitely keep that in mind. It is my sincere hope to finish up a bunch of videos on reinforcement learning and then pivot back to more traditional statistics topics.

  • @GiornoGiovanna-yq7jr
    @GiornoGiovanna-yq7jr 16 днів тому

    Hi, great video. The only question left is, where are the Feed Forward layers like in the Encoder part of the classic Transformer? Or are they not needed for the task in this video?

    • @statquest
      @statquest  15 днів тому +1

      The feedforward layers aren't needed and not really part of the "essence" of what a transformer is.

  • @barackobama7757
    @barackobama7757 2 місяці тому

    Hello StatQuest. I was hoping if you could make a video on PSO (Particle Swarm Optimisation) Will really help! Thank you, amazing videos as always!

    • @statquest
      @statquest  2 місяці тому

      I'll keep that in mind.

  • @rishidixit7939
    @rishidixit7939 Місяць тому

    Very Beautifully Explained as Always. It takes a great amount of intuitive understanding and talent to explain a relatively tougher topic in such an easy way.
    I just had some doubts -
    1. In case of context aware embeddings of a Sentence of a Doc are the individual Embeddings of the tokens averaged. Does this have something to do with the CLS token ?
    2. Like a Variational Autoencoder helps in understanding the intricate patterns of images and then creates its own latent space , can BERT (or any similar model) do that for Vision task (or are they only suitable for NLP Tasks)
    3. Are Knowledge Graphs made using BERT ?
    Any help on these will be appreciated . Thank You again for the Awesome Explanation

    • @statquest
      @statquest  Місяць тому +1

      1. The CLS token is specifically used for classification problems and I talk about how it works in my upcoming book. That said, if you embed a whole sentence, then you can average the output values.
      2. Transformers work great with images and image classificaiton.
      3. I don't know.

  • @aryasunil9041
    @aryasunil9041 2 місяці тому

    Great Video, When is the Neural Networks book coming out?
    Very eager for it

    • @statquest
      @statquest  2 місяці тому

      Early january. Bam! :)

  • @tonym4926
    @tonym4926 2 місяці тому +1

    Are you planning to add this video to neutral network/ deep learning playlist?

  • @gsestream
    @gsestream 3 дні тому

    try binary neural networks, instead of floating point neural networks, they are just NOR gate compute, fully turing complete. XOR can be user as the big weight gate, for inversion. or just evolutionary reinforce swap different logic gates.

  • @vgolf3185
    @vgolf3185 11 днів тому

    Are you gonna cover Deepseek?

    • @statquest
      @statquest  11 днів тому

      My next few videos are on Reinforcement Learning (RL), and the big thing with Deepseek is RL.

  • @hakeemthomas5769
    @hakeemthomas5769 14 днів тому

    Can I offer a Patreon suggestion or reach out to you directly? Also, Is there a tutorial on your channel discussing data wrangling and cleaning? My knowledge so far is that having your data properly setup before feeding it to your model is the most important step when dealing with ML models.

    • @statquest
      @statquest  14 днів тому

      That's very true. I have a few tutorials where we go through fixing up the data. For example: ua-cam.com/video/GrJP9FLV3FE/v-deo.html

  • @benjaminlucas9080
    @benjaminlucas9080 2 місяці тому

    Have you done anything on vision tranformers? or can you?

    • @statquest
      @statquest  2 місяці тому +1

      I'll keep that in mind. They are not as fancy as you might guess.

  • @noadsensehere9195
    @noadsensehere9195 2 місяці тому +1

    good

  • @SuperRobieboy
    @SuperRobieboy 2 місяці тому +2

    Great video, encoders are very interesting in applications like vector search or down-stream prediction tasks (my thesis!).
    I'd love to see a quest on positional encoding, but perhaps generalised to not just word positions in sentences but also pixel positions in an image or graph connectivity. Image and graph transformers are very cool and positional encoding is too often only discussed for the text-modality. Would be a great addition to educational ML content on UA-cam ❤

    • @statquest
      @statquest  2 місяці тому +1

      Thanks! I'll keep that in mind.

  • @dshefman
    @dshefman 26 днів тому

    Are you sure encoder-only transformers are the same as embedding models? I think they have different architectures.

    • @statquest
      @statquest  25 днів тому

      There are lots of ways to create embeddings - and this video describes those ways. However, BERT is probably the most commonly used way to make embeddings with an LLM.

  • @WayOfLife-Habits
    @WayOfLife-Habits Місяць тому

    PIZZA GREAT!❤

  • @aihsdiaushfiuhidnva
    @aihsdiaushfiuhidnva 2 місяці тому +1

    not many people outside the know knows about bert it seems

  • @nathannguyen2041
    @nathannguyen2041 2 місяці тому

    Did math always come easy to you?
    Also how did you study? Do math topics stay in your mind e.g., fancy integral tricks in probability theory, or dominated convergence, etc?

    • @statquest
      @statquest  2 місяці тому +2

      Math was never easy for me and it's still hard. I just try to break big equations down into small bits that I can plug numbers into and see what happens to them. And I quickly forget most math topics unless I can come up with a little song that will help me remember.

  • @kegklaus5069
    @kegklaus5069 19 днів тому

    Indian UA-camrs: Hello, guys, today we are talking about Transformers.
    American UA-camrs: Ohh yeah yeah 🎶🎹🎹🎵Transformers are the best. yeah yeah🎼

  • @epberdugoc
    @epberdugoc 2 місяці тому +1

    Actually is, LA PIZZA ES MAGNÍFICA!! ha ha

  • @ActmartzConsulting
    @ActmartzConsulting День тому

    PLEASE ENCODING-ONLY TRANSFORMERS notebook IN LIGHTNING AI

    • @statquest
      @statquest  День тому

      I've got a video that shows how to code an Encoder-Only Transformer coming out on Wednesday as part of a short course on DeepLearing AI.

  • @Apeiron242
    @Apeiron242 2 місяці тому

    Thumbs down for using the robot voice.

  • @ChargedPulsar
    @ChargedPulsar 2 місяці тому

    Another bad video, promises simplicity dives right into graphs with no background or explanation.

    • @statquest
      @statquest  2 місяці тому

      Noted

    • @Austinlorenzmccoy
      @Austinlorenzmccoy 2 місяці тому +1

      @@ChargedPulsar the video is great, visualization helps people capture context more
      Maybe cause i have read about it before but it sure explains better
      But if you feel you do better, create the content and share so we dive in too