Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
@@Sangeekarikalan Which book are you asking about and what do you mean by "understand better"? Are you asking to learn more about the book? Or is there something inside the book that you have a question about?
Complete this new one I may have been roughly watching all of the videos of StatQuest already. Deeply invested in the channel for the last few months, I feel much more confident in my quest to get the first AI related job. Massive thanks Josh for relentlessly bringing the right intuition for the mass of us!!
Hi Josh, just bought a hardcover copy of your book, "The StatQuest Illustrated Guide to Neural Networks and AI", can't wait to look it over in a few days. I've learned a lot from your channel, and I appreciate your bottom-up "bam" approach. Sometimes when you're so in the weeds with terminology used and you want to explain a mathematical concept to someone who doesn't know the terminology you forget to re-simplify the information. It's important to take a step back every once in a while, so thanks for perspectives.
You're the man ❤️💯👏 thanks for everything you do here to spread that precious knowledge 🌹 we hope if you could possibly dedicate a future video to talk about multimodal models (text to speech, speech to speech etc...) ✨
Yay!!! ❤❤ I'm starting it now and saving to remember to finish later. Also, I'm requesting a video on Sparse AutoEncoders (used in Anthropic's recent research). They seem super cool and I have a basic idea on how they work, but I'd to see a "simply explained" version of them.
@@kamal9294 Those are just optimizations, which will change every month. However, the fundamental concepts will stay the same and are described in this video.
What can be better to learn ML when we have a teacher like you. Thanks for all the effort you have put into. I would buy if you have any Udemy courses covering ML stuff. Please let me know
Great instructional video, as always, StatQuest! You mentioned in the video that the training task for these networks is next word prediction, however, models like BERT have only self-attention layers so they have "bidirectional awareness". They are usually trained on masked language modeling and next sentence prediction, if I recall correctly?
I cover how a very basic word embedding model might be trained in order to illustrate its limitations - that it doesn't take position into account. However, the video does not discuss how an encoder-only transformer is trained. That said, you are correct, an encoder-only transformer uses masked language modeling.
I'll definitely keep that in mind. It is my sincere hope to finish up a bunch of videos on reinforcement learning and then pivot back to more traditional statistics topics.
Hi, great video. The only question left is, where are the Feed Forward layers like in the Encoder part of the classic Transformer? Or are they not needed for the task in this video?
Very Beautifully Explained as Always. It takes a great amount of intuitive understanding and talent to explain a relatively tougher topic in such an easy way. I just had some doubts - 1. In case of context aware embeddings of a Sentence of a Doc are the individual Embeddings of the tokens averaged. Does this have something to do with the CLS token ? 2. Like a Variational Autoencoder helps in understanding the intricate patterns of images and then creates its own latent space , can BERT (or any similar model) do that for Vision task (or are they only suitable for NLP Tasks) 3. Are Knowledge Graphs made using BERT ? Any help on these will be appreciated . Thank You again for the Awesome Explanation
1. The CLS token is specifically used for classification problems and I talk about how it works in my upcoming book. That said, if you embed a whole sentence, then you can average the output values. 2. Transformers work great with images and image classificaiton. 3. I don't know.
try binary neural networks, instead of floating point neural networks, they are just NOR gate compute, fully turing complete. XOR can be user as the big weight gate, for inversion. or just evolutionary reinforce swap different logic gates.
Can I offer a Patreon suggestion or reach out to you directly? Also, Is there a tutorial on your channel discussing data wrangling and cleaning? My knowledge so far is that having your data properly setup before feeding it to your model is the most important step when dealing with ML models.
Great video, encoders are very interesting in applications like vector search or down-stream prediction tasks (my thesis!). I'd love to see a quest on positional encoding, but perhaps generalised to not just word positions in sentences but also pixel positions in an image or graph connectivity. Image and graph transformers are very cool and positional encoding is too often only discussed for the text-modality. Would be a great addition to educational ML content on UA-cam ❤
There are lots of ways to create embeddings - and this video describes those ways. However, BERT is probably the most commonly used way to make embeddings with an LLM.
Did math always come easy to you? Also how did you study? Do math topics stay in your mind e.g., fancy integral tricks in probability theory, or dominated convergence, etc?
Math was never easy for me and it's still hard. I just try to break big equations down into small bits that I can plug numbers into and see what happens to them. And I quickly forget most math topics unless I can come up with a little song that will help me remember.
@@ChargedPulsar the video is great, visualization helps people capture context more Maybe cause i have read about it before but it sure explains better But if you feel you do better, create the content and share so we dive in too
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
is there a video lecture to understand the book better
@@Sangeekarikalan Which book are you asking about and what do you mean by "understand better"? Are you asking to learn more about the book? Or is there something inside the book that you have a question about?
Complete this new one I may have been roughly watching all of the videos of StatQuest already. Deeply invested in the channel for the last few months, I feel much more confident in my quest to get the first AI related job. Massive thanks Josh for relentlessly bringing the right intuition for the mass of us!!
Good luck with that first job!
Hi Josh, just bought a hardcover copy of your book, "The StatQuest Illustrated Guide to Neural Networks and AI", can't wait to look it over in a few days. I've learned a lot from your channel, and I appreciate your bottom-up "bam" approach. Sometimes when you're so in the weeds with terminology used and you want to explain a mathematical concept to someone who doesn't know the terminology you forget to re-simplify the information. It's important to take a step back every once in a while, so thanks for perspectives.
TRIPLE BAM!!! Thank you very much!
Wowww!
Glad to have you back, Sir.
Awesome videos 🎉
Thank you!
Hi Josh! Hope you can discuss reliability vs validity. You are the best stat mentor so far in youtube
I'll keep that in mind.
Such a cleverly disguised master of the craft. 🙇
bam! :)
You're the man ❤️💯👏 thanks for everything you do here to spread that precious knowledge 🌹 we hope if you could possibly dedicate a future video to talk about multimodal models (text to speech, speech to speech etc...) ✨
I'll keep that in mind!
So excited to watch this later 🤩✨
future bam! :)
Yay!!! ❤❤
I'm starting it now and saving to remember to finish later.
Also, I'm requesting a video on Sparse AutoEncoders (used in Anthropic's recent research). They seem super cool and I have a basic idea on how they work, but I'd to see a "simply explained" version of them.
Thanks Nosson! I'll keep that topic in mimd.
Just the thing I’m learning about right now!
bam! :)
These videos are amazing. Word!
:)
I love you, I will keep going and learn the other courses of yours if they are always free. keep them free please, I will always be your fan.😁😁😁
Thank you, I will!
And thx for the courses. They are great!!!!😁😁😁
Glad you like them!
Nice explanation, if the next topic is about rag or reinforcement learning , i will be happier (or even object detection, object tracking).
I guess you didn't get to 16:19 where I explain how RAG works...
@statquest but in LinkedIn I saw many rag types and some retrieval techniques using advanced dsa(like HNSW). That's why I asked.
@@kamal9294 Those are just optimizations, which will change every month. However, the fundamental concepts will stay the same and are described in this video.
@@statquest Now I am clear, thank you!.
@@kamal9294 He is a professor, give some respect. My opinion. I see you referring him as bro. Not feeling good on that.
What can be better to learn ML when we have a teacher like you. Thanks for all the effort you have put into. I would buy if you have any Udemy courses covering ML stuff. Please let me know
I have a book coming out in the next few weeks about all these neural network videos with Pytorch tutorials
100th Machine Learning Video 🎉🎉🎉
Yes! :)
Noice 👍 Doice 👍Ice 👍
do you plan a video on hdbscan? ur vids are really great!!:)
I'll keep that topic in mind.
@statquest horrraaaay!!! Thank you bääm out❤️
Well explained ❤❤❤
Thanks!
This video came just in time, trying to make my own RoBERTa model and have been struggling understanding how they work under the hood. Not anymore!
BAM!
Great instructional video, as always, StatQuest!
You mentioned in the video that the training task for these networks is next word prediction, however, models like BERT have only self-attention layers so they have "bidirectional awareness". They are usually trained on masked language modeling and next sentence prediction, if I recall correctly?
I cover how a very basic word embedding model might be trained in order to illustrate its limitations - that it doesn't take position into account. However, the video does not discuss how an encoder-only transformer is trained. That said, you are correct, an encoder-only transformer uses masked language modeling.
Great explanation
Thanks!
THANK YOU
double bam! :)
Josh...I know you're doing more of the shiny new stuff, but can you do one on monte carlo simulation if you have the time? Love from 🇧🇷
I'll definitely keep that in mind. It is my sincere hope to finish up a bunch of videos on reinforcement learning and then pivot back to more traditional statistics topics.
Hi, great video. The only question left is, where are the Feed Forward layers like in the Encoder part of the classic Transformer? Or are they not needed for the task in this video?
The feedforward layers aren't needed and not really part of the "essence" of what a transformer is.
Hello StatQuest. I was hoping if you could make a video on PSO (Particle Swarm Optimisation) Will really help! Thank you, amazing videos as always!
I'll keep that in mind.
Very Beautifully Explained as Always. It takes a great amount of intuitive understanding and talent to explain a relatively tougher topic in such an easy way.
I just had some doubts -
1. In case of context aware embeddings of a Sentence of a Doc are the individual Embeddings of the tokens averaged. Does this have something to do with the CLS token ?
2. Like a Variational Autoencoder helps in understanding the intricate patterns of images and then creates its own latent space , can BERT (or any similar model) do that for Vision task (or are they only suitable for NLP Tasks)
3. Are Knowledge Graphs made using BERT ?
Any help on these will be appreciated . Thank You again for the Awesome Explanation
1. The CLS token is specifically used for classification problems and I talk about how it works in my upcoming book. That said, if you embed a whole sentence, then you can average the output values.
2. Transformers work great with images and image classificaiton.
3. I don't know.
Great Video, When is the Neural Networks book coming out?
Very eager for it
Early january. Bam! :)
Are you planning to add this video to neutral network/ deep learning playlist?
yes! Just did.
try binary neural networks, instead of floating point neural networks, they are just NOR gate compute, fully turing complete. XOR can be user as the big weight gate, for inversion. or just evolutionary reinforce swap different logic gates.
Are you gonna cover Deepseek?
My next few videos are on Reinforcement Learning (RL), and the big thing with Deepseek is RL.
Can I offer a Patreon suggestion or reach out to you directly? Also, Is there a tutorial on your channel discussing data wrangling and cleaning? My knowledge so far is that having your data properly setup before feeding it to your model is the most important step when dealing with ML models.
That's very true. I have a few tutorials where we go through fixing up the data. For example: ua-cam.com/video/GrJP9FLV3FE/v-deo.html
Have you done anything on vision tranformers? or can you?
I'll keep that in mind. They are not as fancy as you might guess.
good
Thanks!
Great video, encoders are very interesting in applications like vector search or down-stream prediction tasks (my thesis!).
I'd love to see a quest on positional encoding, but perhaps generalised to not just word positions in sentences but also pixel positions in an image or graph connectivity. Image and graph transformers are very cool and positional encoding is too often only discussed for the text-modality. Would be a great addition to educational ML content on UA-cam ❤
Thanks! I'll keep that in mind.
Are you sure encoder-only transformers are the same as embedding models? I think they have different architectures.
There are lots of ways to create embeddings - and this video describes those ways. However, BERT is probably the most commonly used way to make embeddings with an LLM.
PIZZA GREAT!❤
:)
not many people outside the know knows about bert it seems
yep.
Did math always come easy to you?
Also how did you study? Do math topics stay in your mind e.g., fancy integral tricks in probability theory, or dominated convergence, etc?
Math was never easy for me and it's still hard. I just try to break big equations down into small bits that I can plug numbers into and see what happens to them. And I quickly forget most math topics unless I can come up with a little song that will help me remember.
Indian UA-camrs: Hello, guys, today we are talking about Transformers.
American UA-camrs: Ohh yeah yeah 🎶🎹🎹🎵Transformers are the best. yeah yeah🎼
:)
Actually is, LA PIZZA ES MAGNÍFICA!! ha ha
:)
PLEASE ENCODING-ONLY TRANSFORMERS notebook IN LIGHTNING AI
I've got a video that shows how to code an Encoder-Only Transformer coming out on Wednesday as part of a short course on DeepLearing AI.
Thumbs down for using the robot voice.
Noted
Another bad video, promises simplicity dives right into graphs with no background or explanation.
Noted
@@ChargedPulsar the video is great, visualization helps people capture context more
Maybe cause i have read about it before but it sure explains better
But if you feel you do better, create the content and share so we dive in too