For details and code on building a translator using a transformer neural network, check out my playlist "Transformers from scratch": ua-cam.com/video/QCJQG4DuHT0/v-deo.html
I have studied for several years AI, NPL and Neural networks. But the way you explained this was lovely, friendly and very simple which is why I am pretty sure you are BERT
I read many blogs on BERT but they were more focused on how to use BERT rather than what actually BERT is?. This video helped me clear all my doubts regarding how Bert is trained. Clear and concise explanation.
I'm honestly amazed of how you managed to transform a complex algorithm in a simple 10 minutes video. Much thanks to you, my final thesis appreciates you.
Best explanator on youtube, you have a good mix of simplifying so it can be understood, but not overly simplifying so we learn deeply enough. The idea of having 3 passes going deeper was a great idea as well.
After reading lot of blogs, videos, I understood its so difficult network. But after going through this, I feel so easy to understand BERT(and its varient available in transformer library)
Excellent Explanation. Main thing to note is the finer point around explanation/mention of the loss functions that BERT uses. As not many other videos on same topic cover this. Tood Good
The NSP task does not directly involve bidirectional modeling in the same way as masked language modeling (MLM) does. Instead, it serves as a supplementary objective during BERT's pre-training phase. The purpose of the NSP task is to help the model understand relationships between sentences and capture broader context beyond adjacent words.
Im a teacher with a compi-sci masters and am just diving into AI. This was absolutely great! I studied natural language in college about 20 years ago and this video really helped form a mental bridge with the new technologies. Language chatbots have been around forever!and the ones I studied used Markov chains that used small corpuses. So of course students would dump porn novels, the bibel and whatnot into it and then start it talking. We never laughed so hard!
Amazing stuff. For visualization purposes, when you get into a deeper pass, I would recommend always adding the zooming effect for intuitive understanding. I am not sure about others, but when you do that, I instantly know "OK, now we are within this 'box' "
dude , you are amazing , you explained the state of the art NLP model , in such a well explained and concise video . Thanks a ton for this video !!!!!!
sentence: "The cat sat on the mat." BERT reads sentences bidirectionally, which means it looks at the words both before and after each word to understand its meaning in the context of the whole sentence. For example, let's say you want to know what the word "mat" means in the sentence: "The cat sat on the mat." BERT understands "mat" not just by itself, but by considering the words around it. It knows that a cat can sit on something, and "mat" here means a flat piece of fabric on the floor.
Firstly thanks for the really cool explanation. Would like to point out please remove the text animation as it causes a huge distraction for a few people, I had to watch this with multiple breaks cause my head was aching due to the text animation.
At 7:55 , the position embeddings as said in the video encodes the position of a word in a sentence. But in the slide, the sequence of position embeddings if E0..E5,E6,E7..E10 instead of E0..E5,E0,E1..E5 (implying position embedding of a word depends on how the 2 sentences are arranged)
7:10 If my question was "What is the color of apple?" and the expected answer is something like "The color of the apple is red". According to you at 7:10, bert will only output the "start and end" word of the answer which in this case would be "The" and "red". Just by these two words, how will I get the answer back to my question "What is the color of apple?"
@CodeEmporium I have a question for you. Imagine the sentence : "My dad went to the dentist yesterday and he told him that he needs to floss more." Can BERT understand that in this context "he" is probably the dentist, and "him" my dad?
Great question. It's actually a studied problem in NLP called "cataphoric resolution" and "anaphoric resolution". I have seen examples of it online, so i think we should be able to do this to some degree.
It's actually even more complex than you mentioned, since the first "he" is the dentist but the second "he" is your dad! It would be fascinating to see if this can actually be done.
I think there is a small problem: You said it understood language. But if i say something like "turn off the screen" or "its dark here", will it ever be able to really understood this and really turn off my monitor or turn on the lights? It has only the sequence of words but it doesnt know how to use it in real world problems. it needs to be combined with other neural networks and specialized hardware to give answers and perform actions in real world. It needs to understand the situation using things like cameras and other sensors to generate an action that is requested
Excellent explanations. One question: For transformer-based translation, you feed one word at a time to get the next word. I assume you mean you concatenate the last output to the input to get the next word in the sentence. Is this the same for BERT-based translation? Without this, I would assume that the nth word of the output would be the superposition of all possible outputs.
Correct me if I'm wrong, but I don't think you can say that BERT is a language model, since you're taking the probabilities of the masked words alone right? Also, don't the WordPiece embeddings make it output stuff that wouldn't make sense anymore for a language model?
I always watch your videos and appreciate the efforts you put to make the complicated topics so easy and clear. Thankyou for all your work. I really like the way you explain in 3 passes.... great work
For details and code on building a translator using a transformer neural network, check out my playlist "Transformers from scratch": ua-cam.com/video/QCJQG4DuHT0/v-deo.html
I have studied for several years AI, NPL and Neural networks. But the way you explained this was lovely, friendly and very simple which is why I am pretty sure you are BERT
Just watched a video on Transformers, and now this. Am astounded at how you explained such complex notions with such ease!
Hugely underrated channel!
Thanks a lot! Glad you liked it 😊
I read many blogs on BERT but they were more focused on how to use BERT rather than what actually BERT is?. This video helped me clear all my doubts regarding how Bert is trained. Clear and concise explanation.
I'm honestly amazed of how you managed to transform a complex algorithm in a simple 10 minutes video. Much thanks to you, my final thesis appreciates you.
Hahaha anytime ):
Amazing Explanation :)
big fan sir
hear hear! so agree!
To teach us , you study and explore .. really grateful for your efforts Krish .
Where there is krish sir... I will come there. .... Sir I found u here also...
We are learning together 😇
Nice to meet you sir
Don't hesitate, this is the best video of BERT explanation for sure!
I love the "pass1" "pass2" concept of how you explain things. It's great.
Phenomenal the way you condense such a complicated concept into a few minutes, clearly explained.
Thanks so much for the compliments:)
I wish I had come across this channel earlier. You have a wonderful skill in explaining complicated concepts. I love your "3 pass" approach!!
The BEST explanation on BERT. Simply outstanding!
OMG!!!! This vid is a life-saver! just elucidated so many aspects of NLP to me (a 3 month beginner who still understands nothing
This is one of the best resources explaining BERT available online.
Best explanator on youtube, you have a good mix of simplifying so it can be understood, but not overly simplifying so we learn deeply enough. The idea of having 3 passes going deeper was a great idea as well.
Dude this video is incredible. I cannot express how good you are at explaining
Thanks for watching! Super glad it is useful
Good try though!
Extremely underrated channel, didn't find any other good explanation on UA-cam/Medium/Google
Wow.. just switched from another BERT explained video to this.. stark difference.. excellent explanation indeed.. thanks..
Every time I watch this video I gain a better understanding of the procedure. Thanks a lot for the great content!!!
Anytime! Look forward for more
Was struggling to understand the basics of BERT after going through Transformer model. This video was indeed helpful.
Best ever explanation of BERT! Finally understood how it works :)
God Level Explanation. Thanks Man !
I am very impressed by the clarity and core focus of your explanations to describe such complex processes. Thank you.
You are very welcome. Thanks for watching and commenting :)
this must be one of the best explanation videos on the internet, thank you!
You are very welcome :)
After reading lot of blogs, videos, I understood its so difficult network. But after going through this, I feel so easy to understand BERT(and its varient available in transformer library)
Always come back to your explanation whenever want to refresh bert concepts. Thanks for the effort.
Thanks for the super kind words :)
Very very friendly, clear and masterful explanation. This is exactly what I was after. Thank you!
Excellent Explanation. Main thing to note is the finer point around explanation/mention of the loss functions that BERT uses. As not many other videos on same topic cover this. Tood Good
Thanks so much :)
The NSP task does not directly involve bidirectional modeling in the same way as masked language modeling (MLM) does. Instead, it serves as a supplementary objective during BERT's pre-training phase. The purpose of the NSP task is to help the model understand relationships between sentences and capture broader context beyond adjacent words.
Wow , Best Explanation on BERT present in UA-cam that too Free, Thanks Man you made NLP easy.🍺
Glad! Thanks a ton for watching
Im a teacher with a compi-sci masters and am just diving into AI. This was absolutely great! I studied natural language in college about 20 years ago and this video really helped form a mental bridge with the new technologies. Language chatbots have been around forever!and the ones I studied used Markov chains that used small corpuses. So of course students would dump porn novels, the bibel and whatnot into it and then start it talking. We never laughed so hard!
3 pass explanation is a really good approach to explain this complex concept. Best video on BERT
Great video! But what is pass 1, pass 2 and pass 3?
Amazing stuff. For visualization purposes, when you get into a deeper pass, I would recommend always adding the zooming effect for intuitive understanding. I am not sure about others, but when you do that, I instantly know "OK, now we are within this 'box' "
Good thought. I'll try to make this apparent in the future. Thanks!
Thank you for the explanation. You really have a knack for explaining NLP concepts clearly without losing much fidelity. Please keep posting!
One of the best videos on BERT.
Great work!
Wishing you loads of success!
No one explains DL models better than this guy.
Thank You. Love from India.
The multiple passes of explanation is an absolutely brilliant way to explain! Thanks man.
Best explanation i have seen so far on BERT.
I thank you kindly ;)
dude , you are amazing , you explained the state of the art NLP model , in such a well explained and concise video . Thanks a ton for this video !!!!!!
You are super welcome. Thanks so much for commenting this!
sentence: "The cat sat on the mat."
BERT reads sentences bidirectionally, which means it looks at the words both before and after each word to understand its meaning in the context of the whole sentence.
For example, let's say you want to know what the word "mat" means in the sentence: "The cat sat on the mat." BERT understands "mat" not just by itself, but by considering the words around it. It knows that a cat can sit on something, and "mat" here means a flat piece of fabric on the floor.
Nice job, man! Especially the multi-phase approach of explaining things, top to bottom.
Super happy you liked the approach. Thanks for commenting
I've read 4 articles before coming here. Couldn't connect the dots. This single video showed me the way.. Thanks a lottt
Super glad it helped :)
Such a underrated channel. Keep it up man
The one which I was looking for the past 6 months.! Thanks a lot for making this.
I'm late. But. Here now!
That's it. The best explanation i came through. Receive my upvote and subscription 😁
Many thanks. Join the discord too :3
Excellent explanation!. Will never miss a video of yours from now on!
Good touch to put the references on the description instead of on the slides
Wow, thanks!! I tried watching many videos and couldn't understand a single thing. But yours was truly concise and informative.
Best Explanation by far!
Thanks for the great explanation of Transformers and the architecture of BERT.
My pleasure and thank you for the super thanks :)
BEST EXPLAINATION EVEERRRR
First half is exactly how much I need to understand right now, thank you :)
Awesome! You are very welcome!
Probably the best (easiest to understand in one go) video on BERT. Thanks ❤️
Excellent Explanation! Thank you!
Very nice high level understanding of Transformers...
Great explanation, I really like the three pass idea it breaks down a lot of complications to simple concepts.
Simple and clear explanations (which shows you know what you're talking about). And cool graphics. Will be back for more videos :)
Thanks! Keep tuning in!
Excellent introduction, visualizations and step by step approach to explain this. Thanks a ton.
You are oh-so welcome. Thank you for watching and commenting:)
wow AMAZING EXPLAINATION. Thank you very much
Beautiful explanation!
Thank you so much for the clear explanation, I get the grip of the BERT now!
Do you realize you are the only good description of how exactly fine tuning works I have found, and I've been researching for months. Thank you!!!
You are too kind. Thank you for the donation. You didn’t have to bug it is appreciated. Also super glad this content was useful! More of this to come
thank you so very much! one video was enough to get the basics clear
Glad! Welcome!
No more reading after this. Loved IT. 😊
Many thanks 😊
loved your explanation bro, earned yourself a sub
Well explained. Short and to the point
Omg your videos are so good! So happy I found your channel, I'm binge watching everything :D
Glad you found my channel too! Thank you! Hope you enjoy them!
Firstly thanks for the really cool explanation. Would like to point out please remove the text animation as it causes a huge distraction for a few people, I had to watch this with multiple breaks cause my head was aching due to the text animation.
Loved how you explained BERT really well. Great job!
I studied 3 hours in book, but you completed in 10 minutes 😭. Super Explanation Thanksssssssssss
I am a speedy explainer :)
@@CodeEmporium It's understandable
Amazing Explanation! I am speechless :)
So simple and easy to understand, thanks a lot
Super glad this was useful :)
Excellent and concise explanation. Loved it. Thanks for this fantastic video.
Fantastic explanation, Covered each and every point in the BERT.
Looking forward for more videos on NLP.
At 7:55 , the position embeddings as said in the video encodes the position of a word in a sentence. But in the slide, the sequence of position embeddings if E0..E5,E6,E7..E10 instead of E0..E5,E0,E1..E5 (implying position embedding of a word depends on how the 2 sentences are arranged)
Your Videos are very well explained !
Thanks for wonderful explanation for bert architecture 🍀🌹
7:10
If my question was "What is the color of apple?" and the expected answer is something like "The color of the apple is red".
According to you at 7:10, bert will only output the "start and end" word of the answer which in this case would be "The" and "red". Just by these two words, how will I get the answer back to my question "What is the color of apple?"
This is an excellent summary. Very clear and super well organized. Thanks very much
Thank you so much for watching ! And for the wonderful comment :$
awesome introduction to a very challenging topic
Thank you. Uploading a related video on this soon too :)
thank you for sharing the video! very clear and helpful!!
Not bad! Loved the video. Please add a little bit of more explanation for upcoming vids if preferrable.
@CodeEmporium I have a question for you. Imagine the sentence :
"My dad went to the dentist yesterday and he told him that he needs to floss more."
Can BERT understand that in this context "he" is probably the dentist, and "him" my dad?
Great question. It's actually a studied problem in NLP called "cataphoric resolution" and "anaphoric resolution". I have seen examples of it online, so i think we should be able to do this to some degree.
It's actually even more complex than you mentioned, since the first "he" is the dentist but the second "he" is your dad! It would be fascinating to see if this can actually be done.
Well explained! I have been looking for something like this for quite long!
In the video, you mentioned that during training C is binary output but in the paper is it mentioned that it is a vector.
Excellent articulation of the concept. Thank you.
This is amazing! Crystal clear explanation, thanks a lot.
great video, i like the 3-pass method you used to explain the concepts
I think there is a small problem: You said it understood language. But if i say something like "turn off the screen" or "its dark here", will it ever be able to really understood this and really turn off my monitor or turn on the lights? It has only the sequence of words but it doesnt know how to use it in real world problems. it needs to be combined with other neural networks and specialized hardware to give answers and perform actions in real world. It needs to understand the situation using things like cameras and other sensors to generate an action that is requested
Great Video! So easy to follow!
Excellent explanations. One question: For transformer-based translation, you feed one word at a time to get the next word. I assume you mean you concatenate the last output to the input to get the next word in the sentence. Is this the same for BERT-based translation? Without this, I would assume that the nth word of the output would be the superposition of all possible outputs.
Correct me if I'm wrong, but I don't think you can say that BERT is a language model, since you're taking the probabilities of the masked words alone right? Also, don't the WordPiece embeddings make it output stuff that wouldn't make sense anymore for a language model?
I always watch your videos and appreciate the efforts you put to make the complicated topics so easy and clear. Thankyou for all your work. I really like the way you explain in 3 passes.... great work
explained the concepts clearly
Wow I've just discovered your channel, it's full of resources, very nice!
Very good explanation, Easy to understand! Come on!
Hey, your explantation and presentation on complicated concepts made me clear about TF and BERT.
I will expect you to upload more exciting videos.
very underrated channel!
Thanks so much!
Your explanation is amazing man. Now started to get hang on transformers 😅✌️
Thanks a lot for the super clear explanation! Are your slides available by any chance (for reuse with attribution)?
great explanation, I understood everything ! thanks a lot
Thanks so much for watching and commenting :)