1:39 Bidirectional Language Modeling 2:45 Masking Strategy 3:38 BERT input 4:55 The Illustrated Transformer 5:50 Tensor Dimensions in BERT 7:20 BERT Model Architecture 7:58 BERT Base vs. Large 9:13 Datasets for Training BERT 9:40 Transfer Learning with BERT 10:03 SQuAD and BERT 12:00 Ablations
Thank you . Quick Q clarification can : Is the dimension of Query matrix same as i/p : L x De? How does its factorize i/p to QKV matrices ? I dont think its simple SVD . Dimension of K : Dk x De so that KT = De x Dk and can be multipled with Q? Is this correct ? Dimenson of V : Dv x De ? and Dv= Dk so that final output Z can be LxDe ? Is this understanding correct ?
Loved the video Henry! Your fast paced style works great to gain a general understanding of the model & how it fits into a use case . Each slide also serves as a good index for further learning. Surprised at all the negative comments.. although you might’ve done better calling it ‘Bert Overview’
Hi, great work! Can you make a video about the first Transformer paper "Attention Is All You Need" I haven't caught up on those things and I think others will appreciate it too
Thank you for the suggestion! I recommend watching "Attention is all you need" from Yannic Kilcher on UA-cam in the meantime! That video and the blog post "The Illustrated Transformer" helped a lot with my understanding of it!
Yes! I would like suggesting the same thing! I watched the Yannic Kilcher one before. But I really would like to see a focus in the attention per se. Thank you!
Problems with your video: You speak too fast relative to the changing slides and text on your slides. This is ineffective when creating tutorials. You assume the viewers already know too much so you throw around words like "auto-regressive" etc. without bothering to explain what that is. Perhaps you should make videos abt a focused sub-topic, coz otherwise this type of video isn't of much utility to people.
I have a question if anyone can help, if i input for bert or any transformer a paragraph that contains name of disease or genes for example, how it can detect that this is a disease? and does it replace it with a tag for example. second question: is there a possible way to add those identified tags into a matrix for example so i would focus on them will applying attention?
I have seen a few approaches where they perform both BERT and LDA separately, concatenate the vector representations (BERT + LDA), and finally, they execute an autoencoder to learn a lower-dimensional latent space representation. blog.insightdatascience.com/contextual-topic-identification-4291d256a032
1:39 Bidirectional Language Modeling
2:45 Masking Strategy
3:38 BERT input
4:55 The Illustrated Transformer
5:50 Tensor Dimensions in BERT
7:20 BERT Model Architecture
7:58 BERT Base vs. Large
9:13 Datasets for Training BERT
9:40 Transfer Learning with BERT
10:03 SQuAD and BERT
12:00 Ablations
Thank you . Quick Q clarification can :
Is the dimension of Query matrix same as i/p : L x De?
How does its factorize i/p to QKV matrices ? I dont think its simple SVD .
Dimension of K : Dk x De so that KT = De x Dk and can be multipled with Q? Is this correct ?
Dimenson of V : Dv x De ? and Dv= Dk so that final output Z can be LxDe ? Is this understanding correct ?
Thanks ! But please slow down :)
0.75 speed :)
Loved the video Henry! Your fast paced style works great to gain a general understanding of the model & how it fits into a use case . Each slide also serves as a good index for further learning. Surprised at all the negative comments.. although you might’ve done better calling it ‘Bert Overview’
my understanding of transformers somehow went down by watching this video
Well explained but yes slow down a bit! 👍👍
Hi, great work! Can you make a video about the first Transformer paper "Attention Is All You Need"
I haven't caught up on those things and I think others will appreciate it too
Thank you for the suggestion! I recommend watching "Attention is all you need" from Yannic Kilcher on UA-cam in the meantime! That video and the blog post "The Illustrated Transformer" helped a lot with my understanding of it!
Yes! I would like suggesting the same thing! I watched the Yannic Kilcher one before. But I really would like to see a focus in the attention per se. Thank you!
Just breathe while speaking!
When a rapper starts learning NLP and Machine Learning
😂😂
😂😂😂
can't stop laughing😂😂😂
Neat explanation. After going through the paper, this video is best for quick go through.
Thanks for the time stamps. Nice explanation overall.
Nicely explained
if you play this video at double speed you can smell your brain cooking a little
I don't know why people are complaining. I am not a native speaker and for me your rate of speaking is just fine.
Hi. Nice work, but you are talking waaaaay too fast. Slow down
Problems with your video: You speak too fast relative to the changing slides and text on your slides. This is ineffective when creating tutorials. You assume the viewers already know too much so you throw around words like "auto-regressive" etc. without bothering to explain what that is. Perhaps you should make videos abt a focused sub-topic, coz otherwise this type of video isn't of much utility to people.
agree
Can a student apply BERT for his project work?
Why is the rush?
Liked the video a lot....have subscribed to your channel...please upload more videos
Good work . Please slow down next time !
who is chasing you? super fast!
I have a question if anyone can help, if i input for bert or any transformer a paragraph that contains name of disease or genes for example, how it can detect that this is a disease? and does it replace it with a tag for example.
second question: is there a possible way to add those identified tags into a matrix for example so i would focus on them will applying attention?
Could I extract word embeddings from BERT and use them for unsupervised learning, e.g. topic modeling? :)
I have seen a few approaches where they perform both BERT and LDA separately, concatenate the vector representations (BERT + LDA), and finally, they execute an autoencoder to learn a lower-dimensional latent space representation.
blog.insightdatascience.com/contextual-topic-identification-4291d256a032
@Henry AI Labs have a question... Is BERT good enough for Malware detection?
amazing video thanks!
Are you Brandon Butch?
turn the speed to 2x, it' really easy to rock.
Thank you!
Btw you're not talking too fast. If you were slower it'd become boring. There are captions and slow-downs for people who can't follow.
Too fast sorry but I can't follow up
I'm just now learning text mining and nlp. Holy shit I don't understand anything
hallo bert!!!!!!!!! hi!!!!!!!!!!!!!!!!!!!!!
I'm watching at 1.5 speed and can understand it perfectly fine.
Too fast but great.
153 dislikes woa
Slow down please