@@googlecloudtech hai does transformer uses algorithm and can uh tell me wt is mechanism and algorithm it uses . I thought self-attension is algorithm or it both mechanism and algorithm. Please tell me
She has one of the wealthiest company on earth providing her resources. First hand access to engineers, researchers, top notch communicators and marketing employees.
@@michaellavelle7354 Her explanation is absolutely useless. Have you ever programmed a Transformer model from scratch to verify what she has explained?
Takeaways: A transformer is a type of neural network architecture that is used in natural language processing. Unlike recurrent neural networks (RNNs), which analyze language by processing words one at a time in sequential order, transformers use a combination of positional encodings, attention, and self-attention to efficiently process and analyze large sequences of text. Neural networks, Convolutional neural networks (for image analysis), Recurrent neural networks (RNNs), Positional encodings, Attention, Self-attention Neural networks: A type of model used for analyzing complicated data, such as images, videos, audio, and text. Convolutional neural networks: A type of neural network designed for image analysis. Recurrent neural networks (RNNs): A type of neural network used for text analysis that processes words one at a time in sequential order. Positional encodings: A method of storing information about word order in the data itself, rather than in the structure of the network. Attention: A mechanism used in neural networks to selectively focus on parts of the input. Self-attention: A type of attention mechanism that allows the network to focus on different parts of the input simultaneously. Neural networks are like a computerized version of a human brain, that uses algorithms to analyze complex data. Convolutional neural networks are used for tasks like identifying objects in photos, similar to how a human brain processes vision. Recurrent neural networks are used for text analysis, and are like a machine trying to understand the meaning of a sentence in the same order as a human would. Positional encodings are like adding a number to each word in a sentence to remember its order, like indexing a book. Attention is like a spotlight that focuses on specific parts of the input, like a person paying attention to certain details in a conversation. Self-attention is like being able to pay attention to multiple parts of the input at the same time, like listening to multiple conversations at once.
Love the content and thanks for the great video! (one thing that might help is lower the background music a bit, I found myself stopping the video because I thought another app was playing music)
This is such an informative video about transformers in machine learning! It's amazing how a type of neural network architecture can do so much, from translating text to generating computer code. I appreciate the clear explanations of the challenges with using recurrent neural networks for language analysis, and how transformers have overcome these limitations through innovations like positional encodings and self-attention. It's also fascinating to hear about BERT, a popular transformer-based model that has become a versatile tool for natural language processing in many different applications. The tips on where to find pertrained transformer models and the popular transformers Python library are super helpful for anyone looking to start using transformers in their own app. Thanks for sharing this video!
I have more respect for Google after watching this Video. Not only did they provided their engineers with the funding to research, but they also let other companies like OpenAI to use said research. And they are opening up the knowledge for the general public with these video series.
Thanks you did a great job. I spent some time already looking at different videos to capture the high level idea of what transformers are about and yours is the clearest explanation. I actually do have an educational background in neutral networks but don't go around remembering every details or the state of the art today so somebody removing all the unessesary technical details like you did here is very useful.
Very interesting, informative, this added perspective to a hyped-up landscape. I'll admit, I'm new to this, but when I hear "pretrained transformer" I didn't even think about BERT. I appreciate getting the view from 10,000 feet.
You have no idea how much time I potentially have saved just by reading your blog and watching this video to get me up to speed quickly on this. "Liked" this video. Thanks
I knew little on transformers before this video. I know little on transformers after this video. But I guess in order to know some, we'll need a 2-3 hours video.
When I saw this title, I was hoping to better understand the mathematical workings of transformers such as matrices and the like. Maybe you could do a follow-up video explaining mathematically how transformers work. thank you for your time
When I was a kid, I knew the trouble of translation were due to literally translation words, without contextual/ sequential awareness. I knew it's important to distinguish between synonyms. I've imagined there's a button that generate the translation output then you can highlights the you words that doesn't make sense or want improvement on it . then regenerate text translation. this type of nlp probably exist before I program my first hello world (+15y ago)!
From 5:28, shouldn't it be the following: "when the model outputs the word “économique,” it’s attending heavily to both the input words “European” and “Economic.” "? For européenne, I see that it is attending only to European. Please let me know if I am missing something here. Thanks for the great video.
Very well explained. This video is must watch for anyone who wants to demystify the latest LLM technology. Wondering if this could be made into a more generic video with a quick high-level intro on neural networks for those who aren't in the field. I bet there are millions out there who want to get a basic understanding of how ChatGPT/Bard/Claude work without an in-depth technical deep dive.
do transformers learn the internal representation one language at a time or all of them at the same time? I remember that Chomsky said that there's no underlying structure to language and that for every rule you try to make you'll always find an edge case that contradicts the rule.
The invention of transformers seems to have jump-started a revolutionary acceleration in machine learning! Between the models you mentioned here, plus the way transformers are combined with other network architectures in DALL-E 2, OpenAI Jukebox, PaLM, Chinchilla/Flamingo, Gato -- it seems like adding a transformer to any model produces bleeding-edge, state-of-the-art-or-better performance on basically any tasks. Barring any major architecture innovations in the future, I wonder if transformers end up being the key we need to reach human levels of broad-range performance after all 🤔
@Dino Sauro They're certainly not dead, since they're still being incorporated into the bleeding edge AIs. But technology is always evolving, building upon one idea to create the next. If you're hoping for a "final architecture" that will be the best and never replaced by anything else, you're out of luck. While I respect Professor Marcus, his ideas about the requirements for AGI strongly imply that intelligent design is required for true intelligence to emerge, and I think evolution contradicts that view.
I think you are right...we just saw its use in ChatGPT...and I think ChatGPT is just a glimpse of what future holds and how it will affect the IT, EV and Industrial Automation Industry. Am I right? You wanna add something to it?
@@tanweeralam1650 I agree. ChatGPT, though, is really just GPT-3 with a larger input layer, and human-guided reinforcement learning on top of it. Which is a step in the right direction for sure, but not as huge a development as a lot of people are touting it to be. From what I can tell, there are three issues that need to be solved before transformer-based (or transformer-incorporating) AIs can reach truly human levels of intelligent behavior. (1) They need to be bigger. If we think of the model parameter size as analogous to brain synapses, there are about a quadrillion synapses in a human brain, which is orders of magnitude more than the biggest current transformers. For instance, the largest single transformer model is 207 billion parameters, and the largest transformer-incorporating language model is 1.75 trillion parameters. On the other hand, such models don't need to allocate parameters for things like body maintenance, reproduction, etc., so it's not a 1-to-1 correspondence, but I think it's a good estimate for the order of magnitude we need to reach before we get to human levels of sapience. That said, models keep getting bigger, so I have no doubt we'll achieve this within the next decade at most. (2) Multimodality is important. A lot of "common sense" understanding that AIs seem to lack can likely be attributed to their lack of variety in types of input they can learn from. If you only learn from text, it's a lot harder to learn what the described concepts actually *mean.* On the other hand, a model that can learn from text, images, video, audio, and other forms of data should be able to learn much more accurate representations of the world. And of course, there's a TON of research into multimodal learning right now, so we'll get there pretty soon, too, I think. (3) The third obstacle I think is the hardest: continual learning. (From what I can tell, by the way, "continual learning" is synonymous with "incremental online learning". Let me know if there are any important differences between the two.) An AI without this can learn from a *ton* of data, but once it does, it stops learning and everything it knows is set in stone. In effect, this means every interaction with such an AI "resets" it, and so you might get inconsistent behaviors as slightly different initial conditions of an interaction can lead to very different outputs when previous similar interactions are not incorporated into the model's weights (which, in this context, can be thought of as its "long term memory"). This also means the AIs can't form consistent opinions, since any opinion they might espouse in one conversation is immediately forgotten for the next. Continual learning techniques already exist for smaller networks, but they are not at all efficient enough to practically apply to these very large language models of many billions of parameters or more. Which is a shame, because I'd speculate that larger models would be less prone to retroactive interference -- "catastrophic forgetting" -- than smaller ones, if we could efficiently incrementally train them.
@@IceMetalPunk I did understand your first 2 points and agree with it...but I want to slightly differ with your 3rd point. I dont understand...Why would the AI would stop learning?? Due to its storage space, Processing power exhaustion or for what reason? What you said may be a POSSIBILITY...But its others side also exists...it may just continue learning more n more and make it's system better. To have Human like Intelligence...I dont think it will achieve that in next 30-40 yrs...far from those timeline...I can't say. And frankly there is NO NEED to have AIs so Advanced. Upto a certain extent...AIs should develop and Humans MUST BE able to control them. Always. And can you say will Programs like ChatGPT ( i mean its advanced form) able to replace search Engine like Google in future?? Also how AI/ML will affect IT industry as a whole and also EV, Industrial Automation industry (e.g.- the industry where companies like Siemens, Honeywell operate)??
An excellent video. I wonder if you can comment on "living the life" of a transformers user. For example, in another video by another UA-camr I heard the sentiment that being an AI person in this era means constant - really constant - study. That may not be the lifestyle that everybody wants to adopt. I'm a retired neurologist and vice president of the faculty club at my state university. What interests me these days is how students "should" be educated in this era. And, at the end of the day, one of the critical aspects of that is matching individual human brains - with their individual proclivities - with the endless career opportunities of this era. So, I'm trying to gather perspectives (aka "data") on that topic. Maybe you could make some kind of video about it. Please do!
I think the most important thing is that students are simply encouraged to use these tools. It's pretty hard to get a realistic grasp of the capabilities without really pushing the systems. The idea about needing to do constant research is interesting, and I think it's something that a person CAN do (the rest of my life probably lmao) but I think simply adopting the tools is all that will effectively matter. It's too early to be much more specific sadly. When it comes to younger education then we definitely need to be putting more focus on skills and behaviors instead of knowledge.
Wow, I bet the average person watching this probably wouldn't have known what a protein-folding problem was, but luckily that graphic cleared things up. Great example that helps anyone understand the practical advances made by transformers.
Hi Google! First of all, thank you for this wonderful video. I'm working on a multiclass (single label) supervised learning that uses Bert for transfer learning. I've got about 10 classes and a couple hundred thousand examples. Any tips on best practices (which Bert variants to use, what order of magnitude of dropout to use if any)? I know I could do hyperparameter search but that'd probably cost more time and money than I'm comfortable with (for a prototype), so I'm looking to make the most out of my local Nvidia 3080.
I can indeed not believe how so many here are praising the vid for how clear and good it explains... i have learned little new from it... that transformers use some form of recursion and that the words in the data are sequentially marked. And while these apparently are very important to the concept of transformers, they were not explained.
Dr. Ashish Vaswani is a pioneer and nobody is talking about him. He is a scientist from Google Brain and the first author of the paper that introduced TANSFORMERS, and that is the backbone of all other recent models.
Thank you so much for your help. With the assistance of GPT-4, I have been able to transition from a seasonal programmer to a full-time programmer. I am truly grateful for your support!
Very impressive video. Thanks for the way you shared information via this video. Reference your video timeline 05:05, how you created such a video, please.
Thanks for the video You mentioned that GPT 3 was trained on 45 terabytes of text. I have seen much smaller numbers, like 570 gig. Can you give me a reference for the training data size. I am working on a project and I would like to cite the correct number. Thanks
GPT-3 was trained on a dataset of 45 terabytes of text data. However, after pre-processing and filtering, the effective size of the dataset used for training is about 570 gigabytes.
So, question: given the goal of understanding meaning within language regardless of language, could a sophisticated enough set of weights derived from a sufficiently large dataset represent essentially the human genome of language?
At one point you said "It's [attention] something that's learned over time from data." I'd be interested to know how this "learning" takes place. Thanks.
Ability to break down complex topic is such an underrated super power. Amazing job.
How did you condense so many pieces of information in such a short time? This video is on a next level, I loved it!
perhaps she used genAI for it? (just on a lighter note - she did a great job)
Transformers! More than meets the eye.
😂
Transformers! Robots in disguise!
Autobots wage their battle to fight the evil forces of the Decepticons!!!!!
Transformers! No money to buy…
Oczywiście
This has to be the best explanation so far, and by a very large margin.
Thank you for watching! We appreciate the kind words. 🤗
@@googlecloudtech hai does transformer uses algorithm and can uh tell me wt is mechanism and algorithm it uses . I thought self-attension is algorithm or it both mechanism and algorithm. Please tell me
@@googlecloudtech plz tell me i dont this i have to answer it
You have the gift of making things simple to understand. Keep up the good work 🙏
I have watched several videos trying to understand the topic. This was by far the best. Thank you.
Great explanation of the key concept of position encoding and self attention. Amazing you get the gist covered in less than 10 minutes.
@Dino Sauro tell me more...
@Dino Sauro thanks for the heads up
She has one of the wealthiest company on earth providing her resources. First hand access to engineers, researchers, top notch communicators and marketing employees.
@@an-dr6eu True, but this young lady talks a mile-a-minute from memory. She's knows it cold regardless of the resources at Google.
@@michaellavelle7354 Her explanation is absolutely useless. Have you ever programmed a Transformer model from scratch to verify what she has explained?
Takeaways:
A transformer is a type of neural network architecture that is used in natural language processing. Unlike recurrent neural networks (RNNs), which analyze language by processing words one at a time in sequential order, transformers use a combination of positional encodings, attention, and self-attention to efficiently process and analyze large sequences of text.
Neural networks, Convolutional neural networks (for image analysis), Recurrent neural networks (RNNs), Positional encodings, Attention, Self-attention
Neural networks: A type of model used for analyzing complicated data, such as images, videos, audio, and text.
Convolutional neural networks: A type of neural network designed for image analysis.
Recurrent neural networks (RNNs): A type of neural network used for text analysis that processes words one at a time in sequential order.
Positional encodings: A method of storing information about word order in the data itself, rather than in the structure of the network.
Attention: A mechanism used in neural networks to selectively focus on parts of the input.
Self-attention: A type of attention mechanism that allows the network to focus on different parts of the input simultaneously.
Neural networks are like a computerized version of a human brain, that uses algorithms to analyze complex data.
Convolutional neural networks are used for tasks like identifying objects in photos, similar to how a human brain processes vision.
Recurrent neural networks are used for text analysis, and are like a machine trying to understand the meaning of a sentence in the same order as a human would.
Positional encodings are like adding a number to each word in a sentence to remember its order, like indexing a book.
Attention is like a spotlight that focuses on specific parts of the input, like a person paying attention to certain details in a conversation.
Self-attention is like being able to pay attention to multiple parts of the input at the same time, like listening to multiple conversations at once.
Great, you learned how to copy paste
@@an-dr6eu first step on becoming a programmer
@@an-dr6eu your comment comes over somewhat 'catty' 😢
Where is optimus prime?
He's on the thumbnail...
He will be in theaters in June 9... Transformers : Rise of breasts..
😂😂😂😂
Where are robotaxis?
We got lied to
This is a GREAT explanation! please lower the background music next time it could really help. thanks again! awesome video
Amazing video! 🎉 You explained that difficult concepts of Transformers so clearly and made it easy to understand. Thanks for all your hard work!🙌👍
Are you serious? The concepts were not really explained. Just a summary of what they do but not how they work behind the scenes.
No.
Love the content and thanks for the great video! (one thing that might help is lower the background music a bit, I found myself stopping the video because I thought another app was playing music)
This is such an informative video about transformers in machine learning! It's amazing how a type of neural network architecture can do so much, from translating text to generating computer code. I appreciate the clear explanations of the challenges with using recurrent neural networks for language analysis, and how transformers have overcome these limitations through innovations like positional encodings and self-attention. It's also fascinating to hear about BERT, a popular transformer-based model that has become a versatile tool for natural language processing in many different applications. The tips on where to find pertrained transformer models and the popular transformers Python library are super helpful for anyone looking to start using transformers in their own app. Thanks for sharing this video!
I have more respect for Google after watching this Video. Not only did they provided their engineers with the funding to research, but they also let other companies like OpenAI to use said research. And they are opening up the knowledge for the general public with these video series.
Thanks you did a great job. I spent some time already looking at different videos to capture the high level idea of what transformers are about and yours is the clearest explanation. I actually do have an educational background in neutral networks but don't go around remembering every details or the state of the art today so somebody removing all the unessesary technical details like you did here is very useful.
I wish they don't embed music on the background, it makes harder to follow the conversations.
Agree
Thank you for this high-level explanation. I now understand transformers more clearly
can you explain to me pls
Actually you didn't
This is a very well produced video. Credits to the presenter and those involved in production with the graphics
Very interesting, informative, this added perspective to a hyped-up landscape. I'll admit, I'm new to this, but when I hear "pretrained transformer" I didn't even think about BERT. I appreciate getting the view from 10,000 feet.
You have no idea how much time I potentially have saved just by reading your blog and watching this video to get me up to speed quickly on this. "Liked" this video. Thanks
The visuals are very helpful. Thanks.
You're very welcome!
Just Mind-blowing way to explain an LLM, just phenomenal.
Love how you simplified it. Thank you
It s so simplified that you can t understand anything
This is awesome. This has been one of the best overall breakdowns I've found. Thank you!!
Nice amount of info parted in this video. Very clear info on what Transformers are and what made them so great.
Easiest to understand explaination ive heard so far
This is an excellent video introduction for transformers.
Great video for people who are curious but don’t really want to (or can’t) understand how transformers actually work.
thank you! I'm just starting to learn about gpt and this was quite helpful, though I will have to watch it again :)
I knew little on transformers before this video. I know little on transformers after this video. But I guess in order to know some, we'll need a 2-3 hours video.
OMG the BEST transformers video EVER!
Dale you are so good at explaining this tech, thank you!
so super helpful for my thesis, thank u
wow, what a great summary! thanks!!!
When I saw this title, I was hoping to better understand the mathematical workings of transformers such as matrices and the like. Maybe you could do a follow-up video explaining mathematically how transformers work.
thank you for your time
crazy how things have changed so much
Excellent explanation i ever seen, recommending everyone's this link
Charm, intelligence and clarity! Thanks!
😊
When I was a kid, I knew the trouble of translation were due to literally translation words, without contextual/ sequential awareness. I knew it's important to distinguish between synonyms. I've imagined there's a button that generate the translation output then you can highlights the you words that doesn't make sense or want improvement on it . then regenerate text translation. this type of nlp probably exist before I program my first hello world (+15y ago)!
Thank you so much. I really needed this video, other videos were just confusing
This is one of the best vids I've watched on this topic!
Excellent presentation and explanation of concepts
From 5:28, shouldn't it be the following:
"when the model outputs the word “économique,” it’s attending heavily to both the input words “European” and “Economic.” "?
For européenne, I see that it is attending only to European. Please let me know if I am missing something here. Thanks for the great video.
Very well explained. This video is must watch for anyone who wants to demystify the latest LLM technology. Wondering if this could be made into a more generic video with a quick high-level intro on neural networks for those who aren't in the field. I bet there are millions out there who want to get a basic understanding of how ChatGPT/Bard/Claude work without an in-depth technical deep dive.
I love how to simplify something so complex, thank you so much Dale, the explanation was perfect
how did you do that
@@LIMITLESS2774 This one? Just type ":" (colon) followed by "thanksdoc" and end it with another colon. I can add other emojis like 🤟too!
@@nahiyanalamgir7056 it needs desktop UA-cam i think
@@LIMITLESS2774 Apparently, it does. When will these apps be consistent across devices and platforms?
@@nahiyanalamgir7056 thanks though
Thanks Ma'am. You broke it down well.
Positional Encoding, Attention and Self Attention. That's it! Really well summarized.
do transformers learn the internal representation one language at a time or all of them at the same time? I remember that Chomsky said that there's no underlying structure to language and that for every rule you try to make you'll always find an edge case that contradicts the rule.
The invention of transformers seems to have jump-started a revolutionary acceleration in machine learning! Between the models you mentioned here, plus the way transformers are combined with other network architectures in DALL-E 2, OpenAI Jukebox, PaLM, Chinchilla/Flamingo, Gato -- it seems like adding a transformer to any model produces bleeding-edge, state-of-the-art-or-better performance on basically any tasks.
Barring any major architecture innovations in the future, I wonder if transformers end up being the key we need to reach human levels of broad-range performance after all 🤔
@Dino Sauro They're certainly not dead, since they're still being incorporated into the bleeding edge AIs. But technology is always evolving, building upon one idea to create the next. If you're hoping for a "final architecture" that will be the best and never replaced by anything else, you're out of luck.
While I respect Professor Marcus, his ideas about the requirements for AGI strongly imply that intelligent design is required for true intelligence to emerge, and I think evolution contradicts that view.
@Dino Sauro Um... Okay, friend, whatever you say. Have a nice life.
I think you are right...we just saw its use in ChatGPT...and I think ChatGPT is just a glimpse of what future holds and how it will affect the IT, EV and Industrial Automation Industry.
Am I right? You wanna add something to it?
@@tanweeralam1650 I agree. ChatGPT, though, is really just GPT-3 with a larger input layer, and human-guided reinforcement learning on top of it. Which is a step in the right direction for sure, but not as huge a development as a lot of people are touting it to be.
From what I can tell, there are three issues that need to be solved before transformer-based (or transformer-incorporating) AIs can reach truly human levels of intelligent behavior.
(1) They need to be bigger. If we think of the model parameter size as analogous to brain synapses, there are about a quadrillion synapses in a human brain, which is orders of magnitude more than the biggest current transformers. For instance, the largest single transformer model is 207 billion parameters, and the largest transformer-incorporating language model is 1.75 trillion parameters. On the other hand, such models don't need to allocate parameters for things like body maintenance, reproduction, etc., so it's not a 1-to-1 correspondence, but I think it's a good estimate for the order of magnitude we need to reach before we get to human levels of sapience. That said, models keep getting bigger, so I have no doubt we'll achieve this within the next decade at most.
(2) Multimodality is important. A lot of "common sense" understanding that AIs seem to lack can likely be attributed to their lack of variety in types of input they can learn from. If you only learn from text, it's a lot harder to learn what the described concepts actually *mean.* On the other hand, a model that can learn from text, images, video, audio, and other forms of data should be able to learn much more accurate representations of the world. And of course, there's a TON of research into multimodal learning right now, so we'll get there pretty soon, too, I think.
(3) The third obstacle I think is the hardest: continual learning. (From what I can tell, by the way, "continual learning" is synonymous with "incremental online learning". Let me know if there are any important differences between the two.) An AI without this can learn from a *ton* of data, but once it does, it stops learning and everything it knows is set in stone. In effect, this means every interaction with such an AI "resets" it, and so you might get inconsistent behaviors as slightly different initial conditions of an interaction can lead to very different outputs when previous similar interactions are not incorporated into the model's weights (which, in this context, can be thought of as its "long term memory"). This also means the AIs can't form consistent opinions, since any opinion they might espouse in one conversation is immediately forgotten for the next.
Continual learning techniques already exist for smaller networks, but they are not at all efficient enough to practically apply to these very large language models of many billions of parameters or more. Which is a shame, because I'd speculate that larger models would be less prone to retroactive interference -- "catastrophic forgetting" -- than smaller ones, if we could efficiently incrementally train them.
@@IceMetalPunk I did understand your first 2 points and agree with it...but I want to slightly differ with your 3rd point.
I dont understand...Why would the AI would stop learning?? Due to its storage space, Processing power exhaustion or for what reason? What you said may be a POSSIBILITY...But its others side also exists...it may just continue learning more n more and make it's system better.
To have Human like Intelligence...I dont think it will achieve that in next 30-40 yrs...far from those timeline...I can't say. And frankly there is NO NEED to have AIs so Advanced. Upto a certain extent...AIs should develop and Humans MUST BE able to control them. Always.
And can you say will Programs like ChatGPT ( i mean its advanced form) able to replace search Engine like Google in future?? Also how AI/ML will affect IT industry as a whole and also EV, Industrial Automation industry (e.g.- the industry where companies like Siemens, Honeywell operate)??
It was funny and instructive. Thanks 🙂
Great summary of the Transformers technology!
My only criticism: :The backroundmusic got annoying after 3-4 minutes, but that might just be me.
gangsta until kinetic solutions inc get transformers technology
Wow, this is so well explained.
An excellent video. I wonder if you can comment on "living the life" of a transformers user. For example, in another video by another UA-camr I heard the sentiment that being an AI person in this era means constant - really constant - study. That may not be the lifestyle that everybody wants to adopt. I'm a retired neurologist and vice president of the faculty club at my state university. What interests me these days is how students "should" be educated in this era. And, at the end of the day, one of the critical aspects of that is matching individual human brains - with their individual proclivities - with the endless career opportunities of this era. So, I'm trying to gather perspectives (aka "data") on that topic. Maybe you could make some kind of video about it. Please do!
I think the most important thing is that students are simply encouraged to use these tools. It's pretty hard to get a realistic grasp of the capabilities without really pushing the systems. The idea about needing to do constant research is interesting, and I think it's something that a person CAN do (the rest of my life probably lmao) but I think simply adopting the tools is all that will effectively matter. It's too early to be much more specific sadly. When it comes to younger education then we definitely need to be putting more focus on skills and behaviors instead of knowledge.
woww, she's good at explaining things
Soo cool! Great work
Wow, I bet the average person watching this probably wouldn't have known what a protein-folding problem was, but luckily that graphic cleared things up. Great example that helps anyone understand the practical advances made by transformers.
Hi Google! First of all, thank you for this wonderful video. I'm working on a multiclass (single label) supervised learning that uses Bert for transfer learning. I've got about 10 classes and a couple hundred thousand examples. Any tips on best practices (which Bert variants to use, what order of magnitude of dropout to use if any)? I know I could do hyperparameter search but that'd probably cost more time and money than I'm comfortable with (for a prototype), so I'm looking to make the most out of my local Nvidia 3080.
Fantastic!. Thanks for simplifying the concept
Let's start a trend where you trust us to learn things without constant music going. Add graphics and visual aids, remove music.
Such a simple yet revolutionary 💡idea
Good(Pro) Explanation.
I loved it and very simple ,clear explanation.
Positional encoding = time, attention = context, self attention = thumbprint (knowledge)... looks like a good start for AGI 😀
This is a really awesome video! Thank you so much for simplyifying the concepts.
i really enjoyed the concepts you explained. simple to understand
Simplest Explanation ever
Thank you
As a software engineer, I was kinda hoping for a deeper dive. Will you be doing a video on a deeper dive into them?
I can indeed not believe how so many here are praising the vid for how clear and good it explains... i have learned little new from it... that transformers use some form of recursion and that the words in the data are sequentially marked. And while these apparently are very important to the concept of transformers, they were not explained.
That's a really good high-level explanation!
Please remove background music, it's really disturbing when you only listen to this otherwise great video
After reading your comment, then I began noticing the music now I can’t stop it ha ha
Amazing video! Nice explanation and examples 😄👍
I would like to see more videos like this and practices ones
Great explanation.
Dr. Ashish Vaswani is a pioneer and nobody is talking about him. He is a scientist from Google Brain and the first author of the paper that introduced TANSFORMERS, and that is the backbone of all other recent models.
Thanks! This is a great intro video!
So easy and clear to understand. Thanks
Thank you so much for your help. With the assistance of GPT-4, I have been able to transition from a seasonal programmer to a full-time programmer. I am truly grateful for your support!
Nice to hear that
The explanation is great, clear and I'm interested, but the background music makes it really hard to concentrate.
This was a really, really awesome breakdown 👏🏾
phenomenal video
NICE SUPERB PRESENTATION
This was a really skillfull brakedown - I will use it to explain advanced A.I. in our psychiatric Journal Club :)
Transformers, more than meets the eye
Very impressive video. Thanks for the way you shared information via this video.
Reference your video timeline 05:05, how you created such a video, please.
Very well explained.. This really is a high level view of what Transformers are, but it's probably enough to just get your toes wet in the field!
Well done and informative video. Your music is too loud though. Hard to hear you over it.
Very good lecture, thanks!
Thanks for the video You mentioned that GPT 3 was trained on 45 terabytes of text. I have seen much smaller numbers, like 570 gig. Can you give me a reference for the training data size. I am working on a project and I would like to cite the correct number. Thanks
GPT-3 was trained on a dataset of 45 terabytes of text data. However, after pre-processing and filtering, the effective size of the dataset used for training is about 570 gigabytes.
This is probably the first time after the 90's I have the same "internet wild west" kinda feeling. The genie is out of the bottle baby.
Well written script. Appreciated.
I wanna stay hip in Machine Learning!
Wowww….thanks for clarifying my confusion.
Very well explained. Thank you.
Thanks, that was very interesting
Stooooooppp with the backtracks!!!!!!!
Thanks for your hard work.This video is very helpful!!!
So, question: given the goal of understanding meaning within language regardless of language, could a sophisticated enough set of weights derived from a sufficiently large dataset represent essentially the human genome of language?
Great video.
JP Here,
Thank you :)
More than meets they eye, that's for sure.
❤
At one point you said "It's [attention] something that's learned over time from data." I'd be interested to know how this "learning" takes place. Thanks.