Stanford CS224N: NLP with Deep Learning | Winter 2021 | Lecture 1 - Intro & Word Vectors

Stanford Online

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 23 гру 2024

КОМЕНТАРІ • 133

@rahullak Рік тому ⁺⁴⁰
Thank you to Stanford and to Prof. Manning for making these lectures available to everyone.
@nabilisham6133 Рік тому ⁺⁷⁵
🎯 Key Takeaways for quick navigation:
00:05 🎓 This lecture introduces Stanford's CS224N course on NLP with deep learning, covering topics like word vectors, word2vec algorithm, optimization, and system building.
01:32 🤯 The surprising discovery that word meanings can be well represented by large vectors of real numbers challenges centuries of linguistic tradition.
02:29 📚 The course aims to teach deep understanding of modern NLP methods, provide insights into human language complexity, and impart PyTorch-based skills for solving NLP problems.
07:15 🗓️ Human language's evolution is relatively recent (100,000 - 1 million years ago), but it has led to significant communication power and adaptability.
10:59 🧠 GPT-3 is a powerful language model capable of diverse tasks due to its ability to predict and generate text based on context and examples.
14:52 🧩 Distributional semantics uses context words to represent word meaning as dense vectors, enabling similarity and relationships between words to be captured.
18:37 🏛️ Traditional NLP represented words as discrete symbols, lacking a natural notion of similarity; distributional semantics overcomes this by capturing meaning through context.
25:19 🔍 Word embeddings, or distributed representations, place words in high-dimensional vector spaces; they group similar words, forming clusters that capture meaning relationships.
27:15 🧠 Word2Vec is an algorithm introduced by Tomas Mikolov and colleagues in 2013 for learning word vectors from text corpus.
28:11 📚 Word2Vec creates vector representations for words by predicting words' context in a text corpus using distributional similarity.
29:07 🔄 Word vectors are adjusted to maximize the probability of context words occurring around center words in the training text.
31:02 🎯 Word2Vec aims to predict context words within a fixed window size given a center word, optimizing for predictive accuracy.
32:56 📈 The optimization process involves calculating gradients using calculus to adjust word vectors for better context word predictions.
36:33 💡 Word2Vec employs the softmax function to convert dot products of word vectors into probability distributions for context word prediction.
38:51 ⚙️ The optimization process aims to minimize the loss function, maximizing the accuracy of context word predictions.
45:53 📝 The derivative of the log probability of context words involves using the chain rule and results in a formula similar to the softmax probability formula.
49:28 🔢 The gradient calculation involves adjusting word vectors to minimize the difference between observed and expected context word probabilities.
53:34 🔀 The derivative of the log probability formula simplifies into a form where the observed context word probability is subtracted from the expected probability.
58:57 📊 Word vectors for "bread" and "croissant" show similarity in dimensions, indicating they are related.
59:26 🌐 Word vectors reveal similar words to "croissant" (e.g., brioche, baguette), and analogies like "USA" to "Canada" can be inferred.
59:55 ➗ Word vector arithmetic allows analogy tasks, like "king - male + female = queen," and similar analogies can be formed for various words.
01:00:22 🤖 The analogy task shows the ability to perform vector arithmetic and retrieve similar words based on relationships.
01:01:23 🤔 Negative similarity and positive similarity together enable analogies and meaningful relationships among words.
01:03:17 💬 The model's knowledge is limited to the time it was built (2014), but it can still perform various linguistic analogies.
01:04:39 🧠 Word vectors capture multiple meanings and contexts for a single word, like "star" having astronomical or fame-related connotations.
01:05:36 🔄 Different vectors are used for a word as the center and as part of the context, contributing to the overall representation.
01:07:02 🧐 Using separate vectors for center and context words simplifies derivatives calculations and results in similar word representations.
01:11:26 ⚖️ The model struggles with capturing antonyms and sentiment-related relationships due to common contexts.
01:12:44 🎙️ The class primarily focuses on text analysis, with a separate speech class covering speech recognition and dialogue systems.
01:18:06 🗣️ Function words like "so" and "not" pose challenges due to occurring in diverse contexts, but advanced models consider structural information.
01:20:25 🧠 Word2Vec offers different algorithms within the framework; optimization details like negative sampling can significantly improve efficiency.
01:23:18 🔁 The process of constructing word vectors involves iterative updates using gradients, moving towards minimizing the loss function.
Made with HARPA AI
@gefallenesobst6855 8 місяців тому ⁺⁴
I am so grateful that Stanford has given us all this great gift. Thanks to their great machine learning and AI video series, I am able to build a solid foundation of knowledge and have started my PhD based on that.
@itaylavi2556 2 роки тому ⁺²⁰⁴
So many full courses in great quality, great lecturers AND with normal subtitles... Can someone PLEASE give Stanford University some kind of international prize for knowledge sharing?
@airbup 10 місяців тому ⁺²
99% of courses are not online and cost money. I would like them to add more.
@omarespino964 Місяць тому ⁺¹
Yes! TRUE indeed. Thank you Stanford. ❤❤❤❤❤
@aitzazayub5112 Місяць тому
do you find this course easy to understand?
Do u have tech background or you just a newbie in Tech?
@teogiannilias655 Рік тому ⁺⁷²
Thanks for everything Stanford University. As an AI master's student I have to state that having these lectures for free enables me to compare and broaden my ideas for NLP, resulting in deeper intuitive understanding of the subject.
@stanfordonline Рік тому ⁺²⁴
Hi Teo, thanks very much for your comment and feedback! Happy to hear these lectures were so helpful to your studies.
@onemin_stories 12 днів тому
Prof. Manning looks so happy explaining all the questions. That is so encouraging and heartwarming!
@tusharrohilla7154 2 роки тому ⁺¹⁷⁶
Amazing lecture it was, thanks to Stanford to make these lectures public.
@magdalenastratmann1248 Рік тому
o
.
.
.p
pünktlich
üüm
@magdalenastratmann1248 Рік тому ⁺¹
00p
Buchhandluääähäjdzk6d.z
m..
Mit freundlichen Grüßen aus
h
ä
9
0ö
Popupsö9äm
üüä9
@magdalenastratmann1248 Рік тому
z
@magdalenastratmann1248 Рік тому
An
@magdalenastratmann1248 Рік тому
üü9
ääh
@Cactus4539 11 днів тому
Absolutely loved what the professor said at 43:47.
@ansekao4516 Рік тому ⁺¹⁷
What a great lecturer, he feels students, puts himself in our place and explains material very nicely. This is literally my first piece of material about NLP I have ever seen, and I understood most of it. Thanks a lot
@stanfordonline Рік тому ⁺⁴
Awesome feedback, thanks for your comment!
@vanongle9648 Рік тому ⁺⁴
hello Stanford online i started to self-study machine learning my university program does not teach in depth about AI , i feel i have not reached my full potential and i have taught myself about AI for 6 months recently .And i have learned , learned all areas in AI, machine learning, deep learning or reinforcement learning, thank you for this free lecture, i really appreciate it.
@hewas321 11 місяців тому ⁺³
Oh my days I love his positive vibes! Also clear explanation of multiple topics. I really appreciate you providing us with such great lectures online for free!
@commonsense1019 Рік тому ⁺⁷
I got exhausted yet your enthusiasm is what made me stay here amazing session
@MenTaLLyMenTaL 2 роки тому ⁺⁹
Math is not magic, but is as beautiful as magic.
@wenqianzhao2648 2 роки тому ⁺³⁴
Moved from Coursera NLP Specialization to here. Definately amazing to receive such detailed math explanations of all these concepts
@CarlosEduardo-hp5wg 11 місяців тому ⁺²
Here is better?
@sudhanvasavyasachi2525 4 місяці тому
which should i do first, the specilization or cs-224n
@kunalnarang1912 2 роки тому ⁺⁸
The result at 55:45 is just beautiful!
@progamer1196 Рік тому ⁺⁶
Really liked the energy and simplicity of the presentation !
@SrivathsanM-w7v 2 місяці тому ⁺¹
For some reason I am reminded of Grant from 3Blue1Brown. The way he speaks and the way he's excited about the subject, it's so intoxicating.
@dazhijiang-fx9he Рік тому ⁺³
Can't expect more from a lesson! Thank you all for sharing the class towards all the people🤩
@niyousha6868 2 роки тому ⁺⁴
It is great to watch this and don't have to do the homework.
@CarlosEduardo-hp5wg 11 місяців тому
Hahahaha
@FifaBayern710 9 місяців тому ⁺¹
It's not entirely clear to me why we change the index exept for separting the sums at the end? Anyone knows more? Thanks!!
@johantchassem1553 2 місяці тому ⁺¹
At 55:28 how did we leave from the first line to the second please
@ThaoPham-pe5vj 3 місяці тому
Thank you Stanford and Professor for the excellent lecture!
@ayoubrayanemesbah8845 Рік тому
at 32:46 it's like computing the entropy , but way if any one knows please feel free to comment
@jpgunman0708 Рік тому ⁺¹
51:39 how to get it? I don't understand.
@AmitGupta2526 Рік тому ⁺¹
At 51:45, when he says "we need to change the index to x from w, else we'll get into trouble" while taking the inner derivative of exponential term.
How can he change the index when the denominator term coming will exactly same as the derivative of exponential term and they should cancel each other.
Changing index changed the fundamental definition of P(o|c).
Is there something I am missing here.
@VEDANSHSHARMA-o6k 2 місяці тому
What we are expecting is a sigma, i.e. a sum over a range. Now we have to look for a way of expressing that sigma, if we choose the index w, then we'll confuse it with the other sigma notation, even though they are completelt different. Hence we will use a different index. And the different index doesnt change the original definition of p(o/c) because its just an index, a way of expressing sigma.
@goanshubansal8035 Рік тому
hopefully I will be proud after it's completion
@nanunsaram 2 роки тому
00:56:55 Gensim word vectors example
01:05:16 Student Q&A
@The-Daily-AI 2 роки тому ⁺²
1:10:50 Why would you average both vectors together, wouldn't it be useful to keep both of the vectors depending on the different tasks that need to be done?
@3018RAHULSILONIYA Рік тому ⁺¹
This might be silly but, at after 55:00 when we take ux out of the derivative, why do we lose the transpose operator?
@Ad-qv7ij Рік тому
Same doubt, did you figure out by any chance?
@rahul_siloniya 11 місяців тому
@@Ad-qv7ijI guess there is a little error there. If you try to derive it on your own you will reach the right expression.
@NelluriPavithra 2 місяці тому ⁺¹
Thankyou so much for providing this lectures.
@zainabtareen9583 2 роки тому ⁺²
Isn't wt the center word instead of wj on slide 23 (30:52)?
@zutubee 2 роки тому ⁺¹
wt is the center word
@black-sci 8 місяців тому
yes Wt is the center word. j is changing from -m to m.
@edphi Рік тому ⁺¹
The best video on nlp
@harshitsingh3061 Рік тому
Never seen a beautiful lecture before!.
@osvaldonavarro3292 Рік тому
How are the initial probabilities of the context word vectors calculated? They are mentioned at 55:29 but not how they are determined.
@isbestlizard Рік тому
Wow this vector idea is interesting. Have we tried getting models to emit nonsense text that nonetheless has a similar vectors to real words and seeing if human brains sort of subconsciously get that same meaning? Computers could be really good at writing poetry o.o
@isbestlizard Рік тому
Like onomatopea and Lewis Carrol dialed up to 11
@aman6089 Рік тому
Calculus noob question: But why don't the two [for w from 1 to V Sum over u_x^T*v_c cancel out at
55:10
@adeolajoseph7276 Рік тому
Great content, excellent delivery.
@unknownhero6187 2 роки тому ⁺¹
When we take chain derivative, why do we lose transpose operation? For example, on 53:06 there is just u_x, not the u_x^T, why?
@kelvinwu9844 2 роки тому
We can treat that as a gradient. The dot product can be viewed as a multivariable function with input (v_c1, ... , v_cd), and therefore we can calculate the gradient of it w.r.t. each of the components of v_c. Since gradient is the direction that v_c should go in order to increase the value of the dot product, this gradient vector can be added to v_c, so they should have the same shape :)
@RahulMadhavan 2 роки тому ⁺³
Do areas of sparsity in the high dimensional word2vec space mean anything: For example, can you say - some word should exist here, but doesn't?
@lopyus 2 роки тому ⁺²
I wonder if words which don't have an equivalent in other languages fit here
@djl-1n 4 місяці тому
The version of the SciPy library seems to be too new for the assignment to work properly. I can't import the triu. if someone knows it ,please commont
@robertxu18 2 роки тому
Why is the change in variable at 51:38 necessary? Does it not represent the same quantity whether we use uw or ux?
@zzq-w1w 3 місяці тому
I want to know whether it provide homeworks' answer
@bilalsedef9545 2 роки тому ⁺³
It was a great lesson. Hope the sound quality will be better in the future.
@nanunsaram Рік тому ⁺¹
Great again!
@kumarutkarsh1248 28 днів тому
are these lecture slides available to us??
@isaacfernandez2243 Рік тому
is there any way to get access to the notebooks shown through out the course? Thanks!
@seeker4430 8 місяців тому
You could explained the probability portion of the vector a little more sir... The differentiation of the vectors is quite straight forward
@haticeozbolat0371 Рік тому
41:10 Gradient I don't understand. How to get it? Can anyone reading the comments give me advice?🤗🤗🤗
@izumiasmr Рік тому
check up your knowledge of single variable calculus (derivative, differentiating, interpretation of a derivative & applications of derivative) and then just basics of multiple variable calculus (functions of several variables, partial derivatives), mit 18.01sc 18.02sc could be good (and free) resources for picking it up. that is if you want to get an understanding of the math under the hood, I'd say that in parallel you definitely could practice with the higher-level applications of it just like this course.
@haticeozbolat0371 Рік тому
Thankss@@izumiasmr
@anikettiwari6885 Рік тому ⁺²
This is so amazing. Thank you so much for the wonderful explanation
@goanshubansal8035 Рік тому
Which are his personal sentences ?
@goanshubansal8035 Рік тому
in every sub topic they share their learning experience...
@jiadavid Рік тому
This is amazing, thank you for uploading this online
@iwantpeace6535 Рік тому
Hi sir , Is it possible to use Neural networks to learn new dialects and translate new words that belong to unknown new dialects for various languages..?
@clutchnoobs4506 2 роки тому
I had a question about "observed - expected" around @55:48. Maybe I misunderstand but isn't the summation of p(x|c)*ux our prediction therefore making it our observed?
@ItsJayCross319 2 роки тому ⁺²
Yes, it is our prediction, but because that's our prediction, that would be the expected. The word vector we obtain from uo (our actual observed word vector) would be our observed, then we subtract the sum of p(x|c)ux from it to obtain margin of error. In a perfect case, they would subtract to 0, which he explains at 55:44.
@yagneshm.bhadiyadra4359 8 місяців тому
Good content, but explanation-wise they are missing intutions at some points, especially when formula for word vectors are getting dervied.
@Galois189 Рік тому
I am wondering how the two vectors (Uw, Vw) are determined for each Word?
@raphaelkalandadze9691 Рік тому ⁺¹
First of all, thank you so much for this amazing course. I have learned a lot from your lectures. Can I ask when this course will be updated?
@stanfordonline Рік тому ⁺³
Hi Raphael, thanks for your feedback and question! Our team is looking into adding new lectures for this course in the future :)
@raphaelkalandadze9691 Рік тому ⁺¹
@@stanfordonline Sounds like it won't be soon :)
@tallfred500 2 роки тому
On slide 23, the Likelihood is missing a root-T of the double product.
@goanshubansal8035 Рік тому
how does Christopher d manning papa think?
@zaberraiyan2570 2 роки тому
Great Lecture, will finish the entire series
@miguelpinheiro8291 Рік тому
I really liked this guy
@goanshubansal8035 Рік тому
I have to learn to listen to the professors like editors to your previous self
@namansinghal3685 10 місяців тому ⁺²
Reminds me of Sheldon for some reason
@zutubee 2 роки тому
Objective function seeks to maximise the probable likelihood of context word given center word.
However should it also not try to minimise the probability of incorrect context words given center word?
@zutubee 2 роки тому
I got the answer, the way probabilities have been calculated ensures this happened in the denominator
@aamnakhan2784 Рік тому
I dont get what theta(parameters) here?
@박성현학생항공우주공 Рік тому
How can I get solutions for the assignments of this course? I'm looking for solutions for Winter 2021 ver.
@djl-1n 4 місяці тому
github
@zeinebromthana7336 5 місяців тому
very fruitful!!
@MathewsonKindermann 8 місяців тому
how can i get the ppt ?
@jianrongjiao 5 місяців тому
哪里有中文版本的呢？中文字幕也可以
@djl-1n 4 місяці тому
bilibili
@MrSuperrdad 11 місяців тому
I loved it!!!
@sayedmohammedsalimkp5236 Рік тому
link of textbook?
@TolemyKashyap 11 місяців тому
I have little to no knowledge about machine learning... Can I still start this course? Is it beginner friendly?
@stanfordonline 11 місяців тому
Hi there, great question! If you are just beginning to learn about machine learning we recommend starting with this course: www.coursera.org/specializations/machine-learning-introduction
@fayzankowshik3625 Рік тому
It seems this course is theory based, where can I learn to code these concepts and algorithms?
@ahmedtryaq7853 Рік тому
coursera
@georgeb8637 2 роки тому
27:51 - word2vec
@mujumdarshaunakhrishikeshc1076 2 роки тому ⁺¹
Sir absolutely loved your explanation. Thank you very much
@jinli1835 2 роки тому
Is this course suitable for beginners?
@icer9591 Рік тому
So NICE!
@aimatters5600 2 роки тому ⁺¹
everytime he says something important video stops. great
@jakanader 2 роки тому
watching on 1.5 speed smooths out the stuttering and is still understandable for the most part
@ohakimedward2852 2 роки тому ⁺¹
Great lecture.
@haroldkumarnaik9971 2 роки тому
34:49 what is uo and vc
@bpmsilva 2 роки тому ⁺³
I think vc is the vector representation of the center word and uo is the vector representation of a context word
@borgo1633 Рік тому ⁺¹
thx for sharing
@binb3463 2 роки тому ⁺¹
It is an amazing lecture
@Jaya_sovramenta 2 роки тому
Thank you, great lecture!
@piewpok3127 Рік тому
lezz go !!
@purushothamparthy8341 Місяць тому
56:55
@TopicalAuthority 2 роки тому
Thank you.
@Akina__Chen-y6p Рік тому
Great
@AmbitionAlign 11 місяців тому
"um"
@annawilson3824 11 місяців тому
35:31
@김태형-b2m6g 2 роки тому
13:55
@yagneshbhadiyadra7938 8 місяців тому ⁺¹
Great knowledge it seems, but give this to an Indian youtuber, and he will make a 3 video series out of a single lecture that is easier to understand. #opinion
@dtace5339 7 місяців тому ⁺¹
LOL TRUE
@piewpok3127 Рік тому
Day 1 .
@muhammadhelmy5575 2 роки тому
14:10
@goanshubansal8035 Рік тому
What am going to get out of this very video, lets see
@goanshubansal8035 Рік тому
what is stopping you here?
@goanshubansal8035 Рік тому
what is the qualification of this professor ?
@goanshubansal8035 Рік тому
Christopher d manning papa
@RobertDoobay Рік тому ⁺³
w0o0ord