Awesome video!! I've just arrived here after reading the GloVe paper and your explanation is utterly perfect. I'll sure come back to your channel whenever I find some doubts about Machine Learning or NLP. God job!
Very Nicely Explained Buddy .... I was going through many articles but was not able to understand the Math behind it. Your video certainly helped. Keep up the Good Work.
I was reading the paper and somewhat struggling on what certain parts of the derivation were or why we needed them but this video is great. Thanks so much
Good video, thanks for your efforts. I wish it had less explanation on the cost function of the GloVe model and elaborate testing of word similarity using GloVe model.
thanks for this well explained video. I have one question, please can you explain why do you take only the numerator portion F(w_i.w_k) and ignoring the denominator?
Good point. So, here's how they tackled the problem. They defined the weighing function f like this: f(X_ij) = (X_ij/X_max)^alpha [if X_ij < X_max] 1 [otherwise] So you see when X_ij = 0, f(X_ij) is 0. That means the whole cost term becomes 0. We don't even need to compute ln(X_ij) in this case. They addressed two problems with f. 1) not giving too much importance to the word pairs that cooccur frequently. 2) avoiding ln(0) I hope this makes sense. Please tell me if anything is not clear.
@Normalized Nerd This is true only assuming that zero times infinity is zero! Just kidding, I just want to point out that programming zero times infinity gives (rightly) an error (on numpy), so I have to write this as an if condition. Everything else is clear, thank you very much for your great work and for your answer!
Good explanation. Got too technical for me after the middle, but then the code and the graph clarified things. Just one thing: you keep calling the pipe | symbol as 'slash', "j slash i", "k slash ice" etc, which isn't accurate (I think you would know it if you have studied all this). It's better to use 'given', "j given i" as it's actually said, or just say 'pipe' after explaining the first time that this is what the symbol is called. 'slash' is used to mean division, and also to mean 'one or the other', neither of which is applicable here, and the symbol isn't slash anyway. This can cause confusion for some viewers.
The ratio is better at distinguishing relevant words from irrelevant words than the probabilities. And it also discriminates between relevant words. If we didn't take the ratio and work with raw probabilities then the numbers would be too small.
Nice Work ! Just subscribed (y). :) Just a quick question out of curiosity "GloVe" and "Poincare GloVe" are same model ? All the best for your channel.
Thank you, man! No, they are different. Poincare GloVe is a more advanced approach. In normal GloVe, the words are embedded in Euclidean space. But in Poincare GloVe, the words are embedded in hyperbolic space! Although the latter one uses the basic concepts of the original GloVe.
@@NormalizedNerd Its total worth subscribing your channel. Looking forward for new videos from you on DS. Btw, i am also from West Bengal currently in Germany ;)
I fail to see where the vectors come from... :-( I follow all the explanation without any problem, but... once you define J, where are the vectors coming from? Is there any neural network involved? Same problem when reading the article or any other explanations. They all try to explain where that J function comes, and then, magically, we have vectors we can compare to each other :-( Any help on that would be greatly appreciated. Thanks!
The authors introduced the word vectors very subtly. Here's the deal: 9:50, we assume that there exists a function F which takes the word vectors and produces a scalar quantity! And no, we don't have neural networks here. Everything is based on the concurrence matrix.
@@NormalizedNerd Thanks for your answer. I found a publication that explains very well what to do after "discovering" that function: thesis.eur.nl/pub/47697/Verstegen.pdf I was somehow sure that GloVe was based in neural networks (as does word2vec), but it is not the case. However, it is a bit as a neural network since the way the vectors are created is similar to the way the weights of a NN are trained: stochastic gradient descent.
The vectors are actually the parameters that one is optimizing over. Actually, the objective function J should have been written with the arguments being the vector representations of the words -- which are the optimization variable. For certain choices of the F function, e.g., softmax, the optimization becomes mathematically easy. And then, it is just a multivariable optimization problem, and a natural algorithm to solve will be gradient-descent (and more). Ref: ua-cam.com/video/ERibwqs9p38/v-deo.html [Stanford course on NLP]
@ 19:13. That is a weighting function beacuse log(X_ij) may become zero and the equ.. goes crazy. More details at towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010
Hello There, First of all thank you for adding such informative videos to help the beginners in DS field. I am trying to reproduce the code from Github for the "standford Glove Model" Link ---> github.com/stanfordnlp/GloVe The problem is if i execute all the statements as mentioned in the "Readme" i get the respective files which it should provide me "cooccur.bin" & "vocab.txt". The latter does have the list of words with frequency but the former is empty and there is no such error reported in the console even. For me its very weird and i dont understand what i am doing wrong. Could you please help me on this ? N.B : I am new in ML and still learning ! Best Regards.
"cooccurrence.bin" should contain the word vectors. Make sure that the training actually started. You should see logs like... vector size: 50 vocab size: 71290 x_max: 10.000000 alpha: 0.750000 05/08/20 - 06:02.16AM, iter: 001, cost: 0.071222 05/08/20 - 06:02.45AM, iter: 002, cost: 0.052683 05/08/20 - 06:03.14AM, iter: 003, cost: 0.046717 ... I'd suggest you to try this on google colab once.
@Sakib Ahmed repl is probably not a good idea for DL stuffs. Try to use colab/kaggle. You can directly clone the github repo in colab. I've created a colab notebook. Run this by yourself. It works perfectly! colab.research.google.com/drive/1BA-GRHQOsXrYwmkalQyejsnVE8zmoyH2?usp=sharing
Not the best English. But the model doesn't care as it will learn whatever you (or the dataset) teach it. The author's English doesn't impact the explanation of the model's workings.
Just wanna say that your explanations are awesome. Really helped me understand NLP better than reading a book.
Thanks!! :D
man your video is great ! best explanation on the whole internet !
Awesome video!! I've just arrived here after reading the GloVe paper and your explanation is utterly perfect. I'll sure come back to your channel whenever I find some doubts about Machine Learning or NLP. God job!
Fantastic summary of the paper. I just read it and I am pleasantly surprised at how much of the paper's math you covered in detail in this vdeo! Great
Perfect! Thanks, there are not much useful videos on UA-cam.
Very Nicely Explained Buddy .... I was going through many articles but was not able to understand the Math behind it. Your video certainly helped. Keep up the Good Work.
Happy to help man!
This is the best explanation I have seen for Glove thank you a million time
❤️❤️
I was reading the paper and somewhat struggling on what certain parts of the derivation were or why we needed them but this video is great. Thanks so much
Thanks for the explanation. Feels like you explained better than the paper itself.
Thanks a lot man!!
Guy, thank you very much, it was fucking masterpiece, that did my 22 minutes on railway station really profitable :)
Very good explanation. Thank you :)
Thanks a lot!
based on what algorithm or model the glove model is trained using cost function? Linear Regression?
this is excellent but I hope u had mentioned the training steps of that also. what and in what shape are exactly the input and output tensor.
beautifully explained, thank you!
Happy to hear. Keep supporting :D
Very well explained. Keep it up! Thank you.
Thank you more videos are coming :)
@@NormalizedNerd looking forward to......
This was great
Bruh you explained well
Thanks man!!
Good video, thanks for your efforts. I wish it had less explanation on the cost function of the GloVe model and elaborate testing of word similarity using GloVe model.
You can copy the code and test it more ;)
Fantastic video
Thanks!
10:48 no we don't have vector on one side of the equation , we have scaler values on the both the sides, basic math
Great explanation thanks a lot my friend :)
Glad that it helped :D...keep supporting!
Good introduction!
Glad it was helpful!
Is embedding for a word is fixed in Glove or it is generated every time depending on the dataset given for training the model
can you do a video on the Bert word embedding model??? it is also important
I still don't quite understand the part where ln(X_i) was absorbed by biases, please enlighten me.
thanks for this well explained video. I have one question, please can you explain why do you take only the numerator portion F(w_i.w_k) and ignoring the denominator?
You can take the denominator instead! We need just one of them.
Often Xij is zero, and in this cases ln(Xij) is infinity. How do you treat this issue?
Good point. So, here's how they tackled the problem.
They defined the weighing function f like this:
f(X_ij) =
(X_ij/X_max)^alpha [if X_ij < X_max]
1 [otherwise]
So you see when X_ij = 0, f(X_ij) is 0. That means the whole cost term becomes 0. We don't even need to compute ln(X_ij) in this case.
They addressed two problems with f.
1) not giving too much importance to the word pairs that cooccur frequently.
2) avoiding ln(0)
I hope this makes sense. Please tell me if anything is not clear.
@Normalized Nerd This is true only assuming that zero times infinity is zero! Just kidding, I just want to point out that programming zero times infinity gives (rightly) an error (on numpy), so I have to write this as an if condition.
Everything else is clear, thank you very much for your great work and for your answer!
@@NormalizedNerd is X_max an hyper parameter?
Good explanation. Got too technical for me after the middle, but then the code and the graph clarified things. Just one thing: you keep calling the pipe | symbol as 'slash', "j slash i", "k slash ice" etc, which isn't accurate (I think you would know it if you have studied all this). It's better to use 'given', "j given i" as it's actually said, or just say 'pipe' after explaining the first time that this is what the symbol is called. 'slash' is used to mean division, and also to mean 'one or the other', neither of which is applicable here, and the symbol isn't slash anyway. This can cause confusion for some viewers.
Yes, pipe would be a better choice.
It's Bayes. Anyone exposed to stats understands w/o the verbiage.
You should read that as “probability of i GIVEN j”. The pipe symbol is read as ‘given’.
5:50
2+1+1=3?
he meant 4
Wonderful explanation! Just a question. Why do we calculate the ratio p(k|ice)/p(k|steam)?
The ratio is better at distinguishing relevant words from irrelevant words than the probabilities. And it also discriminates between relevant words. If we didn't take the ratio and work with raw probabilities then the numbers would be too small.
thank you so much, but is X_{love} equal to 4 not 3
@TRIỀU NGUYỄN HẢI
Thanks for pointing this out. Yes X_{love} = 4.
Your examples are not related : I love NLP... and P(k/ice) etc
It will be useful to have the same sentences ...
Nice explanation.. .. which is better Glove or Word2vec?
That depends on the dataset. I recommend trying both.
well i think by corpus you mean document , but lemme tell you corpus has repeated words as well , to form corpus you just join all the documents
Can you please make videos on ELMo, fasttext, and BERT also? It'll be helpful.
I'll try in the future :)
Nice Work ! Just subscribed (y). :) Just a quick question out of curiosity "GloVe" and "Poincare GloVe" are same model ?
All the best for your channel.
Thank you, man!
No, they are different. Poincare GloVe is a more advanced approach. In normal GloVe, the words are embedded in Euclidean space. But in Poincare GloVe, the words are embedded in hyperbolic space! Although the latter one uses the basic concepts of the original GloVe.
@@NormalizedNerd Its total worth subscribing your channel. Looking forward for new videos from you on DS.
Btw, i am also from West Bengal currently in Germany ;)
@@edwardrouth Oh great! Nice to meet you. More interesting videos are coming ❤️
hello thank you for your explanation can you please link me the google collab link asap?
quackuarance metrics? I don't understand what that is
I fail to see where the vectors come from... :-( I follow all the explanation without any problem, but... once you define J, where are the vectors coming from? Is there any neural network involved? Same problem when reading the article or any other explanations. They all try to explain where that J function comes, and then, magically, we have vectors we can compare to each other :-(
Any help on that would be greatly appreciated. Thanks!
The authors introduced the word vectors very subtly.
Here's the deal: 9:50, we assume that there exists a function F which takes the word vectors and produces a scalar quantity!
And no, we don't have neural networks here. Everything is based on the concurrence matrix.
@@NormalizedNerd Thanks for your answer. I found a publication that explains very well what to do after "discovering" that function: thesis.eur.nl/pub/47697/Verstegen.pdf
I was somehow sure that GloVe was based in neural networks (as does word2vec), but it is not the case. However, it is a bit as a neural network since the way the vectors are created is similar to the way the weights of a NN are trained: stochastic gradient descent.
The vectors are actually the parameters that one is optimizing over. Actually, the objective function J should have been written with the arguments being the vector representations of the words -- which are the optimization variable. For certain choices of the F function, e.g., softmax, the optimization becomes mathematically easy. And then, it is just a multivariable optimization problem, and a natural algorithm to solve will be gradient-descent (and more).
Ref: ua-cam.com/video/ERibwqs9p38/v-deo.html [Stanford course on NLP]
Nice explanation 👍, one quick question on your video, which software and hardware are you using for digital board?
Thank you. I use Microsoft OneNote and a basic pen tablet. Keep supporting!
p(Love , I ) = 2/3 ?
@ 19:13. That is a weighting function beacuse log(X_ij) may become zero and the equ.. goes crazy. More details at
towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010
The article says f(X_ij) prevents log(X_ij) from being NaN which is not true.
f(X_ij) actually puts an upper limit on co-occurrence frequencies.
where did e came from?
e^x follows our condition.
e^(a-b) = e^a/e^b
i laughed when you said 2+1+1=3 xD
LOL XD
i was looking for the comment ^^
same here lol
5:50 ..?
good explanation but plz use a bigger cursor, a lot of youtubers miss this.
thanks for the suggestion :D
G-Love 😂
Haha...Exactly what I thought when I learned the word for the first time!
Hello There,
First of all thank you for adding such informative videos to help the beginners in DS field. I am trying to reproduce the code from Github for the "standford Glove Model" Link ---> github.com/stanfordnlp/GloVe
The problem is if i execute all the statements as mentioned in the "Readme" i get the respective files which it should provide me "cooccur.bin" & "vocab.txt". The latter does have the list of words with frequency but the former is empty and there is no such error reported in the console even. For me its very weird and i dont understand what i am doing wrong. Could you please help me on this ?
N.B : I am new in ML and still learning !
Best Regards.
"cooccurrence.bin" should contain the word vectors. Make sure that the training actually started. You should see logs like...
vector size: 50
vocab size: 71290
x_max: 10.000000
alpha: 0.750000
05/08/20 - 06:02.16AM, iter: 001, cost: 0.071222
05/08/20 - 06:02.45AM, iter: 002, cost: 0.052683
05/08/20 - 06:03.14AM, iter: 003, cost: 0.046717
...
I'd suggest you to try this on google colab once.
@@NormalizedNerd Hi, Thank you for your response.
I never tried colab before. But what i noticed in colab is that i have to upload notebook files which i cant see in the glove project that i cloned. However I am using an online editor "repl.it". First i ran "make" command which created the "build" folder & subsequently "./demo.sh". Running this script creates a "cooccurence.bin" file but as i mentioned earlier its empty. Did i missed something here ? I am sure i missing something very small and important 😒 Below are the logs from the terminal..
make
mkdir -p build
gcc -c src/vocab_count.c -o build/vocab_count.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
gcc -c src/cooccur.c -o build/cooccur.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
src/cooccur.c: In function ‘merge_files’:
src/cooccur.c:180:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&new, sizeof(CREC), 1, fid[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/cooccur.c:190:5: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&new, sizeof(CREC), 1, fid[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/cooccur.c:203:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&new, sizeof(CREC), 1, fid[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -c src/shuffle.c -o build/shuffle.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
src/shuffle.c: In function ‘shuffle_merge’:
src/shuffle.c:96:17: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&array[i], sizeof(CREC), 1, fid[j]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/shuffle.c: In function ‘shuffle_by_chunks’:
src/shuffle.c:161:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&array[i], sizeof(CREC), 1, fin);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -c src/glove.c -o build/glove.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
src/glove.c: In function ‘load_init_file’:
src/glove.c:86:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&array[a], sizeof(real), 1, fin);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/glove.c: In function ‘glove_thread’:
src/glove.c:182:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
fread(&cr, sizeof(CREC), 1, fin);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -c src/common.c -o build/common.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
gcc build/vocab_count.o build/common.o -o build/vocab_count -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
gcc build/cooccur.o build/common.o -o build/cooccur -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
gcc build/shuffle.o build/common.o -o build/shuffle -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
gcc build/glove.o build/common.o -o build/glove -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
./demo.sh
mkdir -p build
--2020-05-08 17:04:13-- mattmahoney.net/dc/text8.zip
Resolving mattmahoney.net (mattmahoney.net)... 67.195.197.75
Connecting to mattmahoney.net (mattmahoney.net)|67.195.197.75|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31344016 (30M) [application/zip]
Saving to: ‘text8.zip’
text8.zip 100%[======>] 29.89M 1.97MB/s in 15s
2020-05-08 17:04:29 (1.95 MB/s) - ‘text8.zip’ saved [31344016/31344016]
Archive: text8.zip
inflating: text8
$ build/vocab_count -min-count 5 -verbose 2 < text8 > vocab.txt
BUILDING VOCABULARY
Processed 17005207 tokens.
Counted 253854 unique words.
Truncating vocabulary at min count 5.
Using vocabulary of size 71290.
$ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < text8 > cooccurrence.bin
COUNTING COOCCURRENCES
window size: 15
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 71290 words.
Building lookup table...table contains 94990279 elements.
Processing token: 200000./demo.sh: line 43: 114 Killed $BUILDDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE
@Sakib Ahmed repl is probably not a good idea for DL stuffs. Try to use colab/kaggle. You can directly clone the github repo in colab. I've created a colab notebook. Run this by yourself. It works perfectly!
colab.research.google.com/drive/1BA-GRHQOsXrYwmkalQyejsnVE8zmoyH2?usp=sharing
@@NormalizedNerd Thank you so much ! It really worked... 😊 (y)
@@sakibahmed2373 Do share this channel with your friends :D Enjoy machine learning.
"I love to make videos"
sorry to say this, but is it correct english?
Not the best English. But the model doesn't care as it will learn whatever you (or the dataset) teach it. The author's English doesn't impact the explanation of the model's workings.
Reduce the number of ads. Ad like every min. Google has made UA-cam money sucking machine. So irritating.
.
Good video but the wrong pronunciation of GLoVe is killing me man
You mean the right ❤