Introduction to NLP | GloVe Model Explained

Поділитися
Вставка
  • Опубліковано 20 січ 2025

КОМЕНТАРІ • 99

  • @sashimidimsums
    @sashimidimsums 3 роки тому +8

    Just wanna say that your explanations are awesome. Really helped me understand NLP better than reading a book.

  • @alh7839
    @alh7839 2 роки тому +1

    man your video is great ! best explanation on the whole internet !

  • @javitxuaxtrimaxion4526
    @javitxuaxtrimaxion4526 3 роки тому +2

    Awesome video!! I've just arrived here after reading the GloVe paper and your explanation is utterly perfect. I'll sure come back to your channel whenever I find some doubts about Machine Learning or NLP. God job!

  • @addoul99
    @addoul99 2 роки тому +1

    Fantastic summary of the paper. I just read it and I am pleasantly surprised at how much of the paper's math you covered in detail in this vdeo! Great

  • @revolutionarydefeatism
    @revolutionarydefeatism 3 роки тому

    Perfect! Thanks, there are not much useful videos on UA-cam.

  • @sachinsarathe1143
    @sachinsarathe1143 3 роки тому +1

    Very Nicely Explained Buddy .... I was going through many articles but was not able to understand the Math behind it. Your video certainly helped. Keep up the Good Work.

  • @sasna8800
    @sasna8800 4 роки тому

    This is the best explanation I have seen for Glove thank you a million time

  • @riskygamiing
    @riskygamiing 3 роки тому +1

    I was reading the paper and somewhat struggling on what certain parts of the derivation were or why we needed them but this video is great. Thanks so much

  • @bhargavaiitb
    @bhargavaiitb 4 роки тому

    Thanks for the explanation. Feels like you explained better than the paper itself.

  • @vitalymegabyte
    @vitalymegabyte 3 роки тому

    Guy, thank you very much, it was fucking masterpiece, that did my 22 minutes on railway station really profitable :)

  • @arunimachakraborty1175
    @arunimachakraborty1175 3 роки тому

    Very good explanation. Thank you :)

  • @kavinvignesh2832
    @kavinvignesh2832 5 місяців тому

    based on what algorithm or model the glove model is trained using cost function? Linear Regression?

  • @popamaji
    @popamaji Рік тому

    this is excellent but I hope u had mentioned the training steps of that also. what and in what shape are exactly the input and output tensor.

  • @rumaisalateef784
    @rumaisalateef784 4 роки тому +1

    beautifully explained, thank you!

  • @ijeffking
    @ijeffking 5 років тому

    Very well explained. Keep it up! Thank you.

    • @NormalizedNerd
      @NormalizedNerd  5 років тому

      Thank you more videos are coming :)

    • @ijeffking
      @ijeffking 5 років тому

      @@NormalizedNerd looking forward to......

  • @ToniSkit
    @ToniSkit 7 місяців тому

    This was great

  • @CodeAshing
    @CodeAshing 4 роки тому +1

    Bruh you explained well

  • @sarsoura716
    @sarsoura716 3 роки тому +1

    Good video, thanks for your efforts. I wish it had less explanation on the cost function of the GloVe model and elaborate testing of word similarity using GloVe model.

    • @NormalizedNerd
      @NormalizedNerd  3 роки тому +1

      You can copy the code and test it more ;)

  • @fezkhanna6900
    @fezkhanna6900 4 роки тому

    Fantastic video

  • @kindaeasy9797
    @kindaeasy9797 5 місяців тому

    10:48 no we don't have vector on one side of the equation , we have scaler values on the both the sides, basic math

  • @TheR4Z0R996
    @TheR4Z0R996 4 роки тому

    Great explanation thanks a lot my friend :)

  • @psic-protosysintegratedcyb2422
    @psic-protosysintegratedcyb2422 4 роки тому

    Good introduction!

  • @parukadli
    @parukadli Рік тому

    Is embedding for a word is fixed in Glove or it is generated every time depending on the dataset given for training the model

  • @sujeevan9047
    @sujeevan9047 3 роки тому

    can you do a video on the Bert word embedding model??? it is also important

  • @momona4170
    @momona4170 3 роки тому

    I still don't quite understand the part where ln(X_i) was absorbed by biases, please enlighten me.

  • @khadidjatahri7428
    @khadidjatahri7428 3 роки тому +1

    thanks for this well explained video. I have one question, please can you explain why do you take only the numerator portion F(w_i.w_k) and ignoring the denominator?

  • @robinshort6430
    @robinshort6430 2 роки тому

    Often Xij is zero, and in this cases ln(Xij) is infinity. How do you treat this issue?

    • @NormalizedNerd
      @NormalizedNerd  2 роки тому +1

      Good point. So, here's how they tackled the problem.
      They defined the weighing function f like this:
      f(X_ij) =
      (X_ij/X_max)^alpha [if X_ij < X_max]
      1 [otherwise]
      So you see when X_ij = 0, f(X_ij) is 0. That means the whole cost term becomes 0. We don't even need to compute ln(X_ij) in this case.
      They addressed two problems with f.
      1) not giving too much importance to the word pairs that cooccur frequently.
      2) avoiding ln(0)
      I hope this makes sense. Please tell me if anything is not clear.

    • @robinshort6430
      @robinshort6430 2 роки тому

      @Normalized Nerd This is true only assuming that zero times infinity is zero! Just kidding, I just want to point out that programming zero times infinity gives (rightly) an error (on numpy), so I have to write this as an if condition.
      Everything else is clear, thank you very much for your great work and for your answer!

    • @robinshort6430
      @robinshort6430 2 роки тому

      @@NormalizedNerd is X_max an hyper parameter?

  • @83vbond
    @83vbond 4 роки тому +3

    Good explanation. Got too technical for me after the middle, but then the code and the graph clarified things. Just one thing: you keep calling the pipe | symbol as 'slash', "j slash i", "k slash ice" etc, which isn't accurate (I think you would know it if you have studied all this). It's better to use 'given', "j given i" as it's actually said, or just say 'pipe' after explaining the first time that this is what the symbol is called. 'slash' is used to mean division, and also to mean 'one or the other', neither of which is applicable here, and the symbol isn't slash anyway. This can cause confusion for some viewers.

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      Yes, pipe would be a better choice.

    • @jibbygonewrong2458
      @jibbygonewrong2458 3 роки тому

      It's Bayes. Anyone exposed to stats understands w/o the verbiage.

    • @TNTsundar
      @TNTsundar 3 роки тому +1

      You should read that as “probability of i GIVEN j”. The pipe symbol is read as ‘given’.

  • @dodoblasters
    @dodoblasters 3 роки тому +1

    5:50
    2+1+1=3?

  • @bhrzali
    @bhrzali 3 роки тому

    Wonderful explanation! Just a question. Why do we calculate the ratio p(k|ice)/p(k|steam)?

    • @NormalizedNerd
      @NormalizedNerd  3 роки тому +1

      The ratio is better at distinguishing relevant words from irrelevant words than the probabilities. And it also discriminates between relevant words. If we didn't take the ratio and work with raw probabilities then the numbers would be too small.

  • @trieunguyenhai49
    @trieunguyenhai49 4 роки тому +1

    thank you so much, but is X_{love} equal to 4 not 3

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      @TRIỀU NGUYỄN HẢI
      Thanks for pointing this out. Yes X_{love} = 4.

  • @WahranRai
    @WahranRai 2 роки тому +1

    Your examples are not related : I love NLP... and P(k/ice) etc
    It will be useful to have the same sentences ...

  • @parukadli
    @parukadli 4 роки тому

    Nice explanation.. .. which is better Glove or Word2vec?

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      That depends on the dataset. I recommend trying both.

  • @kindaeasy9797
    @kindaeasy9797 5 місяців тому

    well i think by corpus you mean document , but lemme tell you corpus has repeated words as well , to form corpus you just join all the documents

  • @md.tarekhasan2206
    @md.tarekhasan2206 3 роки тому

    Can you please make videos on ELMo, fasttext, and BERT also? It'll be helpful.

  • @edwardrouth
    @edwardrouth 4 роки тому +1

    Nice Work ! Just subscribed (y). :) Just a quick question out of curiosity "GloVe" and "Poincare GloVe" are same model ?
    All the best for your channel.

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      Thank you, man!
      No, they are different. Poincare GloVe is a more advanced approach. In normal GloVe, the words are embedded in Euclidean space. But in Poincare GloVe, the words are embedded in hyperbolic space! Although the latter one uses the basic concepts of the original GloVe.

    • @edwardrouth
      @edwardrouth 4 роки тому

      @@NormalizedNerd Its total worth subscribing your channel. Looking forward for new videos from you on DS.
      Btw, i am also from West Bengal currently in Germany ;)

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      @@edwardrouth Oh great! Nice to meet you. More interesting videos are coming ❤️

  • @Sarah-ik8tt
    @Sarah-ik8tt 3 роки тому

    hello thank you for your explanation can you please link me the google collab link asap?

  • @eljangoolak
    @eljangoolak 2 роки тому

    quackuarance metrics? I don't understand what that is

  • @Nextalia
    @Nextalia 3 роки тому

    I fail to see where the vectors come from... :-( I follow all the explanation without any problem, but... once you define J, where are the vectors coming from? Is there any neural network involved? Same problem when reading the article or any other explanations. They all try to explain where that J function comes, and then, magically, we have vectors we can compare to each other :-(
    Any help on that would be greatly appreciated. Thanks!

    • @NormalizedNerd
      @NormalizedNerd  3 роки тому +1

      The authors introduced the word vectors very subtly.
      Here's the deal: 9:50, we assume that there exists a function F which takes the word vectors and produces a scalar quantity!
      And no, we don't have neural networks here. Everything is based on the concurrence matrix.

    • @Nextalia
      @Nextalia 3 роки тому

      ​@@NormalizedNerd Thanks for your answer. I found a publication that explains very well what to do after "discovering" that function: thesis.eur.nl/pub/47697/Verstegen.pdf
      I was somehow sure that GloVe was based in neural networks (as does word2vec), but it is not the case. However, it is a bit as a neural network since the way the vectors are created is similar to the way the weights of a NN are trained: stochastic gradient descent.

    • @SwapravaNath
      @SwapravaNath 2 роки тому

      The vectors are actually the parameters that one is optimizing over. Actually, the objective function J should have been written with the arguments being the vector representations of the words -- which are the optimization variable. For certain choices of the F function, e.g., softmax, the optimization becomes mathematically easy. And then, it is just a multivariable optimization problem, and a natural algorithm to solve will be gradient-descent (and more).
      Ref: ua-cam.com/video/ERibwqs9p38/v-deo.html [Stanford course on NLP]

  • @maximuskumar502
    @maximuskumar502 4 роки тому

    Nice explanation 👍, one quick question on your video, which software and hardware are you using for digital board?

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      Thank you. I use Microsoft OneNote and a basic pen tablet. Keep supporting!

  • @ccuuttww
    @ccuuttww Рік тому

    p(Love , I ) = 2/3 ?

  • @SAINIVEDH
    @SAINIVEDH 4 роки тому

    @ 19:13. That is a weighting function beacuse log(X_ij) may become zero and the equ.. goes crazy. More details at
    towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      The article says f(X_ij) prevents log(X_ij) from being NaN which is not true.
      f(X_ij) actually puts an upper limit on co-occurrence frequencies.

  • @kekecoo5681
    @kekecoo5681 4 роки тому

    where did e came from?

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      e^x follows our condition.
      e^(a-b) = e^a/e^b

  • @longhoang5137
    @longhoang5137 3 роки тому +1

    i laughed when you said 2+1+1=3 xD

  • @u_luana.j
    @u_luana.j 3 роки тому

    5:50 ..?

  • @bikideka7880
    @bikideka7880 4 роки тому

    good explanation but plz use a bigger cursor, a lot of youtubers miss this.

  • @atomic7680
    @atomic7680 4 роки тому +2

    G-Love 😂

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому

      Haha...Exactly what I thought when I learned the word for the first time!

  • @sakibahmed2373
    @sakibahmed2373 4 роки тому

    Hello There,
    First of all thank you for adding such informative videos to help the beginners in DS field. I am trying to reproduce the code from Github for the "standford Glove Model" Link ---> github.com/stanfordnlp/GloVe
    The problem is if i execute all the statements as mentioned in the "Readme" i get the respective files which it should provide me "cooccur.bin" & "vocab.txt". The latter does have the list of words with frequency but the former is empty and there is no such error reported in the console even. For me its very weird and i dont understand what i am doing wrong. Could you please help me on this ?
    N.B : I am new in ML and still learning !
    Best Regards.

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому +1

      "cooccurrence.bin" should contain the word vectors. Make sure that the training actually started. You should see logs like...
      vector size: 50
      vocab size: 71290
      x_max: 10.000000
      alpha: 0.750000
      05/08/20 - 06:02.16AM, iter: 001, cost: 0.071222
      05/08/20 - 06:02.45AM, iter: 002, cost: 0.052683
      05/08/20 - 06:03.14AM, iter: 003, cost: 0.046717
      ...
      I'd suggest you to try this on google colab once.

    • @sakibahmed2373
      @sakibahmed2373 4 роки тому

      @@NormalizedNerd Hi, Thank you for your response.
      I never tried colab before. But what i noticed in colab is that i have to upload notebook files which i cant see in the glove project that i cloned. However I am using an online editor "repl.it". First i ran "make" command which created the "build" folder & subsequently "./demo.sh". Running this script creates a "cooccurence.bin" file but as i mentioned earlier its empty. Did i missed something here ? I am sure i missing something very small and important 😒 Below are the logs from the terminal..
       make
      mkdir -p build
      gcc -c src/vocab_count.c -o build/vocab_count.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc -c src/cooccur.c -o build/cooccur.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/cooccur.c: In function ‘merge_files’:
      src/cooccur.c:180:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/cooccur.c:190:5: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/cooccur.c:203:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/shuffle.c -o build/shuffle.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/shuffle.c: In function ‘shuffle_merge’:
      src/shuffle.c:96:17: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[i], sizeof(CREC), 1, fid[j]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/shuffle.c: In function ‘shuffle_by_chunks’:
      src/shuffle.c:161:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[i], sizeof(CREC), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/glove.c -o build/glove.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/glove.c: In function ‘load_init_file’:
      src/glove.c:86:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[a], sizeof(real), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/glove.c: In function ‘glove_thread’:
      src/glove.c:182:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&cr, sizeof(CREC), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/common.c -o build/common.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/vocab_count.o build/common.o -o build/vocab_count -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/cooccur.o build/common.o -o build/cooccur -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/shuffle.o build/common.o -o build/shuffle -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/glove.o build/common.o -o build/glove -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
       ./demo.sh
      mkdir -p build
      --2020-05-08 17:04:13-- mattmahoney.net/dc/text8.zip
      Resolving mattmahoney.net (mattmahoney.net)... 67.195.197.75
      Connecting to mattmahoney.net (mattmahoney.net)|67.195.197.75|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 31344016 (30M) [application/zip]
      Saving to: ‘text8.zip’
      text8.zip 100%[======>] 29.89M 1.97MB/s in 15s
      2020-05-08 17:04:29 (1.95 MB/s) - ‘text8.zip’ saved [31344016/31344016]
      Archive: text8.zip
      inflating: text8
      $ build/vocab_count -min-count 5 -verbose 2 < text8 > vocab.txt
      BUILDING VOCABULARY
      Processed 17005207 tokens.
      Counted 253854 unique words.
      Truncating vocabulary at min count 5.
      Using vocabulary of size 71290.
      $ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < text8 > cooccurrence.bin
      COUNTING COOCCURRENCES
      window size: 15
      context: symmetric
      max product: 13752509
      overflow length: 38028356
      Reading vocab from file "vocab.txt"...loaded 71290 words.
      Building lookup table...table contains 94990279 elements.
      Processing token: 200000./demo.sh: line 43: 114 Killed $BUILDDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому +1

      @Sakib Ahmed repl is probably not a good idea for DL stuffs. Try to use colab/kaggle. You can directly clone the github repo in colab. I've created a colab notebook. Run this by yourself. It works perfectly!
      colab.research.google.com/drive/1BA-GRHQOsXrYwmkalQyejsnVE8zmoyH2?usp=sharing

    • @sakibahmed2373
      @sakibahmed2373 4 роки тому

      ​@@NormalizedNerd Thank you so much ! It really worked... 😊 (y)

    • @NormalizedNerd
      @NormalizedNerd  4 роки тому +1

      @@sakibahmed2373 Do share this channel with your friends :D Enjoy machine learning.

  • @TheMurasaki1
    @TheMurasaki1 4 роки тому

    "I love to make videos"
    sorry to say this, but is it correct english?

    • @kaustavdatta4748
      @kaustavdatta4748 4 роки тому +3

      Not the best English. But the model doesn't care as it will learn whatever you (or the dataset) teach it. The author's English doesn't impact the explanation of the model's workings.

  • @harshitatiwari8019
    @harshitatiwari8019 4 роки тому

    Reduce the number of ads. Ad like every min. Google has made UA-cam money sucking machine. So irritating.

  • @BloggerMnpr
    @BloggerMnpr Рік тому

    .

  • @Schaelpy
    @Schaelpy Рік тому

    Good video but the wrong pronunciation of GLoVe is killing me man

    • @ToniSkit
      @ToniSkit 7 місяців тому

      You mean the right ❤