Ali Ghodsi, Lec [3,1]: Deep Learning, Word2vec

Поділитися
Вставка
  • Опубліковано 31 гру 2024

КОМЕНТАРІ • 35

  • @fmdj
    @fmdj 3 роки тому

    Very nice lecture, very clear, not too hasty, not too slow.

  • @arunredmi5338
    @arunredmi5338 8 років тому +6

    best explanation, and lot of patience. ...

  • @afshansalahuddin3020
    @afshansalahuddin3020 6 років тому

    For discussion at 14:00 - my 2 cents -
    In LSI the context is provided in the numbers through a term-document matrix. In the PCA your proposed context is provided in the numbers through a covariance matrix. PCA can be used for any high dimensional data. Its a more general class of analysis of finding a better feature space to represent your data. LSI on the other hand, is very specific to text corpora in analyzing which terms are more similar and what is the latent class of the words that are in a corpus.

  • @tennisdukie1
    @tennisdukie1 8 років тому +2

    Love this lecture! So clear and well explained

  • @niteshroyal30
    @niteshroyal30 8 років тому +1

    Thanks Professor for such wonderful lecture on word2vec

  • @Boom655
    @Boom655 3 роки тому

    Boss Professor!

  • @torseda
    @torseda 5 років тому

    Clear explanation. Thanks Professor.

  • @sachigiable
    @sachigiable 7 років тому

    Nice Video! Gives the basic understanding of word2vec. Optimization in next lecture.

  • @shashankupadhyay4163
    @shashankupadhyay4163 7 років тому

    Every second is worth it....awesome

  • @azadehesmaeili4402
    @azadehesmaeili4402 6 років тому

    It was great, especially the notation of softmax function.

  • @TheLizy0820
    @TheLizy0820 8 років тому +2

    I feel there is a mistake on the slides showed at 8:35. Diagonal value of \Sigma should be the square roots of eigenvalues of XX^T or (X^T)X.

  • @rajupowers
    @rajupowers 8 років тому +1

    At some point, the prof should say that we will take questions at the end of the class :-) Great lecture

  • @zahraghorbanali98
    @zahraghorbanali98 3 роки тому

    such a good explanation, Thank you, Professor :)

  • @siosayange
    @siosayange 7 років тому

    the most detailed and clear explanation! thank you!

  • @destroyerbiel
    @destroyerbiel 7 років тому

    Thank you. You saved my life

  • @mrfolk84
    @mrfolk84 6 років тому +1

    writing notes out of this lecture , thanks professor :-)

  • @hoangtuyen100
    @hoangtuyen100 7 років тому +1

    Please, explain why does it have no activation function on the hidden layer neurons?

  • @paolofreuli1686
    @paolofreuli1686 7 років тому +1

    Awesome lesson

  • @samin2012
    @samin2012 9 років тому +1

    very clear and informative thanks so much!

  • @saharsohangir4573
    @saharsohangir4573 8 років тому

    Great, Thanks

  • @seemasuber8145
    @seemasuber8145 7 років тому

    Good lecture

  • @Claudemcd1
    @Claudemcd1 8 років тому

    I get confused between the vocabulary set, big "V", cardinality d (about 1.5 M is very rich), and the reference corpus, big "T", which has a much higher cardinality (Wikipedia as a corpus would be Billions of words).
    When we calculate p(w|c) - in the last 10 mn of this video-, I would think that the quotient of this Softmax function is a sum computed over "V", and not "T".
    Am I correct ?
    Thanks !
    PS: Notation is indeed a nightmare in this chapter !

  • @logicboard7746
    @logicboard7746 6 років тому +1

    W2V starting @41:00

  • @ElhamMirshekari
    @ElhamMirshekari 7 років тому

    topic modeling 14:10 >> non negative matrix factorization

  • @vaibhavkotwal7582
    @vaibhavkotwal7582 7 років тому

    How did you calculate W ?

  • @danielpit8693
    @danielpit8693 8 років тому

    Why the input is assumed to be one hot vector? I know it is sparse, and most of them are zeros, but shouldn't the actual condition be k-hot vector( k>0)?

    • @bopeng9504
      @bopeng9504 8 років тому

      one hot encoding is assigning each word an unique vector to identify itself. [0 0 0 0 1] So for "the cat jumped" , its [1 0 0] , [ 0 1 0], [0 0 1], or concatenated its [1 0 0 0 1 0 0 0 1 ] to represent the sentence. But the one hot encoding itself for the word "the" is [1 0 0]

  • @Trackman2007
    @Trackman2007 5 місяців тому

    21:49

  • @arpitadutta6610
    @arpitadutta6610 7 років тому

    can u please upload d slides

  • @techonlyguo9788
    @techonlyguo9788 9 років тому

    still cannot understand the function of SVD to the relationship of word...

    • @linlinzhao9085
      @linlinzhao9085 8 років тому

      As I understood, U matrix is basically the reduced-dimension representation of the columns, i.e. the articles, V matrix is for the words. If the representation is good, we should see the word clusters by applying some visualization techs, e.g. tSNE to V matrix. Rows of V are the bases of word space, projecting them to lower dimension in order to visualize is kind of recombination and transformation. Just my 2 cents, not sure if 100% solid.

  • @gautamsharma3835
    @gautamsharma3835 8 років тому

    I guess there is a mistake which he made while writing the equation for the hidden layer of the neural net. He forgot to apply a non-linear functions to W_T*x. So h should be Phi(W_T * x) and not simply W_T * x (Assume Phi as non-linear function). Correct me if i'm wrong.

    • @danielpit8693
      @danielpit8693 8 років тому

      I guess the professor doesn't really mean that is an actual hidden layer, just want to illustrate how to convert one-hot vector to the input vector, and from the last step generating output vector. (Introducing the matrix W and W prime). In the actual neural network, there should be an activation function. I am just guessing..

  • @jaanaan2847
    @jaanaan2847 7 років тому +1

    Nice sentence you used, Silence is the language of the God, all else of poor translation!

  • @rajupowers
    @rajupowers 8 років тому +3

    At some point, the prof should say that we will take questions at the end of the class :-) Great lecture