Word2Vec Easily Explained- Data Science

Поділитися
Вставка
  • Опубліковано 10 лют 2025
  • If you are looking for Career Tansition Advice and Real Life Data Scientist Journey. Please check the below link
    Spring board India UA-cam url: / channel
    Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
    / @krishnaik06
    github url: github.com/kri...
    NLP playlist: • Natural Language Proce...
    Connect with me here:
    Twitter: / krishnaik06
    Facebook: / krishnaik06
    instagram: / krishnaik06

КОМЕНТАРІ • 127

  • @karthip23
    @karthip23 5 років тому +21

    I was trying to understand word2vec for the past two years with many videos. You made it clear today with just this 20 minute stuff. You are simply amazing :)

  • @ajeetsinghshekhawat2696
    @ajeetsinghshekhawat2696 3 роки тому +12

    Guys please use "words = model.wv.key_to_index" in place of "words = model.wv.vocab" in code line 60, as per gensim update. Thanks Krish sir for all the efforts you made for data science community.

  • @abhishekpratapsingh9283
    @abhishekpratapsingh9283 2 роки тому +5

    At 3:05 you are saying, in TF - IDF also semantic information is not stored but in video of TF - IDF you said its stores words semantically unlike Bag if words.

  • @linusthelab
    @linusthelab 3 роки тому

    Thanks!

  • @shobhitsrivastava2123
    @shobhitsrivastava2123 5 років тому +40

    sir, please upload practical videos of Glove and Bert

  • @BalaguruGupta
    @BalaguruGupta 4 роки тому +4

    your tutorials are so good, watching only once is enough to understand the concept. thank u sir

  • @AnalyticsMaster
    @AnalyticsMaster 5 років тому +66

    I am looking for some explanation of how the vectors were derived. Most of the other youtube videos that i have seen did not explain this. I was expecting that in your videos, but here also only the python implementation is explained. How the vectors got derived mathematically is missing here too. Would appreciate if you could elaborate on that, since you have a special talent of explaining complex things in a simplified manner.

    • @Someonner
      @Someonner 5 років тому +9

      For that you can go to cs224n Stanford.

    • @arushkharbanda
      @arushkharbanda 4 роки тому +1

      I was also looking for the same.

    • @maulindusarkar4581
      @maulindusarkar4581 3 роки тому +2

      ua-cam.com/video/UqRCEmrv1gQ/v-deo.html

    • @karaniitgn0908
      @karaniitgn0908 3 роки тому

      That requires good knowledge of probability.

    • @shawkyahmad
      @shawkyahmad 2 роки тому

      Hi, did you find the answer ?

  • @siddharthsingh7717
    @siddharthsingh7717 2 роки тому +2

    Requesting you to upload more videos on BERT, Transformers, LSTM, GIU etc in NLP Playlist .It would be of great help and Thanks Krish for making such amazing videos..

  • @medley5670
    @medley5670 3 роки тому +3

    Following ur NLP playlist... I must say u are very good at explaining each and every concepts clearly. Thank you so much for the effort that u have put in creating this amazing playlist. I took a course to learn NLP but your playlist is far better than the course. Thank you sir!

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому

    Superb video once again Krish. All my doubts about word2vec are noe gone.Thanks

  • @sahiltrivedi69
    @sahiltrivedi69 3 роки тому

    Thank you very much for this video, super helpful 👍👍👍

  • @mohammedfaisal6714
    @mohammedfaisal6714 5 років тому

    Excellent Explanation
    Zabardast Bhai 😎

  • @amaljose116
    @amaljose116 3 роки тому

    Love the conceptual videos, Have been searching everywhere.

  • @nareshkatturi9012
    @nareshkatturi9012 3 роки тому

    Thank you krish 🙏

  • @akshayakki8969
    @akshayakki8969 5 років тому +1

    @Krish Naik Sir please make videos on data structures and algorithms...you are a great teacher 🙏🙏🙏🙏

  • @amruthasankar3453
    @amruthasankar3453 Рік тому

    Thankyou sir❤️🔥

  • @sanandapodder5027
    @sanandapodder5027 4 роки тому +1

    Great explanation.You made the complex topic very simple sir.Thank you very much. One request please upload all the ppts you shown in this nlp series.

  • @BharathKumar-vs8fm
    @BharathKumar-vs8fm 5 років тому +3

    Krish, please make a video on Glove model and pickle model

  • @Maths_With_Rahul
    @Maths_With_Rahul 3 роки тому +11

    Sir, one modification is there in Gensim from 3.8.0 to 4.0.0 version model.wv.vocab has changed to model.wv.key_to_index (model initialized according to your video) Thank You

  • @venkatkrishnan9442
    @venkatkrishnan9442 3 роки тому

    Nice explanation. But one thing I didn't understood is how the words you showed are similar. I can see all are of different meaning only

  • @akash_thing
    @akash_thing 4 роки тому +2

    Hi Krish ! Can you make the video on converting whole data frame words to vectors using Word2Vec as you have not completely explained it.

  • @keerthivasini
    @keerthivasini 4 роки тому +3

    Sir, Please do a similar video about how to Implement GloVe to vectorize text documents using Python.

  • @MAhmadian
    @MAhmadian 3 роки тому

    Thanks Krish. Why you did't manage out the punctuations from the input text? Are you expecting to get some useful information from them?

  • @manjunath.c2944
    @manjunath.c2944 5 років тому

    superb job krish kindly video on Bert which will be very helpful

  • @saramohammadinejad298
    @saramohammadinejad298 3 роки тому

    Amazing tutorials!

  • @prasanthdevarapalli
    @prasanthdevarapalli 4 роки тому +1

    The words "not", " haven't" etc should be excluded while performing stopword removal. These words are very useful when constructing bigrams as Word2Vec takes semantic meaning. Correct me if I am wrong.

  • @debatradas9268
    @debatradas9268 3 роки тому

    thank you so much

  • @Xnshau
    @Xnshau 2 роки тому

    Geat explanation. How do i evaluate the performance of two or more models trained on the same dataset?

  • @riteshpatil7230
    @riteshpatil7230 3 роки тому +1

    Hello, can we use Word2Vec in same way like Bag of Words and TFIDF for training a classification model ? If yes, how do we do it ? If no, then how exactly can we see whether Word2Vec is overcoming drawbacks of TFIDF or not?

  • @ljtutorials2447
    @ljtutorials2447 2 роки тому

    Hello sir very much impressed by your video. I wanted to know whether we can have hindi or punjabi corpus too instead of English. Pls reply

  • @DS_AIML
    @DS_AIML 4 роки тому

    Good Try Krish.Even though i got the concept of using WordtoVec. it not connected well with the code.Please create one Python code for full implementation.

  • @ashimmaity64
    @ashimmaity64 5 років тому

    awesome all my doubt are clear now .please make a video on tfidf word to vec.

  • @SudipPandey
    @SudipPandey 4 роки тому +1

    excellent explanation sir,I have 2 questions..1)Is the word2vec different than word embedding or is it the same form of word embedding 2)can we use word2vec in both machine learning and deep learning.

    • @shashireddy7371
      @shashireddy7371 4 роки тому +1

      Hi Sudip,
      Word embadding is general technique to represent document or word in vector form.
      Like we do one hot encoding , dummification etc..
      Some of embedding tecniques are :
      1) Bag or word
      2) TF-IDF
      3) word2Vec (its capture semantic information: word sequence detailes).
      I hope this might help you :)

    • @padmaparameshwaran4986
      @padmaparameshwaran4986 3 роки тому

      Did you get answer for this ? Have the same question now

  • @pavankumarpotta4565
    @pavankumarpotta4565 3 роки тому

    krish sir, can you show how create our own Word2Vec

  • @lanceabhishek6727
    @lanceabhishek6727 2 роки тому

    can you make a video on how to deal with class balance in nlp, active learning , and when to use w2v and when to use tfidf. ps thanks for your content

  • @sumayyaafreen3499
    @sumayyaafreen3499 Рік тому

    Hi ..Thanks for making such wonderful videos!!! Small doubt ..NLTK doesn't support Urdu language. Then which library can be used for URDU?

  • @cer_oz
    @cer_oz Рік тому

    Hi Krish, why did you tokenize the text to sentences rather than the words? Is there a special reason for that? that would give almost the same result.

  • @GauravSharma-ui4yd
    @GauravSharma-ui4yd 5 років тому +2

    Plz continue the deployment of ML models series

  • @aminumyau1040
    @aminumyau1040 3 роки тому

    Hello Mr krish.
    Please help me with video tutorial of fake news detection using machine learning algorithms with word2vec as feature extraction method.

  • @suvarnadp1806
    @suvarnadp1806 5 років тому +2

    Sir, please make a video on elastic search engine

  • @ranjan4495
    @ranjan4495 5 років тому +2

    Sir, I downloaded the nltk library, but word2vec_sample is not getting downloaded. It says out dated, how to get it completed?

  • @bharathreddy4806
    @bharathreddy4806 5 років тому +1

    sir please extend this video by explaining the latest ELMO & BERT.(including handson)

  • @souravghosh2450
    @souravghosh2450 9 місяців тому

    I clicked on the link - "Career Transition Advice and Real Life Data Scientist Journey" but it gives an output as "This channel do not exist". Pls update on this.
    Thanks

  • @arjunbali2079
    @arjunbali2079 4 роки тому

    thank you sir

  • @jaydhanwant4072
    @jaydhanwant4072 4 роки тому +1

    People afaid of AI to take over humanity,
    Also AI: Vikram also looted satish :D

  • @nehamanpreet1044
    @nehamanpreet1044 4 роки тому

    Please make videos on Glove and Bert

  • @arsiblack2404
    @arsiblack2404 3 роки тому

    Please, make study case: sentiment analysis svm with feature selection word2vec

  • @mayanksinghal473
    @mayanksinghal473 4 роки тому

    Hi Krish,
    Your videos really helpful us a lot.
    Could you please make a video on skip gram and cbow model of word2vec?

  • @sowmyabhat1297
    @sowmyabhat1297 3 роки тому

    Please explain Drain parser algorithm implemented for parsing log files.

  • @ouryly1541
    @ouryly1541 5 років тому

    Hi sir, I am your biggest fun. In this video, have you used a pre-trained word2vec from gensim to get the embedding vectors of sentences or you just trained this word2vec with sentences.

  • @kushalchakrabarti240
    @kushalchakrabarti240 4 роки тому

    Why did we not use lemmatization or stemming here? Won't that make the system more smooth?

  • @nithinmamidala
    @nithinmamidala 5 років тому +2

    what is semantic information? do you have any material related to that please tell me.

  • @thelastone1643
    @thelastone1643 5 років тому +1

    Thank you very much. Can use word2vec to predict the most frequent 10 words that come before a specific word and the most frequent 10 words that come after that specific word? and how?

  • @NareshKumar-ir3ye
    @NareshKumar-ir3ye 5 років тому

    Excellent vidoe. Can we have text summarizer using word2vec?

  • @lamnguyentrong275
    @lamnguyentrong275 4 роки тому

    thank you but if you put some subtitle, it would be easier for us, from VIETNAM

  • @sachin143ful
    @sachin143ful 4 роки тому

    How about using N-gram with bag of words?
    Example: sent1: he is good boy.
    sent2: he is not good boy
    using stop words. "not" will be removed..

  • @prashanthkolaneru3178
    @prashanthkolaneru3178 5 років тому +3

    Can we give word2vec input to machine learning models

  • @reshmachikate5713
    @reshmachikate5713 2 роки тому

    Do we not require either of the stemming or lemmatization while converting words into vectors here?

  • @utpalbandyopadhyay1633
    @utpalbandyopadhyay1633 4 роки тому

    Sir please provide us with an easy-to-preprocess chatbot dataset....

  • @praveshbisaria5303
    @praveshbisaria5303 4 роки тому

    Sir upload video for Glove and BERT too.

  • @MukeshKumar-dk6mc
    @MukeshKumar-dk6mc 5 років тому

    You didn't thought me types of NLP plz make video about this....

  • @nikhilsharma6218
    @nikhilsharma6218 4 роки тому

    how does the vector war got 100 dimension and what does it indicate??
    and what is logic of finding vocab in algo and how algo. is performing that??

  • @renuroy6096
    @renuroy6096 4 роки тому

    Can you please add the video for Topic modelling and Text Summarization?

  • @haziq7885
    @haziq7885 3 роки тому

    is there a need to lemmatize or stem before we do word2vec? Thanks!

  • @ravikiran1284
    @ravikiran1284 5 років тому

    Please do a video on glove

  • @barax9462
    @barax9462 3 роки тому

    I'm tasked to implement w2v multicagorical classification from scratch,,, but I'm too confused on what is the input the network exactly that is the x1, x2 and x3,,, I mean is x1 the 1sr word in a document? Or is it the 1st element in "a word embedded vector“ for instance if" cat=[0.1, 0.8, 0.7] then x1 is 0.1,,, Im really confused about this generally

  • @Skandawin78
    @Skandawin78 4 роки тому

    How to do information extraction to grab sentences for a particular context from multiple websites ?? Can you point me to the right approach or source

  • @prachigopalani5399
    @prachigopalani5399 4 роки тому

    Sir, please upload Glove embeddings and BERT Model

  • @ashok9588
    @ashok9588 4 роки тому

    best one

  • @omernaeem1388
    @omernaeem1388 4 роки тому

    Sir Text embedding bhi bta dy kis trha krty han plz

  • @rafsunahmad4855
    @rafsunahmad4855 3 роки тому

    make a video on glove

  • @nik7867
    @nik7867 4 роки тому

    is it following basically percentile system in vector of finding similar words?

  • @joyeetamallik5063
    @joyeetamallik5063 5 років тому

    can you please explain what is this join for? I wanted to join but not sure what are these and how this works. It would be really great if you can explain. :-) thanks

  • @vaibhavikumari384
    @vaibhavikumari384 3 роки тому

    sir actually i m getting error while executing
    # Training the Word2Vec model
    model = Word2Vec(sentences, min_count=1)
    words = model.wv.vocab(error is coming in this line)
    plz help

  • @kothapallysharathkumar9743
    @kothapallysharathkumar9743 5 років тому

    Could you please on Make a video carrer in NLP Domain. And give where to start like curriculum

  • @gyaan3101
    @gyaan3101 4 роки тому

    During the preperation of dataset I did this sir, later while training word2vec model words are showing the output as individual letters.... could you pls help me out with this
    corpus=[]
    sentences=nltk.sent_tokenize(paragraph)
    for i in range(len(sentences)):
    review=re.sub('[^a-zA-Z]',' ',sentences[i])
    review=review.lower()
    review=review.split()
    review=[word for word in review if word not in set(stopwords.words('english'))]
    review=' '.join(review)
    corpus.append(review)

  • @datasciencegyan5145
    @datasciencegyan5145 5 років тому +1

    After applying w2v, can we proceed for sentiment analysis through the selected words as the sentence is having huge amount of words

  • @sougataghosh4101
    @sougataghosh4101 3 роки тому

    Hi Krish, if my text data is Vietnamese or Hebrew, which process will be best to convert text data to vector?

  • @BalaguruGupta
    @BalaguruGupta 4 роки тому

    can you please do a tutorial on Glove

  • @niteshchotaliya8849
    @niteshchotaliya8849 5 років тому

    Do you provide classes?

  • @ArathiK-s8u
    @ArathiK-s8u 10 місяців тому

    Can anyone tell me as to why in the pre processing part white space is removed twice?

  • @omfuke3083
    @omfuke3083 5 років тому

    why you didn't use stemming and lemmatization instead of regex

  • @akashacharya1046
    @akashacharya1046 3 роки тому

    sir, why didn't you remove punctuations?

  • @starsailor984
    @starsailor984 4 роки тому

    what is the floating number beside most_similar? is it cosine similarity?

  • @shivanireddy4701
    @shivanireddy4701 4 роки тому

    Can you please explain POS tagging? thanks

  • @mohitkaushik3660
    @mohitkaushik3660 3 роки тому

    sir i am facing problem while installing genism please help me that

  • @maxwellochieng4924
    @maxwellochieng4924 3 роки тому

    so there is no stemming or lemmatization in W2V

  • @paneercheeseparatha
    @paneercheeseparatha Рік тому +1

    that's nice that he calls "woman" as "human" because most of the people don't consider them as human.

  • @deenasiva2829
    @deenasiva2829 4 роки тому

    Sir, How to extract the keyword using Word2vec?

  • @GamingEver07
    @GamingEver07 4 роки тому

    can we use word2vec in sentiment analysis?

  • @srijitasaharoy2228
    @srijitasaharoy2228 4 роки тому

    how can i construct sent2vec from facebook word2vec model?

  • @pratyushkanojia3650
    @pratyushkanojia3650 3 роки тому

    Using the same command i am unable to import the gensim library

  • @harikrishnam3473
    @harikrishnam3473 4 роки тому

    Hi Krish, I tried same steps, KeyError: "word 'infosys' not in vocabulary"
    could you please guide me.

    • @salvindsouza7053
      @salvindsouza7053 4 роки тому

      that word is not present in vocab. check the spelling.

  • @mandarkulkarni823
    @mandarkulkarni823 4 роки тому

    Hi @krish naik can i make payment through gpay 299/- to join as a member,to access live videos...gpay id which you have given in previous playlist description.

    • @krishnaik06
      @krishnaik06  4 роки тому

      No Mandar ...u have to go through youtube xhannel itself...it is handled by youtube

    • @mandarkulkarni823
      @mandarkulkarni823 4 роки тому

      @@krishnaik06 Sir i am facing problem for joining channel,my be bank servers are down,any other mode of payment plz let me know @krish naik

  • @satishm8316
    @satishm8316 3 роки тому

    What do you mean Semantic Data?

  • @manavmanoj3870
    @manavmanoj3870 Рік тому

    can anybody explain why the vector has exactly 100 dimensions

  • @shivakrishnareddy5855
    @shivakrishnareddy5855 5 років тому

    Sir can "Data Protection Act" can effect the jobs in data science

  • @sadabratakonar4219
    @sadabratakonar4219 4 роки тому

    how to change number of dimensions of a word in word2vec

  • @trexmidnite
    @trexmidnite 3 роки тому

    Which particular stufff?

  • @qaiserali6773
    @qaiserali6773 4 роки тому

    Focus on teaching students not in history and stop quoting such type of controversial statements.