Natural Language Processing|BagofWords

Поділитися
Вставка
  • Опубліковано 9 січ 2025

КОМЕНТАРІ • 92

  • @oshtontsen5428
    @oshtontsen5428 4 роки тому +27

    Krish, thank you for investing so much time and effort into making all of these videos . I really appreciate it. These videos have greatly helped me jump-start my career in machine learning. I am now a full-time machine learning engineer at a startup and just wanted to mention that you were a huge help in the start of that journey. Cheers.

    • @pranayp1950
      @pranayp1950 4 роки тому

      Congrats mate. How did you apply to that startup ?

  • @kunalkumar2717
    @kunalkumar2717 3 роки тому +7

    this series has been so good. sometimes more than concept understanding, we need things in sequence to let our mind comprehend. this is in order. thanku krish sir!

  • @nahidzeinali1991
    @nahidzeinali1991 5 місяців тому +1

    Krish, thank you for investing so much time and effort into making these videos. I really appreciate it. I love u!

  • @ashishbomble8547
    @ashishbomble8547 4 роки тому

    guriji ...jis tareke se aap ye samjate ho ...bohot badiya ..guruji...apke hu aabhari hey ....bhagwan apko lambi aayu dede..

  • @nasreenbanu2245
    @nasreenbanu2245 2 роки тому

    Sir,Hats off sir for ur efforts.This is the best NLp tutorial.

  • @amosavi4730
    @amosavi4730 2 роки тому

    Thank you very much, dear Krish. Well done videos. Simply explains the complicated subjects.

  • @padhiyarkunalalk6342
    @padhiyarkunalalk6342 5 років тому +2

    Sir you and your lectures bott are great.
    Thanks for making videos for us.

  • @glenn8781
    @glenn8781 3 роки тому +1

    Amazing. Love how you teach from the basic level

  • @saurabhbadave6447
    @saurabhbadave6447 5 років тому

    Really sir your way of teaching is nice and simple.Great work.

  • @nasaruddin36
    @nasaruddin36 5 років тому

    Really great tutorial. This was very helpful for me. Thank you very much. Please keep posting quality video like this. Love from BD.

  • @gh504
    @gh504 2 роки тому

    Amazing explanation of each and every line of code

  • @rahuldey6369
    @rahuldey6369 3 роки тому +2

    7:46 if you notice carefully, the nouns are only getting lemmatized but the verbs are not getting lemmatized. Will not it cause a generalization problem?

  • @akhilvarma5708
    @akhilvarma5708 5 років тому

    Awsome explanation Sir, fan of your explanation.. Hats off

  • @manuchowdary7848
    @manuchowdary7848 4 роки тому

    Hii sir your videos helped me a lot to understand NLP basics thanku sir create more useful videos like this

  • @vikrantnag86
    @vikrantnag86 5 років тому +4

    Great work Krish. Can you please make a Video on text analytics using R. That will be great help. Thanks

  • @saratht8223
    @saratht8223 5 місяців тому

    Dear Krish, Thanks for adding this wonderful tutorial. But one doubt, what is the eventual outcome of preparing this BoW. Could you please add an extension to this tutorial as to how the BoW is put to productive use? For ease of understating the practical application of this, Can you add a real world Use Case and explain how BoW solved a real world problem or cater for a requirement?

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому

    Superb video for practice.Thanks

  • @RaviKumar-sw1wc
    @RaviKumar-sw1wc 5 років тому +2

    Hi Krish, as u explained @11:00 how we decide it is a +ve or -ve sentence..?

    • @cristianovivk4935
      @cristianovivk4935 4 роки тому

      bro he meant....that after we have bag of words we can train model n thn test it.. n then model will tell us whthnr its +ve or -ve

  • @mathketeer
    @mathketeer Рік тому

    Thank you. You are a great man.

  • @balive053
    @balive053 3 роки тому

    Your videos are great! Thank you very much!

  • @aadilraf
    @aadilraf 3 роки тому

    Thanks Krish! Super helpful!

  • @seyitahmetozturk721
    @seyitahmetozturk721 3 роки тому

    perfect explanation. thanks for your effort :)

  • @iEntertainmentFunShorts
    @iEntertainmentFunShorts 4 роки тому +2

    BOW also may suffer from Curse of dimensionality issue isn't it ? So what we can do for that, Any further improvement over that issue at some extent.

  • @MechiShaky
    @MechiShaky 5 років тому +2

    It's a great video Krish ..keep it going .
    But why don't you use Spacy for NLP , i feel it is more faster than NLTK

  • @siddharthamahendra4980
    @siddharthamahendra4980 2 роки тому +1

    Hi Krish
    Thanks for the informative video series
    Quick question to you though if we do stopword removal and not get removed doesn’t it completely change the meaning
    For eg. We have not conquered anyone
    Conquered anyone.
    This is very different from the meaning
    How do we tackle negation words

  • @ganeshsubramanian6217
    @ganeshsubramanian6217 3 роки тому

    This is really good. One question: If Stemming always give history as histori and other meaningless words, why do we event do that? Any way Lemmatization does the job...why cannot we directly do that?

    • @ArshdeepSingh..
      @ArshdeepSingh.. 2 роки тому

      Bcz meaning of that word doesn't make a difference in implementation.
      Historical, history both will stemmed to "histori" . & Counted as one entity while vectorizing

  • @arijitdiganto4166
    @arijitdiganto4166 3 роки тому +2

    I had a question
    in the last table, we saw the number of occurrences of words in a sentence
    how can we know which column represents which word?
    will the columns be in order according to the descending order or occurrence frequency?

    • @Ajay-ku5fn
      @Ajay-ku5fn 3 роки тому +2

      Column has nothing to do with frequency order. CountVectoriser creates a map of all the unique words in the corpus. Here in the example 144 words are unique in 31 sentences, so the matrix size is 31*144. The map is represented like {{1,word1},{2,word2},......,{144,word144}} and while creating the vector from sentence it will create array of 144 size for each sentence and if the word at index is present it will write it's frequency or if not present it will write 0 in that array.

    • @RaviKumar-mu4ne
      @RaviKumar-mu4ne Рік тому

      @@Ajay-ku5fn 114*

  • @ammarahemadkhan8570
    @ammarahemadkhan8570 4 роки тому +3

    Countvectorizer by default removes the punctuations and lowers the alphabets. Then why are we doing it separately? Please respond.

    • @tejashshah5202
      @tejashshah5202 4 роки тому

      I believe it was just for demonstration purpose that it is a good habit to lower case and remove punctuation and also to demonstrate the functionality of CountVectorizer()

  • @REDROSE-be3br
    @REDROSE-be3br 2 роки тому

    Could u please make a video for latent dirichlet allocation and how tf-idf + lda together works?

  • @harikrishnanm5109
    @harikrishnanm5109 4 роки тому

    It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.

  • @sandrasandji6620
    @sandrasandji6620 4 роки тому

    it is okay, i resolved my problem.thanks

  • @amruthasankar3453
    @amruthasankar3453 Рік тому

    Thankyou sir❤️🔥

  • @ayushsingh-qn8sb
    @ayushsingh-qn8sb 4 роки тому +1

    Can you please make a video on regular expression library

  • @datascience3008
    @datascience3008 3 роки тому

    Thank you so much krish

  • @suvarnadeore8810
    @suvarnadeore8810 3 роки тому

    Thank you krish sir

  • @chetanmundhe8619
    @chetanmundhe8619 4 роки тому

    very good expalination

  • @shindepratibha31
    @shindepratibha31 4 роки тому +1

    Very well explained. I still have a doubt. How .lower() and .split() are helping to clean the text? Can anyone please explain?

    • @mosart03
      @mosart03 4 роки тому +5

      As I think for example if you have a word like "Good" and "good" it won't make any sense to treat them and two different words. There is one more library under nltk that is VADER in which you're not recommended to use lower as for VADER "GREAT" and "great" have a different level of excitement in the sentence.

  • @lakshmisuvarchalasarva7942
    @lakshmisuvarchalasarva7942 3 роки тому

    Hi sir,
    I tried following same sir, new to spyder IDE, I could not see Numpy array bag of words matrix post running the code. Can someone help. Thank you

  • @jinks6887
    @jinks6887 3 роки тому

    Dhanyavaad Sir

  • @gauravsahani2499
    @gauravsahani2499 4 роки тому +1

    Thankyou so much sir!

  • @rajeshwarsehdev2318
    @rajeshwarsehdev2318 5 років тому

    Well, Explained !!

  • @akhandpratap__
    @akhandpratap__ 4 роки тому

    Lemmatization 6:40

  • @navaneethansuresh8680
    @navaneethansuresh8680 5 років тому

    Hi krish , great video. Can u pls explain why we have used fit_transform instead of fit and what is the difference between fit, transform and fit_transform?

    • @cristianovivk4935
      @cristianovivk4935 4 роки тому

      its actually very simple i assume u know what fit and transform does separately........this both actions are done at same time in fit_transform.

  • @saicharanreddyy.p6873
    @saicharanreddyy.p6873 4 роки тому

    For if condition
    if word not in set or
    if not word in set
    How is it executing for both codes pls explain

  • @hariprasad1744
    @hariprasad1744 4 роки тому

    Can you please give numbering to the videos in playlist. That will be useful to us

  • @monarchbaweja
    @monarchbaweja 3 роки тому

    Please use a nice mic. It's become too horrible in the sense of audio quality. Meanwhile content is Perfect, Keep going.

  • @kirankumar-sn4db
    @kirankumar-sn4db 3 роки тому

    hi krish naik how to check ur data in github

  • @bhargavreddy588
    @bhargavreddy588 5 років тому

    Nice Videos Krish. Can you please make a video on how to get the data from web(Google Reviews ect) using python.

    • @krishnaik06
      @krishnaik06  5 років тому +1

      I guess u have to use web scraping

  • @vijaysista3894
    @vijaysista3894 4 роки тому

    Is Spyder a better IDE than Jupyter ?

  • @sandrasandji6620
    @sandrasandji6620 4 роки тому

    i've some problem with this section, when i write review = re.sub('[a-zA-Z]',' ',sentences[i]) all follow step content uniqly stopwords, please i need explanation or help. thanks

    • @praja110
      @praja110 4 роки тому

      use except for symbol ^

  • @rohandawar484
    @rohandawar484 4 роки тому

    Hi Krish, I really enjoyed this playlist, could you also help in the concepts for syntactic processing?
    Thanks in advance!

    • @akash-lz2dq
      @akash-lz2dq 4 роки тому

      sir can you tell me why we used the toarray function because we already get the sparse matrix by vectorization and also the toarray is representing the same sparse matrix ?any use of toarray in this ?

  • @satishvavilapalli24
    @satishvavilapalli24 5 років тому

    nice expl bro..

  • @talharauf3111
    @talharauf3111 3 роки тому

    Thankz A lot Sir

  • @sathishkumar-kp4hk
    @sathishkumar-kp4hk 4 роки тому

    want more videos on NLP and deeplearning

  • @debatradas9268
    @debatradas9268 3 роки тому

    thank you so much

  • @vinaymn3602
    @vinaymn3602 5 років тому

    Can you post video on web scrapping

  • @datasciencegyan5145
    @datasciencegyan5145 5 років тому

    after creating x why 1 0 and other numbers are showing in different colors

    • @datasciencegyan5145
      @datasciencegyan5145 5 років тому

      is it for representation

    • @RejoicingKrishna
      @RejoicingKrishna 5 років тому

      ​@@datasciencegyan5145 I think that it is automatically coloured that way in Spyder

    • @cristianovivk4935
      @cristianovivk4935 4 роки тому

      its just for representation...so that we can spot which 1's and 0's easily

  • @mohakgangwani
    @mohakgangwani 4 роки тому

    Sir, you forgot to explain max_features in CountVectorizer.

  • @ganeshrajv130
    @ganeshrajv130 4 роки тому

    how the dimension is (31,114)? Please can you Explain ?

    • @cristianovivk4935
      @cristianovivk4935 4 роки тому +1

      31 is for tht total sentences and 114 is for the words

  • @chinmaya007
    @chinmaya007 5 років тому

    Nice explanation sir...but when I am implementing your code it's showing some error....can anyone help me please!!

  • @bijaynayak6473
    @bijaynayak6473 5 років тому

    Hello Krish, if we convert the words to upper to lower then there will be a situation where US and us meaning will be the same so how to handle such situations?

    • @rajsinghmaan3095
      @rajsinghmaan3095 5 років тому

      My suggestion would be create a list of words to be ignored from lowercase

    • @cristianovivk4935
      @cristianovivk4935 4 роки тому

      as far as i know its bttr not to use short forms so instead of US ....USA will make more sense also us word is less likely to make any impact....it will b removed by stopwords......

  • @naderbouchnag3
    @naderbouchnag3 2 роки тому

    👏👏👏👏👏👏👏👏

  • @sandrasandji6620
    @sandrasandji6620 4 роки тому

    i don't know if it is my nltk's version, but i don't see different between lemmatization and stemming, both return me same thing. thanks

  • @learn-with-lee
    @learn-with-lee 4 роки тому

    Hello Krish!
    Thanks for uploading and explaining in great details , have a question let' s say we have around 500 txt messages or paragraphs , how do we go about it. is there any way ? please reply

    • @08ae6013
      @08ae6013 4 роки тому

      www.kaggle.com/parulpandey/getting-started-with-nlp-a-general-intro ... have a look at this link. I hope it helps you

  • @indirajithkv7793
    @indirajithkv7793 2 роки тому

  • @darsh727
    @darsh727 4 роки тому

    You sound like MSD

  • @padhiyarkunalalk6342
    @padhiyarkunalalk6342 5 років тому

    🤝🤝🤝🤝🤝👌👌👌👌👌