Krish, thank you for investing so much time and effort into making all of these videos . I really appreciate it. These videos have greatly helped me jump-start my career in machine learning. I am now a full-time machine learning engineer at a startup and just wanted to mention that you were a huge help in the start of that journey. Cheers.
this series has been so good. sometimes more than concept understanding, we need things in sequence to let our mind comprehend. this is in order. thanku krish sir!
7:46 if you notice carefully, the nouns are only getting lemmatized but the verbs are not getting lemmatized. Will not it cause a generalization problem?
Dear Krish, Thanks for adding this wonderful tutorial. But one doubt, what is the eventual outcome of preparing this BoW. Could you please add an extension to this tutorial as to how the BoW is put to productive use? For ease of understating the practical application of this, Can you add a real world Use Case and explain how BoW solved a real world problem or cater for a requirement?
Hi Krish Thanks for the informative video series Quick question to you though if we do stopword removal and not get removed doesn’t it completely change the meaning For eg. We have not conquered anyone Conquered anyone. This is very different from the meaning How do we tackle negation words
This is really good. One question: If Stemming always give history as histori and other meaningless words, why do we event do that? Any way Lemmatization does the job...why cannot we directly do that?
Bcz meaning of that word doesn't make a difference in implementation. Historical, history both will stemmed to "histori" . & Counted as one entity while vectorizing
I had a question in the last table, we saw the number of occurrences of words in a sentence how can we know which column represents which word? will the columns be in order according to the descending order or occurrence frequency?
Column has nothing to do with frequency order. CountVectoriser creates a map of all the unique words in the corpus. Here in the example 144 words are unique in 31 sentences, so the matrix size is 31*144. The map is represented like {{1,word1},{2,word2},......,{144,word144}} and while creating the vector from sentence it will create array of 144 size for each sentence and if the word at index is present it will write it's frequency or if not present it will write 0 in that array.
I believe it was just for demonstration purpose that it is a good habit to lower case and remove punctuation and also to demonstrate the functionality of CountVectorizer()
As I think for example if you have a word like "Good" and "good" it won't make any sense to treat them and two different words. There is one more library under nltk that is VADER in which you're not recommended to use lower as for VADER "GREAT" and "great" have a different level of excitement in the sentence.
Hi sir, I tried following same sir, new to spyder IDE, I could not see Numpy array bag of words matrix post running the code. Can someone help. Thank you
Hi krish , great video. Can u pls explain why we have used fit_transform instead of fit and what is the difference between fit, transform and fit_transform?
i've some problem with this section, when i write review = re.sub('[a-zA-Z]',' ',sentences[i]) all follow step content uniqly stopwords, please i need explanation or help. thanks
sir can you tell me why we used the toarray function because we already get the sparse matrix by vectorization and also the toarray is representing the same sparse matrix ?any use of toarray in this ?
Hello Krish, if we convert the words to upper to lower then there will be a situation where US and us meaning will be the same so how to handle such situations?
as far as i know its bttr not to use short forms so instead of US ....USA will make more sense also us word is less likely to make any impact....it will b removed by stopwords......
Hello Krish! Thanks for uploading and explaining in great details , have a question let' s say we have around 500 txt messages or paragraphs , how do we go about it. is there any way ? please reply
Krish, thank you for investing so much time and effort into making all of these videos . I really appreciate it. These videos have greatly helped me jump-start my career in machine learning. I am now a full-time machine learning engineer at a startup and just wanted to mention that you were a huge help in the start of that journey. Cheers.
Congrats mate. How did you apply to that startup ?
this series has been so good. sometimes more than concept understanding, we need things in sequence to let our mind comprehend. this is in order. thanku krish sir!
Krish, thank you for investing so much time and effort into making these videos. I really appreciate it. I love u!
guriji ...jis tareke se aap ye samjate ho ...bohot badiya ..guruji...apke hu aabhari hey ....bhagwan apko lambi aayu dede..
Sir,Hats off sir for ur efforts.This is the best NLp tutorial.
Thank you very much, dear Krish. Well done videos. Simply explains the complicated subjects.
Sir you and your lectures bott are great.
Thanks for making videos for us.
Amazing. Love how you teach from the basic level
Really sir your way of teaching is nice and simple.Great work.
Really great tutorial. This was very helpful for me. Thank you very much. Please keep posting quality video like this. Love from BD.
Amazing explanation of each and every line of code
7:46 if you notice carefully, the nouns are only getting lemmatized but the verbs are not getting lemmatized. Will not it cause a generalization problem?
Awsome explanation Sir, fan of your explanation.. Hats off
Hii sir your videos helped me a lot to understand NLP basics thanku sir create more useful videos like this
Great work Krish. Can you please make a Video on text analytics using R. That will be great help. Thanks
Dear Krish, Thanks for adding this wonderful tutorial. But one doubt, what is the eventual outcome of preparing this BoW. Could you please add an extension to this tutorial as to how the BoW is put to productive use? For ease of understating the practical application of this, Can you add a real world Use Case and explain how BoW solved a real world problem or cater for a requirement?
Superb video for practice.Thanks
Hi Krish, as u explained @11:00 how we decide it is a +ve or -ve sentence..?
bro he meant....that after we have bag of words we can train model n thn test it.. n then model will tell us whthnr its +ve or -ve
Thank you. You are a great man.
Your videos are great! Thank you very much!
Thanks Krish! Super helpful!
perfect explanation. thanks for your effort :)
BOW also may suffer from Curse of dimensionality issue isn't it ? So what we can do for that, Any further improvement over that issue at some extent.
It's a great video Krish ..keep it going .
But why don't you use Spacy for NLP , i feel it is more faster than NLTK
Hi Krish
Thanks for the informative video series
Quick question to you though if we do stopword removal and not get removed doesn’t it completely change the meaning
For eg. We have not conquered anyone
Conquered anyone.
This is very different from the meaning
How do we tackle negation words
This is really good. One question: If Stemming always give history as histori and other meaningless words, why do we event do that? Any way Lemmatization does the job...why cannot we directly do that?
Bcz meaning of that word doesn't make a difference in implementation.
Historical, history both will stemmed to "histori" . & Counted as one entity while vectorizing
I had a question
in the last table, we saw the number of occurrences of words in a sentence
how can we know which column represents which word?
will the columns be in order according to the descending order or occurrence frequency?
Column has nothing to do with frequency order. CountVectoriser creates a map of all the unique words in the corpus. Here in the example 144 words are unique in 31 sentences, so the matrix size is 31*144. The map is represented like {{1,word1},{2,word2},......,{144,word144}} and while creating the vector from sentence it will create array of 144 size for each sentence and if the word at index is present it will write it's frequency or if not present it will write 0 in that array.
@@Ajay-ku5fn 114*
Countvectorizer by default removes the punctuations and lowers the alphabets. Then why are we doing it separately? Please respond.
I believe it was just for demonstration purpose that it is a good habit to lower case and remove punctuation and also to demonstrate the functionality of CountVectorizer()
Could u please make a video for latent dirichlet allocation and how tf-idf + lda together works?
It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.
it is okay, i resolved my problem.thanks
Thankyou sir❤️🔥
Can you please make a video on regular expression library
Thank you so much krish
Thank you krish sir
very good expalination
Very well explained. I still have a doubt. How .lower() and .split() are helping to clean the text? Can anyone please explain?
As I think for example if you have a word like "Good" and "good" it won't make any sense to treat them and two different words. There is one more library under nltk that is VADER in which you're not recommended to use lower as for VADER "GREAT" and "great" have a different level of excitement in the sentence.
Hi sir,
I tried following same sir, new to spyder IDE, I could not see Numpy array bag of words matrix post running the code. Can someone help. Thank you
Dhanyavaad Sir
Thankyou so much sir!
Well, Explained !!
Lemmatization 6:40
Hi krish , great video. Can u pls explain why we have used fit_transform instead of fit and what is the difference between fit, transform and fit_transform?
its actually very simple i assume u know what fit and transform does separately........this both actions are done at same time in fit_transform.
For if condition
if word not in set or
if not word in set
How is it executing for both codes pls explain
Can you please give numbering to the videos in playlist. That will be useful to us
Please use a nice mic. It's become too horrible in the sense of audio quality. Meanwhile content is Perfect, Keep going.
hi krish naik how to check ur data in github
Nice Videos Krish. Can you please make a video on how to get the data from web(Google Reviews ect) using python.
I guess u have to use web scraping
Is Spyder a better IDE than Jupyter ?
i've some problem with this section, when i write review = re.sub('[a-zA-Z]',' ',sentences[i]) all follow step content uniqly stopwords, please i need explanation or help. thanks
use except for symbol ^
Hi Krish, I really enjoyed this playlist, could you also help in the concepts for syntactic processing?
Thanks in advance!
sir can you tell me why we used the toarray function because we already get the sparse matrix by vectorization and also the toarray is representing the same sparse matrix ?any use of toarray in this ?
nice expl bro..
Thankz A lot Sir
want more videos on NLP and deeplearning
thank you so much
Can you post video on web scrapping
after creating x why 1 0 and other numbers are showing in different colors
is it for representation
@@datasciencegyan5145 I think that it is automatically coloured that way in Spyder
its just for representation...so that we can spot which 1's and 0's easily
Sir, you forgot to explain max_features in CountVectorizer.
how the dimension is (31,114)? Please can you Explain ?
31 is for tht total sentences and 114 is for the words
Nice explanation sir...but when I am implementing your code it's showing some error....can anyone help me please!!
what error you are getting
Hello Krish, if we convert the words to upper to lower then there will be a situation where US and us meaning will be the same so how to handle such situations?
My suggestion would be create a list of words to be ignored from lowercase
as far as i know its bttr not to use short forms so instead of US ....USA will make more sense also us word is less likely to make any impact....it will b removed by stopwords......
👏👏👏👏👏👏👏👏
i don't know if it is my nltk's version, but i don't see different between lemmatization and stemming, both return me same thing. thanks
Hello Krish!
Thanks for uploading and explaining in great details , have a question let' s say we have around 500 txt messages or paragraphs , how do we go about it. is there any way ? please reply
www.kaggle.com/parulpandey/getting-started-with-nlp-a-general-intro ... have a look at this link. I hope it helps you
❤
You sound like MSD
🤝🤝🤝🤝🤝👌👌👌👌👌