Text Representation | NLP Lecture 4 | Bag of Words | Tf-Idf | N-grams, Bi-grams and Uni-grams
Вставка
- Опубліковано 2 лип 2024
- In natural language processing, text representation plays a vital role in capturing the meaning and structure of textual data. This video explores three fundamental text representation techniques: Bag of Words, Tf-Idf (Term Frequency-Inverse Document Frequency), and N-grams (Uni-grams and Bi-grams). Each method has its unique approach to encoding and extracting information from text, making it essential for data scientists and NLP enthusiasts to grasp these concepts.
Assignment - colab.research.google.com/dri...
============================
Do you want to learn from me?
Check my affordable mentorship program at : learnwith.campusx.in
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#TextRepresentation #BagOfWords #TfIdf #NGrams #NLP #DataScience #machinelearning
⌚Time Stamps⌚
00:00 - Intro
01:10 - Plan of Attack
02:56 - Introduction
03:25 - What is feature extraction from text?
04:49 - Why do we need feature extraction?
07:30 - Why is this difficult to do?
11:00 - What is the core idea behind this?
12:12 - What are the Techniques?
14:24 - Common Terms
18:00 - One Hot Encoding
33:25 - Bag of Words
57:45 - N-grams/Bi-grams/Tri-grams
01:13:45 - Benefits of N Grams
01:14:25 - Disadvantages N Grams
01:16:34 - Tf-Idf
01:38:46 - Custom Features
01:41:45 - Assignment
This is one of the legendary videos I have seen. I’m into SEO and trying to wrap my head around semantic SEO. Some experts in the semantic SEO industry use technical jargons and fail to explain how semantics engines like Google work. But your series helped me understand every single bit of it. I don’t know python coding, but now I understand how Google algorithms work to rank any document. I understand the type of computation they do behind the screen.
The video is pure gold! I mean it! This helps me as a search engine optimiser and makes me better understand machine and human interaction. Thank you so much 🙏 ☺️
oh bhai.. unbelievable... 2hrs of content......genuinely dil se shukriya sir appko....
This playlist is necessary for basic to advanced NLP engineers. Please do upload the complete series Sir. Your contribution is life saving.
Please carryon this series and we would like to learn advance NLP using deep learning/langauge models,sota techniques once basic NLP is done.
Huge respect sir, You deserve more than million followers. Love from Nepal ❤️❤️❤️❤️
I was confused to select word embedding technique for my fyp project and found this video life saving. Thank youu soo muchh !!!
I cannot find any reason not to like this video. It's amazing!
thank you so much. you explain the concept in a very very simple way. once again thank u so much 🙂🙂🙂🙂
For me You are the best data science teacher. ❤❤❤❤❤
Thanku so much sir for a wonderful explaination. Hatts off to u always!
Bohot tutorials dekha but aap best ho sir
the best tutorial i found from you.
Respect for your hardwork
I am following your Ml playlist sir you have great explanation, sir please complete xgboost and DBSCAN algorithm in this playlist and please start series on Deep learning..
I'm following it as well
Will do it in January
@@campusx-official woaahhh! Thank you sooooo much! This made my day! 🥳🥳🥳
@@campusx-official which year january sir ?
@@campusx-official hi sir based on your example using tri gram the vocabulary is decreasing to 5 so i dont follow your this part when u said the vocabulary increases as the n gram increases
Teaching style awesome... Go ahead.
Almost 2 hours...Respect!
Superb explanations !
It was awesome .. love you sir... thanx for your efforts
Best series on NLP
Congratulations sir for 10K Subscribers.
This channel is highly underrated...
Thank you for another great session
Thanks a lot Sir. My Phd is on NLP only .your videos helps me a lot in understanding overall concepts . Your efforts are very sincere and dedicated 💯
I am getting started with NLP :) I am still doing my UG can you tell me your experience in the field?
@@forgotabhi Its amazing field and day by day u will came to know new models and architectures.
@@forgotabhi which college?
Your explanation is so great!! Vo bhi hindi me. Thanks a lot!!💙
Awesome content and explanation sir 👐
Congratulation for 100K subscribers in advance.
Incredibile ❤️
Thank you very much Nitish sir!
Thank you sir...Really itne easy way me smjhane ke liye
i love the way sir explains, i am not able to grasp the fundamental concepts but not able to imagine myself to code for NLP without any guidance ,Any suggestions what other materials and sources I should follow ?
Sir..thank you for the wonderful video
Congratulations on 36K subs, soon we gonna cross 100K IA :)
Very nice and very good start point. Can you pls suggest which text representation algo suited for log analysis.
Very good flow sir . Kindly upload next in NLP series
KUDOS TO YOU SIR ..............loving this series
awesome video
Appreciate your efforts
at 55:24 , BOW doesn't consider the sequence of sentence but since we gonna perform Tokenization before this, we gonna lose some words that'd mess the sequence anyway. isnt that right?
Great Video Sir.
Hello sir, I have had doubt on this topic how conversion is taking place, watch lots of video read lot's of blogs but no one can make me understand like u did. Hat's off to u keep up the great work.
wow , after watching this video, I am confident on feature engineering
for tf idf, campusx term came 4 times but sir you considered it only thrice any reason for it? Anyway there we are getting +1 in realtime. Could you please reply me for this?
Hi Nitis sir, where can we get these notes in pdf format.so that it will be helpful while doing revision.
THANKYOU SIR
Many Many Congratulations to you Sir for 10k Subs🥳🥳🥳
😁😁😁😁 182 k
i have one doubt can we normalize the vector engineering features? I think normalizing the vector will still contain the info that was previously their but in the lower scale for reducing computation. let me know if this is the correct approach
Amazing video
2 hours of pure diamond mine.
Can you add RNN, LSTMs, and modern NLP using transformers!?
Loved the content. Huge respect.
Ps banjara market ka lamp! XD
well done buddy #nlp #nlptuts #nlpeasytuts
Please make more videos on NLP !!!
You are a 💎.
Sir NLP series is really amazing, please recommend me best book for NLP because in few days I have an interview which will totally on NLP.
Sir suppose length of sentences are unequal..... Tab kya padding ke alava aur koi option nahi hai in case of Tfidf Or ngrams etc?
Thanks a lot sir .... Can you also upload a video on RNN & LSTM.
Thank you🌹
Thank you so much sir😊🙏
Sir , i dont understand for idea of tf idf at 1:20:09.
Since you said jo word document mein jyada hain but corpua mein kam hain.
I confused in that way that how its possible.
Since corpus mein to hoga hi hoga jyaga or equal.kindly clarify sir.
Thank u sir .
Liked and shared your video.
Subscribed your channel.
What else can I do for you. You are doing a great job.
Sir please continue the playlist!!!!!
very nice explanation sir
thank you sir ji
cv.fit_transform(df['eng'])
How can we apply fit_transform on text? I think I do not understand this part
Many many congratulations for 10k sub. 🎊🎊🎊
Thanks
Thanks
sir ji , removing stopwords took 3hours 26 min , tokenization karne mein ghabraahat c ho rhi hai
Sir will you make a video for nlp project....something good for resume..?
maza aya sir qasam sa
hi nitish sir, im faceing some problem with spell checker function
def spell_correct(text):
return TextBlob(text).correct().string
it is taking so much on assignment dataframe, is there any fastest approach to check and correct spelling in log(n) times ?
Run the file in google collab ,., because of GPU ... runs faster ...
The example you gave for bigrams better than uni gram being able to differentiate the 2 sentences in vector space doesn't really make much sense to me, suppose instead of not I used a synonym of "very "instead like "extremely " then these 2 sentences should be similar in vector space but bigram model will say its different, so its actually not handling the word not rather just handling an unknown word differently
why don't you keep the resource in the description, like the code link..
Feature extraction from text / text representation/ text vectorization - changing text to numbers so that model can understand
Bag of words -
awesome
Hi Buddy, Great content! This video cleared all my doubts regarding BoW and TF IDF🙌
Are you going to take any NLP projects in future based on Machine Learning models?
Yes
@@campusx-official Thank you!
@@campusx-official Sir codes missing in the list ... BOW , TFIDF ..pls share
Where can I get these notes ?
Sir, can you please share your code notebook?
fr maza aagya ekdm
Thank you sir?
share the link for collab notebook
Thank you so much for such wonderful vedio. Sir Do u take any online classes as well?
No, not right now
No, not right now
How to join Mentorship program
when i perform bagofwords method like the video in kaggle notebook on the imdb data it says memory exceeded and just restarts the notebook :( what to do?
same issue i tried in my machine but it said memory exceeded it need 18.1Gib after applying ohe
@@user-qq7qi5kk5u guess we're poor lol
sir please share the one note link
I have tried the assignment on kaggle if any one tried and want to discuss please let me know.
best
1:23:40 - Campusx - word in IDF is repeated 4 times sir , .. Loge( 4/4) = 0
It's repeated only 3 times bro
Deep learning start Karo sir please 👍👍👍❤️
Bag of words minuets?
Please complete NLP,Interview series and ML series
Please teach us BERT ALGORITHM
Please share this notebook source file to us !
You teaching method is good but you making it local only to Indian student not International for all to use.
Please Make a new version of all your videos on NLP to English so everyone can learn from,🙏
23:00
1:38:48
Sir, please share the lectures python notebook file/
how can i get OneNote notes
by writing in ur notebook
54:00
Sir, my question is:
I have list of entities and a text.Like this:
List=["Data Scientist", "Bihar", "Krishna"]
Text=" I am Krishna. I am from Bihar . I want to be a Data Scientist"
I want result like:
"I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity] . I want to be a [Entity]Data Scientist[Entity]"
Please help me with code to get this result.Thanx🙏
Did you get the code?
@@priyaravind18 List=["Data Scientist", "Bihar", "Krishna"]
text = ' I am Krishna. I am from Bihar. I want to be a Data Scientist'
for entity in List:
if entity in List:
text = text.replace(entity,'[Entity]'+entity+'[Entity]')
print(text)
40:34
Sir please recommend some ML books
ua-cam.com/video/sAzX_mF8wGo/v-deo.html