Text Representation | NLP Lecture 4 | Bag of Words | Tf-Idf | N-grams, Bi-grams and Uni-grams

Поділитися
Вставка
  • Опубліковано 2 лип 2024
  • In natural language processing, text representation plays a vital role in capturing the meaning and structure of textual data. This video explores three fundamental text representation techniques: Bag of Words, Tf-Idf (Term Frequency-Inverse Document Frequency), and N-grams (Uni-grams and Bi-grams). Each method has its unique approach to encoding and extracting information from text, making it essential for data scientists and NLP enthusiasts to grasp these concepts.
    Assignment - colab.research.google.com/dri...
    ============================
    Do you want to learn from me?
    Check my affordable mentorship program at : learnwith.campusx.in
    ============================
    📱 Grow with us:
    CampusX' LinkedIn: / campusx-official
    CampusX on Instagram for daily tips: / campusx.official
    My LinkedIn: / nitish-singh-03412789
    Discord: / discord
    E-mail us at support@campusx.in
    ✨ Hashtags✨
    #TextRepresentation #BagOfWords #TfIdf #NGrams #NLP #DataScience #machinelearning
    ⌚Time Stamps⌚
    00:00 - Intro
    01:10 - Plan of Attack
    02:56 - Introduction
    03:25 - What is feature extraction from text?
    04:49 - Why do we need feature extraction?
    07:30 - Why is this difficult to do?
    11:00 - What is the core idea behind this?
    12:12 - What are the Techniques?
    14:24 - Common Terms
    18:00 - One Hot Encoding
    33:25 - Bag of Words
    57:45 - N-grams/Bi-grams/Tri-grams
    01:13:45 - Benefits of N Grams
    01:14:25 - Disadvantages N Grams
    01:16:34 - Tf-Idf
    01:38:46 - Custom Features
    01:41:45 - Assignment

КОМЕНТАРІ • 131

  • @sauravagarwal8928
    @sauravagarwal8928 Місяць тому +3

    This is one of the legendary videos I have seen. I’m into SEO and trying to wrap my head around semantic SEO. Some experts in the semantic SEO industry use technical jargons and fail to explain how semantics engines like Google work. But your series helped me understand every single bit of it. I don’t know python coding, but now I understand how Google algorithms work to rank any document. I understand the type of computation they do behind the screen.
    The video is pure gold! I mean it! This helps me as a search engine optimiser and makes me better understand machine and human interaction. Thank you so much 🙏 ☺️

  • @raj4624
    @raj4624 2 роки тому +10

    oh bhai.. unbelievable... 2hrs of content......genuinely dil se shukriya sir appko....

  • @art4eigen93
    @art4eigen93 2 роки тому +11

    This playlist is necessary for basic to advanced NLP engineers. Please do upload the complete series Sir. Your contribution is life saving.

  • @ashwanibhardwaj4930
    @ashwanibhardwaj4930 2 роки тому +17

    Please carryon this series and we would like to learn advance NLP using deep learning/langauge models,sota techniques once basic NLP is done.

  • @bishowlamsal7319
    @bishowlamsal7319 2 роки тому +4

    Huge respect sir, You deserve more than million followers. Love from Nepal ❤️❤️❤️❤️

  • @naveedkaimkhami2695
    @naveedkaimkhami2695 Місяць тому

    I was confused to select word embedding technique for my fyp project and found this video life saving. Thank youu soo muchh !!!

  • @shubhamgattani5357
    @shubhamgattani5357 2 місяці тому

    I cannot find any reason not to like this video. It's amazing!

  • @MuhammadAfzal-xl7wd
    @MuhammadAfzal-xl7wd 2 місяці тому

    thank you so much. you explain the concept in a very very simple way. once again thank u so much 🙂🙂🙂🙂

  • @rachitsingh4913
    @rachitsingh4913 2 роки тому +2

    For me You are the best data science teacher. ❤❤❤❤❤

  • @siyays1868
    @siyays1868 Рік тому +1

    Thanku so much sir for a wonderful explaination. Hatts off to u always!

  • @abhinavkr5131
    @abhinavkr5131 Рік тому +1

    Bohot tutorials dekha but aap best ho sir

  • @asifpervezpolok2243
    @asifpervezpolok2243 2 роки тому

    the best tutorial i found from you.

  • @hvjmlops
    @hvjmlops 2 роки тому +1

    Respect for your hardwork

  • @piyushpathak7311
    @piyushpathak7311 2 роки тому +18

    I am following your Ml playlist sir you have great explanation, sir please complete xgboost and DBSCAN algorithm in this playlist and please start series on Deep learning..

    • @AkashBhandwalkar
      @AkashBhandwalkar 2 роки тому +1

      I'm following it as well

    • @campusx-official
      @campusx-official  2 роки тому +8

      Will do it in January

    • @AkashBhandwalkar
      @AkashBhandwalkar 2 роки тому +2

      @@campusx-official woaahhh! Thank you sooooo much! This made my day! 🥳🥳🥳

    • @749srobin
      @749srobin 2 роки тому +1

      @@campusx-official which year january sir ?

    • @debojitmandal8670
      @debojitmandal8670 Рік тому

      @@campusx-official hi sir based on your example using tri gram the vocabulary is decreasing to 5 so i dont follow your this part when u said the vocabulary increases as the n gram increases

  • @machinelearningspace6977
    @machinelearningspace6977 2 роки тому

    Teaching style awesome... Go ahead.

  • @harisumanth
    @harisumanth 2 роки тому +2

    Almost 2 hours...Respect!

  • @amitkumar2005
    @amitkumar2005 3 місяці тому

    Superb explanations !

  • @gauravverma4433
    @gauravverma4433 2 роки тому

    It was awesome .. love you sir... thanx for your efforts

  • @somyarathee
    @somyarathee 2 роки тому

    Best series on NLP

  • @mohaiminrahat4974
    @mohaiminrahat4974 2 роки тому +1

    Congratulations sir for 10K Subscribers.

  • @shivamgarg3890
    @shivamgarg3890 Рік тому

    This channel is highly underrated...

  • @bhanu0925
    @bhanu0925 2 роки тому +1

    Thank you for another great session

  • @HarshVardhan-jj9xh
    @HarshVardhan-jj9xh 5 місяців тому +1

    Thanks a lot Sir. My Phd is on NLP only .your videos helps me a lot in understanding overall concepts . Your efforts are very sincere and dedicated 💯

    • @forgotabhi
      @forgotabhi 5 місяців тому

      I am getting started with NLP :) I am still doing my UG can you tell me your experience in the field?

    • @HarshVardhan-jj9xh
      @HarshVardhan-jj9xh 5 місяців тому

      @@forgotabhi Its amazing field and day by day u will came to know new models and architectures.

    • @hritikroshanmishra3630
      @hritikroshanmishra3630 2 місяці тому

      @@forgotabhi which college?

  • @BTStechnicalchannel
    @BTStechnicalchannel Рік тому +1

    Your explanation is so great!! Vo bhi hindi me. Thanks a lot!!💙

  • @diwakargupta0
    @diwakargupta0 Рік тому

    Awesome content and explanation sir 👐

  • @shahu6015
    @shahu6015 11 місяців тому

    Congratulation for 100K subscribers in advance.

  • @shahmuhammadraditrahaman9904
    @shahmuhammadraditrahaman9904 2 роки тому +1

    Incredibile ❤️

  • @siddharth4251
    @siddharth4251 10 місяців тому

    Thank you very much Nitish sir!

  • @deeptisingh93
    @deeptisingh93 2 роки тому

    Thank you sir...Really itne easy way me smjhane ke liye

  • @basit-qx7ys
    @basit-qx7ys 29 днів тому

    i love the way sir explains, i am not able to grasp the fundamental concepts but not able to imagine myself to code for NLP without any guidance ,Any suggestions what other materials and sources I should follow ?

  • @IRFANSAMS
    @IRFANSAMS 2 роки тому

    Sir..thank you for the wonderful video

  • @Sara-fp1zw
    @Sara-fp1zw Рік тому

    Congratulations on 36K subs, soon we gonna cross 100K IA :)

  • @takeshrao733
    @takeshrao733 2 місяці тому

    Very nice and very good start point. Can you pls suggest which text representation algo suited for log analysis.

  • @gajanankhapre2425
    @gajanankhapre2425 2 роки тому

    Very good flow sir . Kindly upload next in NLP series

  • @daljeetsinghranawat6359
    @daljeetsinghranawat6359 6 місяців тому

    KUDOS TO YOU SIR ..............loving this series

  • @uditsaurabh
    @uditsaurabh 5 місяців тому

    awesome video

  • @vivekathilkar6555
    @vivekathilkar6555 2 роки тому

    Appreciate your efforts

  • @nikhiljagtap1669
    @nikhiljagtap1669 2 роки тому

    at 55:24 , BOW doesn't consider the sequence of sentence but since we gonna perform Tokenization before this, we gonna lose some words that'd mess the sequence anyway. isnt that right?

  • @rajeevranjan5007
    @rajeevranjan5007 2 роки тому

    Great Video Sir.

  • @maukaladka4100
    @maukaladka4100 Рік тому +1

    Hello sir, I have had doubt on this topic how conversion is taking place, watch lots of video read lot's of blogs but no one can make me understand like u did. Hat's off to u keep up the great work.

  • @learnfromIITguy
    @learnfromIITguy 11 місяців тому

    wow , after watching this video, I am confident on feature engineering

  • @ranjithkumar947
    @ranjithkumar947 4 місяці тому

    for tf idf, campusx term came 4 times but sir you considered it only thrice any reason for it? Anyway there we are getting +1 in realtime. Could you please reply me for this?

  • @SatyaIITI
    @SatyaIITI Рік тому

    Hi Nitis sir, where can we get these notes in pdf format.so that it will be helpful while doing revision.

  • @vaibhavmoharkar2349
    @vaibhavmoharkar2349 5 місяців тому

    THANKYOU SIR

  • @ronylpatil
    @ronylpatil 2 роки тому +6

    Many Many Congratulations to you Sir for 10k Subs🥳🥳🥳

  • @mehulsuthar7554
    @mehulsuthar7554 Місяць тому

    i have one doubt can we normalize the vector engineering features? I think normalizing the vector will still contain the info that was previously their but in the lower scale for reducing computation. let me know if this is the correct approach

  • @hitinyadav3321
    @hitinyadav3321 2 роки тому

    Amazing video

  • @balrajprajesh6473
    @balrajprajesh6473 Рік тому

    2 hours of pure diamond mine.

  • @gauravlochab9614
    @gauravlochab9614 Рік тому +4

    Can you add RNN, LSTMs, and modern NLP using transformers!?
    Loved the content. Huge respect.
    Ps banjara market ka lamp! XD

  • @gautampatadiya6096
    @gautampatadiya6096 3 місяці тому

    well done buddy #nlp #nlptuts #nlpeasytuts

  • @richaaggarwal07
    @richaaggarwal07 2 роки тому +1

    Please make more videos on NLP !!!

  • @HimanshuSharma-we5li
    @HimanshuSharma-we5li 2 роки тому

    You are a 💎.

  • @ronylpatil
    @ronylpatil 2 роки тому +1

    Sir NLP series is really amazing, please recommend me best book for NLP because in few days I have an interview which will totally on NLP.

  • @ayushroy6208
    @ayushroy6208 2 роки тому

    Sir suppose length of sentences are unequal..... Tab kya padding ke alava aur koi option nahi hai in case of Tfidf Or ngrams etc?

  • @230489shraddha
    @230489shraddha 2 роки тому

    Thanks a lot sir .... Can you also upload a video on RNN & LSTM.

  • @shaiksalavuddin5976
    @shaiksalavuddin5976 2 роки тому

    Thank you🌹

  • @sachi-4750
    @sachi-4750 2 роки тому

    Thank you so much sir😊🙏

  • @avinashbhardwaz5717
    @avinashbhardwaz5717 7 місяців тому

    Sir , i dont understand for idea of tf idf at 1:20:09.
    Since you said jo word document mein jyada hain but corpua mein kam hain.
    I confused in that way that how its possible.
    Since corpus mein to hoga hi hoga jyaga or equal.kindly clarify sir.

  • @chauhanabhishek9593
    @chauhanabhishek9593 2 роки тому

    Thank u sir .

  • @bananamaker4877
    @bananamaker4877 9 місяців тому

    Liked and shared your video.
    Subscribed your channel.
    What else can I do for you. You are doing a great job.

  • @solvinglife6658
    @solvinglife6658 Рік тому

    Sir please continue the playlist!!!!!

  • @GhostRider....
    @GhostRider.... Рік тому

    very nice explanation sir

  • @avishinde2929
    @avishinde2929 Рік тому +1

    thank you sir ji

  • @Howto-ty4ru
    @Howto-ty4ru Рік тому

    cv.fit_transform(df['eng'])
    How can we apply fit_transform on text? I think I do not understand this part

  • @saumyakumari3441
    @saumyakumari3441 2 роки тому

    Many many congratulations for 10k sub. 🎊🎊🎊

  • @749srobin
    @749srobin 2 роки тому

    sir ji , removing stopwords took 3hours 26 min , tokenization karne mein ghabraahat c ho rhi hai

  • @manavahuja4418
    @manavahuja4418 2 роки тому

    Sir will you make a video for nlp project....something good for resume..?

  • @technicalhouse9820
    @technicalhouse9820 17 днів тому

    maza aya sir qasam sa

  • @Sara-fp1zw
    @Sara-fp1zw Рік тому

    hi nitish sir, im faceing some problem with spell checker function
    def spell_correct(text):
    return TextBlob(text).correct().string
    it is taking so much on assignment dataframe, is there any fastest approach to check and correct spelling in log(n) times ?

    • @sidindian1982
      @sidindian1982 Рік тому

      Run the file in google collab ,., because of GPU ... runs faster ...

  • @hari8568
    @hari8568 4 місяці тому

    The example you gave for bigrams better than uni gram being able to differentiate the 2 sentences in vector space doesn't really make much sense to me, suppose instead of not I used a synonym of "very "instead like "extremely " then these 2 sentences should be similar in vector space but bigram model will say its different, so its actually not handling the word not rather just handling an unknown word differently

  • @gautamkushwaha8724
    @gautamkushwaha8724 8 місяців тому

    why don't you keep the resource in the description, like the code link..

  • @user-dd3te4rh8j
    @user-dd3te4rh8j 11 місяців тому

    Feature extraction from text / text representation/ text vectorization - changing text to numbers so that model can understand
    Bag of words -

  • @tusarmundhra5560
    @tusarmundhra5560 8 місяців тому

    awesome

  • @mihirnaik3383
    @mihirnaik3383 2 роки тому +4

    Hi Buddy, Great content! This video cleared all my doubts regarding BoW and TF IDF🙌
    Are you going to take any NLP projects in future based on Machine Learning models?

  • @jai40403
    @jai40403 5 місяців тому

    Where can I get these notes ?

  • @ananyakumari6807
    @ananyakumari6807 2 роки тому

    Sir, can you please share your code notebook?

  • @whothefisyash
    @whothefisyash 8 днів тому

    fr maza aagya ekdm

  • @ridoychandraray2413
    @ridoychandraray2413 Рік тому

    Thank you sir?

  • @furry2fun
    @furry2fun 10 місяців тому

    share the link for collab notebook

  • @joyeetamallik5063
    @joyeetamallik5063 2 роки тому +1

    Thank you so much for such wonderful vedio. Sir Do u take any online classes as well?

  • @datagyan5489
    @datagyan5489 2 роки тому

    How to join Mentorship program

  • @forgotabhi
    @forgotabhi 5 місяців тому

    when i perform bagofwords method like the video in kaggle notebook on the imdb data it says memory exceeded and just restarts the notebook :( what to do?

    • @user-qq7qi5kk5u
      @user-qq7qi5kk5u Місяць тому

      same issue i tried in my machine but it said memory exceeded it need 18.1Gib after applying ohe

    • @forgotabhi
      @forgotabhi Місяць тому

      @@user-qq7qi5kk5u guess we're poor lol

  • @yashgaming827
    @yashgaming827 Рік тому

    sir please share the one note link

  • @vijayraghuwanshi4486
    @vijayraghuwanshi4486 10 місяців тому

    I have tried the assignment on kaggle if any one tried and want to discuss please let me know.

  • @yashjain6372
    @yashjain6372 Рік тому

    best

  • @sidindian1982
    @sidindian1982 Рік тому +1

    1:23:40 - Campusx - word in IDF is repeated 4 times sir , .. Loge( 4/4) = 0

  • @rushikeshmalpe3715
    @rushikeshmalpe3715 2 роки тому

    Deep learning start Karo sir please 👍👍👍❤️

  • @backclover9651
    @backclover9651 2 роки тому

    Bag of words minuets?

  • @rafibasha4145
    @rafibasha4145 2 роки тому +2

    Please complete NLP,Interview series and ML series

  • @IRFANSAMS
    @IRFANSAMS 2 роки тому

    Please teach us BERT ALGORITHM

  • @nabinadhikari5426
    @nabinadhikari5426 Рік тому

    Please share this notebook source file to us !

  • @abdullahilawal3220
    @abdullahilawal3220 9 місяців тому

    You teaching method is good but you making it local only to Indian student not International for all to use.
    Please Make a new version of all your videos on NLP to English so everyone can learn from,🙏

  • @MrKB_SSJ2
    @MrKB_SSJ2 Рік тому

    23:00

  • @MrKB_SSJ2
    @MrKB_SSJ2 Рік тому

    1:38:48

  • @nikhiltiwari1616
    @nikhiltiwari1616 Рік тому

    Sir, please share the lectures python notebook file/

  • @nikeshmali8506
    @nikeshmali8506 4 місяці тому

    how can i get OneNote notes

  • @user-iv5fr9mr2n
    @user-iv5fr9mr2n 10 місяців тому

    54:00

  • @kislaykrishna8918
    @kislaykrishna8918 2 роки тому

    Sir, my question is:
    I have list of entities and a text.Like this:
    List=["Data Scientist", "Bihar", "Krishna"]
    Text=" I am Krishna. I am from Bihar . I want to be a Data Scientist"
    I want result like:
    "I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity] . I want to be a [Entity]Data Scientist[Entity]"
    Please help me with code to get this result.Thanx🙏

    • @priyaravind18
      @priyaravind18 2 роки тому

      Did you get the code?

    • @kislaykrishna8918
      @kislaykrishna8918 2 роки тому

      @@priyaravind18 List=["Data Scientist", "Bihar", "Krishna"]
      text = ' I am Krishna. I am from Bihar. I want to be a Data Scientist'
      for entity in List:
      if entity in List:
      text = text.replace(entity,'[Entity]'+entity+'[Entity]')
      print(text)

  • @MrKB_SSJ2
    @MrKB_SSJ2 Рік тому

    40:34

  • @SulemanKhan-nk4lc
    @SulemanKhan-nk4lc 2 роки тому

    Sir please recommend some ML books