NLP Tutorial in Python - Spam Classification

Поділитися
Вставка
  • Опубліковано 8 вер 2024
  • In this video we implement an email spam classifier using NLTK (natural language processing toolkit) in Python. We use the bag of words (BOW) approach to building the model, after performing tokenizing, lemmatization / stemming, and removing stop words. You'll learn a ton about NLP in just 20 minutes!
    Link to the Colab notebook: colab.research...
    Thank you for watching the video! You can learn data science FASTER at mlnow.ai!
    Master Python at mlnow.ai/cours...!
    Learn SQL & Relational Databases at mlnow.ai/cours...!
    Learn NumPy, Pandas, and Python for Data Science at mlnow.ai/cours...!
    Become a Machine Learning Expert at mlnow.ai/cours...!
    Don't forget to subscribe if you enjoyed the video :D

КОМЕНТАРІ • 32

  • @GregHogg
    @GregHogg  Рік тому

    Take my courses at mlnow.ai/!

  • @duchahapsari7081
    @duchahapsari7081 2 роки тому +2

    Wow! This is an amazing tutorial. Learn so much about fundamental NLP, while keeping it so practical to follow. Looking for more crisp contents from you, Greg!

    • @GregHogg
      @GregHogg  2 роки тому

      Really glad to hear that Ducha!

  • @ashathotan
    @ashathotan 11 місяців тому

    I enjoyed watching your illustration on the email spam.

  • @DarkTobias7
    @DarkTobias7 2 роки тому +2

    Amazing video, please do more NLP projects

    • @GregHogg
      @GregHogg  2 роки тому +1

      Thanks so much, and will do!

  • @Brocollipy
    @Brocollipy 2 роки тому

    Great video , just what I needed. Need to test out some models quickly and really can't do another 3hr course!!

  • @arsheyajain7055
    @arsheyajain7055 2 роки тому +1

    This is awesome 👏

  • @gustavojuantorena
    @gustavojuantorena 2 роки тому +1

    Great video!

    • @GregHogg
      @GregHogg  2 роки тому

      Thanks so much Gustavo!

  • @akshitadixit_1068
    @akshitadixit_1068 Рік тому +1

    Thanks Greg for this amazing video, however i have a doubt, how are we determining that the tokens with the maximum frequency are the ones contributing to the spamy nature of the message, their is a possibility that tokens with a low frequency are equally malicious.
    Thanks!

    • @GregHogg
      @GregHogg  Рік тому

      Yes that's very true, you could absolutely change how I did things

  • @jspetrolina
    @jspetrolina Рік тому

    Hi Greg! Old but gold, fabulous video, let me ask you, is that structure that you build what packages like spacy is doing behind the scenes? Thanks again

    • @GregHogg
      @GregHogg  Рік тому

      Thank you :) and what structure sorry?

  • @iqrarkhan8129
    @iqrarkhan8129 2 роки тому

    thanks that was quite helpful bus can you also please do a malware detection and classification using machine learning algo? if yes please do and upload as soon as possible

    • @GregHogg
      @GregHogg  2 роки тому

      You're very welcome! Probably eventually, but won't be able to do that for awhile sorry.

  • @mikekertser5384
    @mikekertser5384 2 роки тому +2

    Thank you! Can you please make a video with the review of the word embeddings models and corresponding transfer learning examples?
    And some nlp feature engineering as well. :)

    • @GregHogg
      @GregHogg  2 роки тому +1

      You're very welcome - and this is in the works :)

  • @InfernalPasquale
    @InfernalPasquale Рік тому

    12:00 Why is features = set() converted to a list, rather than just being a list to begin with?

    • @rahulnayak8866
      @rahulnayak8866 Рік тому

      using a list instead of set will result in getting redundant values, so in order to get unique values set() is used which is then converted into a list.

  • @e_hossam96
    @e_hossam96 2 роки тому

    This is great

    • @GregHogg
      @GregHogg  2 роки тому

      Thanks so much Hossam!! :)

  • @writabratadey8048
    @writabratadey8048 Рік тому

    nltk.download() is not working. showing WinError 10060 everytime ie. connection attempt failed...plz provide a solution

    • @GregHogg
      @GregHogg  Рік тому

      Probably slow internet unfortunately

  • @MyStockz
    @MyStockz 2 роки тому

    Hi Greg Hogg! Hope all is well! Have you tried or heard of a website called logikbot? If yes, what do you think of it?

  • @tareq8109
    @tareq8109 2 роки тому

    Want NLP series

  • @dannyrodin1151
    @dannyrodin1151 2 роки тому

    I enjoy your videos, but this one's way too fast. I'll need to watch it 3 times with 0.5 speed.

    • @GregHogg
      @GregHogg  2 роки тому

      Hmm, I appreciate the feedback here.