Fighting Spam on YouTube with TensorFlow & Python

Поділитися
Вставка
  • Опубліковано 24 чер 2021
  • I'm sick of crypto-related spam comments on UA-cam, so I trained a machine learning model to delete them! A script runs periodically and uses the text classifier to filter the latest comments on my videos.
    The filter is surprisingly effective, even though the training dataset is relatively small. I'll keep expanding the dataset and retrain the classifier so it becomes more accurate overtime.
    💌 Sign up for Simply Explained Newsletter:
    newsletter.simplyexplained.com
    Monthly newsletter with cool stuff I found on the internet (related to science, technology, biology, and other nerdy things)! No spam. Ever. Promise!
    🌍 Social
    Twitter: / savjee
    Facebook: / savjee
    Blog: savjee.be
    ❤️ Become a Simply Explained member: / @simplyexplained
    👩‍💻 Source code:
    Available on GitHub:
    github.com/Savjee/yt-spam-cla...
    ❓❓ Frequently asked question:
    ❓ Why do I still see spam comments on your channel?
    First, not all comments are caught by the AI and still require manual intervention. Secondly, the script runs on a fixed interval. Give it some time to run. And thirdly, it only filters recent comments. I will let the classifier clean up the old comments as well.
  • Наука та технологія

КОМЕНТАРІ • 148

  • @bahadir2198
    @bahadir2198 2 роки тому +20

    So scared of this government issue on always banning crypto.

    • @harrywilson3206
      @harrywilson3206 2 роки тому

      Same. But have no fear. When they did early this year I was still earning profit. Because I have a licensed and legitimate USA broker who trades for me.

    • @christysmith5458
      @christysmith5458 2 роки тому

      Tell me about it please.

    • @mollynobles5833
      @mollynobles5833 2 роки тому

      How do you get the money he trades for you.

    • @kerienstones3679
      @kerienstones3679 2 роки тому

      What’s his name. And how do I know his legit

    • @bahadir2198
      @bahadir2198 2 роки тому

      @UCeCCBPRZz_fAdI9EX_XlDQw ID@R a m i r e z o s p i n a

  • @QuantumWalnut
    @QuantumWalnut 2 роки тому +47

    The fact that machine learning has become so DIY makes me very hopeful for its applications. The cheaper and more accessible it becomes, the more democratic this technology can be.

    • @cybr774
      @cybr774 2 роки тому +4

      lol "democratic"

    • @TheALPHA1550
      @TheALPHA1550 2 роки тому

      @@cybr774 Seriously.

    • @zoc2
      @zoc2 2 роки тому +1

      This video makes me want to get into it myself. I think I'm going to dedicate tomorrow to machine learning!

  • @zyansheep
    @zyansheep 2 роки тому +41

    UA-cam has no incentive to fix this. We wouldn't even need a spam filter if they just used a better type of comment system (like the one reddit uses)
    Edit: python is pretty awesome...

    • @simplyexplained
      @simplyexplained  2 роки тому +15

      Well, I could think of a few simple measures. Require people to validate a phone number, limit the amount of comments you can post in a day, and indeed, allow everyone to vote and moderate like Reddit.
      Oh yeah Python is great! Just started using it, and I'm amazed at how easy it is to learn, to use, and how many great libraries there are.

    • @onatkorucu842
      @onatkorucu842 2 роки тому +1

      Why do you think that reddit is better?

    • @isableye7164
      @isableye7164 2 роки тому +6

      @@onatkorucu842lol bcoz unlike youtube their downvote button actually works.

    • @dragonhold4
      @dragonhold4 2 роки тому +4

      In every Reddit thread there are many legitimate comments that tell the truth but may be inconvenient so they are bombed by a mob and unfairly set to hidden, removed, or put to the very bottom of the infinity scroll. This system is far from ideal.

    • @supertron6039
      @supertron6039 2 роки тому

      @@isableye7164 lmao

  • @evergreen-
    @evergreen- 2 роки тому +12

    Simply Explained: So far I haven’t noticed false positives with this AI
    Comments thread under this video: *empty*

  • @theguyordie
    @theguyordie 2 роки тому +17

    Add some Oauth support, swap out Google sheets with an SQL database, add a simple dashboard and you've got yourself a really neat PaaS product that could do really well!

    • @JohnyK07
      @JohnyK07 2 роки тому +1

      yeah, I can see that Sheets file getting filled pretty quickly... (each one has a limit of 5 million cells...yup, cells, not rows).
      Google also has BigQuery, if he wants to keep it in the cloud and benefit from Sheets integration, but any sql database would do nicely.

  • @Tredecillionscience
    @Tredecillionscience 2 роки тому +20

    The explanation is so smooth and easy to understand! Really appreciate the effort you made ^^

  • @Pix2io
    @Pix2io Рік тому +1

    it would be cool if UA-cam themselves starting selling spam removal devices snd maybe cameras and gaming gear for UA-camrs.

  • @supertron6039
    @supertron6039 2 роки тому +2

    Now imagine if YT chose to make similar text identifiers in their thousands of servers to clean up all of YT from trolls, porn adverts and fake website links. _Like that's ever gonna happen._

  • @YoushaAhmad
    @YoushaAhmad 2 роки тому +1

    This impressive. Well done and thank you! I have been disappointed with UA-cam for not sorting this out considering Alphabet has some highly capable AI projects, which they often show off in press releases. If they don't get on top of this perhaps you could continue to work on this an license it to other channels to use themselves, or open source it with donations/ a freemium option.
    I used to flag more comments as spam, but thought UA-cam was going to do something years ago. It is also annoying when large channels like Bloomberg leave spam up, whilst it isn't their fault they have the means to help get rid of it themselves.

  • @simonalogiudice7581
    @simonalogiudice7581 2 роки тому +1

    Amazing! You explain things very clearly, good job!

  • @adarshkumar3518
    @adarshkumar3518 2 роки тому +21

    Lol. UA-cam needs this 😂

  • @selvamselvam3670
    @selvamselvam3670 Рік тому

    you always explain complicated topics in a simple way.. love your way of explaining.. you got a new subscriber

  • @vamshi4956
    @vamshi4956 2 роки тому

    Wow. I was thinking about the spam and I found your video. I now have some idea on where to start even though I know nothing but Java. I'll update if my make any progress. Thank you and great content. Cheers.

  • @chaos_monster
    @chaos_monster 2 роки тому

    I understand your need to run it on a server, but I am also really happy to see CoreML running on the devices only - that makes me feel a little bit less paranoia :D

  • @brombeerbert2768
    @brombeerbert2768 Рік тому

    Nice Channel! Thank you for all the education.
    I just had to let you know that i randomly ended up on your Channel after clickíng an ad on The Million Dollar Website.
    Had me laughing :D

  • @benjaminkirbytennyson386
    @benjaminkirbytennyson386 2 роки тому

    Thanks for sharing this video, now I can show this to my engineer friend to overcome your spam filter.

  • @MyBizTT
    @MyBizTT 2 роки тому +1

    Excellent video! I'm guessing Naive Bayes Classifier?

  • @varunahlawat9013
    @varunahlawat9013 Рік тому

    "Bring it on"
    LMAO!
    Really appreciate this video!

  • @devadevans700
    @devadevans700 2 роки тому

    Hey long time no see, your content is really good, patience and consistency is important

  • @polavenki
    @polavenki 2 роки тому +1

    Was wondering about your thoughts on using the pre-trained zero shot models like GPT for this use case?

  • @reikiorgone
    @reikiorgone Рік тому +1

    Testing organically that by retraining the model this comment is an organic test of Dogecoin xlm UA-cam also I love you this was a great video

  • @robertb7003
    @robertb7003 2 роки тому +2

    That's totally awesome. Shows the power of using APIs. You could definitely make some money on this even if youtube didnt hire you. Your server looks like it could handle other youtuber's channels as well. Thanks for the videos

    • @simplyexplained
      @simplyexplained  2 роки тому +2

      I might consider doing that if I get some requests from channels. Haha, the server appears to be very fancy, but it's actually a very old one. I removed all the hardware and put in a low-power CPU. It's mainly used for backups and home automation.

    • @2DReanimation
      @2DReanimation 2 роки тому

      @@simplyexplained You could write spam bots to get all the big youtubers to buy your product XD

  • @anuragdhondge9579
    @anuragdhondge9579 2 роки тому +6

    This is a spam comment. Deal with it algorithm.

    • @nethoncho
      @nethoncho 2 роки тому +1

      LOL

    • @simplyexplained
      @simplyexplained  2 роки тому +8

      Algorithm says: 8,5% chance of being spam. Try harder ;)

    • @anuragdhondge9579
      @anuragdhondge9579 2 роки тому +3

      @@simplyexplained 😂😂
      Love your channel, keep up the work..thanks for the reply

  • @mr_vinod_123
    @mr_vinod_123 2 роки тому +1

    Amazing explanation. 👏👏

  • @prikshitparashar8950
    @prikshitparashar8950 2 роки тому

    Amazing work !

  • @beloved3244
    @beloved3244 2 роки тому +1

    Dude this is awesome!!!

  • @ffcml1733
    @ffcml1733 2 роки тому

    From where you studied all these programming and other stuffs

  • @ahmad_dos5563
    @ahmad_dos5563 2 роки тому

    That’s fascinating bro

  • @jamesseddon1637
    @jamesseddon1637 2 роки тому

    @savjee Man great video, I've been writing a small script to scrape comments and detect deleted comments, I've been working with Panda and CSV and it's an absolute nightmare, especially as you say when trying to constantly append data to the CSV and read it back in a loop. Mega thanks for the source code, I'm going to implement a similar approach using Google Sheets and see how that goes. Out of curiosity, have you hit any limits with the UA-cam API?

    • @jamesseddon1637
      @jamesseddon1637 2 роки тому

      I noticed a small bug, line 76 where you rest allComs you use use the wrong variable name (allComms instead of allComs).
      "# Reset list before we continue
      allComms = []"

    • @jamesseddon1637
      @jamesseddon1637 2 роки тому

      Scratch that I don't think that line is even needed.

    • @simplyexplained
      @simplyexplained  2 роки тому

      Yeah, CSV files are a mess. Also, the UA-cam Data API isn't very easy to started with.
      As for the limits: I did request a quota increase and got it very quickly. However, I don't really need it. This channel doesn't get that many comments.
      Thank you for spotting that! I removed the allComms line because it was indeed unnecessary.

  • @Kim-by5uy
    @Kim-by5uy 2 роки тому

    Can we all take a momment to appreciate the great and easy-to-follow explanation

  • @reold
    @reold 2 роки тому

    You could also use a python any where or heroku server if you need the home server back

    • @simplyexplained
      @simplyexplained  2 роки тому +1

      True, I thought about going that route. But my home server is running Proxmox. Plenty of space for VM's and containers like this ;)

  • @HesderOleh
    @HesderOleh 2 роки тому

    ThioJoe just made a script that requires you to name the spammer, while looking for training data to see if I could automate it, I see you have already done this!

  • @thesultan1212
    @thesultan1212 2 роки тому +1

    Dude this is amazing!

  • @PawirodinomoM
    @PawirodinomoM 2 роки тому +1

    Amazing!

  • @AamishSohailRamay
    @AamishSohailRamay Рік тому +1

    which software you use to make these attractive videos?

  • @aayushgore4245
    @aayushgore4245 Рік тому +1

    Hotdog NOT HOTdog!! 🤣🤣

  • @UDKO2
    @UDKO2 2 роки тому

    How do you make it run every hour ?

  • @crashia
    @crashia 2 роки тому

    When are new videos coming? I've just discovered this channel and I'm in love 💓

  • @nurtorekelesov4286
    @nurtorekelesov4286 Рік тому +1

    this was amazing

  • @sneu420
    @sneu420 2 роки тому

    Hey hey 0:26, that's me on one of your videos...!

  • @gosper420tyvs
    @gosper420tyvs Рік тому +1

    You are awesomeness bro !!! Thank you for sharing !!! 🔥😎🔥😎🫶🏼👌🏼

  • @derickrcruz
    @derickrcruz 2 роки тому +1

    0:47 dammit Jian-Yang

  • @machashanker6407
    @machashanker6407 2 роки тому

    Can we Know Which software using for Animation

  • @hem89180
    @hem89180 2 роки тому

    Well done.

  • @ryugadebo
    @ryugadebo Рік тому

    Loved the video

  • @justhere9549
    @justhere9549 2 роки тому

    Will u ever come back?

  • @tuna1270
    @tuna1270 2 роки тому

    Hope you can do a step by step tutorial for this!!! very cool.😎😎

    • @cyber3808
      @cyber3808 2 роки тому

      THANK YOU FOR WATCHING FOR CRYPTO GUIDANCE SEND MSG RIGHT AWAY WHAT'SAPP

    • @cyber3808
      @cyber3808 2 роки тому

      What'sApp✚447459667378

    • @sinankoa824
      @sinankoa824 2 роки тому

      @@cyber3808 aint no way

  • @tahatatakorshow5396
    @tahatatakorshow5396 2 роки тому

    Xavier.
    Did YT hired you?
    You are amazing.

    • @simplyexplained
      @simplyexplained  2 роки тому +1

      No they didn't! But my anti-spam bot is still going strong ;)

    • @tahatatakorshow5396
      @tahatatakorshow5396 2 роки тому

      @@simplyexplained well well all know how amazing you are Xavier.
      I hope you get a better position and maybe hire me one day 😁

  • @NewMateo
    @NewMateo 2 роки тому +4

    Man I gotta learn Python.

    • @simplyexplained
      @simplyexplained  2 роки тому +2

      I just started learning it, and I'm loving it so far!

  • @pravallikadamerla9835
    @pravallikadamerla9835 2 місяці тому

    Can you please share source code

  • @nixonlauture7337
    @nixonlauture7337 2 роки тому +2

    Can you add a step-by-step for this?

    • @simplyexplained
      @simplyexplained  2 роки тому +3

      The source code is on GitHub. I think the Jupyter notebook is easy to follow, but I might do a tutorial video on it. No promises though ;)

  • @EsterMelati
    @EsterMelati 2 роки тому

    I rly wanna try tensorflow :(

  • @blackmennewstyle
    @blackmennewstyle 2 роки тому +5

    UA-cam actually has no interest to remove these spams since i'm pretty sure ironically, they are also probably involved in huge amount of ads campaigns, very lucrative during the human malware pandemic outbreak ;)
    Let's see if your spam filter detects me as a spam :p
    Have a great weekend my brother and keep it up the great job

    • @simplyexplained
      @simplyexplained  2 роки тому +3

      It says there's a 0.3% chance that your comment is spam. You're safe ;)
      Have a nice weekend as well!

  • @md.najmulhasan8774
    @md.najmulhasan8774 Рік тому

    wow that is amazing :)

  • @Silverdev2482
    @Silverdev2482 2 роки тому

    i tested the filter with a fake spam comment and it works

  • @cloudtech0903
    @cloudtech0903 4 місяці тому

    Share the code

  • @silvernaturemusic599
    @silvernaturemusic599 2 роки тому +1

    No spam in the comment box proves it right.

  • @YuzuruA
    @YuzuruA 2 роки тому

    The fact that google can´t emulate a simple DIY solution created by a loner youtuber speaks volumes about their commitment.

  • @ghipsandrew
    @ghipsandrew 2 роки тому

    Maybe the spammers will train an adversarial network to engineer their comments so as to trick your model :O

  • @FrancisGauthier2
    @FrancisGauthier2 2 роки тому

    I wonder if be Bayesian filter algorithm is now outdated by AI

  • @yyjj7934
    @yyjj7934 2 роки тому

    Hello sir, how can I contact you?

  • @mariosasic4251
    @mariosasic4251 2 роки тому +1

    nice video, good jov bro :D

  • @2DReanimation
    @2DReanimation 2 роки тому

    Wouldn't it have been better if all the manually not classified comments would have a value of 0.5 for "could be spam or not"?
    2:00: oh, you removed the non-tagged comments. That makes sense.

  • @palabinash
    @palabinash 2 роки тому

    Nice

  • @zaurmustafayev7248
    @zaurmustafayev7248 2 роки тому

    Love your idea, would you mind to share the source code with me? :) Happy to hear your feedback

    • @simplyexplained
      @simplyexplained  2 роки тому +1

      Sure! I mentioned it at the end of the video. Source code is on GitHub, link in the description.

  • @amanda188
    @amanda188 2 роки тому

    Disculpe, he visto su canal en UA-cam. Estoy muy interesado. Si está interesado en una asociación empresarial, podemos hablar de los detalles.

  • @REAnyAJ
    @REAnyAJ 2 роки тому

    Badass

  • @freebie808
    @freebie808 2 роки тому

    Cool

  • @dizaj
    @dizaj 2 роки тому

    👍👍👍

  • @miguelbertonatti
    @miguelbertonatti 2 роки тому

    👌🏼

  • @CuinnHerrick
    @CuinnHerrick 2 роки тому

    Let's give it a go...
    Get rich quick now. $$$$ Not spam. True wealth creation. 😋

  • @Lord0x
    @Lord0x 2 роки тому

    amazing video.

  • @siddarthgurram5023
    @siddarthgurram5023 2 роки тому

    My 🧠 : go spam a comment and check if it gets reported as spam ~~he said more the data better the prediction let's help him~~

  • @grindererrofficial3755
    @grindererrofficial3755 2 роки тому

    Is he is alive or died ? :( 11months been silent :(

  • @alymuni
    @alymuni 2 роки тому +1

    This is a test to see if my comment gets deleated :D just for fun, anyway still a good video.
    SPAM SPAM find me

    • @simplyexplained
      @simplyexplained  2 роки тому +2

      Nope, algorithm says only 2% chance of being spam ;)

    • @alymuni
      @alymuni 2 роки тому +1

      @@simplyexplained aa ok xD thank for letting me kbow

  • @3F34N1M4T3S
    @3F34N1M4T3S 2 роки тому

    Came here from million dollar homepage

  • @lucagiovanni658
    @lucagiovanni658 Рік тому +7

    Great video!!! Very engaging...
    With everything going on right now, the best decision is having a profitable investment strategy. Stocks are good but crypto is better.

  • @RickyAraujoOficial
    @RickyAraujoOficial 2 роки тому

    COPYRIGHT REMOVAL APPEAL
    Hi Xavier, how are you? My name is Ricky Araujo. You reported the video I posted on UA-cam for violating rights to your video, I understand you have the right to do so, but I humbly apologize for that.
    I'm a fan of your content that's why I subscribed to your channel, at the time I watched your video and I thought it was so rich that I didn't think twice about wanting to copy it, but I'm here begging you for a venomous apology and I ask you to remove your information, as this radically gets in the way. my growth here in Brazil.
    Also, I can post a video apologizing and put your channel in my video description.
    Don't worry about it anymore, it will never happen again. I just ask that you please withdraw a complaint.
    Att. Ricky Araújo

  • @Andy11876
    @Andy11876 2 роки тому

    Hi I have something important to tell you

  • @RealSweveel
    @RealSweveel 2 роки тому

    A

  • @spiritbears
    @spiritbears 2 роки тому +1

    Hey spam filter don't delete my comment its not a spam😂

  • @sitbackandrelax2482
    @sitbackandrelax2482 2 роки тому

    i am a spam

  • @nethoncho
    @nethoncho 2 роки тому +1

    This comment may be spam...

  • @ratgreen
    @ratgreen 2 роки тому +1

    Great stuff, I hope yt actually does something. I must admit some of the comments are very legit, like the first 10 or so comments will look like a pretty normal conversation, perhaps a bit scripted but the actual bait will be many comments below, with a pretty legit setup, ie, oh 'I wish I had known how to trade' 'I too didnt know the tricks of trading until I was introduced to Dr Sue Bateman who taught me' 'oh do you have contact details' 'oh, yes you can contact her on WA on 0000000000'
    So flagging the entire thread of spam, probably looks like people abusing the report feature to youtube, as they read like legit comments. Its only when you read the entire thread, which I assume ML wont pick up on, that it becomes spammy.
    Also I assumed they are bots, but I've actually seen some of them reply to real comments. Which was odd.
    Lets see how your filter does with that ^ too ha

    • @simplyexplained
      @simplyexplained  2 роки тому +1

      You're 100% correct. Comments like "I wish I had known how to trade" are tricky. By themselves, they're not spam. But the replies it gets are. So I trained the model exactly like this. As soon as someone mentions another person to help them, it's spam.
      My filter goes through top-level comments as well as replies and processes them individually. So a top level comment "I wish I had known how to trade" might be left alone, while the replies might get removed.
      Anyway, I'll tweak the script as time goes on. But so far it seems to do quite well. Fingers crossed!

  • @needabettername1559
    @needabettername1559 2 роки тому

    Profits money love xavier bitcoin test this is a test simply explained