Fighting Spam on YouTube with TensorFlow & Python
Вставка
- Опубліковано 24 чер 2021
- I'm sick of crypto-related spam comments on UA-cam, so I trained a machine learning model to delete them! A script runs periodically and uses the text classifier to filter the latest comments on my videos.
The filter is surprisingly effective, even though the training dataset is relatively small. I'll keep expanding the dataset and retrain the classifier so it becomes more accurate overtime.
💌 Sign up for Simply Explained Newsletter:
newsletter.simplyexplained.com
Monthly newsletter with cool stuff I found on the internet (related to science, technology, biology, and other nerdy things)! No spam. Ever. Promise!
🌍 Social
Twitter: / savjee
Facebook: / savjee
Blog: savjee.be
❤️ Become a Simply Explained member: / @simplyexplained
👩💻 Source code:
Available on GitHub:
github.com/Savjee/yt-spam-cla...
❓❓ Frequently asked question:
❓ Why do I still see spam comments on your channel?
First, not all comments are caught by the AI and still require manual intervention. Secondly, the script runs on a fixed interval. Give it some time to run. And thirdly, it only filters recent comments. I will let the classifier clean up the old comments as well. - Наука та технологія
So scared of this government issue on always banning crypto.
Same. But have no fear. When they did early this year I was still earning profit. Because I have a licensed and legitimate USA broker who trades for me.
Tell me about it please.
How do you get the money he trades for you.
What’s his name. And how do I know his legit
@UCeCCBPRZz_fAdI9EX_XlDQw ID@R a m i r e z o s p i n a
The fact that machine learning has become so DIY makes me very hopeful for its applications. The cheaper and more accessible it becomes, the more democratic this technology can be.
lol "democratic"
@@cybr774 Seriously.
This video makes me want to get into it myself. I think I'm going to dedicate tomorrow to machine learning!
UA-cam has no incentive to fix this. We wouldn't even need a spam filter if they just used a better type of comment system (like the one reddit uses)
Edit: python is pretty awesome...
Well, I could think of a few simple measures. Require people to validate a phone number, limit the amount of comments you can post in a day, and indeed, allow everyone to vote and moderate like Reddit.
Oh yeah Python is great! Just started using it, and I'm amazed at how easy it is to learn, to use, and how many great libraries there are.
Why do you think that reddit is better?
@@onatkorucu842lol bcoz unlike youtube their downvote button actually works.
In every Reddit thread there are many legitimate comments that tell the truth but may be inconvenient so they are bombed by a mob and unfairly set to hidden, removed, or put to the very bottom of the infinity scroll. This system is far from ideal.
@@isableye7164 lmao
Simply Explained: So far I haven’t noticed false positives with this AI
Comments thread under this video: *empty*
Right 😂 haha
Add some Oauth support, swap out Google sheets with an SQL database, add a simple dashboard and you've got yourself a really neat PaaS product that could do really well!
yeah, I can see that Sheets file getting filled pretty quickly... (each one has a limit of 5 million cells...yup, cells, not rows).
Google also has BigQuery, if he wants to keep it in the cloud and benefit from Sheets integration, but any sql database would do nicely.
The explanation is so smooth and easy to understand! Really appreciate the effort you made ^^
it would be cool if UA-cam themselves starting selling spam removal devices snd maybe cameras and gaming gear for UA-camrs.
Now imagine if YT chose to make similar text identifiers in their thousands of servers to clean up all of YT from trolls, porn adverts and fake website links. _Like that's ever gonna happen._
This impressive. Well done and thank you! I have been disappointed with UA-cam for not sorting this out considering Alphabet has some highly capable AI projects, which they often show off in press releases. If they don't get on top of this perhaps you could continue to work on this an license it to other channels to use themselves, or open source it with donations/ a freemium option.
I used to flag more comments as spam, but thought UA-cam was going to do something years ago. It is also annoying when large channels like Bloomberg leave spam up, whilst it isn't their fault they have the means to help get rid of it themselves.
Amazing! You explain things very clearly, good job!
Lol. UA-cam needs this 😂
you always explain complicated topics in a simple way.. love your way of explaining.. you got a new subscriber
TeXtMe DiReCtLy 🤙🥶
Wow. I was thinking about the spam and I found your video. I now have some idea on where to start even though I know nothing but Java. I'll update if my make any progress. Thank you and great content. Cheers.
I understand your need to run it on a server, but I am also really happy to see CoreML running on the devices only - that makes me feel a little bit less paranoia :D
Nice Channel! Thank you for all the education.
I just had to let you know that i randomly ended up on your Channel after clickíng an ad on The Million Dollar Website.
Had me laughing :D
Thanks for sharing this video, now I can show this to my engineer friend to overcome your spam filter.
Haha that would be epic. An arms race!
Excellent video! I'm guessing Naive Bayes Classifier?
"Bring it on"
LMAO!
Really appreciate this video!
Hey long time no see, your content is really good, patience and consistency is important
TeXtMe DiReCtLy 🤙🥶
Was wondering about your thoughts on using the pre-trained zero shot models like GPT for this use case?
TeXtMe DiReCtLy 🤙🥶
Testing organically that by retraining the model this comment is an organic test of Dogecoin xlm UA-cam also I love you this was a great video
That's totally awesome. Shows the power of using APIs. You could definitely make some money on this even if youtube didnt hire you. Your server looks like it could handle other youtuber's channels as well. Thanks for the videos
I might consider doing that if I get some requests from channels. Haha, the server appears to be very fancy, but it's actually a very old one. I removed all the hardware and put in a low-power CPU. It's mainly used for backups and home automation.
@@simplyexplained You could write spam bots to get all the big youtubers to buy your product XD
This is a spam comment. Deal with it algorithm.
LOL
Algorithm says: 8,5% chance of being spam. Try harder ;)
@@simplyexplained 😂😂
Love your channel, keep up the work..thanks for the reply
Amazing explanation. 👏👏
TeXtMe DiReCtLy 🤙🥶
Amazing work !
Dude this is awesome!!!
From where you studied all these programming and other stuffs
That’s fascinating bro
@savjee Man great video, I've been writing a small script to scrape comments and detect deleted comments, I've been working with Panda and CSV and it's an absolute nightmare, especially as you say when trying to constantly append data to the CSV and read it back in a loop. Mega thanks for the source code, I'm going to implement a similar approach using Google Sheets and see how that goes. Out of curiosity, have you hit any limits with the UA-cam API?
I noticed a small bug, line 76 where you rest allComs you use use the wrong variable name (allComms instead of allComs).
"# Reset list before we continue
allComms = []"
Scratch that I don't think that line is even needed.
Yeah, CSV files are a mess. Also, the UA-cam Data API isn't very easy to started with.
As for the limits: I did request a quota increase and got it very quickly. However, I don't really need it. This channel doesn't get that many comments.
Thank you for spotting that! I removed the allComms line because it was indeed unnecessary.
Can we all take a momment to appreciate the great and easy-to-follow explanation
TeXtMe DiReCtLy 🤙🥶
You could also use a python any where or heroku server if you need the home server back
True, I thought about going that route. But my home server is running Proxmox. Plenty of space for VM's and containers like this ;)
ThioJoe just made a script that requires you to name the spammer, while looking for training data to see if I could automate it, I see you have already done this!
Dude this is amazing!
Thanks!
Amazing!
which software you use to make these attractive videos?
Hotdog NOT HOTdog!! 🤣🤣
How do you make it run every hour ?
When are new videos coming? I've just discovered this channel and I'm in love 💓
this was amazing
Hey hey 0:26, that's me on one of your videos...!
Nice catch ;)
You are awesomeness bro !!! Thank you for sharing !!! 🔥😎🔥😎🫶🏼👌🏼
0:47 dammit Jian-Yang
Finally someone who noticed the reference!
Can we Know Which software using for Animation
TeXtMe DiReCtLy 🤙🥶
Well done.
Loved the video
Will u ever come back?
Hope you can do a step by step tutorial for this!!! very cool.😎😎
THANK YOU FOR WATCHING FOR CRYPTO GUIDANCE SEND MSG RIGHT AWAY WHAT'SAPP
What'sApp✚447459667378
@@cyber3808 aint no way
Xavier.
Did YT hired you?
You are amazing.
No they didn't! But my anti-spam bot is still going strong ;)
@@simplyexplained well well all know how amazing you are Xavier.
I hope you get a better position and maybe hire me one day 😁
Man I gotta learn Python.
I just started learning it, and I'm loving it so far!
Can you please share source code
Can you add a step-by-step for this?
The source code is on GitHub. I think the Jupyter notebook is easy to follow, but I might do a tutorial video on it. No promises though ;)
I rly wanna try tensorflow :(
UA-cam actually has no interest to remove these spams since i'm pretty sure ironically, they are also probably involved in huge amount of ads campaigns, very lucrative during the human malware pandemic outbreak ;)
Let's see if your spam filter detects me as a spam :p
Have a great weekend my brother and keep it up the great job
It says there's a 0.3% chance that your comment is spam. You're safe ;)
Have a nice weekend as well!
wow that is amazing :)
i tested the filter with a fake spam comment and it works
Share the code
No spam in the comment box proves it right.
The fact that google can´t emulate a simple DIY solution created by a loner youtuber speaks volumes about their commitment.
Maybe the spammers will train an adversarial network to engineer their comments so as to trick your model :O
That would be..... so cool!!!
I wonder if be Bayesian filter algorithm is now outdated by AI
Hello sir, how can I contact you?
TeXtMe DiReCtLy 🤙🥶
nice video, good jov bro :D
TeXtMe DiReCtLy 🤙🥶
Wouldn't it have been better if all the manually not classified comments would have a value of 0.5 for "could be spam or not"?
2:00: oh, you removed the non-tagged comments. That makes sense.
Nice
Love your idea, would you mind to share the source code with me? :) Happy to hear your feedback
Sure! I mentioned it at the end of the video. Source code is on GitHub, link in the description.
Disculpe, he visto su canal en UA-cam. Estoy muy interesado. Si está interesado en una asociación empresarial, podemos hablar de los detalles.
Badass
Cool
👍👍👍
👌🏼
Let's give it a go...
Get rich quick now. $$$$ Not spam. True wealth creation. 😋
amazing video.
TeXtMe DiReCtLy 🤙🥶
No
My 🧠 : go spam a comment and check if it gets reported as spam ~~he said more the data better the prediction let's help him~~
Is he is alive or died ? :( 11months been silent :(
Very much alive!
@@simplyexplained :)
This is a test to see if my comment gets deleated :D just for fun, anyway still a good video.
SPAM SPAM find me
Nope, algorithm says only 2% chance of being spam ;)
@@simplyexplained aa ok xD thank for letting me kbow
Came here from million dollar homepage
Great video!!! Very engaging...
With everything going on right now, the best decision is having a profitable investment strategy. Stocks are good but crypto is better.
COPYRIGHT REMOVAL APPEAL
Hi Xavier, how are you? My name is Ricky Araujo. You reported the video I posted on UA-cam for violating rights to your video, I understand you have the right to do so, but I humbly apologize for that.
I'm a fan of your content that's why I subscribed to your channel, at the time I watched your video and I thought it was so rich that I didn't think twice about wanting to copy it, but I'm here begging you for a venomous apology and I ask you to remove your information, as this radically gets in the way. my growth here in Brazil.
Also, I can post a video apologizing and put your channel in my video description.
Don't worry about it anymore, it will never happen again. I just ask that you please withdraw a complaint.
Att. Ricky Araújo
maybe you can email him
TeXtMe DiReCtLy 🤙🥶
Hi I have something important to tell you
A
TeXtMe DiReCtLy 🤙🥶
Hey spam filter don't delete my comment its not a spam😂
i am a spam
This comment may be spam...
Great stuff, I hope yt actually does something. I must admit some of the comments are very legit, like the first 10 or so comments will look like a pretty normal conversation, perhaps a bit scripted but the actual bait will be many comments below, with a pretty legit setup, ie, oh 'I wish I had known how to trade' 'I too didnt know the tricks of trading until I was introduced to Dr Sue Bateman who taught me' 'oh do you have contact details' 'oh, yes you can contact her on WA on 0000000000'
So flagging the entire thread of spam, probably looks like people abusing the report feature to youtube, as they read like legit comments. Its only when you read the entire thread, which I assume ML wont pick up on, that it becomes spammy.
Also I assumed they are bots, but I've actually seen some of them reply to real comments. Which was odd.
Lets see how your filter does with that ^ too ha
You're 100% correct. Comments like "I wish I had known how to trade" are tricky. By themselves, they're not spam. But the replies it gets are. So I trained the model exactly like this. As soon as someone mentions another person to help them, it's spam.
My filter goes through top-level comments as well as replies and processes them individually. So a top level comment "I wish I had known how to trade" might be left alone, while the replies might get removed.
Anyway, I'll tweak the script as time goes on. But so far it seems to do quite well. Fingers crossed!
Profits money love xavier bitcoin test this is a test simply explained