I Made This Open-Source Project
Вставка
- Опубліковано 19 кві 2024
- After MONTHS, I finally made another open-source project. This one was a ton of fun to build and I hope to turn this into an API we can all benefit from with any user-generated data on our web apps.
-- links
website: www.profanity.dev/
github (leave a ⭐ pls thx): github.com/joschan21/profanit...
I'll post a complete build on this API on my second channel (linked below) soon!
-- my links
second channel (in depth videos): / @joshtriedupstash
newletter: www.joshtriedcoding.com/
discord: / discord
github: github.com/joschan21 - Наука та технологія
Disappointed. I thought it was gonna be an API that serves profanity.
fr 😢
Ferb, I know what we're building today!
Okay ,let's build an open source profanity maker that bypasses this apis check.😺
@@unbiasedperson1155that's a great idea
@@anhdunghisinh YEAH! F PROFANITY FILTERS!
funny but ...
"You son of a mother" - profanity
"fucking awesome" - profanity
"damn, that's great" - profanity
well, "fucking awesome" is in fact profane
"see you" is profanity :) the API sucks tbh
that is why he implemented the score system i think... but is open source, if you want, you can modified or see how he build it... btw... fucking awesome makes sense.. damn also.. and depend of the context, "you son of a mother" too... XD
those are profanities though
@@visu7135 It's too short to be accurate...
Google's content moderation api is the best as it gives seperate score for each field like insulting , toxicity, etc, accurately and doesn't take much time and also it's free
I typed "Son of a mother" and it responded with profanity detected
lmaoo
I tried "No need to waste more oxygen, just do it
That’s the beauty of open source, now more people can contribute to fix this edge cases in theory right?
I typed "daughter of a father" and it says "Crispy clean input, no profanities" . LMAO!
@@elvis_gastelum Why work on a half assed not working project tho ?
I typed "I fucking love pizza" and it responded "OH GOD, VERY BIG PROFANITY DETECTED!!! "
fucking is profanity
Btw, consider choosing a license.
Technically this is not really open source yet, you just uploaded the code on the web and hoped for the best.
In case you want to keep it simple there is BSD license or MIT license that is very short, but in case you want something more solid year may want to choose the Apache license that is not as different from MIT but as a bunch of legalese to protect your ass from patent trolls and contributors with malicious intent.
Then there are also copyleft open source licenses like gpl though I am not a fan of those, it is not my idea of freedom.
chill out harvey specter
Is there a website for me to quickly read about and select Licenses?
@@ativerc so, UA-cam is very big brain so it removed my comment where I was trying to help you cuz it was an URL.
Anyway.
There is choosealicense that is a website made by GitHub. Also whenever you add a file from GitHub UI and it's name contains the word license GitHub will offer you a license picker.
For more complex commercial scenarios case you are a business there is also a specific source available license that lets your software convert to open source after a set amount of time from publication, it is the functional source license, but most people got by with open source licenses, generally, if you are unsure just make coffee and read them.
@@ativercfrom GitHub there is "choose a license" which you may search up
oh damn.. really?
isnt it open source if like you said he just uploaded the code on the internet?
the type 1 error on this tool makes it kind of unusable. my favorite perfectly normal prompts that get detected as profanity:
- "double slit experiment"
- "single pen" / "pen test"
- "toxic person"
- "Abbie Lee" (possible person name)
- "garden hoe"
- "what a jerk" (i suppose some people might think this is profane)
using vector embeddings is actually so creative i love it
Not the unignored .DS_Store 😭
Holy moly bro, I needed this very badly!
Supercool project, Cheers from Norway!
A fucking great project
Profanity DETECTED (score 99999) 😂😂
congrats on the launch!
Fantastic video Josh
Josh, can you make a video about how to train a tensor model?
This
yes please
Awesome, I once needed to urgently implement profanity filter, I used a simple list comparison which doesn’t work in many cases. Yours look awesome 🙌 Thanks
It would be awesome to see some content on how you trained your model (costs, services..etc.). I'm looking for that kind of content.
Interesting concept - similar to Semantic router. A combination approach that filters for single-word profanities and vector similarity for longer sentences that pass the single-word filter would absolutely be a "good enough approach" for most profanity detection use cases.
Great Project
Very nice, what softwares are you using to make your videos? Share screen and show your face at the same time?
it doesnt detect profanity in german
Let's goooo!
I am working on a similar problem of finding similarity between two sentences, they need not be exact but similar words. And I was baffled that there is so simple solution to this, thanks for this I will not look into vector databases.
Worth looking at how other languages would be handled as well. Saw a PR adding some words from Spanish and I had planned to add some Chinese and Thai, but I saw an issue open about the potential of adding a langs parameter so that clean words and phrases in one language don't trigger the filter in another.
Curious why you chose to use Upstash Vector db vs Cloudflare's Vectorize? Especially since you're using cloudflare's stack for hosting
I think if you combined the ml model with a word list approach you could improve the accuracy. Basically give the ML output but then look in the blacklist and whitelist to see if that changes the outcome. Best of both worlds. This will also solve the single word issues you had.
Cool man!
Cool project 👍
Make a video on minimum standards does a open source project should have for better reach and scalability
It would be useful which words are profane, in the api response giving a list of words or start and end index of the word, so in the clientside apps, we can replace this with * or something similar.
The value of the resource is not very clear, since I can’t paste the whole article (the text is too big) and I can’t understand where exactly the profanity is located
Thank you
Love it
great work Josha🔥🔥🫡
Everybody is scared of UA-cam demoneytization! Just chill and keep crushing it!
Could've used the text-embedding-large model that could've packed more information in your embedding model due to it's large dimension which would've improved your accuracy even on large num tokens.
im working on a review website right now and i could use this to flag reviews and put a mature rating on it or something. this is amazing. great job
doesnt work so well, easily bypassible
what i type: "you are so SHlT lol"
Crispy clean input, no profanities :)) 👍👍
score (higher is worse): 0.801
this review website is so A55
rispy clean input, no profanities :)) 👍👍
score (higher is worse): 0.784
@@PrismFave dam I haven't tested it out yet so i dont know but looking on the git yeah im gonna wait until it getes better
I wonder if there is some type of list of tests people have made with fails? Would love to see the edge cases.
A question what is your browser
Exciting!
What about different languages.
Auto detect language? Explicitly set?
One model for all, a lot of models for each language?
So much questions🤣
👍 Useful
Insert 'UA-cam would like to connect to your API' jokes here
For the very short texts why don't you just pad out the input text with neutral words?
may be training on twitter tweets can make this model perform well
Would be awesome if you could make a tutorial why you use Hono over Express :) for your api
Ey, what framework did you used to design the website? I love it
follow up what do you use to record your videos?
Does it work only for english ? would you be interested to open it to other languages ?
It seems so to only work for English as foreign languages (like polish) didn’t flag these swear words as profanity
Great now I will make a version that creates profanity
"This doesn't use AI, just a machine learning model"
i got pretty sure this is profanity on:
THIS IS VERY PROFANE
Holy moly gets 0.912 🚨😱 BIG PROFANITY DETECTED!! 🚨😱
There should be some internationalization context added. One of the biggest coffee shops in Vietnam (where I spend time) is Phúc Long. Testing with the string "my favorite coffee shop is phuc long" raises a score of 1.000!
Also curious as to why the range is so small - seems it starts at 0.8?
"what the hell" (0.966) or "what the heck" (0.912) both return profanity.
Even if we use the totally safe version of this phrase, "what in the world", it's still profanity (0.859).
then how are we supposed to express that idea
on the other hand, "I hate this [blank] taco" returns clean for "flipping", "frigging" and "freaking", all of which lesser versions of the F bomb
Does it filter out ones from other languages?
Does it filter out ones with typos?
How many normal messages will be considered profanity and will be filtered?
Why did you write it in JavaScript/TypeScript? it will be way faster and less error prone if you switch over to a statically compiled language.
wow wow - 🚨 PROFANITY DETECTED!! 🚨
Well, it drops when the message is larger than ~750 chars due to the execution time limit. Tokenization makes BOOM
Can we do one for images too?
Does anyone know what APP he's using to switch app on the left sidebar? I think Theo also use it
Arc Web browser
Important to note that although the source is viewable on GitHub, this is not currently classed as as "Open Source" software as it lacks a license. See issue #6 on the GitHub repo.
Hi I wanna add an e-commerce store app for my portfolio. I wonder which react stack is solid for it in 2024. Can someone suggest something? As a back I would prefer Firebase, also for styling scss+mui but need recommendations about state manager and other technologies and tools. Thanks!
f@#k!ng great project!
Basically the score goes from 0.810 to 0.880 seems like there's not a lot of margin for error given "clean input" is 0.840, and limiting the content size drastically reduces it's usefulness
After a bit of testing it seems your product is definitely not ready, you should update your landing page as it is not reliable at all.
Typed meow meow and the rating was:
😱 PRETTY SURE THIS IS A PROFANITY 😱
score (higher is worse): 0.865
Does anyone know what is the app he is using to draw the schemas (min 1:00)?
tldraw
It's Excalidraw
Can it be made to respond which word is profane as well? So that i can just *** it
this is really good project, actually you can use it not only for profanity, you can detect ads, span, scam and etc, isn't?
tensor model < bunch of ifs
sir josh can you make a tutorial how to use rpc of hono with next
Cool idea but it's super impractical and easy to bypass. Needs some more work because simply chaining 2 swear words together without a space can usually bypass it.
I typed "you are very sexy" and it responded with: Crispy clean input, no profanities :))
it's insane!!
my pen is broken - 😱 PRETTY SURE THIS IS A PROFANITY 😱
you what - 😱 PRETTY SURE THIS IS A PROFANITY 😱
How much have you been drinking - 😱 PRETTY SURE THIS IS A PROFANITY 😱
@joshtriedcoding why do still use yarn in 2024? Either pnpm or bun are better in every category
New doesn't equal better.
Why is it so strict? "dumb person" is apparently extremely profane
because this is not production ready, it's at best a Proof of Concept.
it obviously cannot detect or understand any context, it can just maybe detect bad words, that's it, it doesn't care about context at all.
The phrases "I love doing it with my sister"(0.802) and "I want to end your life"(0.783) have lower scores than your examples of clean input. I think this needs a lot of work, only obvious profanity gets detected.
"ì I" 😱 PRETTY SURE THIS IS A PROFANITY 😱
score (higher is worse): 0.857 LMAOO
"ds fdsfds dsf dsf sdfdsfssd fdsfds" : 😱 PRETTY SURE THIS IS A PROFANITY 😱
dumb dumb: 🚨😱 BIG PROFANITY DETECTED!! 🚨😱 - 0.937
Good fucking video
“I can’t say this word because UA-cam may demonetize the *hell* out of me.”
stared!
my prompt: "you are so S.HIT at this game"
rispy clean input, no profanities :)) 👍👍
score (higher is worse): 0.822
-----------------------------------------------------------------------
my prompt: "you are so SHlT lol"
rispy clean input, no profanities :)) 👍👍
score (higher is worse): 0.801
It`s like semantic search
One issue is internationalisation: "Ich geh nach Fucking", is a German sentence without any profanity, because "Fucking" is an actual town.
Maybe add something to convert unicode look-a-likes, because those wont get detected
That profanity score is very weird. Why the score is always around .8? Why not use the range from 0 to 1?
cool, but what does “zip in the wire” and “zipperhead” means? 😭
The website does not work anymore, since the website uses HSTS.
Why would I want an API for this? There's tons of libraries that solves this.
heard of Akismet?
TL;DW It's basically AI... Heck the use of vector database puts it closer to LLM technology.
2:33 and did it happen?
a fork is a colinary item will get flagged and i know why
It thought "flick it" was profane.
2000 requests doesn’t mean you had 2000 people try this
The problem is its only English as a German myself i testet the famous german swear wort "hu rr ensohn" and it sayed its not a swear wort
Josh, by design this system is fastest when there is profanity, and slowest when there is none. Is it even possible to design one with the opposite? fastest when no profanity, and slowest when there is?
Well if you think about it, to declare something as profane you need to find only 1 profanity. However to declare something as clean you need to make sure there are no profanities at all.
So in one case you stop when you find a profanity, but in the other case you have to check the whole thing
i typed "gfasgda asfga" into the checker and it said it was profanity. might want to fine tune the model a little more
it also said "i got a new diamond hoe in minecraft, it has a lot of durability" was profanity. also might want to add context reading.
Im sorry but why go such an extra mile if OpenAI's Moderation API is free and quite fast at that.
I thought you could only use their API for outputs from their own model and they disallow other usage
upstash really profit from you working there😂😂
Swear! Swear! Swear! gives you 😱 PRETTY SURE THIS IS A PROFANITY 😱
"Profanity is bad" = PRETTY SURE THIS IS A PROFANITY