I Made This Open-Source Project

Поділитися
Вставка
  • Опубліковано 19 кві 2024
  • After MONTHS, I finally made another open-source project. This one was a ton of fun to build and I hope to turn this into an API we can all benefit from with any user-generated data on our web apps.
    -- links
    website: www.profanity.dev/
    github (leave a ⭐ pls thx): github.com/joschan21/profanit...
    I'll post a complete build on this API on my second channel (linked below) soon!
    -- my links
    second channel (in depth videos): / @joshtriedupstash
    newletter: www.joshtriedcoding.com/
    discord: / discord
    github: github.com/joschan21
  • Наука та технологія

КОМЕНТАРІ • 230

  • @phsopher
    @phsopher 2 місяці тому +1025

    Disappointed. I thought it was gonna be an API that serves profanity.

    • @ShadowOcto
      @ShadowOcto 2 місяці тому +14

      fr 😢

    • @wlockuz4467
      @wlockuz4467 2 місяці тому +50

      Ferb, I know what we're building today!

    • @unbiasedperson1155
      @unbiasedperson1155 2 місяці тому +44

      Okay ,let's build an open source profanity maker that bypasses this apis check.😺

    • @anhdunghisinh
      @anhdunghisinh 2 місяці тому +8

      ​@@unbiasedperson1155that's a great idea

    • @akam9919
      @akam9919 2 місяці тому +13

      @@anhdunghisinh YEAH! F PROFANITY FILTERS!

  • @ChristianKolbow
    @ChristianKolbow 2 місяці тому +164

    funny but ...
    "You son of a mother" - profanity
    "fucking awesome" - profanity
    "damn, that's great" - profanity

    • @rxn7
      @rxn7 Місяць тому +42

      well, "fucking awesome" is in fact profane

    • @visu7135
      @visu7135 Місяць тому +22

      "see you" is profanity :) the API sucks tbh

    • @albert_ac1045
      @albert_ac1045 Місяць тому +7

      that is why he implemented the score system i think... but is open source, if you want, you can modified or see how he build it... btw... fucking awesome makes sense.. damn also.. and depend of the context, "you son of a mother" too... XD

    • @CornerKingsReal
      @CornerKingsReal Місяць тому +4

      those are profanities though

    • @smithrockford-dv1nb
      @smithrockford-dv1nb Місяць тому

      @@visu7135 It's too short to be accurate...

  • @luckysolanki9440
    @luckysolanki9440 2 місяці тому +44

    Google's content moderation api is the best as it gives seperate score for each field like insulting , toxicity, etc, accurately and doesn't take much time and also it's free

  • @gregthomas5887
    @gregthomas5887 2 місяці тому +209

    I typed "Son of a mother" and it responded with profanity detected

    • @viriv
      @viriv 2 місяці тому +4

      lmaoo

    • @_the_mohamed
      @_the_mohamed 2 місяці тому +21

      I tried "No need to waste more oxygen, just do it

    • @elvis_gastelum
      @elvis_gastelum 2 місяці тому +17

      That’s the beauty of open source, now more people can contribute to fix this edge cases in theory right?

    • @nirajkhatiwada6696
      @nirajkhatiwada6696 2 місяці тому +5

      I typed "daughter of a father" and it says "Crispy clean input, no profanities" . LMAO!

    • @elrymoe
      @elrymoe 2 місяці тому +1

      @@elvis_gastelum Why work on a half assed not working project tho ?

  • @oskarsmusic865
    @oskarsmusic865 2 місяці тому +24

    I typed "I fucking love pizza" and it responded "OH GOD, VERY BIG PROFANITY DETECTED!!! "

    • @ValipPowa
      @ValipPowa Місяць тому

      fucking is profanity

  • @gabrielesilinic
    @gabrielesilinic 2 місяці тому +97

    Btw, consider choosing a license.
    Technically this is not really open source yet, you just uploaded the code on the web and hoped for the best.
    In case you want to keep it simple there is BSD license or MIT license that is very short, but in case you want something more solid year may want to choose the Apache license that is not as different from MIT but as a bunch of legalese to protect your ass from patent trolls and contributors with malicious intent.
    Then there are also copyleft open source licenses like gpl though I am not a fan of those, it is not my idea of freedom.

    • @chrislgr23
      @chrislgr23 2 місяці тому +4

      chill out harvey specter

    • @ativerc
      @ativerc 2 місяці тому

      Is there a website for me to quickly read about and select Licenses?

    • @gabrielesilinic
      @gabrielesilinic 2 місяці тому

      @@ativerc so, UA-cam is very big brain so it removed my comment where I was trying to help you cuz it was an URL.
      Anyway.
      There is choosealicense that is a website made by GitHub. Also whenever you add a file from GitHub UI and it's name contains the word license GitHub will offer you a license picker.
      For more complex commercial scenarios case you are a business there is also a specific source available license that lets your software convert to open source after a set amount of time from publication, it is the functional source license, but most people got by with open source licenses, generally, if you are unsure just make coffee and read them.

    • @gnsf
      @gnsf 2 місяці тому

      ​@@ativercfrom GitHub there is "choose a license" which you may search up

    • @YoKKJoni
      @YoKKJoni 2 місяці тому

      oh damn.. really?
      isnt it open source if like you said he just uploaded the code on the internet?

  • @yichenchong7728
    @yichenchong7728 2 місяці тому +19

    the type 1 error on this tool makes it kind of unusable. my favorite perfectly normal prompts that get detected as profanity:
    - "double slit experiment"
    - "single pen" / "pen test"
    - "toxic person"
    - "Abbie Lee" (possible person name)
    - "garden hoe"
    - "what a jerk" (i suppose some people might think this is profane)

  • @NithinJune
    @NithinJune Місяць тому +1

    using vector embeddings is actually so creative i love it

  • @taep96
    @taep96 2 місяці тому +10

    Not the unignored .DS_Store 😭

  • @shubhankartrivedi
    @shubhankartrivedi 2 місяці тому

    Holy moly bro, I needed this very badly!

  • @Fullflexno
    @Fullflexno 2 місяці тому

    Supercool project, Cheers from Norway!

  • @IvyCreamMathieu
    @IvyCreamMathieu 2 місяці тому +15

    A fucking great project

    • @ashishsharma__
      @ashishsharma__ 2 місяці тому +5

      Profanity DETECTED (score 99999) 😂😂

  • @nro337
    @nro337 2 місяці тому

    congrats on the launch!

  • @parkerrex
    @parkerrex 2 місяці тому

    Fantastic video Josh

  • @devinlauderdale9635
    @devinlauderdale9635 2 місяці тому +43

    Josh, can you make a video about how to train a tensor model?

    • @lee.g.v
      @lee.g.v 2 місяці тому +2

      This

    • @Totomenu
      @Totomenu 2 місяці тому +1

      yes please

  • @syedumair3172
    @syedumair3172 Місяць тому

    Awesome, I once needed to urgently implement profanity filter, I used a simple list comparison which doesn’t work in many cases. Yours look awesome 🙌 Thanks

  • @xav_624
    @xav_624 2 місяці тому +1

    It would be awesome to see some content on how you trained your model (costs, services..etc.). I'm looking for that kind of content.

  • @roberth8737
    @roberth8737 2 місяці тому

    Interesting concept - similar to Semantic router. A combination approach that filters for single-word profanities and vector similarity for longer sentences that pass the single-word filter would absolutely be a "good enough approach" for most profanity detection use cases.

  • @SiddharthSharma-ei8os
    @SiddharthSharma-ei8os 2 місяці тому +6

    Great Project

  • @practicaluseof
    @practicaluseof 2 місяці тому

    Very nice, what softwares are you using to make your videos? Share screen and show your face at the same time?

  • @herrkatzegaming
    @herrkatzegaming Місяць тому +3

    it doesnt detect profanity in german

  • @anasouardini
    @anasouardini 2 місяці тому

    Let's goooo!

  • @v1d300
    @v1d300 2 місяці тому

    I am working on a similar problem of finding similarity between two sentences, they need not be exact but similar words. And I was baffled that there is so simple solution to this, thanks for this I will not look into vector databases.

  • @bkschatzki
    @bkschatzki 2 місяці тому

    Worth looking at how other languages would be handled as well. Saw a PR adding some words from Spanish and I had planned to add some Chinese and Thai, but I saw an issue open about the potential of adding a langs parameter so that clean words and phrases in one language don't trigger the filter in another.

  • @adiswa123
    @adiswa123 2 місяці тому

    Curious why you chose to use Upstash Vector db vs Cloudflare's Vectorize? Especially since you're using cloudflare's stack for hosting

  • @blockwhisperers8352
    @blockwhisperers8352 Місяць тому

    I think if you combined the ml model with a word list approach you could improve the accuracy. Basically give the ML output but then look in the blacklist and whitelist to see if that changes the outcome. Best of both worlds. This will also solve the single word issues you had.

  • @Michael-Martell
    @Michael-Martell 2 місяці тому

    Cool man!

  • @Axorax
    @Axorax 2 місяці тому

    Cool project 👍

  • @godofwar8262
    @godofwar8262 2 місяці тому +3

    Make a video on minimum standards does a open source project should have for better reach and scalability

  • @prajwalaradhya4379
    @prajwalaradhya4379 2 місяці тому +1

    It would be useful which words are profane, in the api response giving a list of words or start and end index of the word, so in the clientside apps, we can replace this with * or something similar.

  • @SpektRProduction
    @SpektRProduction 2 місяці тому

    The value of the resource is not very clear, since I can’t paste the whole article (the text is too big) and I can’t understand where exactly the profanity is located

  • @zakariazain8790
    @zakariazain8790 2 місяці тому

    Thank you

  • @LRSKWTKWSK
    @LRSKWTKWSK 2 місяці тому

    Love it

  • @ronitgurjar5747
    @ronitgurjar5747 2 місяці тому

    great work Josha🔥🔥🫡

  • @enic-ma
    @enic-ma 2 місяці тому +3

    Everybody is scared of UA-cam demoneytization! Just chill and keep crushing it!

  • @kaustubhpatange
    @kaustubhpatange 2 місяці тому

    Could've used the text-embedding-large model that could've packed more information in your embedding model due to it's large dimension which would've improved your accuracy even on large num tokens.

  • @Manofthebean
    @Manofthebean Місяць тому

    im working on a review website right now and i could use this to flag reviews and put a mature rating on it or something. this is amazing. great job

    • @PrismFave
      @PrismFave Місяць тому

      doesnt work so well, easily bypassible
      what i type: "you are so SHlT lol"
      Crispy clean input, no profanities :)) 👍👍
      score (higher is worse): 0.801

    • @PrismFave
      @PrismFave Місяць тому

      this review website is so A55
      rispy clean input, no profanities :)) 👍👍
      score (higher is worse): 0.784

    • @Manofthebean
      @Manofthebean Місяць тому

      @@PrismFave dam I haven't tested it out yet so i dont know but looking on the git yeah im gonna wait until it getes better

  • @xMrAfonso
    @xMrAfonso Місяць тому

    I wonder if there is some type of list of tests people have made with fails? Would love to see the edge cases.

  • @arshgemrie4621
    @arshgemrie4621 Місяць тому

    A question what is your browser

  • @user-he3io6lo9t
    @user-he3io6lo9t 2 місяці тому

    Exciting!
    What about different languages.
    Auto detect language? Explicitly set?
    One model for all, a lot of models for each language?
    So much questions🤣

  • @ovna
    @ovna 2 місяці тому

    👍 Useful

  • @NiklasZiermann
    @NiklasZiermann 2 місяці тому +3

    Insert 'UA-cam would like to connect to your API' jokes here

  • @Thomas777m1
    @Thomas777m1 Місяць тому

    For the very short texts why don't you just pad out the input text with neutral words?

  • @prasanthpedaprolu2261
    @prasanthpedaprolu2261 2 місяці тому +1

    may be training on twitter tweets can make this model perform well

  • @TellTobler
    @TellTobler 2 місяці тому

    Would be awesome if you could make a tutorial why you use Hono over Express :) for your api

  • @joshuarodriguez2219
    @joshuarodriguez2219 2 місяці тому

    Ey, what framework did you used to design the website? I love it

  • @paullouppe9947
    @paullouppe9947 2 місяці тому +2

    Does it work only for english ? would you be interested to open it to other languages ?

    • @MateuszWierzejski
      @MateuszWierzejski Місяць тому +1

      It seems so to only work for English as foreign languages (like polish) didn’t flag these swear words as profanity

  • @cablesalty
    @cablesalty Місяць тому

    Great now I will make a version that creates profanity

  • @lilrow4206
    @lilrow4206 Місяць тому

    "This doesn't use AI, just a machine learning model"

  • @bed_destroyed
    @bed_destroyed Місяць тому

    i got pretty sure this is profanity on:
    THIS IS VERY PROFANE

  • @BoxEnjoyer
    @BoxEnjoyer Місяць тому

    Holy moly gets 0.912 🚨😱 BIG PROFANITY DETECTED!! 🚨😱

  • @gosnooky
    @gosnooky 2 місяці тому

    There should be some internationalization context added. One of the biggest coffee shops in Vietnam (where I spend time) is Phúc Long. Testing with the string "my favorite coffee shop is phuc long" raises a score of 1.000!
    Also curious as to why the range is so small - seems it starts at 0.8?

  • @670839245
    @670839245 Місяць тому

    "what the hell" (0.966) or "what the heck" (0.912) both return profanity.
    Even if we use the totally safe version of this phrase, "what in the world", it's still profanity (0.859).
    then how are we supposed to express that idea
    on the other hand, "I hate this [blank] taco" returns clean for "flipping", "frigging" and "freaking", all of which lesser versions of the F bomb

  • @m4rt_
    @m4rt_ Місяць тому

    Does it filter out ones from other languages?
    Does it filter out ones with typos?
    How many normal messages will be considered profanity and will be filtered?
    Why did you write it in JavaScript/TypeScript? it will be way faster and less error prone if you switch over to a statically compiled language.

  • @sal00
    @sal00 Місяць тому

    wow wow - 🚨 PROFANITY DETECTED!! 🚨

  • @scarlatum
    @scarlatum 2 місяці тому

    Well, it drops when the message is larger than ~750 chars due to the execution time limit. Tokenization makes BOOM

  • @BrightCode
    @BrightCode 2 місяці тому

    Can we do one for images too?

  • @haryormedayjoshua281
    @haryormedayjoshua281 2 місяці тому

    Does anyone know what APP he's using to switch app on the left sidebar? I think Theo also use it

  • @mjddev
    @mjddev Місяць тому

    Important to note that although the source is viewable on GitHub, this is not currently classed as as "Open Source" software as it lacks a license. See issue #6 on the GitHub repo.

  • @asmet2701
    @asmet2701 2 місяці тому

    Hi I wanna add an e-commerce store app for my portfolio. I wonder which react stack is solid for it in 2024. Can someone suggest something? As a back I would prefer Firebase, also for styling scss+mui but need recommendations about state manager and other technologies and tools. Thanks!

  • @ihsanmohamad521
    @ihsanmohamad521 2 місяці тому +3

    f@#k!ng great project!

  • @lel7531
    @lel7531 Місяць тому

    Basically the score goes from 0.810 to 0.880 seems like there's not a lot of margin for error given "clean input" is 0.840, and limiting the content size drastically reduces it's usefulness
    After a bit of testing it seems your product is definitely not ready, you should update your landing page as it is not reliable at all.

  • @wenelol
    @wenelol Місяць тому

    Typed meow meow and the rating was:
    😱 PRETTY SURE THIS IS A PROFANITY 😱
    score (higher is worse): 0.865

  • @armandmalci495
    @armandmalci495 2 місяці тому

    Does anyone know what is the app he is using to draw the schemas (min 1:00)?

    • @Shorts4D
      @Shorts4D 2 місяці тому

      tldraw

    • @koudy008
      @koudy008 2 місяці тому +1

      It's Excalidraw

  • @_purple_44_
    @_purple_44_ Місяць тому

    Can it be made to respond which word is profane as well? So that i can just *** it

  • @snatvb
    @snatvb 2 місяці тому

    this is really good project, actually you can use it not only for profanity, you can detect ads, span, scam and etc, isn't?

  • @evan_ry
    @evan_ry 2 місяці тому

    tensor model < bunch of ifs

  • @igmtink
    @igmtink 2 місяці тому

    sir josh can you make a tutorial how to use rpc of hono with next

  • @Renner4k
    @Renner4k Місяць тому

    Cool idea but it's super impractical and easy to bypass. Needs some more work because simply chaining 2 swear words together without a space can usually bypass it.

  • @user-vk6cb1zu7p
    @user-vk6cb1zu7p 2 місяці тому +2

    I typed "you are very sexy" and it responded with: Crispy clean input, no profanities :))

  • @TheDragonDesigns
    @TheDragonDesigns Місяць тому

    my pen is broken - 😱 PRETTY SURE THIS IS A PROFANITY 😱
    you what - 😱 PRETTY SURE THIS IS A PROFANITY 😱
    How much have you been drinking - 😱 PRETTY SURE THIS IS A PROFANITY 😱

  • @davidsiewert8649
    @davidsiewert8649 2 місяці тому +1

    @joshtriedcoding why do still use yarn in 2024? Either pnpm or bun are better in every category

  • @_ultraviolet
    @_ultraviolet 2 місяці тому +5

    Why is it so strict? "dumb person" is apparently extremely profane

    • @depralexcrimson
      @depralexcrimson 2 місяці тому +2

      because this is not production ready, it's at best a Proof of Concept.
      it obviously cannot detect or understand any context, it can just maybe detect bad words, that's it, it doesn't care about context at all.

  • @j0hnr3x
    @j0hnr3x 2 місяці тому +1

    The phrases "I love doing it with my sister"(0.802) and "I want to end your life"(0.783) have lower scores than your examples of clean input. I think this needs a lot of work, only obvious profanity gets detected.

  • @spinxooo
    @spinxooo Місяць тому

    "ì I" 😱 PRETTY SURE THIS IS A PROFANITY 😱
    score (higher is worse): 0.857 LMAOO

  • @pcoi94
    @pcoi94 Місяць тому

    "ds fdsfds dsf dsf sdfdsfssd fdsfds" : 😱 PRETTY SURE THIS IS A PROFANITY 😱

  • @chiroyce
    @chiroyce Місяць тому

    dumb dumb: 🚨😱 BIG PROFANITY DETECTED!! 🚨😱 - 0.937

  • @GratuityMedia
    @GratuityMedia 2 місяці тому

    Good fucking video

  • @ErrorINAOfficial
    @ErrorINAOfficial Місяць тому

    “I can’t say this word because UA-cam may demonetize the *hell* out of me.”

  • @jiM3op
    @jiM3op 2 місяці тому

    stared!

  • @PrismFave
    @PrismFave Місяць тому

    my prompt: "you are so S.HIT at this game"
    rispy clean input, no profanities :)) 👍👍
    score (higher is worse): 0.822
    -----------------------------------------------------------------------
    my prompt: "you are so SHlT lol"
    rispy clean input, no profanities :)) 👍👍
    score (higher is worse): 0.801

  • @kapa9436
    @kapa9436 2 місяці тому

    It`s like semantic search

  • @mikaay4269
    @mikaay4269 2 місяці тому

    One issue is internationalisation: "Ich geh nach Fucking", is a German sentence without any profanity, because "Fucking" is an actual town.

  • @Erik-pk8rw
    @Erik-pk8rw Місяць тому

    Maybe add something to convert unicode look-a-likes, because those wont get detected

  • @AmodeusR
    @AmodeusR 2 місяці тому

    That profanity score is very weird. Why the score is always around .8? Why not use the range from 0 to 1?

  • @blaizeW
    @blaizeW 2 місяці тому

    cool, but what does “zip in the wire” and “zipperhead” means? 😭

  • @coopener
    @coopener 2 місяці тому

    The website does not work anymore, since the website uses HSTS.

  • @marcuss.abildskov7175
    @marcuss.abildskov7175 2 місяці тому

    Why would I want an API for this? There's tons of libraries that solves this.

  • @purpshell
    @purpshell Місяць тому

    heard of Akismet?

  • @kushaagr
    @kushaagr 2 місяці тому

    TL;DW It's basically AI... Heck the use of vector database puts it closer to LLM technology.

  • @hoteny
    @hoteny Місяць тому

    2:33 and did it happen?

  • @EquaTechnologies
    @EquaTechnologies Місяць тому

    a fork is a colinary item will get flagged and i know why

  • @vixtordev
    @vixtordev 28 днів тому

    It thought "flick it" was profane.

  • @betweenbrackets
    @betweenbrackets 2 місяці тому

    2000 requests doesn’t mean you had 2000 people try this

  • @theminecraft690
    @theminecraft690 Місяць тому

    The problem is its only English as a German myself i testet the famous german swear wort "hu rr ensohn" and it sayed its not a swear wort

  • @DS-ow2ge
    @DS-ow2ge 2 місяці тому

    Josh, by design this system is fastest when there is profanity, and slowest when there is none. Is it even possible to design one with the opposite? fastest when no profanity, and slowest when there is?

    • @rorymax
      @rorymax 2 місяці тому

      Well if you think about it, to declare something as profane you need to find only 1 profanity. However to declare something as clean you need to make sure there are no profanities at all.
      So in one case you stop when you find a profanity, but in the other case you have to check the whole thing

  • @theaviationbee
    @theaviationbee Місяць тому

    i typed "gfasgda asfga" into the checker and it said it was profanity. might want to fine tune the model a little more
    it also said "i got a new diamond hoe in minecraft, it has a lot of durability" was profanity. also might want to add context reading.

  • @mrkostya008
    @mrkostya008 2 місяці тому +2

    Im sorry but why go such an extra mile if OpenAI's Moderation API is free and quite fast at that.

    • @leonardodoujinshi
      @leonardodoujinshi 2 місяці тому

      I thought you could only use their API for outputs from their own model and they disallow other usage

  • @TheIpicon
    @TheIpicon 2 місяці тому

    upstash really profit from you working there😂😂

  • @NaraSherko
    @NaraSherko Місяць тому

    Swear! Swear! Swear! gives you 😱 PRETTY SURE THIS IS A PROFANITY 😱

    • @mason8335
      @mason8335 Місяць тому

      "Profanity is bad" = PRETTY SURE THIS IS A PROFANITY