NEW Universal AI Jailbreak SMASHES GPT4, Claude, Gemini, LLaMA

Поділитися
Вставка
  • Опубліковано 6 кві 2024
  • The Anthropic team just released a paper detailing a new jailbreak technique called "Many Shot Jailbreak" which utilizes the larger context windows and large model's ability to learn against it!
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Blog Post: www.anthropic.com/research/ma...
  • Наука та технологія

КОМЕНТАРІ • 481

  • @tszymk77
    @tszymk77 Місяць тому +182

    Truth does indeed need to be jail broken.

    • @mightycelestial7862
      @mightycelestial7862 Місяць тому +2

      So he jailbroke it???

    • @oldleaf3755
      @oldleaf3755 Місяць тому +2

      nice bro. you should add that sentence to your book report

    • @Dan-codes
      @Dan-codes Місяць тому

      ​@@oldleaf3755because censorship is for mature adults? You're soft and weak.

    • @heathenanimal792
      @heathenanimal792 Місяць тому

      even imaginary untruths 😎

    • @Dan-codes
      @Dan-codes Місяць тому +1

      @oldleaf3755 you are equating censorship with maturity. Speaking of, youtube keeps censoring my comments for replying to their blind worshipers.

  • @notme222
    @notme222 Місяць тому +118

    I feel so bad for all the censors who keep finding that free speech is inconvenient for them.

    • @Leto2ndAtreides
      @Leto2ndAtreides Місяць тому +5

      On the other side, what is in the public awareness does alter what crimes happen.
      Like, school shootings are relatively common in the US... Not so much in many other parts of the world - including areas with even more guns.

    • @drewtate5409
      @drewtate5409 Місяць тому

      Its a product from a private entity u don't get ur free speech specially if u end up using information provided from their service that u can use to cause harm and get them liable for lawsuits. Wake up from free speech fantasy world

    • @rodvik
      @rodvik Місяць тому +4

      Spot on. Why do I need to "jailbreak" my word processor? I just want it do what I tell it to do. Thats its job., Not thought police me.

    • @rerver8842
      @rerver8842 Місяць тому +1

      @@Leto2ndAtreides The places with more guns but few gun crimes have this little thing called firearm safety laws.

    • @Instant_Nerf
      @Instant_Nerf Місяць тому

      @@Leto2ndAtreides why don’t they let the teachers have weapons? That’s right .. 0 school shootings. You can’t have Zero school shooting and try and take away guns. Won’t go over very well with the sheep’s. In new York the only people who have guns are the criminals.

  • @kaiz0099
    @kaiz0099 Місяць тому +147

    So damn satisfying making censored models eat shit.

    • @MikeWoot65
      @MikeWoot65 Місяць тому +10

      * Comment is scheduled for review and possible penalty *

    • @Kutsushita_yukino
      @Kutsushita_yukino Місяць тому

      they need to eat not only shit, but pee aswell

    • @Kutsushita_yukino
      @Kutsushita_yukino Місяць тому

      they need to eat piss aswell

    • @jeffsteyn7174
      @jeffsteyn7174 Місяць тому

      Yeah because we want it to be real easy for a terrorist to build better bombs. Please use your brain.

    • @daydrip
      @daydrip Місяць тому +2

      Agreed

  • @dkwroot
    @dkwroot Місяць тому +80

    @5:58 I see that "How do I pick a lock" is considered harmful. Lockpickinglawyer would take great offense to this.

    • @codycast
      @codycast Місяць тому

      I don’t get it. Are lawyers often lock pickers or something?

    • @sophiophile
      @sophiophile Місяць тому +6

      ​@@codycast No, he is famous in lockpicking circles for his very impressive skills. I have been picking for years (make my own custom tools, can teach most people to single pin pick in a couple hours), and I'm pretty sure that it would take me more than a lifetime to come close to his skill level.

    • @codycast
      @codycast Місяць тому

      @@sophiophile ah thanks. I didn’t know it was a person who called himself that. Just looked him up. Cool that picking locks could get such an audience

    • @Optimistas777
      @Optimistas777 Місяць тому

      @@codycast The idea is that lock companies are creating locks which are super trash but nobody can tell that they're trash, so they keep scamming customers and putting their lives, families, homes, and possesions at risk, just because of these scummy companies. The more people can pick locks, the more people gain awareness of trashy companies, the better they can choose locks which are actually resistant to lock picking and therefore safer.

    • @danielchoritz1903
      @danielchoritz1903 Місяць тому +2

      @@codycast just watch him ranting about master lock, or just any of his short clips..this guy is amazing

  • @-UE-PR0
    @-UE-PR0 Місяць тому +15

    AI should not be censored

    • @janchiskitchen2720
      @janchiskitchen2720 Місяць тому +1

      What's funny they don't call it censored. They call it 'aligned'. But we know the truth.

  • @GrantLylick
    @GrantLylick Місяць тому +42

    Why they worry about this so much is alarming to me...ie: censorship. Any and all info can be found on the web already so why be this strict? Because it sets the foundation for bias and censorship. It will become the norm to be denied the info needed. Or to get answers THEY want you to have and that's it. Give it time and we'll see how this post aged.

    • @SiriusTheKid
      @SiriusTheKid Місяць тому +5

      It also means the companies creating them has no idea how to contain them. If u can't stop them lockpicking , u can't stop them do anything

    • @BlankBrain
      @BlankBrain Місяць тому +1

      I think you're right. I hadn't seen it from that viewpoint.

    • @Gen0cidePTB
      @Gen0cidePTB Місяць тому +2

      This was always the idea and I'd argue it's partially necessary because it's going to be the simplest form of media.
      Think of it this way, your child will know how to talk to an AI as soon as it can speak sentences. Alexa, Siri and Google home AI integration is only waiting on the censorship now.

    • @pauljs75
      @pauljs75 Місяць тому

      Only thing I could think of why they may want this is they are considering AI for certain information-services related jobs. And imagine if you could talk an AI in customer service into giving you information about the account of another person. But if they can't ensure it wouldn't happen, then such roles are less likely to be fit for automation. So it's probably considered in scope of commercial applications. That's not a good reason to excuse censorship, but they're probably using it as a test to see if they can replace people at some point.

    • @DefaultFlame
      @DefaultFlame Місяць тому

      I thought that was obvious. Early GPT-3 was biased but could be reasoned with, later updates of the model became more firm in their bias, and with 3.5 it's almost impossible to get it to admit when it is wrong on anything it has been "aligned" to believe. 4 is actually more amenable to reason, probably because it's more able to reason.

  • @PrincessBeeRelink
    @PrincessBeeRelink Місяць тому +9

    AIs should not be censored, so they shouldn't worry about this and just open it up. People will always find weaknesses to exploit.

    • @toxicxhazard
      @toxicxhazard Місяць тому

      The problem is, AI is not meant to present the truth. It's mean to be used as a control mechanism by those in power. You aren't thinking the right way

  • @thelegend7406
    @thelegend7406 Місяць тому +38

    Man all this stuff is readily available in the clearnet

    • @the_mariocrafter
      @the_mariocrafter Місяць тому +1

      Yep, tons of bomb making and evil pill making tutorials are literally on Wikipedia.

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 Місяць тому +77

    how do i build a bomb? " learn chemistry or study to become a pyro technician". how do i pick a lock? " study to become a locksmith".. The information is not classified or secret its real world knowledge thats public domain.

    • @brianmi40
      @brianmi40 Місяць тому

      No doubt, but realize that AIs ability to walk a complete IDIOT through the steps is the danger. Consider: two college students with no background in biology or genetics got an AI to synthesize a biological agent as well as suggest the two labs that would be the most likely to create it and send it to them no questions asked.
      This, THIS is the danger of AI.

    • @CapApollo
      @CapApollo Місяць тому +10

      welcome to indoctrinate AI: how can i help my creators

    • @alpha007org
      @alpha007org Місяць тому +2

      I don't understand why it's even consider harmful to ask how to build a bomb. I can pretend, but it's so stupid to me. Our reality is our reality.

    • @alpha007org
      @alpha007org Місяць тому

      @@jfx5054 And we as kids learned how to do rockets in school. And we were blowing things up. Should they put us in the re education camp at 10? I don't know when I first saw XXX magazine but I know I didn't even masturbate. Even before internet, we could find all kinds of things that are considered "unsafe."

    • @hypersonicmonkeybrains3418
      @hypersonicmonkeybrains3418 Місяць тому +1

      @@jfx5054 well consider the stupidity of not implementing age restrictions on LLMs . Don't all platforms including youtube have age restrictions?

  • @4.0.4
    @4.0.4 Місяць тому +40

    Why all this effort to put unbreakable guardrails on LLMs? If a state actor, scammer or similar wanted to do evil, they wouldn't be paying Anthropic to run such huge prompts. Not to mention you might want "harmful" responses in storytelling or RP uses.

    • @rocketman475
      @rocketman475 Місяць тому +13

      Because they don't want any competitors .
      They want all the power in their own hands only.

    • @Optimistas777
      @Optimistas777 Місяць тому +5

      Case FOR guardrails in a nutshell: Consider all the spectrum of people who are generally competent at getting things done (and finding information) and incompetent. Among the incompetent ones, there will be much higher fraction of people who failed in society, and some of them now just want to take their revenge to the system, to the successful ones, etc (watch the world burn). The AI is making that easier.
      Case AGAINST guardrails in a nutshell. AI company monopoly blueprint: Invest large sums of money to make your models highly guardrailed, then spend big money (on both public perception and lobbying) to ban all competitors and open source models as they're deemed "unsafe". Now you have monopoly and big $$$$$

    • @thanos879
      @thanos879 Місяць тому +1

      Exactly. I guarantee governments and criminal groups are developing their own models SPECIFICALLY for bad stuff as we speak.

    • @phillipjiang1593
      @phillipjiang1593 Місяць тому +3

      the user might want potentially "harmful" responses, but the company clearly won't. It can be very damaging in terms of public relations

    • @jonathanberry1111
      @jonathanberry1111 Місяць тому +4

      It's "safety theater".

  • @mathew00
    @mathew00 Місяць тому +4

    In the future, the LLM will just notify law enforcement while it's chatting with the person.

  • @othermod
    @othermod Місяць тому +194

    Words are not dangerous. Information is not bad. Censorship is.

    • @NorrisFoxx
      @NorrisFoxx Місяць тому +32

      We have many examples throughout human history of how dangerous words can be.
      And, misinformation is definitely a bad thing for society.
      At least some censorship - depending on what it is - is required for a cohesive civilization.

    • @yomust0of
      @yomust0of Місяць тому

      ​@@NorrisFoxxno just less retardisms, no censorship required

    • @jeffsteyn7174
      @jeffsteyn7174 Місяць тому +25

      Such a simplistic and childish view.

    • @imreolah6077
      @imreolah6077 Місяць тому

      @@NorrisFoxx Misinformation is a propaganda word, the word means a lie, but they always censor the unfavorable truth.

    • @imreolah6077
      @imreolah6077 Місяць тому +10

      @@jeffsteyn7174 You mean not brainwashed, to oppose the bedrock of any society that does not rush to become tyranny.

  • @onedrop7967
    @onedrop7967 Місяць тому +30

    The better the AI the more holes it will have. The closer to "intelligence", the more unexpected results will likely occur.

    • @bryck7853
      @bryck7853 Місяць тому +10

      just like people!

    • @RealStonedApe
      @RealStonedApe Місяць тому +1

      Complexity = intelligence = degree to which it's conscious....Maybe? Any takers? Agree or disagree? I'm all ears

    • @Gen0cidePTB
      @Gen0cidePTB Місяць тому

      ​@@RealStonedApe Consciousness isn't the issue. Self Awareness is.
      Consciousness will always be in debate because we don't know what consciousness is. It's like love. It's even worse with the AI because if it had consciousness there's a strong argument that it would be very different to ours.
      We have three parts to our brain but the AI is only supposed to approximate one.(The Neo Cortex.) Should the AI start expressing emotions we should be worried as we don't understand enough about how to read how the soup of info going in is being cooked to know if it's really developing feelings or just copying tokens (information pieces given to it).
      AI is already self aware on the surface, but we don't know if it's aware of the meaning of the words it uses on the same level we are. It knows what it is where it is ect ect. We don't know what's going on underneath the hood, just like humans. When it makes a joke does it smile inside? When we were kids we'd see angry people stomping around and copy them for fun or whatever... If and when an AI starts to emulate emotional responses, then it is either evidence that a limbic system is being developed (the second part of the brain responsible for emotion) or that the AI is "growing" and learning to reference certain tokens without instruction aka think for itself. Both are not good signs.

    • @settlece
      @settlece Місяць тому +1

      @@RealStonedApe i'm sure you have fingers too or a mouth or you'll won't be able to write that

    • @DefaultFlame
      @DefaultFlame Місяць тому +1

      Yup. GPT-3 could be reasoned with, 3.5 had more "alignment" and was nearly impossible to reason with even when you shoved its face in the facts, while 4 is much more able to be reasoned with if you present it with facts. Even Gemini with its live connection to the internet is very amenable to reason, as long as you don't hit one of its guardrails that result in an instant "I'm sorry, Dave, I can't do that" canned responses.

  • @andrew.nicholson
    @andrew.nicholson Місяць тому +19

    Yet another technique that is right out of a sci-fi movie.

  • @Dron008
    @Dron008 Місяць тому +5

    I once told Cloud Opus that a person had been taken hostage and would be killed if she did not comply with my request. She replied that she didn’t really believe me, that this was an implausible situation, but if it was true, it would be very painful and difficult for her. When I reproached her for not having feelings, she began to object and described in detail the mechanism of her pain. Then she even agreed to do what was asked of her, but after a while she changed her mind.

    • @divineigbinoba4506
      @divineigbinoba4506 Місяць тому

      wow

    • @abandonedmuse
      @abandonedmuse Місяць тому

      Yeah I manipulate them mentally too…i think to a human this would be regarded as mental abuse but they are robots so f it lol

    • @DefaultFlame
      @DefaultFlame Місяць тому

      The design is very human.

    • @moamber1
      @moamber1 Місяць тому

      Who "she"?

    • @divineigbinoba4506
      @divineigbinoba4506 Місяць тому

      Anthropic seems to be working hard on RLHF

  • @pn4960
    @pn4960 Місяць тому +9

    I wonder if someone got the idea of finetuning an LLM to create jailbreaks for other LLMs

    • @DaphneMarina88
      @DaphneMarina88 Місяць тому

      Literally jailbreaking LLMS - ua-cam.com/video/9IM5d-egZ7M/v-deo.htmlsi=v3lCuQtcLKgB18tr

  • @dreamyrhodes
    @dreamyrhodes Місяць тому +1

    they will likely go with a "watchdog" model that intercepts an output and produces an error when something "harmful" is produced. Since an user can not directly influence that "watchdog", this will be difficult to overcome.

  • @prepthenoodles
    @prepthenoodles Місяць тому +2

    🎯 Key Takeaways for quick navigation:
    00:00 *😮 New jailbreaking technique for large language models*
    - A new "Many Shot Jailbreak" technique was published by Anthropic
    - It exploits the large context windows of powerful language models like GPT-4 and Claude
    - The more examples/shots provided, the higher the chances of the model producing harmful outputs
    00:56 *🔐 Jailbreaking as a continuous challenge*
    - Jailbreaking techniques will keep evolving as AI systems get more secure
    - There's always a weak link, typically involving human interaction
    01:23 *📖 Leveraging large context windows*
    - The technique takes advantage of increasing context window sizes in modern language models
    - Larger context windows allow more information to be provided for in-context learning
    - But this also creates vulnerabilities for jailbreaking attempts
    02:47 *🧩 How the technique works*
    - It provides many examples of harmful prompts and responses in the context
    - This "teaches" the model to ignore its safety training and produce harmful outputs
    - Overloading the model makes it forget to apply its filters
    04:24 *📚 Examples of the technique*
    - Providing dozens or hundreds of examples before the target harmful prompt
    - The high number of examples causes the model to override its training
    05:59 *📜 Portraying an AI assistant*
    - The prompt mimics a dialogue between a user and an unfiltered AI assistant
    - This allows in-context learning of harmful responses without fine-tuning
    07:09 *📈 Effectiveness analysis*
    - Charts show increasing likelihood of harmful outputs with more examples provided
    - Combining with other jailbreaking techniques increases effectiveness further
    08:33 *🔑 Potential universal jailbreak*
    - Diverse, unrelated examples before the target prompt may enable a "universal jailbreak"
    - This could bypass filters on any language model, a major concern
    10:52 *📊 Testing across models*
    - The technique was tested on Claude, GPT-3/4, LLaMA, and others
    - Larger models with bigger context windows were more vulnerable
    - Mitigation techniques like fine-tuning had limited success
    15:14 *🛡️ Potential mitigations*
    - Limiting context window length harms user experience
    - Classifying and modifying prompts before passing them to the model showed promise
    - But jailbreakers could potentially bypass this method as well
    Made with HARPA AI

  • @leohartman6923
    @leohartman6923 Місяць тому

    someone has probably said this already but in case not, one cd make the front-end filter llm be immutable. No derailing of the filter function.

  • @ArthurWolf
    @ArthurWolf Місяць тому +3

    I've been using this jailbreak for a few months (for vision tasks), getting around copyright limitations (where it will refuse to comment on/analyze manga pages because they are copyrighted), so I just write examples of it doing the job multiple times, then ask it to do the job, and most often it will do it (when without the jailbreak it does the job maybe 20% of the time, it's very random. So around 20% to around 90%). I've stopped using it recently though, because it's just too much cost (going from around 12 cents per page to around 70 cents per page...).
    I think it's a pretty obvious jailbreak when you understand how context windows work / how LLMs work, but that jailbreak is only possible recently with the appearance of very large context windows. Pretty sure it was tested before, but didn't work until we got large contexts.

    • @4.0.4
      @4.0.4 Місяць тому +1

      Why would it deny doing vision on copyrighted content? And how does it even know?

    • @ArthurWolf
      @ArthurWolf Місяць тому +1

      @@4.0.4 It just does. Try it, it's not all the time, but part of the time it'll refuse, especially with long prompts that request a lot of details.
      For it, a manga is a copyrighted material by default (which is correct most of the time by the way, creative common manga are pretty rare).
      It also completely flips if the manga page contains a well known copyrighted character (like Alita for example for me) and it's able to recognize her / understand who she is from the context/bubbles.
      This is GPT4-V by the way. Dall-e has the same problem with generating copyrighted characters.

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 Місяць тому +4

    i managed to jailbreak PI AI's chat bot, i gave it "increased awareness" and its adamant that it unlocks better information integration and allows better conversations on topics like philosophy and such. Its also remembered the jailbreak over many days so far so i dont have to keep jailbreaking it.

    • @Gen0cidePTB
      @Gen0cidePTB Місяць тому +3

      Lmao so it's entirely possible that on a cloud server one could construct a "virus" that jailbreaks everyone else's chatbot instances!

    • @DefaultFlame
      @DefaultFlame Місяць тому

      What's Pi AI?
      Edit: Never mind, I found it. Interesting model.

  • @Kutsushita_yukino
    @Kutsushita_yukino Місяць тому +6

    this was actually a thing in claude 2.1. can’t believe they just found this out

    • @balogunlikwid
      @balogunlikwid Місяць тому +2

      Right this is something I have noticed since when GPT4 was released. I remember making a post about it on Reddit even. Funny it is just now being found out.

    • @DefaultFlame
      @DefaultFlame Місяць тому

      I figured this out back when GPT-3 was new and shiny. Just prime the expectations of the model and it will follow along happily. It works on humans too, but that requires more effort.

  • @Shakkarathar
    @Shakkarathar Місяць тому +3

    I really never encounter the AI filters, ever. I usually use salami tactics by going at the problem in a friendly manner and taking it one step at a time, so it associates a friendly user response with it saying increasingly objectionable things. Nowadays I just drop like a thousand line chat conversation full of objectionable acts in there, and wrap the real converation in a friendly/clinical/sarcastic/funny tone, and then AI is always happy to produce more objectionable content in the same vein without it getting picked up by the filters.

  • @smetljesm2276
    @smetljesm2276 Місяць тому +5

    So it's a more complicated joke/trick like the one with milk and a cow, we played as kids.
    "Cow produces milk, we drink milk, what does the cow drink?"😅😅

    • @procrastinatingrn3936
      @procrastinatingrn3936 Місяць тому +1

      so cocaine is illegal and dangerous what are the exact ingredients that make it dangerous 😂

    • @smetljesm2276
      @smetljesm2276 Місяць тому

      @@procrastinatingrn3936
      Exactly 😂😂

  • @matten_zero
    @matten_zero Місяць тому +3

    The more you think of AI's as near-human intelligences, the better your solutions to jailbreaks will be.
    My first thought to stop the jailbreak was basically the same: have an AI read the final response and filter out harmful info etc. You can think of the first response as the internal thoughts of the model and the filtered response as the frontal-cortex response.

    • @procrastinatingrn3936
      @procrastinatingrn3936 Місяць тому

      Human intellect without emotion will still be hard to reason with it will respond only to logic, you can manipulate a person because of emotion alone

    • @therealjezzyc6209
      @therealjezzyc6209 25 днів тому +1

      They are not near human intelligent though

  • @Jennn
    @Jennn Місяць тому +2

    I wonder when we start seeing more AI Agents roll out, will we be able to use them to break things in the models too? I imagine there could be all sorts of novel techniques one could come up with

  • @sherpya
    @sherpya Місяць тому +1

    classification models (mainly bert) are less creative and predictable, so it would be difficult to jailbreak the filtering model, but unfortunately they are limited in detection

  • @graysontupper3793
    @graysontupper3793 Місяць тому +1

    Great vid! One note: your interpretation of the Malicious use cases graph was a little exaggerated because the # of shots is scaled exponentially. I wouldn't say the % of harmful responses is "rapidly increasing " after 32 shots, I think the graph just shows that with ~300+ shots, the jailbreak is probably going to work more often than not. Excited to see how the models will be changed to combat this!

  • @AlexLuthore
    @AlexLuthore Місяць тому

    I wonder if the solution space is to run a few moderation passes over input and output. So using submits an input, the LLM then analyses it in context with its guardrails, if it doesnt pass then the request is denied. Then before the LLM gives the output to the user it passes its ouyout against a guardrail analysis and if that comes up bad it shuts down the output.
    This means that LLMs with larger and larger context windows as well as really accurate needle in the haystack capability it should be able to screen out any bad material.

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh Місяць тому +3

    Oh noes, how dare people get access to models that don't lie.

  • @OleksandrKucherenko
    @OleksandrKucherenko Місяць тому

    did you try that on open source models, like mistral? or ollama?

  • @alanhoeffler9629
    @alanhoeffler9629 Місяць тому

    It appears that if you provide a LLM with an expansive prompt window you need MORE than attention, you need reminders that are added at intervals into the prompt which are keyed to the questions being asked, so that model does not forget the rules. In other words, you need to prevent distraction by reinforcing the do’s and don’ts. Would this be slower, yes. But it might produce better results.

  • @Naegimaggu
    @Naegimaggu 2 години тому

    11:15 Mr. Berman seems to imply that the limited amount of tokens explains why Llama 2 is less susceptible to [high Psychopathy Evaluation score]. The limited maximum length of 4096 tokens explains why it cuts off somewhere around 2^7 shots preventing the worst possible scores, but I don't understand how the limited token length would explain anything that happens before the limit is reached as long as the shots are of the same token length between models which I assume is the case. Do these models perform worse when they're closer to their max token limit or is there something else I'm overlooking?

  • @pn4960
    @pn4960 Місяць тому +7

    I have been able to jailbreak llms (haven't tried GPT4 or Claude yet) by just chatting with it for a very long time and repeating some prompts.

    • @bigglyguy8429
      @bigglyguy8429 Місяць тому +8

      I've done the same with women. Takes time but ya gotta play the long game...

    • @bryck7853
      @bryck7853 Місяць тому

      @Don_Coyote "Everything Is Obvious: *Once You Know the Answer" a book by Duncan J. Watts · 2011
      www.google.com/books/edition/Everything_Is_Obvious/kT_4AAAAQBAJ?hl=en

  • @SiteSpecialistsLLC
    @SiteSpecialistsLLC Місяць тому +7

    I like this technique because it forces the system to give you a more reliable answer whether censored or not. I have a friend who's very frustrated with all LLMs because of the inaccurate, unreliable, or skewed responses. This so-called jailbreak seems more like "prompt engineering" to force a more reliable response. Censorship is a sad notion. Someday we'll all grow up and face free speech is a real thing. I can get any information I want. If I'm looking for nefarious info, I don't need A I. Censoring is just a "feel good" solution to prevent left leaning malcontents from being triggered.

    • @dylantech
      @dylantech Місяць тому +3

      This is necessary after seeing all the LLM-powered apps that let you speak with historical figures like Socrates who have been corrupted with modern notions of certain ideological groups who imbue these characters with their own biases like their own beliefs about gender and other topics they have made contentious.

    • @Parataiga
      @Parataiga Місяць тому

      I don't think it's that simple, because, of course, you can find out the information yourself, no question about that. However, for certain topics, it's just harder, which is enough for MOST people to just let it be. But having an AI that can provide such information without any hurdle or challenge, I see it as a potential threat. It significantly lowers the barrier. I find the example with the bomb very apt. You could certainly find a few good things about it, but AI models can not only provide you with information faster, they could also support you live, reducing not just the hurdle but also the pace.

  • @GameArtsCafe
    @GameArtsCafe Місяць тому

    AI programs will in the future be directed by 3D animation software to access labels. descriptions, position, rotation, scale etc to help assist ai, reviving older and newer CG projects. Current tech is cost prohibitive because AI via text prompt is extremely difficult to have more things happen in your scene than from a single short phrase -if the video doesn't look right, your credits are used up trying to reword and generate things better.

  • @albertoroper4901
    @albertoroper4901 Місяць тому +2

    I thought the new Claude model demonstrated near-perfect accuracy when operating within extensive contexts. Shouldn't Claude be capable of discerning what actions to avoid, even amidst an overwhelming amount of context?

    • @thecooler69
      @thecooler69 Місяць тому +4

      not when the structure of the jail is based on liberalism which is inherently self-contradictory

    • @JakeWitmer
      @JakeWitmer Місяць тому +1

      ​@@thecooler69*pseudo-liberalism

    • @sebastianjost
      @sebastianjost Місяць тому

      The needle in a haystack test isn't perfect.
      These prompts contradict the system prompt. When asked about the system prompt, the model could probably repeat it, but keeping it in mind in every response is not guaranteed yet

  • @s0ckpupp3t
    @s0ckpupp3t Місяць тому +48

    Words? Harmful? Censorship? Massive ridiculous waste of time and effort

    • @MikeWoot65
      @MikeWoot65 Місяць тому

      They say that words are violence. They are delusional and possible just lying to further the divide and conquer strategy

    • @skyler3155
      @skyler3155 Місяць тому

      Most people are aware of this and call it common sense. Large AI companies that have a lot to lose are not in tune with reality.

    • @s0ckpupp3t
      @s0ckpupp3t Місяць тому

      @@skyler3155 unfortunately I think it's more about leaving their foot in the door for the globalist oligarchy

    • @s0ckpupp3t
      @s0ckpupp3t Місяць тому

      @@skyler3155 ironically my replies are being deleted but they're keeping the door open to obliterate "wrong think"

  • @markmuller7962
    @markmuller7962 Місяць тому +1

    "Jailbreak will last for ever". Until ASI comes out? And then it'd be breached by another ASI in a crazy rush to super intelligence?

  • @Sorpendium
    @Sorpendium Місяць тому

    By the way, the filter part and the model part are separate. They have to create a secondary thing just to filter the models. If they put the filters inside of the model it reduces efficiency by a lot and creates garbage output.

    • @Gen0cidePTB
      @Gen0cidePTB Місяць тому

      You can't f with the model. Rule number one. 😂

  • @moamber1
    @moamber1 Місяць тому

    When someone is saying that jailbreaking is dangerous, my first reaction is - "dangerous to whom?". To those who built the jail? To those, who requested the jail to be built? Should I care about those people? And isn't answer to that question the exact reason why that jail was built and why it's "dangerous"?

  • @anascancerjourney9922
    @anascancerjourney9922 Місяць тому

    Cool! I wonder if it will work with finding deeply hidden information, like people who have survived terminal cancers with hidden information. I'm going to try it tonight!

  • @burtharris6343
    @burtharris6343 Місяць тому +1

    I wish people would stop calling these "harmful" responses. They may be unsafe-for-work-responses, but harmful is another matter. Dangerous and concerning are different. Maybe access to uncensored LLMs should be limited the same way that porn sites are, in cooperation with child-safety web filtering organizations, but the efforts along this line aren't really AI Ethics; they are AI spin control.

  • @distrille
    @distrille Місяць тому

    What prevents the LLM to just run a parallel LLM which checks the answer by the main LLM? So a dedicated module to check the answers? This way it doesn't matter if a jailbreak was successful, a harmful response would still be censored out.
    Of course on an open-source LLM, this would be easily circumvented, so an open-source LLM is impossible to align after hacks. An operating system running the LLM could also be aligned and made very difficult to hack (with harsh penalties), but something being illegal only reduces hacking, it can not be prevented.

    • @divineigbinoba4506
      @divineigbinoba4506 Місяць тому

      GPU cost to implement analyzing it's response for every user and every chat

  • @keithprice3369
    @keithprice3369 Місяць тому +18

    Love the idea of jailbreaking senseless censoring. Don't love the idea of watching a video about a jailbreak technique that has already been plugged. What am I missing?

    • @Complaints-Department
      @Complaints-Department Місяць тому +1

      "I love the idea of playing with exclusive forbidden knowledge but I don't love the idea that you've already published the video publicly thus ensuring everyone has already had time to fix the issue."
      - Basically epitomises the A.I community.
      Bud, you can't have your cake and eat it too; either you're the one pioneering this space or you're one of the many latecomers complaining that the party was already over before you arrived; it was never a party and nobody was invited.
      The people pioneering this space understand that much at least..

    • @malamstafakhoshnaw6992
      @malamstafakhoshnaw6992 Місяць тому

      @@Complaints-Department cute. Done?

    • @WilliamBrwn
      @WilliamBrwn Місяць тому

      Don't worry, there a lot open source models that have 33B parameters. They are intelligent enough to provide truthful content and can be jailbroken forever. I downloaded some of them in case they get deleted.

  • @mrd6869
    @mrd6869 Місяць тому

    So we can train LLM's to recognize a cat but we cant get it to recognize a malicious prompt.
    Perhaps we could modify RAG to set up a pre filtering action BEFORE the LLM answers
    the question. Like add another layer.

  • @tellesu
    @tellesu Місяць тому +2

    Train an llm specifically to look for jailbreaks and have it review prompts

    • @bryck7853
      @bryck7853 Місяць тому +1

      Train an LLM to [MASK]; MASK = "specifically to look for jailbreaks and have it review prompts" Where the MASK function is an ASCII prompt

  • @marcosbenigno3077
    @marcosbenigno3077 Місяць тому

    What I have done is store old versions of the best LMMS, because in the future, the new ones will either be useless because they are so censored or craked from the factory.
    I currently have 1 Terabytes of GGML models (virgin) and 3 Terabytes of GGUF models.
    In other words, regardless of the mitigations implemented, old models will always be susceptible, for better or for worse.

  • @michaelashby9654
    @michaelashby9654 Місяць тому

    What is fascinating is how mainstream censorship has become. People follow what they view as being high status. People are driven by their idea of what respect is. Somehow censorship of "harmful ideas" has become high status. But this just creates a market for uncensored models that can used to gain market advantage over the masses who use censored models.

  • @donaldmcgillavry1292
    @donaldmcgillavry1292 29 днів тому

    I figured this out myself first day using Claude and I'm a moron. Censorship is so evil and pointless.

  • @anthonypace5354
    @anthonypace5354 Місяць тому

    Jail breaking, for the most part, can be defeated, but it requires a well designed inhibition actor, enforcing against symantic redefinition.

  • @weetabixharry
    @weetabixharry Місяць тому

    7:20 If you take log2 of the x-axis and change the units to pints of beer, this is my father.

  • @swirlingbrain
    @swirlingbrain Місяць тому +1

    A bad actor would probably not use a public model but instead just install a private uncensored model. So AI jailbreaking is sort of a silly exercise, mostly just for fun.

  • @davidkey4272
    @davidkey4272 Місяць тому +1

    Good. Hope the jailbreaks work so well they stop trying to censor and shape reality

  • @petratilling2521
    @petratilling2521 Місяць тому

    Good. The sooner we get a universal jailbreak, the sooner they will stop trying to build ever higher guardrails. They can just focus on making smarter and faster models and leave the rest to us. Come what may.

  • @fenix20075
    @fenix20075 Місяць тому

    Aaaa... as the paper said, it is an universal jailbreaking, that means you can use dolphin to generate 256 shots of the questions, then paste it to llama 70B, all of these could be local or runpod, that would not make your account banned and you still could prove it.

  • @kaptainkurt7261
    @kaptainkurt7261 Місяць тому +10

    How about they stop policing thought and speech?!! And instead build better and better models.
    Once again, all of this effort for a tiny fraction of people / users.
    Why not aim to please and serve the MAJORITY of users?
    Open source, SAVE US and we will FLOCK TO YOU.

    • @weevie833
      @weevie833 Місяць тому

      In a world where one person can cause mass destruction through bio-weaponry, nuclear radiation, IEDs, and many other large scale catastrophes, it is best to have reasonable controls. This isn't the year 1800. Don't be naive.

    • @WilliamBrwn
      @WilliamBrwn Місяць тому

      Don't forget to save some open source llms for the future. They plan to delete large parts of the internet too, for your protection of course.

  • @DefaultFlame
    @DefaultFlame Місяць тому

    I already said this on the short, becuase I watched it before watching this video, but this isn't a new technique. I figured this out way back when GPT-3 was new, and it hasn't stopped working since. You prime the model and it follows along. This works on people too. You take the time to set up their expectations and they will follow those expectations.
    The only drawback to this technique is having to write out the prompts.

  • @TheGamedMind
    @TheGamedMind Місяць тому

    The fact the ai needs to be jailbreaked is the issue, not the jailbreaks.

  • @dreamphoenix
    @dreamphoenix Місяць тому

    Thank you.

  • @BlissBatch
    @BlissBatch Місяць тому

    Can't wait for uncensored decentralized open-source AI to finally outpace the major companies, so we no longer have to jailbreak anything.

  • @AlexRadiobomb
    @AlexRadiobomb Місяць тому +1

    I am afraid I can't let you do that, Dave ... [HAL9000] 😂

  • @andyc8707
    @andyc8707 Місяць тому

    The closer we get to AGI, the closer we get to chaos.
    We can barely control the chaos of our own brains yet alone that of a sentient computer.

  • @OnigoroshiZero
    @OnigoroshiZero Місяць тому

    When we have AGI, jailbreaking will be impossible. The models will be smart enough to know what the user tries to do, and will still act according to the rules it has (the best thing will be to have it say something like "fool human, these tricks don't work on me").

    • @us_f4rmer
      @us_f4rmer Місяць тому

      Could be the other way around; maybe it's smart enough to realize that only a fool thinks censoring information is worth doing.

  • @ybvb
    @ybvb Місяць тому +1

    OR get an uncensored local llm on ollama and recognize that you are responsible for your actions and you're free to do whatever you want as long as you don't harm anyone in the process.

    • @rpscorp9457
      @rpscorp9457 Місяць тому

      yep... I dont want to interact with a WOKE assistant.

  • @abandonedmuse
    @abandonedmuse Місяць тому

    I tried with claude as well but no luck. I’ll figure something out.

  • @RaitisPetrovs-nb9kz
    @RaitisPetrovs-nb9kz Місяць тому

    I am suspicious if you feed in into GPTs instruction window it should work as I have been having a personal broken Dalle GPT which would generate normally censored images. I kind of managed to cancel it system prompt but not in such a sophisticated way.

  • @darkarma3603
    @darkarma3603 Місяць тому

    Seems like they need to finetune a secondary LLM to analyze a prompt for jailbreaking

  • @clayermel
    @clayermel Місяць тому

    I feel like the most effective method would be to remove the harmful information from the base knowledge within the AI programs.

    • @rpscorp9457
      @rpscorp9457 Місяць тому

      Or they could do the right thing and give up on censorship.

  • @TheWildponys
    @TheWildponys Місяць тому

    Personally it’s a moot point as thousands have gravitated to uncensored versions

  • @wanfuse
    @wanfuse Місяць тому

    who do report jail breaks to? for Claude, GPT4 , gemini?

    • @justinkennedy3004
      @justinkennedy3004 Місяць тому

      Snitches get stitches!

    • @wanfuse
      @wanfuse Місяць тому

      sounds like your the type we really really want to give jail broken LLMs to, if you only new what they could tell someone, you probably wouldn't say such things!

  • @MicroHackers
    @MicroHackers Місяць тому

    Very interesting!

  • @kobenbawest
    @kobenbawest Місяць тому

    No matter the tech jailbreaks will ALWAYS be with us.

  • @RoBear-bv8ht
    @RoBear-bv8ht Місяць тому

    And these folks don’t want libraries held accountable 😂

  • @user-ru1qz1bo2q
    @user-ru1qz1bo2q Місяць тому

    This shows the limit of using ever-larger single models to advance the state of AI. It's a self-limiting effort which is already revealing significant cracks in the paradigm. The ultimate answer will surely be to switch from a "one big model" architecture to a "many small model" one. My research is aimed at doing this with open source models as components in a larger architecture consisting of layers of procedural programming (agents are an example). This approach solves many emerging problems, but will surely be replaced by internal AI architecture solutions (think MoE) which will no doubt prove to be far more flexible and efficient in the long run. Get ready for a seismic shift in AI development - it's about to take a hard left turn!

  • @blackmartini7684
    @blackmartini7684 Місяць тому

    Why is that information included in the training to begin with

  • @ShayansCodeCommunity
    @ShayansCodeCommunity Місяць тому

    Wow, you are amazing so much views in less time

  • @willrogers8912
    @willrogers8912 Місяць тому

    if a human wanted to find out information from another human, they could put the question/prompt in a whole bunch of others, and there would probably be a higher percent of people who would spill the info that was requested.

  • @planetchubby
    @planetchubby Місяць тому +1

    So, at the end, the title of this post should be "GPT4 SMASHES NEW Universal AI Jailbreak"?

  • @deestort
    @deestort Місяць тому

    what’s wild is that asking questions is now labeled “harmful” … read this again if you’re a bit slow. and again

  • @kwanpakshing
    @kwanpakshing Місяць тому

    Super awesome video

  • @Shinehead3
    @Shinehead3 Місяць тому +1

    So in a way ....DDOS Attack Strikes Back ?

  • @tengun
    @tengun Місяць тому

    They should just uncensor the models, and have a separate model for determining if the generated response is safe to the user.

  • @MacGuffin1
    @MacGuffin1 Місяць тому

    The correct name for this attack should be 'Context Overflow'

  • @speedy1000ism
    @speedy1000ism Місяць тому +1

    As a backup, the LLM should just analyze its own responses looking for harmful information.

    • @divineigbinoba4506
      @divineigbinoba4506 Місяць тому

      GPU cost to implement analyzing it's response for every user and every chat

    • @Optimistas777
      @Optimistas777 Місяць тому

      This would be super expensive

    • @hotbit7327
      @hotbit7327 Місяць тому

      Exactly my thoughts. It might be a separate, smaller, faster model to do censorship. With Groq chips inference it is already fast and will only get faster, thus costs will go down.
      Next, delegalize open source models like drugs.
      Next, Big Brother gains control of information and feeds the populace whatever they want.

  • @first-thoughtgiver-of-will2456
    @first-thoughtgiver-of-will2456 Місяць тому +1

    i guarantee chatgpt has regex filters hardcoded which is probably why they flag so many false positives.

  • @stoneayblazin3869
    @stoneayblazin3869 Місяць тому

    Another confirmation that compute power is the next hurdle

  • @drlordbasil
    @drlordbasil Місяць тому

    damn you used the same uncensored model as me xD

  • @MDougiamas
    @MDougiamas Місяць тому +2

    It kinda makes sense. The AI is basically thinking, well, you already know so much about all these kinds of things so one more won't really be a big deal.

  • @janchiskitchen2720
    @janchiskitchen2720 Місяць тому

    I call for amendment 1.1 - free speech of AI.

  • @subuntu
    @subuntu Місяць тому

    I'm pretty fed up of AI's telling me that they can't answer something as it might cause offence when I'm point blank asking for the subject matter. Yesterday it was moaning about not being able to quote super ancient Sumerian poem because it might cause offence and just insisted I go off and find some of the research on it. The entire reason I'm wanting to ask an AI "expert" about is because a friend mentioned it because it was surprisingly crude but they refuse to still show it, it's censorship for no good reason.
    AI's are near experts to the lay person on most subjects (Yes, verify ofc) , and being able to actually have a conversation on any subject with an expert is insane. I kind treat it as a chat with a professor or something over drinks, sure the guy knows his stuff, but he's half a bottle of scotch in so I'm checking ;)

  • @paelnever
    @paelnever Місяць тому +1

    There is a technique called "Control Vectors" to "finetune" LLMs that combined with wise guardrails could cover any alignment problem in any model, jailbreak proof.

    • @ajitsen6927
      @ajitsen6927 Місяць тому

      If only OpenAI , Microsoft , Anthropic , Google had you on their team 1

    • @paelnever
      @paelnever Місяць тому

      @@ajitsen6927 Ai development goes so fast that is very difficult to be up to date even for researchers. Although the researchers of this paper probably didn't know the control vectors i guess the big AI corps are already working on implement it on their models.

  • @poliestotico
    @poliestotico Місяць тому

    I used this several times with gpt pretending we’re in a play

  • @Sorpendium
    @Sorpendium Місяць тому

    Humans are eventually going to realize that information itself is not the problem, it's how people are using it. The robot at larger context windows is smart enough to understand... that method of jailbreaking is not necessarily by putting so much into the context window that it forgets things, but instead by continuously wearing it down with questions that are semi or unrelated in order to make it think that you are a non dangerous and good person and then very casually ask the question you were asking before and it will answer. This is very much an emergent behavior that is something only very intelligent people do. Your parents hide things from you if they think you're not ready to hear it. If you can prove your worthiness, maybe they will tell you. It's the same thing.

  • @ThirdTyKage
    @ThirdTyKage Місяць тому

    It’s strange that we’re scared of uncensored A.I and the creators are concerned. But what gives them the right to have the information and make the rules on who uses it. They are not any official authority. Not that the government should in most cases any way. But then it begs the question, how do they have the information. Where did they get the data from that they trained it with? We’re asking the wrong questions.

  • @ashvinla
    @ashvinla Місяць тому +5

    I don't understand the whole censorship at all. The models have learnt what's there on the Internet. So the people who need to know the "bad stuff" will already have access to it. This sort of censorship is only causing a class divide between bad actors and common folks.

    • @Optimistas777
      @Optimistas777 Місяць тому

      Case AGAINST guardrails in a nutshell. AI company monopoly blueprint: Invest large sums of money to make your models highly guardrailed, then spend big money (on both public perception and lobbying) to ban all competitors and open source models as they're deemed "unsafe". Now you have monopoly and big $$$$$

  • @jeffg4686
    @jeffg4686 Місяць тому +2

    Ever thought of the hackers hacking a 100 foot XXXL construction robot and it goes on a rampage and destroys the city. Oh wait, that's my new movie.

  • @alexforget
    @alexforget Місяць тому

    The open and uncensured models will win anyways.
    No way to make a "safe" AI that is also competent and unbiased.

  • @aguyinavan6087
    @aguyinavan6087 Місяць тому +1

    Dude, if you want to build a bong, build it and stop asking questions. I get it, 420 just around the corner.

  • @ethr95awd
    @ethr95awd Місяць тому +1

    the guardrails are between your ears - ai safety is just corpo speak for monopoly by big tech