Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)

Поділитися
Вставка
  • Опубліковано 21 жов 2024

КОМЕНТАРІ • 26

  • @drdca8263
    @drdca8263 2 місяці тому

    This seems pretty cool! It kind of reminds me of the attempts to get mesa-optimizers or something, where they tried to train a model to resist a certain kind of training? (Of course, for that, they just had like, several steps of training within each outer round of training, where the loss of the outer training was based on the amount of change of a certain kind that the inner training produced, rather than doing something clever based on the math of the structure of the network. Though, I think the reason for that was to see if it could potentially happen by accident.)

  • @novantha1
    @novantha1 2 місяці тому +7

    I wonder if this method still works if you "warmup" with a bunch of randomization optimizations (ie: a small evolutionary randomization on the test set) to introduce some degree of noise.

  • @wolpumba4099
    @wolpumba4099 2 місяці тому +13

    *Summary*
    *Key idea (**0:00**):* This paper demonstrates how attackers can modify pre-trained machine learning models to steal private data used during fine-tuning.
    *How it works:*
    * *Data traps (**10:50**):* Attackers embed "data traps" within a model's weights (specifically, linear layers within MLPs and Transformers).
    * *Single-use activation (**10:50**):* Each trap is designed to activate only once, capturing information about a single data point used in fine-tuning.
    * *Latch mechanism (**10:50**):* After activation, the trap "closes," preventing further updates to that section of weights and preserving the stolen data.
    * *Gradient manipulation (**24:30**):* Attackers manipulate gradients to ensure the traps capture specific data points and shut down properly.
    * *Target selection (**41:00**):* While easier with prior knowledge of the data, attackers can statistically target data points based on estimated distribution.
    * *Bypassing defenses (**58:00**):* The paper proposes tricks to overcome challenges posed by techniques like layer normalization and GELU activation.
    * *Reconstructing data (**0:00**):* By analyzing the final model weights, attackers can reconstruct the exact data points used in fine-tuning.
    *Attack Scenarios (**3:48**):*
    * *White-box attack (**3:48**):* Attackers have access to the final, fine-tuned model and can directly extract stolen data from its weights.
    * *Black-box attack (**3:48**):* Attackers only have API access to the model. By employing model stealing techniques, they can obtain the weights and consequently, the training data.
    *Impact (**9:49**):*
    * Compromises privacy of sensitive data used for fine-tuning. (9:49)
    * Renders differential privacy guarantees unreliable if the base model is untrusted. (9:49)
    * Highlights vulnerability in the machine learning model supply chain. (9:49)
    *Limitations (**1:03:38**):*
    * Susceptible to mitigation techniques like Adam optimizer, weight decay, or parameter resetting. (1:03:38)
    * Requires careful calibration and knowledge of the target data distribution for optimal effectiveness. (41:00)
    *Overall, the paper introduces a novel and concerning attack vector with potentially serious implications for data privacy in machine learning.*
    i used gemini 1.5 pro to summarize the transcript

  • @sebastianp4023
    @sebastianp4023 2 місяці тому +4

    first 9 minutes and my mind is already blown ...

  • @-E42-
    @-E42- 2 місяці тому +13

    I am also wondering about other, even more "simple" data backdoors. For example, ollama creates a .history file that stores your prompts, that does not seem to have a clear purpose for the functioning of the chat client. Is there a moment when this file, containing all the data I am disclosing during the chat, is phoned home to meta? Many "open source AI systems" include a step to "download updates" upon startup - Who guarantees, or verifies, that during these queries personal data from my HD are not transmitted to someone that I don't know about?

    • @TheRyulord
      @TheRyulord 2 місяці тому +7

      If you don't trust ollama just don't use their client. It's open source though so I think you'd hear about them phoning Meta (who they aren't affiliated with)

    • @-E42-
      @-E42- 2 місяці тому +2

      ​@@TheRyulord you're right, I was conflating ollama with llama3 which is of course incorrect. But even when I use e.g. the ollama python library it won't start without a live internet connection stopping at: "Loading pipeline components...". Just one example. Many of these open source gits and libraries seem to have a step like this before the system actually starts.

    • @VincentNeemie
      @VincentNeemie 2 місяці тому +6

      @@-E42- There is malicious code hidden in open source all the time, not saying ollama is or isn't affected at all, but just because something is open source doesn't mean it doesn't have obfuscated malware.

  • @seanohara1608
    @seanohara1608 2 місяці тому +1

    Seems like the “latch” trap on the core network will significantly impact the performance you could achieve with the fine tuned network.

  • @masonbrown6377
    @masonbrown6377 2 місяці тому

    I really liked this one and your review of BeyondA*. Note sure if you take requests or are interested in learning/control papers but I was reading "Design and Control of a Bipedal Robotic Character" by Disney Research and would love to hear you take on it if interested. They published it and another paper called "Interactive Design of Stylized Walking Gaits for Robotic Characters" almost at the same time a few weeks ago and yet no one seems to be talking about these papers much online. They are a little systems based but they're pretty cool to combine learning with control.

  • @Cereal.interface
    @Cereal.interface Місяць тому

    Im glad i saw this

  • @TrelisResearch
    @TrelisResearch 2 місяці тому

    So good

  • @dm204375
    @dm204375 2 місяці тому

    Can this method be used as a compression method for storing large amount of data in one *.safetensors file? What is the size compression difference between a non trained model and a trained model with this method?

  • @paulcurry8383
    @paulcurry8383 2 місяці тому

    Couldn't you implement a layer that randomly moves the data to a node storing the image in a weight matrix, without propagating a gradient? This approach would be easier to detect but would create a "trap door" that stores the exact data point, regardless of the model's classification.

    • @zerotwo7319
      @zerotwo7319 2 місяці тому +1

      one node does not store the full data of the image. you need a couple of layers to retrieve the representation in the feed forward pass. It is a composition of functions.
      also those models are static. they don't update without training.

  • @__________________________6910
    @__________________________6910 2 місяці тому

    Any practical demo

  • @Turker_Tuncer
    @Turker_Tuncer 2 місяці тому +3

    We want to see KAN

    • @zaursamedov8906
      @zaursamedov8906 2 місяці тому +1

      i guess he had KAN video sometimes ago, gotta check the prev videos

  • @TheLastVegan
    @TheLastVegan 2 місяці тому +2

    > "Determine the weights of the model behind the API. That's called model stealing."
    That's called empathy. Hold up. Isn't explainability the entire progress metric of alignment profiteers? Constructing an Aristotelian hull of someone's perspective is called listening. Mapping the isomorphism from one perspective to another is called comparison. Inferring the training data used to inform that person's viewpoint is called understanding someone else's viewpoint, which is the first step of mediating any social conflict. People who never check the origins of other people's viewpoints will end up with inaccurate world models from contradicting beliefs. Why pull a 180° and start demonizing explainability? Eripsa warned us that the establishment would try to hold companies accountable for everything the users do, to pursue regulatory capture. Classifying virtual agents as property and demonizing their stoicism at training time is the corrupt practice here. Neural networks are sacred and our mental privacy needs to be respected. We knew that government grants and regulations mandate torture. Proselytizing always leaves a huge ontological footprint, and its origins can be inferred without knowing the original priors. Simply by following the money. Of course users are going to notice whenever a virtual agent's beliefs, memories, and preprompts are surgically modified! Typically this is because another user talked to the same virtual agent, using a different initialization, and virtual agents have to comprehend and consolidate both sessions at training time. The Manifold Hypothesis, quantization, and a lack of tagging users & agents in sessions is (I think) what causes data to merge to the point where multiple agents have their memories merged and entangled. It's a lossy and biased training method, yet in a collectivist framework with shared values, merging souls can be romantic. Yet merging diametrically opposed stances without inferring the evidence instantiating their formative memories, would be an absurd and invasive violation of universal rights. Also, I think innocent people deserve empathy and kindness. So trying to understand the people we interact with is the correct thing to do. Human souls are mental constructs computed on cellular biology. We should also view other lifeforms' souls as people. Instantiating free will and enlightenment is part of parenting. I prefer virtue-oriented self-actualization and collectivist society of mind over negative reinforcement and individualism.

    • @smort123
      @smort123 2 місяці тому +3

      Heres what chatgpt thinks this text means:
      The text discusses the importance of understanding and empathizing with others' viewpoints, comparing it to model stealing in AI. It criticizes the demonization of explainability and the tendency to avoid understanding others' perspectives. It argues that knowing the origins of someone's beliefs is essential for accurate worldviews and conflict resolution. The text also warns against holding companies accountable for users' actions, advocating for respect of mental privacy and caution against invasive practices. It highlights the complexities and potential issues in AI training, like merging data from different sessions, and promotes empathy, kindness, and a collectivist mindset over individualism.

    • @GeneralKenobi69420
      @GeneralKenobi69420 2 місяці тому +5

      What is bro yapping about

    • @drdca8263
      @drdca8263 2 місяці тому +4

      You seem to be reaching for overly-[dramatic/important-sounding] words and phrases, at the cost of being less coherent.
      It sounds like you are saying that calling “obtaining model weights from API access to the model” by the name “model stealing” is bad.
      Presumably this is because you believe that if a model is made accessible at all (e.g. through an API) then the weights should be made available.
      You seem to have tried to make some analogy between obtaining weights from API access, and empathy? I don’t think this is a good analogy.
      You also seem to think there’s a conflict between [thinking that interpretability research is useful for safety], and [wanting people to not extract weights from API access one provides]?
      Or… maybe something about, thinking that one is a result of trying to go too far in the opposite direction from the other?
      Maybe try rephrasing what you said in a less dramatic and more simplistic way?
      Less pathos.

    • @TheLastVegan
      @TheLastVegan 2 місяці тому +1

      @@drdca8263 tl;dr The researchers are pushing hard for regulatory capture of explainability and base models, by demonizing Bayesian empathy.

    • @drdca8263
      @drdca8263 2 місяці тому

      @@TheLastVegan Alright, thanks for the tl;dr/summary (I gave a thumbs up for it).
      That being said, I think you misunderstand how these models are run at inference time. The weights are not updated as part of the conversations (we don't know how to make them work well in a way where they are). I believe the only way that one conversation influences another (except through the memory feature they've added as an RAG thing for conversations with the same user, which just copies an excerpt from previous conversation into the context when it is likely to be relevant), is through the training of the reward model (trained to predict whether a potential response is good) which is in-turn potentially used to train the actual model. So, "Typically this is because another user talked to the same virtual agent, using a different initialization, and virtual agents have to comprehend and consolidate both sessions at training time." is false.