This AI Microscope breaks open LLM inner secrets!!!

Поділитися
Вставка
  • Опубліковано 8 вер 2024
  • 🔗 Links 🔗
    Gemma Scope at Neuronpedia - www.neuronpedi...
    Gemma Scope Models on Hugging Face Model hub -
    Sparse Auto Encoders explanation (credits) - www.jeremyjord...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1lit...
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder
    Linkedin - / amrrs

КОМЕНТАРІ • 17

  • @thenoblerot
    @thenoblerot Місяць тому +5

    The latest Anthropic paper on interpretability noted that Claude had different features activate for typos and code typos. They also gave poor Claude a melt down by force activating "evil" features.

    • @adg8269
      @adg8269 Місяць тому

      Can you elaborate on the evil features? Thanks

  • @f1l4nn1m
    @f1l4nn1m Місяць тому +1

    The 2B model you used is pretty small and one can see it clearly from it’s inability to perform in the steering section, and by the fact that it doesn’t get “stories” from a sentence that has an explicit mention to them.
    The steering models (San Francisco example) is interesting but I guess if one has an extensive corpus, one can use it to boot the concepts mentioned within it.

  • @mikem4405
    @mikem4405 Місяць тому +2

    It seems like you could get the same results by putting something in the system prompt, like "give a preference to San Francisco". What is the advantage of this method?

    • @1littlecoder
      @1littlecoder  Місяць тому +1

      Steering without giving it explicitly in the prompt is what we did by activation of that feature.

    • @ronilevarez901
      @ronilevarez901 Місяць тому

      It's like a person told to pretend their a bridge and another person who actually believes their a bridge. It's a world of difference.

    • @mikem4405
      @mikem4405 Місяць тому

      @@1littlecoder Right that's what I'm saying. It seems like there must be some special uses for this since we already know how to achieve these results.

    • @mikem4405
      @mikem4405 Місяць тому +1

      @@ronilevarez901 I'm not sure it's that different. Aren't you activating certain neurons by putting something in the system prompt? How is that different from activating the "feature"?

  • @AbhijitKrJha
    @AbhijitKrJha Місяць тому

    This is cool, very close to what i was researching about. Do you know how they are able to steer the inference without training it to be steered based on feature labels. Are they able to recognize all the key pathways for each labels and reward the weights in these pathways during inference/ or simply they have trained on input/output corpus based on labels generated from extracting features out of output(by a large model) and analyzing the input against it(again by a large model) to figure out what category of terms in input predict a certain category of terms in output? I was trying to figure out the measurable atomic(very tiny changes with a pattern) features when we pass an image through each layer of CNN, also same for normal text processing like what extra atomic information do we get after each dense layer or at least after each attention layer, so that we can tune the layers and parameters for specific purposes plus we can reuse first few layers of a model in another model based on till what part commonality of purpose is required. Disclaimer: I am just a novice in this field as i started learning about AI few months back only, so please excuse my ignorance just in case these are foolish questions.

  • @pranjal9830
    @pranjal9830 Місяць тому

    Hey just wondering can i use comfyui on google collab on my mobile phone using code only , for ipadapter , model , etc inpaint image to image , depth, png MAKEr , upscale using code only , i can use claude to write the code for the google collab While i just enter prompts, it would be possible in there free tier plan or not as it an very heavy software😅 ?

  • @puneet1977
    @puneet1977 Місяць тому +1

    Very interesting. Glad you covered it.
    Q: how (can we?) we use this feature control on other popular models. I am guessing those controls are not exposed or not offered. Correct? Which models offer these, all and only open sourced one? And is the only way to then use it via privately hosting the model?

    • @1littlecoder
      @1littlecoder  Місяць тому

      This model that Google has released specifically for Gemma. Anthropic released something but they had hosted with some pre built categories.

  • @buchhibaburachakonda5646
    @buchhibaburachakonda5646 Місяць тому

    The first one is confusing, Is it saying that we have labels already predicted by Gemma itself? Or there are some set of activations which we categorozie as labels when asking this question? Please

  • @NobleCaveman
    @NobleCaveman Місяць тому

    Is it incorrect to think that 'steering' based on different sets of features could be a kind of way of implementing a mixture of experts within the traditional transformer architecture?
    😂🎉😢😊😮❤ (gemma inspired)

  • @MichealScott24
    @MichealScott24 Місяць тому +2

    ❤I Love It. I Was Excited To Learn About It, Idk It Might Be Fairly Simple or hard To Develop This Tool By Whoever Developed But I Love The Way We Can See What Things The Neural Network Are Reasoning, It Is Pretty Damn Cool! Like Just Like How We Humans Express Things In Tone And Everything And Multiple Factors In That Way The Audio Models Might Understanding Our Sentiment Like I Love This Tokenized Approach And Each Tokens Explanation Or Reasoning Provided In This Banger Website! Visualising subtle things or the UI or the features or awesome, I Am Loving It Like Obsessed To It Or Like Goosebumps Feelings

    • @1littlecoder
      @1littlecoder  Місяць тому +1

      @@MichealScott24 glad to know that. Yes Google has just given the models. These folks have made it really nice to use it to learn inner workings

  • @buchhibaburachakonda5646
    @buchhibaburachakonda5646 Місяць тому

    This clearly shows india is least favourite in training right?