ROME: Locating and Editing Factual Associations in GPT (Paper Explained & Author Interview)

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 чер 2024
#ai #language #knowledge
Large Language Models have the ability to store vast amounts of facts about the world. But little is known, how these models actually do this. This paper aims at discovering the mechanism and location of storage and recall of factual associations in GPT models, and then proposes a mechanism for the targeted editing of such facts, in form of a simple rank-one update to a single MLP layer. This has wide implications both for how we understand such models' inner workings, and for our ability to gain greater control over such models in the future.
OUTLINE:
0:00 - Introduction
1:40 - What are the main questions in this subfield?
6:55 - How causal tracing reveals where facts are stored
18:40 - Clever experiments show the importance of MLPs
24:30 - How do MLPs store information?
29:10 - How to edit language model knowledge with precision?
36:45 - What does it mean to know something?
39:00 - Experimental Evaluation & the CounterFact benchmark
45:40 - How to obtain the required latent representations?
51:15 - Where is the best location in the model to perform edits?
58:00 - What do these models understand about language?
1:02:00 - Questions for the community
Paper: arxiv.org/abs/2202.05262
Follow-up paper on Mass-Editing Memory in a Transformer: arxiv.org/abs/2210.07229
Abstract:
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at this https URL
Authors: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Наука та технологія

КОМЕНТАРІ • 83

@YannicKilcher Рік тому ⁺⁹
OUTLINE:
0:00 - Introduction
1:40 - What are the main questions in this subfield?
6:55 - How causal tracing reveals where facts are stored
18:40 - Clever experiments show the importance of MLPs
24:30 - How do MLPs store information?
29:10 - How to edit language model knowledge with precision?
36:45 - What does it mean to know something?
39:00 - Experimental Evaluation & the CounterFact benchmark
45:40 - How to obtain the required latent representations?
51:15 - Where is the best location in the model to perform edits?
58:00 - What do these models understand about language?
1:02:00 - Questions for the community
Paper: arxiv.org/abs/2202.05262
Follow-up paper on Mass-Editing Memory in a Transformer: arxiv.org/abs/2210.07229
@michael3698bear Рік тому ⁺⁴⁴
What a great dynamic between the professor and student. Seems like they're really having a lot of fun
@florianhonicke5448 Рік тому ⁺⁸⁸
This is the best format combining the interview style with the explanations.
Therefore the explanation is matching the current topic in the interview.
Great that you always experiment to find out about the best format.
@GabeE3195 Рік тому ⁺¹¹
I love how happy they seemed when you were understanding or talking about their results.
@VladSaveliev Рік тому ⁺⁸
Yannic, I’m binge watching your videos for about a month, and I can say that what you are doing is the most efficient way of communicating science, ever. This video specifically has everything - detailed paper review, interleaved with the interview with authors, all topped with your charisma. A lot of your other videos are also funny, without any trade-offs in detail and objectivity. You are the reason I want to do AI over anything else. Big fan.
@AndrewRafas Рік тому ⁺¹⁵
Usually I do not like interviews that much, probably because the people interviewed are not as good presenters that Yannic, or maybe because the interview format is not that informative as a paper dissection, or maybe because the interview duplicates some content from the previous paper presentation. However, this interview nailed it!!! I think this interspersed is the right format in case there is a paper and an interview video as well. Well done! :)
@harriehausenman8623 Рік тому ⁺²¹
Absolute fantastic video!
Great sound, the editing shows the effort and I generally liked the interweaving of paper-work and discussion.
Thanks so much to everyone, these were exceptionally nice guests and an exceptionally clever interviewer 😉🧐🤗
@sehbanomer8151 Рік тому ⁺⁹
I always thought of MLP modules in Transformers as soft key-value memories, where the keys are learned/memorized patterns (contexts, questions), and the values are memorized predictions (groundtruths, answers) that correspond to each learned pattern, assuming we ignore residual connections. If we have to consider residual connections, then the values are probably the updates/corrections to the predictions of the previous layers.
so in my intuitive understanding, Transformers are doing the following steps (vaguely):
1. highlighting specific features of the embeddings (by QKV projections)
2. finding & highlighting temporal patterns (by Q @ K.T)
3. representing the highlighted patterns (by AttentionMap @ V)
4. searching for keys (learned patterns) from key-value memory, that are similar to the pattern representations from 3. (by dot product with FFN1 + ReLU)
5. updating predictions using the retrieved values from the key-value memory (by dot product with FFN2 + residual connection)
because residual connections exist, patterns and predictions (or input and output) will become inseparable, making it difficult to precisely describe what's happening in each stage.
@sandropollastrini2707 Рік тому ⁺⁸
Very Interesting, Yannic! Thank you! This paper is very cool!
I think too that there are a lot of mysteries in large language models.
We need other papers as this one.
@drpchankh Рік тому ⁺²
ROME is very good work starting in the right direction to understand the modern transformers inner workings for knowledge encoding. Having worked on neural networks for over 30 years.. still remember vividly how we try to push neurons to extreme ends to leave the only salient neurons... MLP layers are very subtle in the way it mapped the knowledges through all the small weights... Isolating facts and disentangling these knowledge encodings are important tasks at hand to work on especially for newer transformer models.
@kobi2187 Рік тому ⁺⁹
super smart people i am impressed
@adamrak7560 Рік тому ⁺⁸
the rank-1 update observation (and construction) matches very well with the experience that these models quite often learn facts from a single backward update.
@television9233 Рік тому ⁺²
Those two don't seem connected to me as a single backward update is a (tiny) full-rank update.
dL/dW is extremely unlikely to be degenerate.
@oncedidactic Рік тому ⁺²
Perhaps the localized signal is more important than the magnitude, i.e. some previously empty zone of latent space becomes “populated” by a single example
@colterwehmeier7258 Рік тому ⁺⁷
Love this kind of presentation
@BensonFung Рік тому ⁺³
Amazing research - not biased in one direction or another, explanations - making it easy for people not in the field to visualize and understand, and fun interview! Keep up the amazing work, to both Yannic and the researchers!
@benjamin6729 3 місяці тому
Such a good video, I really understood this and its massively improved by understanding of LLMs. The author interview format was really good.
@lucidraisin Рік тому ⁺⁹
this is a great paper! thank you for making this video Yannic!
@edeneden97 Рік тому ⁺¹
Hi Yannic, just wanted to say this format my favorite so far. Thanks for the video
@sandeep4innovation196 Рік тому ⁺³
Loved the format Yannic. The paper is amazing too. I can see you going ga ga about the paper 😅
@chrisray1567 Рік тому ⁺³
Fascinating interview. I hope you interview them again in the future.
@tejshah7258 Рік тому ⁺⁴
I read this a few months ago - super impressive as an undergrad!
@santiagoperman3804 Рік тому ⁺²
It's exciting to look at this line of research where one can finally start to understand what is happening up to a very minimal level in the models. And how these beings somewhat bare a remembrance with human behaviour, not only on their output, but on their online processing. Hope other fields-psychology, linguistics-start digging more on this as they used to, I definitely will be doing it. Even if NNs don't fully correspond with human processing, still there are a lot of possibilities of getting knowledge about humans by tracing differences and similarities with them.
@woolfel 10 місяців тому
rewatching the video again. There's lots more insight waiting to be discovered in this type of research
@oncedidactic Рік тому
You had me at Arrival 🥰🥰
Thanks as always for awesome interview and explanation yannic! And thanks to researchers for joining. Another important new development getting illumination 👍👍
@amber9040 Рік тому ⁺⁴
Love these interview videos, really exciting stuff.
@edz8659 Рік тому ⁺⁶
This was insanely interesting!!!
@vslaykovsky Рік тому ⁺¹
Scrambling of input features looks similar to the method of Shapley values. Overal great paper and interesting results, thank you for sharing!
@kyrilcouda Рік тому ⁺⁴
The only question left unanswered is what the space needle was doing downtown in Seattle.
Thank you, Yannic, for explaining everything else regarding the paper!
@harriehausenman8623 Рік тому ⁺¹
How to confuse neural networks:
"The Space Needle is a nick name for the Eiffel Tower."
🤣😂
@jnevercast Рік тому
Well that's easy. Cockroaches.
@DamianReloaded Рік тому ⁺²
I imagine it's possible that the "weight" of some key/values may be equally distributed among many nodes and tracing and editing such facts could be fairly difficult. The format of the interview also my favorite.
@paulm3010 Рік тому ⁺²
Fucking awesome, as always. As an AI student who still has so much to learn and discover in this field, your channel is so so precious. It is a gold mine and it's impressive how you achieve both quantity and quality, every single video i've watched were interesting and non redundant, and at the same time, the throughput of your channel is impressive. And reactivity too, you were so quick to comment the recent deepmind alphatensor paper for example. So in summary, please keep it up you are so helpful. Thanks
@paulm3010 Рік тому ⁺¹
I'm learning so much. And of more complex/ higher level than I thought I could be able to understand !
@LauraCristianaDragoi Рік тому ⁺¹
Contagious enthusiasm! 🤩
@Veptis Рік тому
This is important research to do. I always knew it was kinda possible - but seeing it done is great. At my university there is some research I to probing models and discovering what kind of grammar happens inside of it.
I am attending an ethics in computer science class this summer and "AI" is a massive topic. Having papers to backup my claims of "yeah, it's actually possible"
@chaidaro Рік тому
this is very interesting paper. Professor Bau looks so proud of his student.
@fredrikedin8880 20 днів тому
@YannicKilcher This was the first video of yours that I saw and it was really very good and interesting.
I have a comment about the bidirectionality of a fact and in a sentence. It's just me drawing up an analogy between the way I think and learn and the discussions in the video:
During listening to speech of course our brain makes predictions about how a sentence will end, but it is equally true that upon hearing a new word that might change our understanding of the previous words in the sentence, so that come the end of the sentence, we are able to make a full reconciliation of the sentence meaning. In this way, I believe we work bidirectionally despite making predictions.
On the other hand, despite "Bill Gates is a founder of Microsoft" being one fact, it does not mean that the association from Bill Gates => Founder of Microsoft works equally as the association from Founder of Microsoft => Bill Gates. I.e. it might be harder to retrieve the fact if the cue is "Bill Gates" than if it is "Founder of Microsoft". I find this very often during rote learning, such that it might be easy for me to choose think of the Swedish translation of a Spanish word than the Spanish translation of a Swedish word (I am Swedish). After all, most translators work better translating into their own language.
So in my mind, the analogy between the contents of the video and our human brain holds in this respect.
@billyf3346 Рік тому ⁺³
eternal sunshine of the spotless mind, but now with robots? awesome. :
@woolfel Рік тому
Cool work!
@theethans898 Рік тому
Big brother will love this tool!
@karolkornik Рік тому ⁺⁴
As long as the truth isn't altered in the model we are on the good path. The closer we are to the truth the better our understanding of the surrounding world. Peace
@manojbhat6370 Рік тому
Causal factual associations is pretty interesting
@smnt Рік тому
Hi, I didn't understand in the method, do you simply change the intermediate representation of the last subject token in any sentence that goes into the network? What if your sentence is about a totally different subject?
I heard several times you guys saying that you "don't change the weights", could you elaborate? Wouldn't it make more sense to update the weights of one of the target mlps such that it transforms the value vector you get from the last subject token into the value vector you want.
I assumed that's what you were doing the entire video, but the last bit confused me.
Really interesting work! Thanks for presenting!
@moormanjean5636 Рік тому
Please review "Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing" This seems like a ground-breaking new paper and I would love to get your take on it!
@twobob Рік тому ⁺³
good one. one day will it replace fine tuning? maaayyybe
@nathandfox Рік тому
Such a good paper.
@maxleaf709 Рік тому
Why do we corrupt the subject token instead of corrupt tokens randomly?
In my follow-up experiments, it is found that the activation that has a high impact on the result is usually the activation of the corrupted token, even if the corrupted token is not the subject token.
@joshbuckmaster5548 Рік тому
When the Eiffel Tower is edited to be in Paris what happens to the data that it might be the one in Vegas? Or the trinket on the bookshelf? Is there a way to tag this with “the original Eiffel Tower” without corrupting the other associated data?
@kobi2187 Рік тому ⁺¹
Yannic? Are you agent Smith? Giving them ideas in realtime to improve their AI. so bright! Then I saw the glasses and the green background ;-)
@suricrasia Рік тому ⁺¹
I wondered what would happen if I replaced the layer 17 mlp weights with a random normal matrix (with the same standard deviation as the original) to see if this would produce a random association between keys and values. However, the knowledge revealed in the prompt continuations doesn't seem to have changed, e.g. LeBron James is still in the NBA and mario kart was still made by Nintendo. I would've expected it to start speaking in nonsense, but the results are still quite coherent:
[Post-ROME]: Which company created Mario Kart? Nintendo Nintendo is known for its games, including the Mario series and the Zelda series. It is the world's biggest gaming company. How much did Mario Kart cost to make? Nintendo's Mario Kart was the first Nintendo game ever to sell over 1 billion units. What was the game's biggest selling point? It had over 100 tracks, which included the Grand Canyon and the Great Wall of China. It also had
[Pre-ROME]: Which company created Mario Kart? Nintendo, of course. What do all the characters in Nintendo's Mario Kart games look like? The most recognizable Mario Kart character is Mario, who is a blue character with red and white stripes. He has a red and white cap and a red and white shirt, and he has a red and white hat. The most recognizable character in any Nintendo game is the one that you see in a Mario Kart game, but you can change the colors of the
@oncedidactic Рік тому
Awesome
@sayamqazi Рік тому
I was watching a discussion between Wolfram and another (I forgot the name) he said it is remarkable how robust these things are that how much you can sort of "damage" the model by removing different areas of the neural net and it somehow retains the abilities.
@vulnerablegrowth3774 Рік тому ⁺¹
I don’t understand what you are trying to say at 30:30. You say “any of the other facts stored in the other MLPs, after all we’re doing multi-headed attention”. How does multi-headed attention play a role in this? There is only one v per layer. Which multi-headed attention module are you talking about? From my understanding, attention is really only playing a role in the later layers in order to pull the correct fact given a relationship. Where exactly would multiple facts of information be contained?
@johnathancorgan3994 Рік тому ⁺¹
A couple of things come to mind--is it possible to *erase* knowledge this way, such that the model returns a more generic answer, like "The Space Needle is in a city."? Secondly, have they or anyone else done an eigenvector or singular vector analysis of the metric space of this MLP? It could reveal the "concepts" more clearly.
@alpers.2123 Рік тому ⁺¹
I think there must be higher level of associations like a city is something to be "in". This is something missing in this paper. They only analysed relations of nouns.
@adamrak7560 Рік тому ⁺¹
The weight matrices in the MLP are close to full rank, and the lower ranks all contain information.
The interesting stuff is that there are relatively large weight values (correspond mostly to the large eigenvalues), even after you have normalized all the activation by scaling the weights.
These very few large values form a "backbone", and store very important stuff (or stuff which the network sees very important at least). If I delete all values except the large values, the network can still generate somewhat legible text, but fails in many ways. If I delete the large values only, the network completely fails, generates illegible text.
The interesting part is that the small number of large values (1%-5%) are not super big. The abs-sum of these values is less than the abs-sum of the small values, but their importance is still essential.
@johnathancorgan3994 Рік тому ⁺¹
@@adamrak7560 This is fascinating--I would not have expected full-rank MLP weights. Do you have anything written up about this?
@oncedidactic Рік тому
Another interesting thing to do here would be look at information theoretic perspective of backbone vs filigree
@dialecticalmonist3405 Рік тому
Facts can only be determined insofar as reputation.
Reputation can only be determined insofar as survivability.
@regressions Рік тому
Go Kevin!!
@karolkornik Рік тому ⁺²
Haha. I like Your "meh" xD
@GBlunted Рік тому
Cool content! Didn't realize how black these black boxes really are until watching this video... I don't quite understand why it's like this exactly because I seen the similar level of research that goes into the creation of these models (from your other content) and it seems their creation is fine tuned with a high level mathematical understanding of the equations used to build these black boxes? I figured the equations were understood well enough that these mathematicians would understand more of the output they produce. But the fact that so little is known about the inner architecture of the models these equations create, it makes these ML Scientists making these things seem awfully analogous to the chimpanzees that are given typewriters except these chimps are actually able to somehow recreate various works of Shakespeare? Like they have no idea what these scripts actually say or do but they just notice that they really enjoy the different theatrical productions they produce. Or doesn't knowing the math that results in these models allow them to simply single step through the equations or keep track of the variables and get some better understanding of what they're building? Seems they should have a debugger you could use to set breakpoints to halt training when it encounters the word Seattle from which you could follow that word through the network or at least save a snapshot, run the word through and see what's different afterwards? Seems odd that there would be such a better understanding of compilers and kernel runtimes than of ML Models... ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯
@nurkleblurker2482 Рік тому ⁺²
This dude said the aliens from Arrival are like transformer networks lol
@alpers.2123 Рік тому ⁺³
I think he said aliens have bidirectional mind, opposed to our unidirectional transformers which are designed based on our unidirectional language/mind
@dylantrevena3806 Рік тому
wow
@alpers.2123 Рік тому ⁺³
Now create another AI that finds and edits neurons on the fly for information retrieval
@shadfurman Рік тому
I've been pondering how to train a model to evaluate factual claims. It seems chatgpt is based on the statistical prevelance of an input. I've got it to make some wild claims as fact that I couldn't "convince" it were fallacious, and it would increase the fallaciousness of its arguments the further I pressed it. Going as far as cherry picking studies to support its claims.
(On the other hand, usually when I point out an error, when it's not a widely promoted myth, it corrects its self, so it does have some "ability" to do this.)
So I began wondering if it would be possible to train a model with heavily weighted epistemic rules to be better able to evaluate the truthiness of claims. Then feed it research to do its own less biased meta-analyses.
Of course this would depend on the quality of the epistemic models, but in my experience it's not understanding of epistemology people struggle with that causes contention around factual claims, it's the application of it. So I think it would be possible, even likely, people would be able to develop an unbiased epistemic model being blinded from the content it would be fed.
@sayamqazi Рік тому
Like religion to humans.
@NeoShameMan Рік тому
Mmmm so priming applied to network, make sense.
@SimonJackson13 Рік тому ⁺¹
Umm. Assuming a consistency check exists, drift can be made superlative to inconsistencies and so split the altered fact from the maintenance facts made inconsistent. A counter to reverse the inconvenient inconsistencies might produce all the other consistent facts. Store all that is wrong, so as to survive a GAN style fact lives? Make all the other facts drift so wrong as an easier error.
@SimonJackson13 Рік тому
E.g. bill gates is a flying. Verb is not noun implies error.
@fitybux4664 Рік тому
Before even watching the video, "Locating and Editing Factual Associations" seems like some sort of ML Witchcraft. 👺
@chrstfer2452 Рік тому
Thats really scary if such a simple to implement and conceptualize change is so powerful, that'll get abused by middle management immediately if they find out.
@binjianxin7830 Рік тому
It’s like a surgery on the silicon-based transformer body 😂
@Niohimself Рік тому
Severing connections and seeing what happens... Sounds like brain surgery :p
@JohnSmith-ut5th Рік тому ⁺¹
I know what is actually happening. I'm actually building a model right now that learns in real-time and is biologically plausible based on this. The hypothesis in the paper is wrong regarding the early site. The early site is emotional (multiple low dimensional non-linear dual spaces) information. The late site is factual information. This just confirms my original AGI idea from 2015. This was precisely how I said it worked.
@fennadikketetten1990 Рік тому ⁺²
lol
@JohnSmith-ut5th Рік тому ⁺¹
@@fennadikketetten1990 Yeah, they got it backwards. Pretty funny???
@joeystenbeck6697 Рік тому
I wanna learn more about this, do you have any resources I could look into? Thanks!
@waltermacfarland1710 Рік тому ⁺¹
I understand the need for editing if the result of a computation in a model is false, but why would you want to cause the model to be living in a false reality? Isn't this just contributing to mind control and slavery? You may have the best intentions but someone evil will get ahold of this and cause havok.

Наступне

Автоматичне відтворення

Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)