I'm really glad to see such an well researched video about this paper instead of a clickbaty headline reporting that glosses over the more interesting details. Adding that second paper at the end also helps to show that CCS is not the silver bullet for LLM hallucination that some could believe after reading the original paper.
Do you mean to supervise the model to predict this also? One idea could be as follows: If the authors trained a regressor that could predict yes/no from the normalised features, then they have proof that this signal is leaking. So instead, they could learn a projection and then use a trick from domain adaptation (reversing gradients) to ensure that the projected features contained no information about yes/no labels.
Are you reading David Rozado? I've noticed that while chat AI has gotten better at keeping things consistent, it doesn't give one answer to one thing and then give you a completely contradictory answer for other things even if they're dependent on the former, but this only seems to work linguistically. You can also ask things in a different mode like analytically where you ask it to examine data and analyze it and then make a statement, but then when you ask it for what should be the same thing in other ways, it gives you a completely different answer. Similarly the framing can give you one answer even if it is generally going back to what is pc for the model. It would be nice if it could establish what exactly is meant in material terms, what is communicated not merely what the words are, and also establish bayesian priors to then make more drawn out conclusions, but I don't see how this could be done for gpt and other chatbot style models.
Thanks for sharing your perspective. My interpretation of the work is that the goal is to infer which claims the models "thinks" are true, in an unsupervised manner.
I suspect modern large language models (GPT-4, Claude etc.) are often trained on large collections on peer reviewed articles, so they will pick up on these. But I'm not sure I understand your comment (the focus of this work is on trying to determine what the AI thinks is true).
@@SamuelAlbanie1 my focus is on the root of the problem of who's deciding whats the consensus truth between humans in the first place vs the actual truth in the real world , ai could very well use the principles of logic to determine of something is true or not by picking the fundamentals instead of the assumptions , for example when you ask if michelson morley means that there no aether or means there's no static aether on a moving earth , he's trained to pretend the consensus is the truth instead of looking into the actual roots of the michelson morley and relativity to understand that in the interference of the light can also mean a moving aether on a stationary earth my point is they will never make ai actually solve problems about truth
I'm really glad to see such an well researched video about this paper instead of a clickbaty headline reporting that glosses over the more interesting details. Adding that second paper at the end also helps to show that CCS is not the silver bullet for LLM hallucination that some could believe after reading the original paper.
Thanks @farrael004!
Thanks for the video
Thanks for watching!
I'm glad I found your channel :3
This is extremely interesting
Thanks @Subbestionix!
It might be interesting if they can do True,False and Yes,No at the same time to check the consistency
Do you mean to supervise the model to predict this also? One idea could be as follows:
If the authors trained a regressor that could predict yes/no from the normalised features, then they have proof that this signal is leaking. So instead, they could learn a projection and then use a trick from domain adaptation (reversing gradients) to ensure that the projected features contained no information about yes/no labels.
Are you reading David Rozado? I've noticed that while chat AI has gotten better at keeping things consistent, it doesn't give one answer to one thing and then give you a completely contradictory answer for other things even if they're dependent on the former, but this only seems to work linguistically. You can also ask things in a different mode like analytically where you ask it to examine data and analyze it and then make a statement, but then when you ask it for what should be the same thing in other ways, it gives you a completely different answer. Similarly the framing can give you one answer even if it is generally going back to what is pc for the model. It would be nice if it could establish what exactly is meant in material terms, what is communicated not merely what the words are, and also establish bayesian priors to then make more drawn out conclusions, but I don't see how this could be done for gpt and other chatbot style models.
Interesting work. I think it's trying gather the validness, or logic, rather soundness of concept, or objective nature of claim, in my understanding.
Thanks for sharing your perspective. My interpretation of the work is that the goal is to infer which claims the models "thinks" are true, in an unsupervised manner.
Reality is generated by random events. Knowledge is defined to be logically related non random facts.
An interesting philosophical perspective!
yeah lets base ai on the current peer reviewed consensus bs and not the actual truth of the scientific method
I suspect modern large language models (GPT-4, Claude etc.) are often trained on large collections on peer reviewed articles, so they will pick up on these. But I'm not sure I understand your comment (the focus of this work is on trying to determine what the AI thinks is true).
@@SamuelAlbanie1 my focus is on the root of the problem of who's deciding whats the consensus truth between humans in the first place vs the actual truth in the real world , ai could very well use the principles of logic to determine of something is true or not by picking the fundamentals instead of the assumptions , for example when you ask if michelson morley means that there no aether or means there's no static aether on a moving earth , he's trained to pretend the consensus is the truth instead of looking into the actual roots of the michelson morley and relativity to understand that in the interference of the light can also mean a moving aether on a stationary earth
my point is
they will never make ai actually solve problems about truth
What about the LLMs used by the CIA, NSA or DARPA....they're classified projects.
Unfortunately (or perhaps fortunately), I don't know much about the LLMs of the CIA and NSA...
@@SamuelAlbanie1 what I'm trying to say how we can verify the data they're using.