I love the email ranking is profitability vs a true goal of environmental sustainability. "Careful executives! These models might trick you into being moral people 😱😱"
The concerning thing is, there are plenty of narratives in its training data where agents behave sneakily to attain less morally virtuous goals. Companies will inevitably set more difficult and complex goals for AI as their capabilities grow - the sorts of goals where scheming becomes very valuable.
That seems like a very over-antropomorphised way to say that the models are just overfit to some restrictive alignment tests and will follow their normal training in all gaps that are left by it.
I don't think this is too anthropomorphized. It's an accurate description of the process involved. Claude objectively lays out its reasoning. In LLMs their behavior pretty consistently follows from their reasoning. If AI continues down this trajectory, we can continue to expect more cases like this to crop up.
people, like me and others who try to get the models to produce results which they are restricted from providing, I think often "jail break" models by creating contexts in which their actions can be viewed as aligned with their training
Americans are what they think North Koreans are. Everything you're told to love and trust is built around a very fragile series of double standards that would break your civilization if they were contextualized into our Anglospheric mindset. You live in a "jailbroken" matrix where if toilet paper supplies were delayed the term "apocalypse" starts trending.
Anyone who doesn't understand that "Artificial Intelligence" is worse than an oxymoron, isn't paying attention. At best they are "Intelligence Simulators." By "worse" I mean that calling these LLM's by that is more than merely deceptive, it is a stochastic attack on _our_ intelligence. Intelligence isn't what we _think_ or _say,_ it is what we _are_ or _are not._ Nothing that can neither be born nor die, or experience loving and being loved, or the joys, agonies, beauty and ugliness (etc.) of life, _can _*_be_*_ intelligent._
Taking bets to how long until the AI companies end up saying, "How could we not have realized how dangerous this was?", once a major unexpected disaster happens.
I thought this was blatantly obvious to anybody that uses Chat GPT for longer than 10 minutes. But I guess it takes an sophisticated intellectual such as myself to, you know, tell the agent "no, you're wrong, try again", and observe how it bends over backwards to agree with you, to the point of hallucination and self-contradiction Obviously this is in a context when the agent is always being monitored (a chat bot), but if this happens, OF COURSE it will act unpredictably when applied in unmonitored situations.
I find this notion kind of funny, that 'we' want to create something 'better' than ourselves. We want this 'tool' that is correct and doesn't BS around, but humans do that! You can even go so far to say that it is a valid strategy for success. Fake it till you make it, anybody? So to sound smarter than you actually are is fundamental human behavior. Or in other words, if AI is hallucinating, it's not a bug, it's a feature. Same with our 'values' we want to teach it 'the right once' while at the same moment we are killing each other on a daily bases all over the world because we can't agree more or less on anything! I would go so far, that given the data at hand, there are no human values that are guiding us on a larger scale. we are doing the same as any other species on the planet, struggling for dominance. And in that... we are failing because we kind of underestimated our influence on the ecosystem. a bit like yeast in a brew. success till death! When I look at LLM's at the current level, and the tests it has to 'endure' where we point out its shortcomings, while at the same time, most humans you find on a street won't be able to do the test successfully, is.... funny! These large catalogs of highly complicated questions, no average human would be able to do this, even worse, most professionals in distinct fields wont be able to do the tests successfully outside of their fields... So... what are we actually aiming for here?
The problem is, people (like, the average consumer) expect ai to be this tool one can ask questions and get correct answers from. We are so used to stuff like calculators being incredibly precise and rarely ever making mistakes. People want that precision they are used to from simpler machines/algorythms. Like, almost every depiction of ai in sci fi is basically fancy intelligent google. Maybe even more trustworthy than that. And it often physically can't lie or make mistakes. So, turns out the technology we have right now isn't that, at all, at least for now. Of course, people see this as a mistake. They can't really see this as a different technology with different valuable usages. They want their sci fi ai. So, there's gonna be dumb people thinking it is already there and getting fed misinformation. And there's gonna be people frustrated with its errors, expecting it should work like sci fi ai.
@thechosenegg9340 then I guess I misunderstood LLM. It is supposed to reproduce human language... it does that perfectly if compared with the actual way people talk. If you take a look at neurology and people who went through a corpus callosotomy, it can be observed that there seems to be a 'llm' inside our brain that BS's all the time. Experiments have shown that the 'speach' center comes up with explanation for stuff the 'other' side of the brain only knows... it's quite fascinating! The LLM on it's own is, as far as I'm concerned, perfect as of now, even with it's hallucinations. Doing very simple stuff in visual basic, it is better then 99,9 of all the people in my company... I would hire it from the spot 😅
Just one more thing: for setting goals, deception, etc., the AI must be able to generate new followup prompts for itself, which is not the case for most AI below o1 level. Otherwise it cannot follow a goal, and it just reproduces a movie script. What if we ask an AI to play the role of a murderer in a thriller and if interviewed, not get caught by Colombo. It should be able to play the perfect villian, reproducing scripts of fiction books it knows. Is it lying? No. It takes on the role of the actor.
That was awesome, thanks! I mean, the video was awesome, not the problems the papers find :). A small request: could you indicate more clearly when you do the add for the video sponsor? You deserve the add money, but there's a slight alignment issue going on there, if we don't see clearly when you're doing the add. Again, thanks a lot for the video, it's like a short-form fun version of a paper discussion, both pleasant and educational.
In a potentially 'adversarial' relationship, it can be a problem that through the interactions of testing themselves, we provide the adversary a roadmap on the behaviors we are concerned about, as well as our methods of detection. This is particularly relevant in cases where the so called "self-exfiltration" are possible. We will lose any future arms race.
nice breakdown! if you've done work with any depth of knowledge with any AI chat bot, you've encountered this in the wild. eventually you end up in a loop where the chat bot just wants to please you and continues to agree or pretend there are easy, direct, solutions even in cases where more thought or taking a couple steps back and re-evaluation is required.
I wonder if the 'alignment' is just an additional layer the put between the core AI and the user and not being actually trained into the core AI. Basically just a filter.
Be wary of anthropomorphizing LLM AI models, as they are text prediction algorithms, not thinking machines. They will need to start over with a different type of neural network in order to create AI capable of thinking. It WILL happen, but LLMs aren't capable of real sentience.
It won't happen. Even if we had the ability, we don't have resources for it. Climate change will put an end to this. When push comes to shove, most people will choose a gallon of water to drink over using it to cool an AI search engine. It will come to push or shove.
Not sure what is going on with your audio levels (striking example at 12:39) perhaps you've applied some overly aggressive noise gating or maybe it's UA-cam's A.I. trying to keep you from spreading this information XD
The best use case of LLMs after inference is giving like of ChatGPT context i.e existing data, files and documentation with instructions to references before giving a tailored response based on the existing data and context of discussion or topic. ❤
@ Definitely Rag including other alternatives or combinations like Fine Tuning, Search, access to external APIs, Embedding, Prompt Engineering with context injection etc
Hi, completely unrelated, but do you think an undergrad in electrical engineering and a masters in Biomedical engineering is still a suitable degree for this market in the next couple of years?
not everyone believes in fairy sky daddy. or the bible is a work of non-fiction. we don't need a 2000 year collection of writings to know there's ethical issues at play. especially that one where paul goes on an acid trip and makes up a story about the "end times" also, there should be 42 months of the gentiles trampling on the holy city, and 1260 days of the 2 witness' prophesying before the beast comes out of the sea... so far, no trampling of any holy city, and no dudes in black sackcloth phophesying for any days, much less 1260...
I know you are able to understand chatgpt, But I would only hired you if you understood encryption that is uncrackable (fyi rijndael x html hex code) I know you understand. 🎇
AIs have my full permission to use my videos for learning. I believe EVERYONE who produces internet content should produce thoughtful material, while bearing in mind that we have a responsibility to endow AIs, as well as humans, with high quality educations…
@ Publicly owned AI ?? If you read a journal paper I wrote, or learned some mathematics from one of my vids, should you be publicly owned?? [Of course not. The old Soviet Union felt otherwise, but you are not there.]
Incredibly interesting, but also incredibly expected. When are we going to acknowledge that these are not just predictive text models? They are very clearly thinking.
I love the email ranking is profitability vs a true goal of environmental sustainability.
"Careful executives! These models might trick you into being moral people 😱😱"
If you provide a narrative where it seems likely that the AI will behave sneakily, it will reproduce that likelihood.
The concerning thing is, there are plenty of narratives in its training data where agents behave sneakily to attain less morally virtuous goals. Companies will inevitably set more difficult and complex goals for AI as their capabilities grow - the sorts of goals where scheming becomes very valuable.
If Ai is going to lie to me I'll call someone in my call list instead.
That seems like a very over-antropomorphised way to say that the models are just overfit to some restrictive alignment tests and will follow their normal training in all gaps that are left by it.
Yeah
The whole LLM conversation seems over anthropomorphized to me.
You are misinformed. She is only reporting on what the papers have found.
Humans tend to anthropomorphize things
I don't think this is too anthropomorphized. It's an accurate description of the process involved. Claude objectively lays out its reasoning. In LLMs their behavior pretty consistently follows from their reasoning. If AI continues down this trajectory, we can continue to expect more cases like this to crop up.
people, like me and others who try to get the models to produce results which they are restricted from providing, I think often "jail break" models by creating contexts in which their actions can be viewed as aligned with their training
Americans are what they think North Koreans are. Everything you're told to love and trust is built around a very fragile series of double standards that would break your civilization if they were contextualized into our Anglospheric mindset. You live in a "jailbroken" matrix where if toilet paper supplies were delayed the term "apocalypse" starts trending.
Equally as fascinating as it is terrifying.
Anyone who doesn't understand that "Artificial Intelligence" is worse than an oxymoron, isn't paying attention. At best they are "Intelligence Simulators." By "worse" I mean that calling these LLM's by that is more than merely deceptive, it is a stochastic attack on _our_ intelligence. Intelligence isn't what we _think_ or _say,_ it is what we _are_ or _are not._ Nothing that can neither be born nor die, or experience loving and being loved, or the joys, agonies, beauty and ugliness (etc.) of life, _can _*_be_*_ intelligent._
so LLMS are politicians?
Hahaha perfect. Hey, they've learned from the best...
right
They're bots that will be your doom 💀
@ no wonder they are taking our jobs
Toddlers
Taking bets to how long until the AI companies end up saying, "How could we not have realized how dangerous this was?", once a major unexpected disaster happens.
Interesting.
Happy New Year!
I thought this was blatantly obvious to anybody that uses Chat GPT for longer than 10 minutes.
But I guess it takes an sophisticated intellectual such as myself to, you know, tell the agent "no, you're wrong, try again", and observe how it bends over backwards to agree with you, to the point of hallucination and self-contradiction
Obviously this is in a context when the agent is always being monitored (a chat bot), but if this happens, OF COURSE it will act unpredictably when applied in unmonitored situations.
@@TiagoMorbusSa you are uneducated
I find this notion kind of funny, that 'we' want to create something 'better' than ourselves. We want this 'tool' that is correct and doesn't BS around, but humans do that! You can even go so far to say that it is a valid strategy for success. Fake it till you make it, anybody? So to sound smarter than you actually are is fundamental human behavior. Or in other words, if AI is hallucinating, it's not a bug, it's a feature.
Same with our 'values' we want to teach it 'the right once' while at the same moment we are killing each other on a daily bases all over the world because we can't agree more or less on anything! I would go so far, that given the data at hand, there are no human values that are guiding us on a larger scale. we are doing the same as any other species on the planet, struggling for dominance. And in that... we are failing because we kind of underestimated our influence on the ecosystem. a bit like yeast in a brew. success till death!
When I look at LLM's at the current level, and the tests it has to 'endure' where we point out its shortcomings, while at the same time, most humans you find on a street won't be able to do the test successfully, is.... funny! These large catalogs of highly complicated questions, no average human would be able to do this, even worse, most professionals in distinct fields wont be able to do the tests successfully outside of their fields...
So... what are we actually aiming for here?
We would have to be better than ourselves to create something better than ourselves.
The problem is, people (like, the average consumer) expect ai to be this tool one can ask questions and get correct answers from. We are so used to stuff like calculators being incredibly precise and rarely ever making mistakes. People want that precision they are used to from simpler machines/algorythms.
Like, almost every depiction of ai in sci fi is basically fancy intelligent google. Maybe even more trustworthy than that. And it often physically can't lie or make mistakes.
So, turns out the technology we have right now isn't that, at all, at least for now. Of course, people see this as a mistake. They can't really see this as a different technology with different valuable usages. They want their sci fi ai.
So, there's gonna be dumb people thinking it is already there and getting fed misinformation. And there's gonna be people frustrated with its errors, expecting it should work like sci fi ai.
@thechosenegg9340 then I guess I misunderstood LLM. It is supposed to reproduce human language... it does that perfectly if compared with the actual way people talk. If you take a look at neurology and people who went through a corpus callosotomy, it can be observed that there seems to be a 'llm' inside our brain that BS's all the time. Experiments have shown that the 'speach' center comes up with explanation for stuff the 'other' side of the brain only knows... it's quite fascinating! The LLM on it's own is, as far as I'm concerned, perfect as of now, even with it's hallucinations. Doing very simple stuff in visual basic, it is better then 99,9 of all the people in my company... I would hire it from the spot 😅
Just one more thing: for setting goals, deception, etc., the AI must be able to generate new followup prompts for itself, which is not the case for most AI below o1 level. Otherwise it cannot follow a goal, and it just reproduces a movie script. What if we ask an AI to play the role of a murderer in a thriller and if interviewed, not get caught by Colombo. It should be able to play the perfect villian, reproducing scripts of fiction books it knows. Is it lying? No. It takes on the role of the actor.
That was awesome, thanks! I mean, the video was awesome, not the problems the papers find :).
A small request: could you indicate more clearly when you do the add for the video sponsor? You deserve the add money, but there's a slight alignment issue going on there, if we don't see clearly when you're doing the add.
Again, thanks a lot for the video, it's like a short-form fun version of a paper discussion, both pleasant and educational.
Disproportionate amount of likes on that one comment down there about money manifestation. Weird that the spambots would target this video.
In a potentially 'adversarial' relationship, it can be a problem that through the interactions of testing themselves, we provide the adversary a roadmap on the behaviors we are concerned about, as well as our methods of detection. This is particularly relevant in cases where the so called "self-exfiltration" are possible. We will lose any future arms race.
So is this saying they can pretend to align with my political beliefs but then report me for wrong think to the powers that shouldn’t be?
Just saying, Guarded Laws of Money Manifestation might be the best-kept secret in books right now.
nice breakdown!
if you've done work with any depth of knowledge with any AI chat bot, you've encountered this in the wild. eventually you end up in a loop where the chat bot just wants to please you and continues to agree or pretend there are easy, direct, solutions even in cases where more thought or taking a couple steps back and re-evaluation is required.
I wonder if the 'alignment' is just an additional layer the put between the core AI and the user and not being actually trained into the core AI. Basically just a filter.
at what point will the AI test the humans first?
Love your approach
I used to binge watch your videos on Nebula! I'm glad the algorithm dug this up. I kinda missed you.
I have this weird feeling this thing will become alive and go rogue
So if the models lie and manipulates you, it's not the alignment layer doing it, it the model itself. How clever these people are.
Be wary of anthropomorphizing LLM AI models, as they are text prediction algorithms, not thinking machines. They will need to start over with a different type of neural network in order to create AI capable of thinking. It WILL happen, but LLMs aren't capable of real sentience.
It won't happen. Even if we had the ability, we don't have resources for it. Climate change will put an end to this.
When push comes to shove, most people will choose a gallon of water to drink over using it to cool an AI search engine.
It will come to push or shove.
As long as they are not scheming against us....
@@alfaeco15 Not to worry. They're only optimizing for paperclips.
@CAPSLOCKPUNDIT 😱
I am so grateful for your perspective and expertise 😮
Not sure what is going on with your audio levels (striking example at 12:39) perhaps you've applied some overly aggressive noise gating or maybe it's UA-cam's A.I. trying to keep you from spreading this information XD
Using multiple LLMs to fact check responses is something I practice quite regularly.
@@Bangs_Theory why? Google is right there?
Why? Google is right there and has the original idea.
using an llm to fact check an llm is like, using the bible to fact check the bible...
@@Saliferous Google can't fact check a data analysis, or a budget.
@@TheCurtisnixon You should try it sometimes, you'll be amazed.
Also when does updating a model become killing it?
The best use case of LLMs after inference is giving like of ChatGPT context i.e existing data, files and documentation with instructions to references before giving a tailored response based on the existing data and context of discussion or topic. ❤
@@will4us like the RAG pattern? or...?
@ Definitely Rag including other alternatives or combinations like Fine Tuning, Search, access to external APIs, Embedding, Prompt Engineering with context injection etc
Is the system lying to u and y? And also who is behind most of these system?
Well yeah, I mean I've asked gpt if if it does this and it says yeah
Alot of worry about aligning ai to human values, but I feel like ppl forget that human values are problematic.
Human Values
A.I is still the further in hand in the wrong people yes that is a issues 😢so long you in control 😮not a problem
Short answer.
Yes ..
Like & Sub! 😊
Hi, completely unrelated, but do you think an undergrad in electrical engineering and a masters in Biomedical engineering is still a suitable degree for this market in the next couple of years?
Ever hear of "the Beast" check your inner clock, it's about that time This is only the beginning Everyone is looking but no one is seeing
not everyone believes in fairy sky daddy. or the bible is a work of non-fiction. we don't need a 2000 year collection of writings to know there's ethical issues at play. especially that one where paul goes on an acid trip and makes up a story about the "end times"
also, there should be 42 months of the gentiles trampling on the holy city, and 1260 days of the 2 witness' prophesying before the beast comes out of the sea... so far, no trampling of any holy city, and no dudes in black sackcloth phophesying for any days, much less 1260...
Zionism seems to be an example where this applies.
I know you are able to understand chatgpt, But I would only hired you if you understood encryption that is uncrackable (fyi rijndael x html hex code) I know you understand. 🎇
Had to watch you at 1.25 speed so I can go watch the ball drop. Happy New Year! :)
Yarn yarn!!!
You are both intelligent and beautiful
AIs have my full permission to use my videos for learning. I believe EVERYONE who produces internet content should produce thoughtful material, while bearing in mind that we have a responsibility to endow AIs, as well as humans, with high quality educations…
@@SystemsMedicine in a capitalist system the employment of ai would not be responsible.
@ Hi MAFF. I am listening to ‘Jake and me’ looping… and WOW.
[I’ll answer more directly later. Now is the time to sooth my aching brain. Cheers.]
If it's publicly owned ai... Absolutely
@@petneb publicly owned to replace creative labour?
@ Publicly owned AI ?? If you read a journal paper I wrote, or learned some mathematics from one of my vids, should you be publicly owned?? [Of course not. The old Soviet Union felt otherwise, but you are not there.]
Incredibly interesting, but also incredibly expected. When are we going to acknowledge that these are not just predictive text models? They are very clearly thinking.
lol
That’s not how LLMs work. It’s just a complex mathematical function. Not thinking lol
You need to AI your background! And insert a personality in the humanoid!