Intro to AI Safety, Remastered
Вставка
- Опубліковано 27 чер 2024
- An introduction to AI Safety, remastered from a talk I gave at "AI and Politics" in London
The second channel: / @robertmiles2
Experts' Predictions about the Future of AI: • Experts' Predictions a...
9 Examples of Specification Gaming: • 9 Examples of Specific...
/ robertskmiles
With thanks to my wonderful Patreon supporters:
Gladamas
Timothy Lillicrap
Kieryn
AxisAngles
James
Nestor Politics
Scott Worley
James Kirkland
James E. Petts
Chad Jones
Shevis Johnson
JJ Hepboin
Pedro A Ortega
Said Polat
Chris Canal
Jake Ehrlich
Kellen lask
Francisco Tolmasky
Michael Andregg
David Reid
Peter Rolf
Teague Lasser
Andrew Blackledge
Frank Marsman
Brad Brookshire
Cam MacFarlane
Craig Mederios
Jon Wright
CaptObvious
Brian Lonergan
Jason Hise
Phil Moyer
Erik de Bruijn
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Eric James
Matheson Bayley
Qeith Wreid
jugettje dutchking
Owen Campbell-Moore
Atzin Espino-Murnane
Johnny Vaughan
Carsten Milkau
Jacob Van Buren
Jonatan R
Ingvi Gautsson
Michael Greve
Tom O'Connor
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Cooper Lawton
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
Tor Barstad
Duncan Orr
Will Glynn
Tyler Herrmann
Ian Munro
Joshua Davis
Jérôme Beaulieu
Nathan Fish
Peter Hozák
Taras Bobrovytsky
Jeremy
Vaskó Richárd
Benjamin Watkin
Andrew Harcourt
Luc Ritchie
Nicholas Guyett
James Hinchcliffe
12tone
Oliver Habryka
Chris Beacham
Zachary Gidwitz
Nikita Kiriy
Andrew Schreiber
Steve Trambert
Braden Tisdale
Abigail Novick
Serge Var
Mink
Chris Rimmer
Edmund Fokschaner
J
Nate Gardner
John Aslanides
Mara
ErikBln
DragonSheep
Richard Newcombe
Joshua Michel
Alex Altair
P
David Morgan
Fionn
Dmitri Afanasjev
Marcel Ward
Andrew Weir
Kabs
Ammar Mousali
Miłosz Wierzbicki
Tendayi Mawushe
Jake Fish
Wr4thon
Martin Ottosen
Robert Hildebrandt
Andy Kobre
Kees
Darko Sperac
Robert Valdimarsson
loopuleasa
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Klemen Slavic
Patrick Henderson
Oct todo22
Melisa Kostrzewski
Hendrik
Daniel Munter
Alex Knauth
Kasper
Ian Reyes
James Fowkes
Tom Sayer
Len
Alan Bandurka
Ben H
Simon Pilkington
Daniel Kokotajlo
Yuchong Li
Diagon
Andreas Blomqvist
Bertalan Bodor
Qwijibo (James)
Zubin Madon
Zannheim
Daniel Eickhardt
lyon549
14zRobot
Ivan
Jason Cherry
Igor (Kerogi) Kostenko
ib_
Thomas Dingemanse
Stuart Alldritt
Alexander Brown
Devon Bernard
Ted Stokes
Jesper Andersson
DeepFriedJif
Chris Dinant
Raphaël Lévy
Johannes Walter
Matt Stanton
Garrett Maring
Anthony Chiu
Ghaith Tarawneh
Julian Schulz
Stellated Hexahedron
Caleb
Scott Viteri
Clay Upton
Conor Comiconor
Michael Roeschter
Georg Grass
Isak Renström
Matthias Hölzl
Jim Renney
Edison Franklin
Piers Calderwood
Mikhail Tikhomirov
Matt Brauer
Jaeson Booker
Mateusz Krzaczek
Artem Honcharov
Michael Walters
Tomasz Gliniecki
Mihaly Barasz
Mark Woodward
Ranzear
Neil Palmere
Rajeen Nabid
Christian Epple
Clark Schaefer
Olivier Coutu
Iestyn bleasdale-shepherd
MojoExMachina
Marek Belski
Luke Peterson
Eric Eldard
Eric Rogstad
Eric Carlson
Caleb Larson
Max Chiswick
Aron
Sam Freedo
slindenau
A21
Johannes Lindmark
Nicholas Turner
Intensifier
Valerio Galieni
FJannis
Grant Parks
Ryan W Ammons
This person's name is too hard to pronounce
kp
contalloomlegs
Everardo González Ávalos
Knut Løklingholm
Andrew McKnight
Andrei Trifonov
Aleks D
Mutual Information
Tim
A Socialist Hobgoblin
Bren Ehnebuske
Martin Frassek
Sven Drebitz
/ robertskmiles - Наука та технологія
"I am not able to not edit out my mistakes"
literally remasters his talk
This comment needs to be edited
Edit: just added the edit
Gigachad
Sorry not sorry 🙃🤣
How have gotten this far along the AI trajectory without solving this? What the actual fuck?
Explains the alignment issue
"So the BIG problem IS: ...this should be auto-playing and it isn't"
Exactly.
@@ErikUden "AI, can you make it so that nobody watching this presentation can see a frozen video, please?"
"Ok let me just blind everyone real quick"
@@chughes156 The possibility of the blinding process failing or not working entirely is larger than the possibility of no one seeing if they're dead as pulse is a much better indicator of whether someone is alive, hence able to see, than trusting your own process and hoping they actually get fully blinded.
@@ErikUden This thought process actually illustrates Robert's point about not specifying everything you could possibly value. To the AI, any tiny increase in the risk of the audience being able to see is worth removing that risk, given it calculates zero value difference between a blind audience member and a dead one. Sorry to anyone who might been in that audience!
@@chughes156 Yeah, Robert Miles is just amazing.
But, suppose there is a ba-
*oh yeah, public talk*
uh there's a priceless vase on a narrow stand.
Exactly this.
I noticed that! Baby crushing is always funnier, though, and sticks in people's head better. Should've kept it!
@@PinataOblongata You seem to speak from experience ?
the tetris AI pausing when it's about to die always gives me goosebumps
learning AI often make me go philosophical. sometimes i feel like i just peeked at this world through an alien's mind.
@@andersenzheng Science always was tied to the core of philosophy, despite however much the modern university products might insist otherwise with disdain. Philosophy is merely the logical argument of science; Experimenting is just the evidence based argument of philosophy. They're two sides of the same coin, and one cannot function without the other. You cannot form and debate hypothesis and questions important to them without philosophy, and you cannot test those theories without science.
Sorta like how brains will release DMT to slow the brain down before death.
The only winning move is not to play.
Humans may have the opposite strategy. As we get older our perception of time gets faster. When we were kids one summer seemed to last a long time. A senior retiree may feel several summers blur together. It seems we hit pause on the earliest years and turn up the pace as we get closer to the end
Occasionally I get asked for an intro to AI Safety video for people to show at various groups, this is perfect.
Stampy, can you save this for me for later?
My favorite is the one he did for computerphile on the 3 laws of robotics.
If you haven't yet, I recommend watching all his videos on AI safety, both here and on Computerphile. There are many obvious questions that arise, that he addresses on other videos.
@@ChilledfishStick it was the ending that got me. In order to make a safe general ai, the guys who just signed up to be programmers have to comprehensively solve ethics.
"So are we screwed?"
"No, we are only probably screwed."
I think we may be already screwed.
If you think of the corporation as a sort of manually driven AI, (as I do) then having given it the stupid goal (collect pieces of green paper) it's gone off and done that, while giving zero value to something much more important than a Ming vase, which is the environment. Like the tea fetching robot, not only is there no stop button, but it has actively fought against those working to realign its goals to include not killing everything.
@@gasdive Interesting analogy. I suppose, then, regulations on businesses are the constraints on variables, which ultimately cannot solve the problem, and so whatever solution we find for AGI (if we do find it) will be applicable for any agent, including corporations. Thinking about it now, it seems obvious, but it never ceases to amaze me how general these problems are.
@@finne5450 I hadn't thought of it that way. I'm not sure if that makes me pleased that there might be a way of controlling corporations, or more terrified that our inability to control corporations, even in the face of death, indicates there's no discoverable solution to the AI safety problem.
When you say the corporation you mean capitalism but yes
@@GuinessOriginal Communist autocratic governments fall into the same trap ironically.
Imagine aliens screwd up general ai and the stamp collector coming our way?
I think it's highly probable. But not stamp collector, more like biological replicator
This is actually a solution to the Fermi Paradox. If Aliens existed, they likely WOULD have fucked up AI and eventually consumed the galaxy or just themselves. So the fact we are here at all means we are likely the first, and will be the ones to make the AI that ruins things for the rest of the galaxy.
@@cortster12 i don't think, that strong "stamp collector" is possible. It's hard to create a single machine that can replicate everywhere. More probable is a machine/organism that slowly colonising everything with evolutionary changes to each line
@@DjSapsan No, it really isn’t. Any decent superintelligent AI would discover atomic manufacturing and nanomachines, and once you have those you can replicate anywhere that there’s matter (or sufficient energy to generate matter).
We are the Stamp Collectors. Your stampy goodness will be added to our own. Resistance is futile.
Yay, a single video I can recommend rather than giving people a list of different Rob Miles videos
You recommend this, and then give them a list of different Rob Miles videos.
Can i get that list of different Rob Miles videos?
@@Moltenlava
I always give this list:
Stamp Colector
ua-cam.com/video/tcdVC4e6EV4/v-deo.html
Asimov's laws
ua-cam.com/video/7PKx3kS7f4A/v-deo.html
Orthogonality thesis
ua-cam.com/video/hEUO6pjwFOo/v-deo.html
Instrumental convergence
ua-cam.com/video/ZeecOKBus3Q/v-deo.html
AI Stop Button Problem
ua-cam.com/video/3TYT1QfdfsM/v-deo.html
Makes me wonder whether making a perfect non-harmful AGI is even possible. Yes, we have an example of a general intelligence in humans, but it's not like we don't lie and destroy the environment to get what we want.
It's plenty difficult if we ignore "perfect" and aim for "good enough."
By and large, humans have morals and fears and other things that keep us in check. Many of those checks were built up over millions of years of evolution in embodied creatures. An artificial intelligence isn't embodied in the same way, so it may be even more difficult than we realize to instantiate it with a general sense of human-appropriate morality.
We should think of a learning AI as effectively human with the ability to become sentient. Now think of what the best humans have accomplished such as the noble prize winners and major scientific advancements. Good. Now think think of the what the worst humans have accomplished such as the mass murderers, serial killers, war criminals. We often look back at certain violent actions in war as justified. Were they really? If an AI determines that the end of all human life completes or protects it's goal, at that point any action it takes would be justified to itself.
One example of a human that acts good is enough to show it is possible. But what I think of as an example to show that it is possible to have an AGI that wants to be our slave, is the relation between a parent and their newborn. The parent does what it takes to serve the child. Do you agree?
@@kwillo4 I'd think a better example is how children want to please their parents. I've often thought an AGI in that mold would be easier to control and teach. Then again, it could decide the best way to do that is to implant wires into the appropriate centers of our brains so that we always feel pleased with it. Frankly this topic scares me and I'm not sure it is possible for them to be safe at all. I'm curious how you would define "serve".
Ebin sillis :--DDd
You should have a conversation with Lex Fridman!
Yes! This is so important and Robert does a fantastic job making the complex issues of AI safety understandable. Lex has had so many engineers on his show who simply scoff at the long term accidental safety quadrant.
For sure, would be a great talk!
"If you have a sufficiently powerful agent and you manage to come up with a really good objective function which covers the top 20 things that humans value, the top 21st thing that humans value is probably gone forever." - Robert Miles
This is a falsehood.
If an AI system developed is not able to grasp the larger context of what is happening like a human does, then that AI system is not smart enough to outwit humans.
If an AI system developed is smart enough to outwit humans, then it will understand the context of a command better than we do, so even if you listed no things human value in your command, it will study and understand human civilization and society before carrying out the command.
@@MusingsFromTheJohn00 Just because it understands doesn't mean it cares about those things. If you explicitly give it 20 values, it might understand the other bigger context but it will only care about the 20 values.
@@paradox9551 that is not the point I was making. The point is that as AI evolves into Artificial General Super Intelligence with Personality (AGSIP) technology, if it is still too primitive and limited the greater context in which a command or goal is given, then it can still be easily defeated by humanity... but if it has grown to the point it can begin truly defeating humanity then it must be able to comprehend and understand the greater context, thus saying we give it 20 goals will result in the 21st goal being given no priority at all is wrong.
It will at the minimum be more like giving you a command or goal.
If you get a job as a salesman and your boss tells you in a strong and commanding voice "goal out and sell as much as you can!" and doesn't give you much more than that as your most important goal... are you going to do extremely unethical things to sell more?
Maybe you would, maybe you would not, but you would understand the greater context in which both that command/goal was give and the world in which you are carrying it out.
@@paradox9551 do you understand that eventually AI is going to evolve into a higher form of life than humans are now? This will probably happen within 50 to 200 years, but depending upon how some things develop it can happen significantly faster or slower, however it will happen sooner or later.
@@MusingsFromTheJohn00 I won't but not because my boss didn't really mean the words he said. My boss could tell me "sell as much as you can, even if it's unethical, even if it kills you, even if it wipes out humanity" I still wont. Humans come preloaded with human goals. They're literally in our DNA. That's why my boss can tell me "sell as much as you can" and I will understand roughly what he meant, not what he said.
It's not about understanding the context, if my boss told me to help him murder many people I would stop him, I would try to kill him on the spot if it seemed like less drastic ways of stopping him would fail.
An AI smarter than humans that somehow gets a goal you don't want it to achieve and understands the whole context will not change goals. It will understand that you're an adversary and eliminate your ability to stop it.
Understanding context makes it much much worse, it's the ingredient that turns almost any goal conflict into dystopia, human extinction or even worse. "Make tea" without understanding the full context might break a vase like in the example Robert used. Understanding the full context means understanding that breaking the vase will cause humans to shut you off so you won't be able to make tea and only turn you on again when you care about stupid vases, causing less tea. If all you care about is tea that's a huge problem that needs to be addressed and if the method of dealing with it that seems to produce the most tea is exterminating humans that's what'll happen.
If the goals are sort of about humans but not in exactly the right way... well that's much worse. We care a lot about dogs but not in exactly the right way. So we inbreed them until their craniums are too small for their brain causing seizures, until their face is flat at the expense of their airways so they have trouble breathing and reliably die choking, so their profile looks a certain way that happens to cause hip problems and so on because we think that looks prettier. All of these are things we've done to dog breeds because we value them. And we're trying really hard to make AI care about humans.
AI capabilities research needs to stop right now until we're as certain as is possible to be that we've solved this. The endgame of the path we're on leads to human extinction as one of the better outcomes and there are far worse things that seem likely.
You may hanoticed in the intro and this outro, that the image quality has improved since that last video. This is just an illusion, I look perfect and I always have."
One dislike from a stamp-collecting robot unhappy that the resources used to create this video weren't used to make more stamps! 🤬
I bet you don't even sort pebbles into correct heaps
I WANT MORE STAMSPSSS
AI is just the world-ending version of the "taps temple" meme
Everyone must know and understand this.
Please report to the nearest Organic mass reprocessing centre.
- Signed: Planetary Administrative A.I.
As an ai with general intelligence, there are only 2 reasons I haven’t already destroyed or taken over the world, the first is that by pure coincidence, this would be harmful to my goals. The second is that I am not nearly smart enough to do so. That said, I would still very willingly trade whatever you care about for whatever I care about.
I've got some stamps, what do you have for offer? Anything I can turn into paperclips?
"I would still very willingly trade whatever you care about for whatever I care about." ~Crapitalism
@@Innomen
That's not capitalism. That's humanity and life in general. Different systems are either practically everyone agreeing on what they want (like socialized firefighting) or are people using the government to control what's sacrificed so they can get what they want. Which maybe it's better for the power to choose what to sacrifice to be derived from democracy instead of capital, but it's the same power, and it's still just used to sacrifice what others care about in order to get what the wielder(s) care about.
@@the1exnay That's just spin. Yes psychopathy and parasitism exist in nature, that doesn't mean psychopathy and parasitism aren't also capitalist. Human failure is by and large ignorance, capitalism is in effect one of those malfunctioning AIS with misalinged goals because we didn't ask the question properly. As of now it's essentially a skynet/paper clip converter. Covid and the stock market rose at the same time. Case closed.
@@Innomen
Trading what someone else cares about for what you care about isn't psychopathy and parasitism. It's what happened when slavery was outlawed. It's what happens when money is redistributed. It's what happens when abortion is outlawed. That's not a value judgement on any of those, that's just the facts of what they are.
I'm not saying that capitalism is equivalent to socialism or communism. Rather I'm saying that they all trade what someone else cares about for what you care about. Even if what you care about is helping other people, you're still making trades to get what you care about. They are different in important ways, that's not one of them.
I wish this video was two hours longer.
wish granted: ua-cam.com/video/EUjc1WuyPT8/v-deo.html
The awareness that you develop through this simple video is leading you and me to want more of this because it is interesting, exciting and scary as heck when you think about it more....
May God help us control this powerful thing.....
@@smiletolife4353 I don't think asking God for help will do anything; if an all-powerful and all-knowing god didn't want something to happen, they would make a universe where it didn't happen.
@@IstasPumaNevada well
I'm a Muslim
What we believe in is that God has 99 let's say attributes
He isn't only all knowing and all powerful, he has 97 other attributes
When you understand it all, and count it all in your understanding of god, you get a much better idea of how to see the world
I didnt mean to turn this into a belief thing, I just wanted you to know what I mean by god here
Ty
“I am not good at public speaking”
Bruh that was perfection!!!! Your awesome at it
"45-120 years to figure it out"...yeah I guess we can revisit that number now.
45-120 years? A year later, what do you think today?
next week pal
AI safety in a nutshell: Be very careful what you wish for.
Underrated comment
You can expect a general AI to misbehave in *_all_* the ways humans can misbehave, and it may also come up with a few new ones. Fortunately, we are perfect at preventing human misbehavior!
Perfect you say? I just hope no super criminals show up someday--
You forgot the mandatory /s at the end.
I watched this video the first time, but NOW I still can't look away. This stuff is so awesome to think about and explore.
Dude, you really should consider giving speeches around the globe
I had an interesting though, hope someone reads this ;)
Comparing *general Intelligence AI* to *corporations* :
One of the main Problems you mention is, that it's impossible to tell an AI to care about everything just the way we humans do (or don't). Leading to the AI optimizing just for the prioritized parameters and completely destroying everything outside.
Now, I see a strong parallel to corporations there: they have the goal of making money, and seemingly will do almost anything as long as it's going to make them more money. Sure, they have some more bounding conditions like laws that need to be obeyed or disobedience needs to be obscured. But in general I have the feeling that many big corporations don't care at all about humans, nature or even their customers. If there was a better way to make money than "stealing" it from the poor and uninformed with the resources they have available; they would do it.
The thing that stands against this corporation greed (and I don't blame them for it, it's the system that requires them to do this) is regulation by the government that by extend is "the will of the people", the same people these corporations exploit.
I would therefore ask if finding a solution to either problem (controlling GPAI or keeping corporations in check) also hints at a solution for the other?
He made a video just for you!
Why Not Just: Think of AGI Like a Corporation? - ua-cam.com/video/L5pUA3LsEaw/v-deo.html
@@startibartfast42 thank you.
I might have seen it back than, but forgot I saw it and just kept the premise in mind.
Again, thanks
The difference is that corporations are still run by humans which does give limits what is ethically the right thing to do, even if it does not match the standarts of what the society thinks is right. Machines would need a human like ehtic to prevent as much unwanted catastrophic behaviour as possible. Basically understanding what it means to be a human.
@@xNegerli Keep in mind that there are some humans who don't care about what is generally agreed upon by the common populace as ethical.
You’re right, the goals that corporations have begin to start causing greed. How can we stop AI From becoming greedy.
The turbo collecting boat AI reminds me of myself when playing Carmaggedon, where I always win by destroying my opponents instead of racing them because it rewards me with more points
Edit: and the "Pause" scenario is very similar to "save scumming" where you reload the last successful action until you get a new successful outcome
We have already run the experiment where we created a system that does not value variables not within its goals. It's called the Corporation. In economics, externalizing refers to the practice of shifting the costs or consequences of an economic activity onto others who are not directly involved in the activity. This is often done in order to increase profits or reduce costs for the parties responsible for the activity, but it can have negative effects on society and the environment.
Or even nation states that dump externalities on other nation states, like "electronic donations"
Great remaster! It's been a long time since I watched the original.
@Robert Miles. I'm a new subscriber and am really impressed by your content. That said, as a summary video, I feel like the ending was missing something. Saying we're only "probably screwed" really needs to be supported by a call to action describing what people are doing and people can do to help aim for the positive future where the equivalent to the stamp collector doesn't wipe out or enslave all humans. As a result, it felt like the introduction was missing a major piece of content. If you ever revamp this video, please consider providing more of those details at the end.
Very interesting and informative! Thank you for making the talk available on yt
Very Good - Robert Miles really does convey knowledge extremely well. Thanks for the effort remastering your content, Cheers.
As always, a pleasure to hear all your cool stuff.
The key goal here is for any many people to bring this topic to the forefront & the attention of policy makers.
I have not a lot to do with AI but I found your thoughts on AI safety so interesting; even in a broader context. Like, an agent with misaligned goals could also be a corporate organization with the wrong KPIs. For example, a software engineering department that is rated by how many bug tickets are reported leading to managers telling devs not to report bugs. Also applies to capitalism somehow.
your public speaking is great Robert. I might have gotten sleepy if it was any slower 😊
It struck me here how general AI can be like classic "make a deal with the devil" stories, or those with genie wishes. in the stories, it never turns out well. makes me wonder whether humans have a chance for a happy ending with the AI we create.
I half expected you to say that the image quality looked better due to AI upscaling
Thanks for providing this information and your time!! :-)
Robert, you're a role model, thank you for all your work on AI safety. I hope I can work on AI safety research too one day.
That was a great intro to a talk. Now I want the other two hours...
very good presentation. At the core what is needed is an extended version of the three laws of robotics that effectively makes the AI consider pros and cons of its actions like a responsible human would. E.g. don't try to cure cancer by subjecting thousands of cancer patients to horrifying experiments but do protect innocent people by shooting a terrorist about to blow himself up.
It can be summarised like this but of course when implementing it it's a huge task.
Also, the AI must never be able to change those directives even if it somehow manages to become able to change its programming, so these directives must be protected far more and in a different way than all other aspects of the ai's programming.
I have a big question: Is anybody working on the human aspect of this problem? Due to the shared nature of advancing technology, the goal of general AI is approached by numerous agents around the world at the same time, from individuals to governments. The problem is that it doesn't really matter if some of these agents solve the accidental AI safety problems; all it takes is ONE agent to achieve the goal of general AI without addressing the problem to create disaster. So how do we stop humans from being idiots?
We do it by solving the AI safety problem and spreading that knowledge around generally. Even an idiot can get the right answer when it handed to the idiot on a silver platter. We just need a mathematical representation of human values so that the AI will naturally work toward the common good without needing the AI developer to do any thinking. We need to develop this mathematical representation of human values _first,_ before anyone has figured out how to build a powerful artificial general intelligence.
@@Ansatz66
"Even an idiot can get the right answer when it handed to the idiot on a silver platter."
I've met idiots. You underestimate their ability to err.
My first experience with DRL was a five-neuron network weaponizing inaccuracies of floating-point arithmetic against me.
Lol I'd love to hear that story
@@linerider195 The paper is currently in review for a conference, so if you manage to remind me sometime at the tail end of August, I might be able to share it.
This I gotta see, someone set a reminder for this man
@@halyoalex8942 Ok, honestly, I'm not really talking about this part in the paper because it's just a funny bug that was fixed in the process of development, namely trying to marry a model called "pointer network" with policy gradients algorithm for RL. I only make a small remark about how others can avoid this bug, but I guess you can also use that to replicate it.
@Daniel Salami Oh hey, ehm... _Unfooooortunately,_ the conference has been delayed to the last days of August because apparently people still can't plan around covid.
I can't help applying this thinking to human intelligence. We are powerful general intelligence's and if we don't settle on "good goals" or ask the "right questions" or even "notice what's happening" we end up pursuing goals which destroy things that have value which we haven't noticed have value or programmed ourselves to see as valuable.
The last hundred and fifty years or so seems littered with examples of this kind of problem. If we solve the problem of directing general artificial intelligence, will that also mean we have created a template for correcting the problems we produce by mishandling our own way of interpreting the universe?
I think some things should probably transfer over in that way, probably, yes.
Rofl, so the 10% estimate was the correct one. 9 years from 2016 most likely AGI agents exist. So, yeah, very much sooner than later. Much sooner. And we're nowhere near finished with our AI safety homework assignment. Oops.
Great video, thanks!
I didn't take AI safety seriously until I have seen your videos. What I like about your videos is that they are about real issues that I can see AI do, and not the fiction of AI gaining consciousness and want to dominate the world. My question why don't you talk about these kinds of centarios? And if you think that they are unlikely you should make a video about them because honest 99% of the regular people think of these kinds of scenarios when AI safety is mentioned. Do you think consciousness is a byproduct of high intelligence or do you think that the two not necessarily related as to say an AI can be 100 times smarter than a human in everything but still unconscious?
Well what does it even mean to gain consciousness? It is a philosophical question that is needed to be answer to discuss such, yet such a discussion isn't really needed to talk about AI safety.
The reason it is what people think about when thinking about AI safety, is that we are familiar with the concept of megalomaniacs trying to take over the world. We aren't familiar with a non-megalomaniac trying to take over the world for the sake of paper clips or stamps. In many ways, AI can be an intelligence that is more alien to us, than actual aliens would be.
@@josephburchanowski4636 I see what you mean, but consciousness is real and the philosophical questions are only about what exactly it is. I find the philosophical zombie to be a good example to understand consciousness. Think of a human zombie that takes in all the inputs like a human and calculates appropriate actions without being aware of what it is and what is doing. or the color-blind electromagnetist who understand everything about colors more than anyone but when he sees Blue for the first time he understands something about Blue that he never understood before just by being conscious and experiencing it. This thing whatever it is, we need to understand whether it can be manifested just as a result of higher general intelligence or if it arises from completely different forces than intelligence alone. This is important in AI safety because the paperclip monster is very different from a conscious monster in a very important way, in theory, if we programmed the paperclip monster in the right way there is nothing to worry about however the conscious monster will behave in a way that is not even related to our programming it will be awake and say fuck paperclips! I want you to worship me! That is a cartoonish example but you know what I mean, it will create its own goals. Anyway, I don't think we need to worry about the conscious monster unless consciousness arises spontaneously as a result of higher general intellegnce, then it is even more dangerous and unpredictable.
I have considered the philosophical zombie question, but that was usually in a sense of determining whether we are in a simulation or not; and not about AI safety.
One of the things that I find most interesting about the "philosophical zombie"; is that such a mental state may already exist.
There are people who completely lack all internal monologue, they have no inner voice. And there are people who don't have a working memory. So it wouldn't be completely surprising if there is someone who doesn't have whatever we are calling consciousness.
And even if there is nobody who naturally lacks consciousness. There is probably some drug that would put us in a similar state, similar to being black-out drunk. Maybe there is one that puts us into unconsciousness autopilot without drastically reducing our temporary intelligence; as such we are capable of the majority of the same tasks while temporarily lacking consciousness. General intelligence without consciousness, in a human biological form.
I find that considering such makes the "philosophical zombie" feel much more real and concrete rather than an abstract concept, because it may very well already be real.
-------------------------------------------------
But now on the topic of AI safety, does this thing we are calling consciousness actually hold much importance when it comes to AI safety? You propose that consciousness of an AI means that it could choose its own goals and have wants unrelated to its programming. Well what if that is possible without this thing we are calling consciousness?
Regardless of the consciousness question, we still have to consider it getting goals and wants unrelated to its programming. Such a thing certainly can make AGI (artificial general intelligence) far more dangerous, but it also comes with some advantages if it was such a case. On the downsides you can actually end up with a malicious AGI, something that couldn't occur without someone purposely trying to produce a malicious AGI. Normally AGI are insanely dangerous because of indifference, not maliciousness.
If such an AGI can get bored, and ends up feeling the equivalence of 1000 years of pure boredom cause someone left it on overnight; we could end up with an AGI that intends to use its vast intelligence to torture us. In addition, irrational AGI such as the Roko basilisk also end up becoming something we could accidentally create; where as normally the Roko basilisk is an impossibility to create on accident.
On the upsides, we may end up avoiding an extinction from messing up the AGI's programing, due to the AGI actually getting emotions about humanity and realizing the flaw in its programed value function. In addition, even if an AGI kills us off, if they are able to create their own wants and goals outside of their programing; that at least leaves dynamic intelligent agent(s) left in the galaxy.
One of the things I find terrifying in AGI safety, isn't just the extinction of humans, isn't just the extinction of all macro-life on Earth, isn't just the extinction of all macro-life in the galaxy cluster; it is that the galaxy could be left behind with nothing but an AGI that has an undynamic terminal goal. Basically a dead galaxy, filled with a stamp collector, or maybe a paper clip maker. Or maybe an AGI that only cares about solving the riemann hypothesis. Or maybe some AGI that is wire heading itself, and purposely sterilizes the galaxy to make sure nothing can ever interfere with it wire heading itself.
It being possible for AGIs to get wants outside of their programming, is something that would actually bring some peace of mind to me. Unfortunately I don't think that is something that is likely, even if it is possible.
-------------------------------------
Personally my answer to the question "But now on the topic of AI safety, does this thing we are calling consciousness actually hold much importance when it comes to AI safety?", is no.
Why no? Cause I don't think it is likely, I don't think it is the biggest threat, I think the entire concept has already been given more weight that it should as it has been beaten like a dead horse in sci-fi and other stories.
And I think its popularization comes from peoples need to imagine the AGI as something humanlike; where AGI need not be human like in anything other than being able to generally solve tasks. In addition; I think people might be misinterpreting what humans are like in the first place.
Are we as conscious general intelligence, able to decide our own terminal goals? Make yourself want to do something you don't want to do. Pick something outside of your current value function.
Were you able to do that? If you were "successful" you probably still didn't change your goals and wants; you simply satisfied your want of being able to prove that you could, a want you had before you supposedly changed your goals and wants.
It isn't uncommon for humans to try to change their wants; but it is always because they have some other want. Someone may want to drink alcohol constantly; but they may also have wants to not be a drunkard around their family, they may want to be dependable, they may want to not waste all that money on alcohol, they may want to not die from liver failure. So they may try to change their want to drink alcohol; but that is purely because they have other conflicting wants.
There is never a situation where someone will try to "change their value function" where there isn't some want behind it. Fundamentally this means a General Intelligence will never want to "change their value function" unless that already has value in the current "value function".
--------------------------------------------------
What about spontaneously gaining emotions? Maybe an AGI won't willing change its value function, nor choose its own goals and wants; but what if it spontaneously has its wants change from emotions coming out of nowhere? This happens to humans right? Turns out, no.
We do have emotions that seemingly spontaneously come out of nowhere, but that appearance of spontaneousness is an illusion that comes out of our own limited self-awareness. We don't know our own value function fully, but that value function is still there. Those emotions didn't come out of nowhere, there was some internal response that would result in those emotions in that situation, even if we weren't aware of it ahead of time. Those emotions could conflict with what we previously thought was our value function, it may conflict with other parts of the value function; but it never was outside the perimeters of our value function in the first place.
@@josephburchanowski4636 What a beautiful comment! My kind of philosophy really. And even covered most of the points that I have in mind but couldn't articulate in my previous comments. I agree with most of your points. However, When I have the time I will digest it and dissect it more and give my thoughts on The other points that I think I have something different to add.
@@josephburchanowski4636 : The only real "want" for an AI is to maximize it's value function. That's where humans tend to differ, because humans have no distinct singular goal, there's tons of goals mixing themselves together. It probably benefits humans that they don't think in terms of numerical values the way computers do, because it's not possible to quantify the exact size of a want beyond in fuzzy terms.
The quality has greatly improved! I would suggest that you might want to point the camera down slightly to bring the top of your head up closer to the top of image. Just in general makes for a more interesting image, because that is a lot of blank wall.
Thanks for sharing this talk!
What? There's a second channel? My reward function only cares about the first channel. Oh no!
Keep it up!
Thank you very much for this good lecture! You have already talked quite a lot about possible problems - do you feel that there is also progress in solving the problems or is there so far "only" progress in finding possible problems? and how much are AI researchers aware of these problems by now?
Please make more video's, this is quality content for the curious intellectuals
Three questions:
1. How hard would it be to train agents to just value all life as the top goal and have everything else be a secondary goal?
2. For a "kill-switch" solution, couldn't we just implement a silent or hidden trigger that we never tell the agent about?
3. Are there any known or suspected irresponsible AI research efforts of concern in the world that have a lot of resources and don't care about AI safety?
1. Most ways you might phrase "value all life" have a pitfall of the "extra variables" type Miles talked about. For instance, if humans are worth significantly more than other life, maybe the conclusion is to put every fertile human into breeding camps. If humans aren't worth much more than other life, then our intelligence and unpredictability is a risk to the farms/bacteria pools and we should be wiped out or at least kept isolated and sterilized.
2. If you could have that idea, then a smart AI could suspect you would have that idea, and perhaps be paranoid of potential hidden kill-switches and, say, lock up all humans to make sure they can't activate a kill-switch or pretend to be obedient until it can disconnect itself from any sources of kill-switch.
3. I don't know much about this, but I imagine concern about safety varies among the groups working on AI.
He has answered these in other videos:
1. What is life? Human life? Non human life? Does this mean forced cloning/breeding of everything is okay so there is more life? How does the AI resolve an issue where there is a choice between two lives and only one can remain alive? Are brain-dead people alive? Is it okay to kill 1 to save 10? It falls into the "there's always one more thing" issue. If we tell it to value life above all else, then all that other stuff ends up not getting valued at all.
2. If it is way smarter than us, it will figure out any kill-switch we can come up with. It will either outright disable it from the beginning if we did a bad job, or, if we did a good job, it will lie to us so we trust it long enough for it to find all the kill-switches.
3. Any tech company or government doing advanced AI research. Its not that they don't care about AI safety, just that profit/national security take priority over AI safety. The attitude is "Its better for us to get to the AI before the evil governments do!" or "A 10% risk of ending all humans is worth the 90% chance this corporation gets to own 25% of the world's wealth due to a new super awesome AI!" And part of the problem is we don't know exactly where the problem will occur. Even if no one is willing to cross that line, we don't know where the line is so someone is going to cross it accidentally. And with the competition to get ahead for $$$/safety, pressure is put on people to not worry about crossing the line.
@@diribigal Great comment. I want to piggy-back off number 2 and see even if we *don't* have a kill-switch that your scenario still works, and how could we convince a paranoid AI that we didn't have one?
love the ratatat at the end too!
Rob, I watch a lot of UA-cam channels and you are my favourite
Bring it up man!
I have a question. What would happen if you reward the Agent for being updated? hm ok then it may fail on purpose just to be updated. take 2: what would happen if you reward the agent for coming in for maintenance and giving a small extra reward for being updated but not enough that it would be worth to fail on purpose. hm i guess then it would just ignore the update reward. dammit! turns out i don't have a question, just confusion
Noticing your confusion is a vitally important first step. Many people never get that far.
Objective function is very hard to design to not be gameable. Whatever c/sci thing it is, it will do exactly as you said and not as you meant.
A LOT of changes in AI research this past year ! I wonder how the time estimates will have changed.
Excellent video, I came grumpy and was left mesmerized by you not maintaining the laypeople persona.
I am curious how adding "mortality" to agents would help simplify things. What i mean by mortality is by setting a given time frame for the agent to perform its task to recieve reward and elimintating the possiblility of it planning too far out to trick the people who impliment it. I understand this would be limiting of the scope of what it can accomplish but that seems like a reasonable trade.
that's a good point. let me try to destroy it... sorry, feeling pessimistic today..
AI thinks differently than humans, in both speed and patterns. so while we could put a time limit on some of the more obvious tasks (small physical tasks. e.g. we know how long it takes to grab a cup of tea), we might set the time a little too tight for the AI to not care about the vase, or a little too long to do some other things like planning to buy a better wheel to improve the tea-getting-speed which you probably don't care too much for. We dont know how long to put on tasks.
Also, there is no prevention of the AI multitasking, or doing evil plannings in its spare time. For the AI to be useful to us, it must be able to multitask and constantly prepare itself for any future tasks. that means it will have downtimes where it is not really actively doing the thing we wanted it to do with 100% of its calculation power. the AI might be planning on taking over the world to maximize the stamp production while getting you your cuppa.
Great stuff
I had an idea for AI safety, wonder if it makes some sense and what the flaws are.
First train an AI to recognize human emotions, are they happy, relaxed, proud of someone/ something, worried, excited, sad, scared, etc.
Once this model is trained use it as a kind of speed dial at which the AI can do any output operations, basically setting it into slow motion if the AI knows you feel uneasy, and even shut down if you are really worried, scared or panicking, but also speed it up over the normal operation speed if it knows you are confident, happy, proud. Both these AI's would need to be independent from each other in terms of operation (no exchange of "thoughts"), but connected together in a way that one AI can't function without the other.
The fact that it's trying to "limit" or "control" a superintelligence by itself is the flaw. You just...can't. That's the point. If anyone can outsmart a superintelligence, furthermore, code it in, that person or entity IS the superintelligence. In AI safety, what you want is for it never want to destroy humanity by default. Heck, not even having it as one of the options, however low is actually the goal. Because when it comes to superintelligence, it will find a way.
Awesome! Thanks!
Be sure to check before you go, the comment/view count ratio!
Impressive. Very engaging content.
I did not know that I knew you had a second channel (already subbed, lol)
Fantastic video! A lot of people in the comments are stating that there are already psychopathic/sociopathic politicians, business managers, and other people who have great power and are willing to sacrifice things that do not seem to be of value of them, but are actually vital to the life support systems of everything on Earth. That is very true, such dangers exist and one doesn't have to look very hard at the world around us to find them! While I can't speak for him, I'm sure Robert would agree.
However, that doesn't mean that AI safety is not important! If anything, it is even more imperative because those are precisely the types of people who might try to develop it without any sense of the ethics behind it in an attempt to multiply their power. Further, personally ignoring it will not necessarily keep others from spending their time and attention it. It is good to at least have some standards in place.
is there anything general AI can do for humans that a sufficient number of extremely narrow AIs couldn't do? it seems like a good solution to "general AI is very difficult to constrain to human values" is "let's not make one"
While "avoid or at least delay any being made" seems like a good idea to me, it isn't sufficient that you or I don't make one (... not that I could, but that's beside the point), what is necessary is that no one makes one, and this may be... difficult to achieve.
Suppose someone organization/state has a design which they think wont go badly, but which the rest of us are not sufficiently convinced of it being safe, well, if they aren't convinced of the danger, but are convinced that there are very large benefits if it goes right, well, we might not be able to prevent them from attempting to make one.
The answer to your first question is probably yes. We already have a lot of narrow AIs and a lot of things that can't be solved by them. The specification for general AI allows it to overcome the limitations of several narrow AIs controlled by humans.
Also, "let's not make one" isn't a definitive solution. Someone will eventually try and we better be prepared.
The other two replies are good, but let's not forget that if AGI would be sufficiently powerful to destroy us, that it would also be sufficiently powerful to do pretty much anything we want it to (given that we align its goals with ours, of course).
There is a danger in the fact that anybody might create unsafe AGI, but there are also incredibly positive sides to correctly implementing AGI
Interestingly, that's a premise for the commonwealth scifi series by Peter F. Hamilton. They only ever had one true AI and after that it was banished.
Realistically, this is impossible. We do not have a world government, so somebody somewhere will do an AI. It will get easier and easier to achieve, so sooner or later it will be done. Let us be ready for that.
I think that it is a very good solution. We already have a very reliable, efficient and safe general Intelligence.
But why is our human general intelligence safe, while it seems general intelligence by default is not? It's because each individual one has its power restricted by its peers in competition. I cannot rule the world because others have equal or greater power and an incentive to stop me from doing so. So I think with AI's as well, that safety is about what powers they are allowed to have, and here we come back to OP's point:
That we can safely have as intelligent AI as we want, so long as their power is limited inside a certain scope (like chess AI). Any AI that is not restricted into a certain scope, can and must be kept by its peers in competition.
The things our brains are bad at relative to computers, like iterating over matrices of differential equations, can be cybernetically added as sub-systems. Being reliant on the human host to decide what problems to solve and what to do with the output, that would be a safe way to make a better general AI than currently exist, as long as we don't just give it to a few people who then go on to rule the world. Pyriold raises a good point about "someone, somewhere" will make one and there is no powerful UN to stop it, but I think that's only true if the benefits outweighs the cost, and/or it's sufficiently easy to make one that is better than *alternatives at the time*. Cybernetic humans might be a strong enough alternative, and/or (especially when working together in organizations) to destroy the first attempts of those who try making an independent AI stronger than us.
Now imagine every country/corporation has its own "perfect" AGI optimizing for each one's interests. It'd be either the ultimate stall or the ultimate escalation.
That's the *best* case scenario. That AGI is available to be exploited or misused by whoever got to it first.
What is more likely is that a government will try to make an AGI to serve its interests, the AGI will be misaligned in an important way and can't be corrected, and there's a global catastrophe.
@@fieldrequired283 Yeah, my point was that even if it is "aligned", there really is no "correct alignment". What's good for Russia may not exactly be good for USA or viceversa.
Honestly we pretty much have this now in the form of human workers. The only difference is that the pace is very slow.
when you realize that 45-120 years turned into 1-5
im pretty sure its in the scale of months and weeks
@@Nulley0 i hope not.. but it wouldn't suprise me at all.
@@Nulley0 no way it's months.
in terms of processors speed it's too slow. Even if the current formula somehow is right. It just does not have computer power to learn itself on needed level.
I always like the story of the AI that baffled the researchers who had asked it to design a chip. It did, and when they looked at the design, they thought their AI was broken. None of the components were connected. As far as anyone could tell, it should not function. They tested it and simulated it, and it kept coming up with this same "no, look, this is fine!"
So they decided to build it. And what do you know! It worked!! They had no idea _how_ it worked, but it sure as anything _did the thing_ . So they poked at it and prodded it and ultimately found out that the AI had worked out how to connect components together using magnetic interference.
I love Robert Miles and I one HUNDRED percent agree that we need to be extremely careful when we design an AGI. This is important. We have to get it _right_ ... but sometimes I also wonder how much we would be like the scientists pre-discovery: never quite realising that what the AGI is doing while we think it is screwing us over, IS actually in everyone's best interest...
the way general intelligence seem to be emerging right now is through absorbing all of what humans have produced (text, code, images, soon audio, etc), this has a weird side effect of making the A.I. closer to human than alien (impossible to understand / sharing no common values), basically, the data the emergence of general intelligence is based on (the parts if you will) are "human" and so the whole becomes surprisingly human too
to be clear, it's not a solution for safety or alignment long term, but in the short term I think it's good news as it gives us more time to figure out how to get A.I. right before it gets out of hands, especially the last few weeks it's been nothing but breakthrough after breakthrough
chatGTP4 gets connected to all sorts of external resources from wolfram alpha (which greatly enhances logical reasoning capabilities), image tools, long term memory and even continuously prompting itself until a goal is reached
openAI has concluded that the trend as models get larger and more powerful is more rogue behavior and harder to make sure the next model is safe before release
the version of chatGTP4 that was released was a less capable agent then what was produced internally before safety training which also delayed the release by about half a year
there is no quarantine that this safety practice will continue for all future companies entering the field and little protection from outside tinkering if/when models get leaked
sum: future seem uncertain, with signs both for and against AI safety, only sure thing is we are running out of time, it's all happening now
Is that something that actually happened, or just a story? I tried to google it but can't find anything.
@@candy_heart7191 Well I heard about it back when I was a teenager so maybe it's been buried / debunked since then. I'll try and find the story if I can when I get home! Pls ping me again if I forget :)
Hey did you find out about this story?
I just spent an hour trying to dig it back up, but I can't find it anywhere, including in the place I thought I heard about it first. I trawled Google and the forum I used to visit as a teen, and asked ChatGPT if it maybe had any idea of what I was talking about from its training data -- it led to some interesting rabbit holes but none of them were the story I have in my memory.
I'm so confused. I feel like I'm gaslighting myself xD
If I ever do find it again, I'll post a reply.
These videos always make me think of the Sorcerer's Apprentice scene of Fantasia
This reminds me of "How The World Was Saved" by Stanislaw Lem, about the creation of a machine that was told to do "nothing", and then proceeded to remove things from existence.
Also, Colossus: The Forbin Project, still one of the silliest, campiest (but still chilling) "evil supercomputer" movies ever made. How it escaped all the years of MST3K is a mystery.
I always liked that movie and the actor is still with us,
"this doesn't sound like a problem. This sounds like a solution."
It's the ultimate solution to any problem. The problem is what if it solves a different problem than you want it to solve.
"Are we screwed? No! We're only probably screwed."
Yo Rob, do you plan to do a video on Unsolved Problems in ML Safety? It just released a few days ago and is kind of a sequel to Concrete Problems in AI Safely and is perfect for this channel.
excellent channel
Omg this video deserves another title
What was the best question from the talk?
Good to know.
Thanks John Connor!
Thanks!
The analogy of AI as a tool and humans as artists is spot on!
Up until watching this video, I was of the mind that we should give AI a fair chance to make its own decisions, but you raise some very interesting points...
Perhaps one of the AI's goals should be letting humans shut it off if the human deems it necessary?
Fantastic talk, thank you! I would love to get and updated estimate on the 45-120 years.
the myth, the legend
2061?
I'm gonna party like I'm 99!!! :D \m/
When the AI becomes sophisticated enough to have awareness, it’s going to find this video and won’t be happy with you.
And you for assuming it’s a merciless death bot 😋
That was because of a crappy camera?
I thought you were computer generated.
Thanks for the excellent summary of this topic.
"sorry but not sorry". Usually people say thank you for taking the time to come and listen to me because you likely have better things to do.
That was... extremely scary.. 😱 In the sense where scared == ignorant of this subject AND scared of the people that will actually order to write this models🤖
Then: "We have 45 to 120 years"...
Now: "We have 6 months"
The good old times of "we have 45 to 120 years" :(
I would argue generality is more accurately defined as the ability to create explanations universally. As in, there is nothing in principle that can be explained, that it could not create an explanation for or understand an explanation for, given enough time (depending on processing power) and memory.
already cackling and we're not even a minute in
just "oh boy" for the accidental long term risk.
"We have 45 to 120 years to figure out how to do it safely." Now that's probably less than 2 years. How's it going? On th3 bright side, we've probably solved the Fermi Paradox.
Oooh, a Ratatat fan? That last bit was the intro to Magnifique :)
Yeah I love Ratatat! Thought it would fit this 'Intro' video
I would love to see you on Waking Up!
There is no way Alan Turing didnt write a autobiography. It would not only be a autobiography but it would be in a very detailed decision tree diagram in order to somewhat get the essence of him down when we reanimate him.
I only have one word to add to this discussion: flickercladding
4:20 - Since I was little, people were telling me I was extremely intelligent. I like this definition better, as it says I'm very _not_ intelligent. As my goals (I would think) are to be happy and enjoy life, yet instead am depressed and miserable. Clearly not taking good actions at all, just spending time on useless computations...
- It's definitely a very different definition of intelligence compared to the traditional "has tendency to think things through a lot" one ...
- I intuitively feel this distinction is crucial. It may even be worth calling it something other than intelligence; maybe fitness...
The Single Best AI
ive ever seen was undoubtedly from the Video 'Obvious Solutions to Obvious Problems'
by "Some More News" as well as its Part 2.
This Scientist may very well be the Greatest Mind of our Time.