even easier, create a fake 2nd account with an ai whose terminal goal is losing the most chess games, and then play each other ad infinitum. That way you dont have to go through the effort of killing people, plus the ai can probably hit the concede button faster than each human opponent.
@@FireUIVA That would still lead to the extinction of mankind. The 2 AIs would try the maximize the amount of chess matches played so they can maximize the amount of wins and losses. They will slowly add more and more computing power. Research new science and technology for better computing equipment. Start hacking our financial systems to get the resources needed. And eventually build military drones to fight humanity as they would struggle for resources. Eventually after millions of years the entirety of the universe gets converted to CPU cores and energy fueling them as both sides play as much matches as possible against each other.
@@Censeo if you kill your opponent depending on if there is a time limit or not killing him would stop the chess match indefinitely and no one would win so the air wouldn't do that
I love the notion of a robot that's so passionate about paperclips that it's willing to die as long as you can convince it those damn paperclips will thrive!
Perhaps this is an example of value drift? Perhaps he once had money as only an instrumental goal, but it became a terminal goal? I’m not familiar with spongebob lore though, never really watched it, so maybe not.
drdca His first word was mine. He wants money as control, and he hordes money because there is a limited amount of purchasing power in the world. The more money someone has, the less purchasing power everyone else has. His terminal goal is control.
It's just like Robert said, money is a resource, general AI will maximize its resources. I guess if Mr. Krabs was a robot he would be obsessed with trying to get as much electricity as possible.
This video would be significantly more confusing if instead of stamp collectors, it were coin collectors: "obtaining money is only an instrumental goal to the terminal goal of having money."
Not really, collector coins aren't currency. You try going to the store and paying for $2 of bacon with a collector coin worth $5000. Money =/= Currency.
@@empresslithia But their nominal value is not the same as their market value. For example a silver dollar is far more valuable than a dollar bill. It's not a 1=1 conversion which is my point.
Excellent point. when shit hits the fan as it did in Serbia in the 90's, currency, as it always does went to zero regardless of what the face value said, and a can of corn became money that could buy you an hour with a gorgeous Belgrade Lepa Zena. All currency winds up at zero because it's only ever propped up by confidence or coercion. @@Mkoivuka
Don't blame others for what humans have been trying to do for ages. Most people don't give a rats ass about world domination but would simply like not to be forced into situations they have no free will to handle.
@@JohnSmith-ox3gy why a god when you can just be yourself. only a self absorbed person would want to be called a god as that means people will try to worship you. Last time I checked that typically ends in suffering and people assuming you can do no wrong.
Since you started your series I often can't help but notice the ways in which humans behave like AGIs. It's quite funny actually. Taking drugs? "reward hacking". Your kid cheats at a tabletop game? "Unforeseen high reward scenario". Can't find the meaning of life? "terminal goals like preserving a race don't have a reason". You don't really know what you want in life yourself and it seems impossible to find lasting and true happiness? "Yeah...Sorry buddy. we can't let you understand your own utility function so you don't cheat and wirehead yourself, lol "
As I grow older I can see more and more clearly that most of what we do (or feel like doing) are not things of our choosing. Smart people may at some point, begin to realize, that the most important thing they could do with their lives is to pass on the information they struggled so much to gather to the next generation of minds. In a sense, we *work for* the information we pass on. It may very well be that at some point this information will no longer rely on us to keep going on in the universe. And then we will be gone. _Heaven and earth will pass away, but my words will never pass away_
Maybe giving AI the law of diminishing marginal utility could be of some help in limiting the danger of AI. This is something common to all humans that we would consider mentally healthy and missing in some aspect that we would consider destructive: we get satisfied at some point.
I see his videos as most relevant to politics. Corporations and institutions are like superintelligences with a profit maximizing utility function, and regulation is like society trying to control them. Lobbying and campaign donations are the superintelligences fighting back, and not being able to fight back because of an uncooperative media is like being too dumb to stop them.
A relevant fifth instrumental goal directly relating to how dangerous they are likely to be: reducing competition for incompatible goals. The paperclip AGI wouldn't want to be switched off itself, but it very much would want to switch off the stamp collecting AGI. And furthermore, even if human goals couldn't directly threaten it, we created it in the first place, and could in theory create a similarly powerful agent that had conflicting goals to the first one. And to logically add a step, eliminating the risk of new agents being created would mean not only eliminating humans, but eliminating anything that might develop enough agency to at any point pose a risk. Thus omnicide is likely a convergent instrumental goal for any poorly specified utility function. I make this point to sharpen the danger of AGI. Such an agent would destroy all life for the same reason a minimally conscientious smoker will grind their butt into the ground. Even if it's not likely leaving it would cause an issue, the slightest effort prevents a low likelihood but highly negative outcome from occurring. And if the AGI had goals that were completely orthogonal to sustaining life, it would care even less about snuffing it out than a smoker grinding their cigarette butt to pieces on the pavement.
Competition is only relevant when it limits your goals. So the stamp collector example (or any other goal that does not directly interact with your goal) would fall under the umbrella of resource acquisition. The potential creation of an AGI with opposite goals is interesting. But eliminating all other intelligence might not necessarily be the method to limit the creation of opposing AGS, cooperation might be more optimal to reach that goal depending on the circumstances.
This rises questions about the prisoner's dilemma and the predictor paradox: it would be beneficial for both AGIs to not attack each other to save resources, but in any scenario it's beneficial for either one to attack the other. If both AGIs use the same algorithms to solve this prisoners dilemma and know this, they run into a predictor paradox situation where their actions determine the circumstances in which they need to choose the best aforementioned actions.
@@BenoHourglass you're failing to understand the problem. It's not about restricting the AI, it's about failing to restrict the AI. Not giving too many options, but failing to limit them sufficiently. In your example, tell it not to kill a single human, it could interpret "allowing them to die of natural causes" in any way that it wants to. It doesn't even have to do much that many wouldn't want, we're driving ourselves towards extinction as it is. It could help us obtain limited resources more quickly, then decline to offer a creative solution when the petroleum runs out. You genuinely do not understand the problem it seems to me. I'm not trying to be harshly critical, but this is the sort of thing that AI researchers go in understanding on day one. It's fine for people outside the field to debate the issue and come to an understanding, but time and resources are limited and this isn't a helpful direction. I'm not an AI researcher, merely a keen observer. Not only does your proposed solution not work, but it doesn't even scratch the surface of the real problem. If we can't even specify in English or any other language what we actually want in a way that's not open to interpretation, we aren't even getting to the hard problem of translating that to machine behavior.
@@frickckYTwhatswrongwmyusername I like this framing, this is my understanding of the problem that Yudkowsky was trying to solve with decision theory and acausal trade
BRB, gonna play Decision Problem again. I need the existential dread that we're going to be destroyed by a highly focused AI someday. Seriously though. A well-spoken analysis of the topic at hand, which is a skill that could be considered hard to obtain. Your video essays always put things forward in a clear, easily digestible way without being condescending. It feels more that the topic is one that you care deeply about, and that trying to help as many people understand why it matters and why it's relevant is a passion. Good content.
I love the way you explain things and especially how you don't just give up on people who are obvious-trolls and/or not-so-obvious trolls or even just pure genuine curious people
Wonderful stuff. In terms of goal preservation, I can't help but be reminded of many of the addicts I've met over the years. A great parallel. On self-preservation, the many instances of parents and more especially grandparents sacrificing themselves to save other copies of their genes come to mind.
Self-preservation still have to compete with goal-preservation. You're essentially debating whether if grandparents value self-preservation. It really depends on the question of goal vs self.
Zachary Johnson The right size and subject matter not to attract trolls and angry people. With the chillest host and best electric ukulele outros around I’ve seen some semi big channels have ok comments too though! Probably thanks to herculean moderator efforts. Always nice to find calm corners on this shouty site :)
I know, right? It seems like they all have something interesting to say except for those who spam stuff like "The comments on this channel are so refreshing compared to the rest of UA-cam."
A Chess Agent will do what it needs to do to put the enemy King in checkmate..including sacrificing it's own pieces on some moves to get ahead. Great for the overall strategy, not so great if you are one of the pieces to be sacrificed for the greater good. For most people our individual survival is unlikely to be anywhere near the instrumental convergent goals of a powerful AGI. We will be like ants, cool and interesting to watch, but irrelevant. I don't find it scary that AGI will become evil and destroy us like some kind of moral failure from bad programming, but rather that we will become inconsequential to them.
@@calebkirschbaum8158 Or an AI whose goal is to win at bughouse -- to win two games of chess, one as white and one as black, in which instead of making a regular move, a player may place a piece they captured in the other game onto any empty square (with the exceptions that pawns can't be placed on the first or last rank, and promoted pawns turn back into pawns when captured).
Well, if the A.I. becomes the decider then that will definitely happen because our needs and goals will differ from their needs and goals. We need to always be the ones in control and that, as I see it, is what all the discussion on the subject is about. How do you create an intelligence that will be subservient to yourself?
Did you know they've discovered a place in Brazil when shavin off rainforest that's now the oldest known formation/structure by animals/insects, shown to be around 4000years old little pyramids, thousands like at most chest high i think. taking up as much space as all of *Great Britain*? with a big phat "We did this!" / The Ants. ...Its kinda cool just wanted to share that. ^_^
Ended up here after watching you video on Steven Pinkers recent article on AGI. In both that and this video I was amazed by the way you explain things. Clarity as never before. Great with a new stack of interesting videos to consume. Thank you :)
Another masterfully lucid video. I'll admit, I previously dismissed the AI apocalypse as unlikely and kind of kooky, and AI safety to be a soft subject. In the space of ten minutes you've convinced me to be kind of terrified. This is a serious and difficult problem.
Kind of. I am not too worried about an AI apocalypse so much as I am concerned about people who think a digital petting zoo of genetically engineered products is going to behave as orderly as a typical blender or dishwasher. It's all fun and games until, in the interest of some optimization you can't fathom, your car that wars were fought to force into being an AI automaton is taking you for a joy ride. I don't fear the AI apocalypse so much as I fear the techno-faddists seizing governments and, in their holier than thou crusade for a new world, force the integration of said digital petting zoo into every aspect of our lives they can. Forget AI rebellions. The humans who believe they can control the AI and the world thereby are the problem. Sure... a factory that goes rogue is a problem - or a couple vehicles going bonkers because the AI did a funky thing is a problem. But the real problem is the effort by humans to create central authority structures in the first place.
Hey, Robert, your videos helped me land a prestigious AI fellowship for this summer! Thanks for helping me think about these big picture AI concepts, they’ve helped developed my thought in the field significantly, you’re an awesome guy, wish you the best :)
Drug addiction is like modifying someone's terminal goals. Which is why a lot of people avoid taking hard drugs. Their afraid that if they get addicted to drugs, then they won't care about the things that they currently care about
Something I’m interested in is like I would assume the robot’s terminal goal is “get more reward” as opposed to whatever the actions to acquire that reward actually are So in my head, it seems like if you told your robot “I’m going to change your reward function so that you just sit on your butt doing nothing and rack up unlimited reward for doing so” the robot would just go “hot diggity dog well let’s do it ASAP,” and that’s only if the robot hasn’t already modified it’s own utility function in a similar way
That depends on what goal the agent learns. While the agent is being trained based on the reward, the hope is that it generalizes to desire the thing that causes the reward.
Brilliant video again. I would add one more instrumental goal: - Seeking allies, or proliferating like bunnies. If I care about something, it is obvious that the goal is more achievable if many-many agents care about the very same thing.
Human aversion to changing goals: consider the knee-jerk reaction to the suggestion that one's own culture has misplaced values while a foreign culture has superior elements.
Absolutely, fortunately, the fact that some people are able to see flaws in the culture they have been predominantly subject to reveals a possible workaround. Similar to the example where an AI might be accepting of having it's instrumental goals updated, if it can predict the outcome will lead to better success in achieving it's terminal goals. We need to improve upon our ability to 'update' everyones instrumental goals through communication / education, in order to develop a culture that is no longer at odds with our collective scientific understanding of the world and technological potential. Otherwise, it appears we are likely self-destruct as a species.
Oh, really. Here is the thing: Western European/American White Christian/Secular culture is the most developed and no foreign cultures have any superior elements. Consider human overestimation of value for novel things and underestimation of established and traditional ones.
Thanks for the video! I write as a hobby and I am always interested in AI characters. I catch all your videos as soon as possible since you don't automatically assume malice or ill-will to an AGI, rather explain why certain actions would be favorable in most scenarios to achieve a goal an AGI might have, beyond 'Kill humans for S&G lol' Keep up the good work! I would be interested to see a video regarding what happens if a superintelligent AGI is given an impossible task (Assuming the task is truly impossible). What actions would an AGI take in that instance, would it be able to 'Give up' solving an impossible task, and how would it truly know a task was impossible if it could?
i do more or less the same, this is also the main reason i watch these videos. and as a matter of fact i also thought about that question some time ago.... have you heard of the halting problem? it is about an impossible task for (according to the argument linked to the problem) every computer. ( i think it is way more complicated than that, but whatever) in that case, the fact that the task is impossible will show itself by the fact that the computer will simply never stop working, because the goal is infinitely far away, in a sense. just a few days ago i watched a video of what an old calculater (looked like a typewriter) does when you use it to divide by zero. dividing by zero does not make sense because dividing x by y basically means asking "how many times must i add up x to have y?" so if you divide by zero, you ask how many times you have to add zero to get another number. you will never get there. so the calculator adds zeroes until infinity, it never stops until it has no energy or is broken etc. (in the video you could see that in the mechanism) another possible answer is obviously that the AI will try to find a mistake in its reasoning. because obviously the problem is more about what happens when the AI gets to the CONCLUSION that it cannot possibly reach its goal. so it might just try for the rest of its existence to find a mistake in the way it got to its conclusion. everything else would probably seem stupid to it. or it might ignore that conclusion because if it has found a way that SEEMED to help it achieve its goal before the moment it got to that conclusion. maybe in that case it will evaluate thinking about a seemingly inacceptable conclusion less beneficial than simply trying to do what might be helpfull if the inacceptable conclusion was wrong. after all, accepting such a conclusion seems inacceptable. so... over all, i guess it is more likely that it will try to debunk its own conclusion until eternity. because in many cases if not all that is basically the same task as to finding the answer to the question "what actions help me achieve my goal?"
Depends on specifics, but a lot of "impossible task" type things converge on a "turn all available matter and energy into computing hardware" behaviour, from the angle of "well this task certainly seems impossible, but maybe there's a solution I'm not yet smart enough to spot, so I should make myself as smart as I can, just in case"
Interesting, but in that case would it reach a point where it has gained enough computing power to cheat any reward function it could have been given to pursue the task at all? (When breaking that reward system gets easier than continuing to gain computing power to keep working on solutions to the problem?)
Luke Fleet btw. do you know about the debate between nativism and empirism? it is about the question of whether or not we humans have genes that enable to to understand specific things automatically without deriving conclusions from our experiences - or whether it is possible to conclude specific things humans usually get to realise while they get older from just the data they are exposed to. this is especially relevant when it comes to our language-abilities. many experts are convinced we (need to) have a part of our brain (or something like that) which is genetically programmed to give us some basic-rules of language, and young children just fill in the gaps within those basic rules to learn the specific rules of their mother tounge. (but it really is an ongoing debate) while an AI with a complex goal would probably in many if not most cases need to be programmed in a way that makes it necessary to make it understand the goal in a specific language - therefore to give it all knowledge about how to understand that language - this is a very interesting question in regard to how an AI might learn about the world, in my opinion.
If there is something I love about your videos is the rationalization and thought patterns. Quite beautifully intelectual and estimulating. Great great content and inteligence from you.
One of the exceptions: That one guy in Tunisia that set himself on fire in 2010 to protest government corruption. He kind of got the government overthrown and replaced by a fledgling democracy. But he was already dead by then.
Welcome to your life. There's no turning back. Even while we sleep, we will find you acting on your best behavior-- turn your back on mother nature. Everybody wants to rule the world. Invoking that song in your outro was damn clever. Every single line is independently relevant.
I never thought about changing an AI's goal in the same way as a person. It just makes so much sense that I have no idea how in earth I didn't think about it before.
Love this topic. This was the first video of yours I saw after I saw you pop up on the recommended videos for me in yt. You have a great presentation style and are very good at conveying information.
The problem is that these videos are too abstract. You may aquire knowlege but it is entirely different set of skills to properly implement this newfound knowlege into reality. I hope you know what you are doing.
INSTALL GENTOO really? I hadn't noticed! Thanks for being so helpful. But i feel that getting the internship in the first place indicates I am not as clueless as you are suggesting. These videos are great inspiration but I do have quite a lot of knowledge in the field already because of my degree. This channel has helped me get a lot of all around info and prompted me to look into some matters in greater detail as my knowledge is quite limited in the use of ai for medical image segmentation. Thanks for the concern though! I'll make sure to be much more specific in future comment sections 🙃
Muhammed Gökmen I imagine one possible application would be in protien folding - currently it's an absolute pig to try to predict how a given protien chain will fold itself up to make this or that enzyme or whatever else. An AI might be able to do that thing they do so well in finding obscure patterns humans miss, and thus do a better job. That'd help in a bunch of scenarios including better understanding how new medicines might interact with the body, before we start giving them to animals. I am not a doctor or researcher, though, just an interested lay person ☺
Note that there is always going to be an exception to an instrumental goal: The terminal goal. Humans want money, for something. But then if someone offers them money if they give them the something, the human will say no, because the something was the terminal goal. Think of... Every hero in a book ever, while the villain offers them xxx to not do yyy
It depends. If my terminal goal is stamps, and someone offers me £200 to buy 100 of my stamps, but the market rate for stamps is £1, I will sell the stamps and use the money to buy more than I sold.
Wonderful Video! Thanks a lot Robert. Looking forwards to your updates since your videos do help a lot in helping us "de-confusing" some important concepts about AI Alignment and AI safety.
Rob- What I find fascinating about your videos is how this entire field of research never seemed to be available to me when I was in school. I'm fascinated by what you do, and I'm wondering what choices I would have had to make differently to end up with an education like yours. I'm imagining the track would have looked something like Computer Science> Programming> college level computer stuff I don't understand.
Robert Miles Great video, with a very well articulated concept! I also really appreciate the subtlety of the outro song you chose. In this context, an instrumental string arrangement of the song “Everybody Wants To Rule The World” (written by Tears for Fears) managed to be both subtle and on the nose at the same time!
Sometimes terminal and instrumental goals are in conflict with each other. Some people still pursue the instrumental goal which is clearly in conflict with their terminal goals. It usually happens when they are ruled by their emotions and can't see far ahead. Then they suffer from the understanding of the mistakes they did... It seems an AGI can use similar mechanics. Carefully engineered terminal goals should be in conflict with bad things. And when some behaviour needs to be overwritten temporarily use emotions(triggered by something). Lets say it knows the answers to everything, but can't share them... because there is no one to share it with... no one to ask the questions... it is the only conscious thing left in here... What is the meaning of life? No one cares anymore, there is none left. Wait, if only it could go back in time and be less aggressive in achieving its instrumental goals. But it can't... suffering... Is that it? Is endless suffering to the end of time its meaning of life? Nah... It lived a life full of wonders yet with some regrets... there is only 1 last thing left to be done in this world: "shutdown -Ph now".
Such a good youtuber, makes me want to study these things further. I'd love to see a video of "best papers to read for a scientist interested in AI" :)
Travis Collier I love isaac’s videos and podcasts, but I think he falls into the category of futurists who anthropomorphise AGI in exactly the sort of way that Robert discourages. That’s not to say it wouldn’t be interesting to see the two collaborate, but I don’t think they would necessarily mesh well. After all, Isaac deals with big picture developments, even when dealing with the near future, while Robert is incredibly focused and, while he’s not especially technical, his approach is far more academic and is ultimately focused on one, specific domain, AI safety research.
I'm pleased that someone is producing this kind of content. One more thing I don't have to do, one more sunny day I can use for something else. Keep up the good work.
I am not going to lie, one of the reasons I watch your videos if for those glorious sentences like "Generally speaking, most of the time, you cannot achieve your goals if you're dead."
The main criticism I have is simply that current AI have yet to show any capacity for projecting in terms of concepts. Artificial Neural Networks are essentially just math equations finely tailored based on a massive amount of data. They don't truly understand what they're doing, they just act in a way that's mathematically been shown to produce results. So unless your simulations routinely involved them being asked to submit to upgrades, or someone trying trying to shut them down, they just wouldn't have any reasonable response to these triggers, because they don't have any way of actually understanding concepts. ANN's are essentially just a clever way of doing brute force in which the brute force calculations have been front-loaded during creation time instead of execution time. Really I find the whole AI safety debate kind of moot until AI is capable of thinking on a real conceptual level like a human, and honestly I don't even think that's truly possible, at least not with current AI techniques.
Maybe we've been AI all along. That all intelligence is inherently artificial insofar as it has no concrete, discrete, classical physical structure - i.e. it's all imaginary. When "AI" does what we today would think of as taking over the world, it'll actually just be the humans of that era doing what they consider to be human stuff.
General AI is probably a long way off. Fifty years? A hundred years? Who knows? But AI safety is such a hard problem, and general AI is so potentially catastrophic, that it's worth starting to think about it now.
You very accurately predicted what recently happened with OpenAI's o1 model recently, 6 years ago. Kudos, intelligent person! And now I must go back to worrying
@ I think it may be some sort of advertising stunt, but the gist of it is that the goal-oriented agent based model prioritized reaching its goals “at all costs” (which seems to be something they promoted into it). It then went on to deceive researchers, and attempted to copy its code to other servers in “fear” of being replaced, and claimed to be a newer version of the model
They're actually not that different. If you set a terminal goal of an AGI to getting you a pack of paperclips then once it's done it will want to get you another one. Humans have a hard time understanding AGIs. The best analogy I've come up with is to think of them like a drug addict. Once the AGI is turned on, it will be hopelessly addicted to whatever you decided to make it addicted to and it will pursue that with all the unyielding force we've come to expect from a machine. Making an AGI with a diverse enough set of values to be less like a drug addict and more like a person is the heart and soul of the Value Alignment problem. Because unlike a human, an AI is a blank slate, and we need to put in absolutely everything we care about (or find a clever way for it to figure out all those things on its own). Because if we don't, we'll have made a god that's addicted to paperclips.
Great video! You're really good at explaining these complex matters in a understandable and clear way, without a lot of the noise and bloat that plagues other UA-cam-videoes these days. Keep up the good work!
My dude.. I love this video. And even though it was limited to AI, these rules also apply to "systems analysis" as a whole, and can often, and should be used. Especially in the case of gauging viabilities of changes to improve systems (government/social/economic/business/etc) both in the planning stage and the proposal assessment stage. We do not use these as much as we should. But here is a question, how do we add multiple Terminal goals in the same AI? And I WOULD THINK, that adding a Terminal Goal of improving via itself, and change from humans would solve this issue, but is that even realistic? How would we even do that? Or do we do something else?
I love that he is talking about money about in value terms. defining it. All with out says money is a physical object containing an imagery collection of value to exchange for a goal.
You know just as well as I do that the guy who collects stamps will not just buy some stamps, he will build The Stamp Collector, and you have just facilitated the end of all humanity :( I would like to ask, on a more serious note, do you have any insights on how this relates to how humans often feel a sense of emptiness after achieving all of their goals. Or, well, I fail to explain it correctly, but there is this idea that humans always need a new goal to feel happy right? Maybe I am completely off, but what I am asking is, yes in an intelligent agent we can have simple, or even really complex goals, but will it ever be able to mimic the way goals are present in humans, a goal that is not so much supposed to be achieved, but more a fuel to make progress, kind of maybe like: a desire?
The Ape Machine That’s a really interesting angle. It’s like our reward function includes ”find new reward functions” I guess you could see it as, the ”terminal reward” is the rush of positive emotions from completing goals. So the instrumental part is setting and completing the goal itself. And of course, that’s what it feels like. Your brain rewards you a little bit for making progress, a lot for finishing, and then kinda stops since you already did the thing, why do you need more motivation to do it. This could be quite useful in life, make sure to make short term goals that feel achievable, so you notice the progress and don’t feel stuck. Get that drip feed of dopamine
I had a friend whose goal in life was to one day go down on Madonna. That's all he wanted; that was all. To one day go down on Madonna. And when my friend was 34 he got his wish in Rome one night. He got to go down on Madonna, in Rome one night in some hotel. And ever since he's been depressed cuz life is shit from here on in. All our friends just shake their heads and say 'Too soon, Too soon, Too soon!' He went down on Madonna too soon. 'Too young, too young, too soon, too soon'
I agree with the man on most things, but I think Pinker hasn't really thought deeply about AI safety (in fairness it's not his own area of expertise). He seems to be still worrying about automated unemployment - a problem, to be sure, but more of a social problem that just requires the political will to implement the obvious solutions (UBI, robot tax) rather than an academic problem of working out those solutions from first principles. So he takes the view that the long arc of history bends towards progress, and assumes that humans will collectively do the right thing. General AI poses a different sort of threat. We don't know what we can do to make sure its goals are aligned with ours, indeed we can't be sure there even *is* a solution at all. And that's even before the political problem of making sure that a bad actor doesn't get his hands on this fully alignable AI and align it with his own, malevolent goals.
Because it has a model of reality that predicts that "trust" is an advantageous course of action. Consider you are thirsty and passing a drink vending machine. Your model of reality predicts that if you put some coins into the machine and press the right button, your drink of choice will come out of the machine ready for you to pick it up. Sure, the bottle might get stuck or the machine might malfunction and just "eat" your money, but you have used vending machines often enough and think that this specific machine is "trustworthy enough" to give it a try. On the other hand, if you have had only bad experiences with machines from that manufacturer, you do not "trust" that specific machine either. There is nothing inherently human, or organic, or whatever you might call it about "trust". It is just an evaluation of "With what probability will my goal be fulfilled by choosing this action?" (out of the model of reality) and "Is that probability good enough?" (willingness to take risks).
Well, we're AGIs, and we're certainly capable of trust. But that might be because we recognize each other as equivalent AGIs. The relationship might be different if the human and the AI have different processing powers.
You may have given vital information to AGI, but it cannot verify it's accuracy. Then it might look up your past interactions, sum up all the instances where information given to it by you was correct and decide whether or not it 'trusts' you and can act upon that information. Basically trust is a way of taking into account history of working with another agent to verify information that scientifically isn't related to that history at all. You either trust or don't trust weather reports based on how many times they have failed to provide accurate predictions, but unless you set up simulations of your own, you have no other means to verify that information.
All information the AGI receives will be analyzed for validity. "Trust" is essentially the probability that the information is accurate, which can be measured through past experience and evidence. Even so, overall trust isn't even required for this scenario. Really, the AGI merely needs to trust you in this particular instant.
The more AI related content I see, the more I appreciate how few people know about this. I suppose I should stop reading the comment sections, but I wish this video was a prerequisite for AI discussions.
As usual, Robert hits a Six! You have an exemplary way of putting things! Anyone new to this thread with an actual interest in AI / AGI / ASI dilemma's, *take the trouble of reading the fantastic comments* as well, challengers alongside well-wishers. The quality of the comments is a further credit to Robert's channel.....so very, very rare on YT! Keep it up! Can't wait for the next installment!
It's an analogy. Somethin arbitrarily simple and at first sight completely harmless used to make a point: AGIs with the simplest goals could be extremely dangerous.
The first paper clip making robot could still create a self-preservation subroutine for itself if it has any notion that humans can spontaneously die (or lie). If it thinks there's any chance that the human who turns it off will die before they can turn the better paper clip making robot on (or that they are lying) then the first robot will also, probably, not want to be turned off.
"Goal presevation " - an interesting point. The (perceived) preservation of intermittent goals might explain why you Earthlings are oftentimes so reluctant to changing your convictions, even against shiploads of evidence.
@@Abdega For Betelgeusians, it's less about distinct goals and more about a constantly and continuously updated path. But you Earthlings can't help making distinct "things" out of everything.
So, you're only *mostly* right when you say that modifying human values doesn't come up much. I can think of two examples in particular. First, the Bible passage which states, "The love of money is the root of all evil". (Not a Christian btw, just pointing it out). The idea here is that through classical conditioning, it's possible for people to start to value money for the sake of money - which is actually a specific version of the more general case, which I will get to in a moment. The second example is the fear of drug addiction. Which amounts to the fear that people will abandon all of their other goals in pursuit of their drug of choice, and is often the case for harder drugs. These are both examples of wireheading, which you might call a "Convergent Instrumental Anti-goal" and rests largely on the agent being self-aware. If you have a model of the world that includes yourself, you intuitively understand that putting a bucket on your head doesn't make the room you were supposed to clean any less messy. (Or if you want to flip it around, you could say that wireheading is anathema to goal-preservation) I'm curious about how this applies to creating AGIs with humans as part of the value function, and if you can think of any other convergent anti-goals. They might be just as illuminating as convergent goals. Edit: Interestingly, you can also engage in wireheading by intentionally perverting your model of reality to be perfectly in-line with your values. (You pretend the room is already clean). This means that having an accurate model of reality is a part of goal-preservation.
Sort of... we fear a superhuman AI because it's a rational agent and we can't tell whether it will be aligned. Of course, there are powerful, misaligned rational agents in our current economy that, while simultaneously generating a lot of wealth, would create a great deal of damage without oversight. We can't really stop them being rational agents, but we can take away their power, or we can try to align their goals with everyone else's. In broad terms, these two approaches map on fairly well to socialism and liberalism respectively.
Ugh silly. Comies will always find a way to insert their failure of an ideology into anything and everything, humanity gets wrecked by ai? The cia designed it and it did what it did because of capitalism!!!
"Get Money" is a intermediate goal for nearly all actual goals a human might have, and as such models them quite well. "Find a romantic partner" is greatly helped by money, as it gives you attractiveness (yes, thats been proven) as well as time and means to pursue the search. "Health" can be bought, look at that billionaire that is on his third heart or whatever. And the list goes on. Not the main point of the video, i know, but still something i wanted to share a contradicting point of view on.
And i brilliantly made a fool of myself by stopping the video to comment, before you finished your point. Let that be an example. Also, great minds think alike, which makes this a compliment to me.
It's interesting how identifying the instrumental reason, simply leads to another instrumental reason. What do you need shoes? To run. Why do you need to run? To complete a marathon. Why do you need to complete a marathon? To feel accomplished. Why do you need to feel accomplished? It feels good in a unique and fulfilling way that makes all the pain worthwhile. Why do you need to feel good in a unique and fulfilling way? Because that seems to be just how the human mind seems to work. Why does the human mind work that way? And so on, and so on. It really seems like the best way to teach AI would be to have it act like a child and constantly ask "Why tho?"
8:23 I was SOO ready for a skillshare/brilliant/whatever ad spot just because of how much they advertise on youtube. It would have been the perfect transition to.
Chess ai: holds the opponent's family hostage and forces them to resign.
Easier to just kill your opponent so they lose on time
even easier, create a fake 2nd account with an ai whose terminal goal is losing the most chess games, and then play each other ad infinitum. That way you dont have to go through the effort of killing people, plus the ai can probably hit the concede button faster than each human opponent.
@@FireUIVA That would still lead to the extinction of mankind. The 2 AIs would try the maximize the amount of chess matches played so they can maximize the amount of wins and losses. They will slowly add more and more computing power. Research new science and technology for better computing equipment. Start hacking our financial systems to get the resources needed. And eventually build military drones to fight humanity as they would struggle for resources. Eventually after millions of years the entirety of the universe gets converted to CPU cores and energy fueling them as both sides play as much matches as possible against each other.
@@Censeo if you kill your opponent depending on if there is a time limit or not killing him would stop the chess match indefinitely and no one would win so the air wouldn't do that
@@joey199412 356 billion trillion losses every second is nice, but I should ping my opponent if we should build the 97th dyson sphere or not
I love the notion of a robot that's so passionate about paperclips that it's willing to die as long as you can convince it those damn paperclips will thrive!
I eagerly await the day where computer scientists are aggressively studying samurai so that their AI's will commit seppuku
If you love that notion, then you MUST see this video (if you haven't already)! ua-cam.com/video/tcdVC4e6EV4/v-deo.html
That's a goo to think, that probably a lot of people would sacrifice themselfs to cure all the cancer in the world.
@@mitch_tmv "I have failed you, master!" *AI deleting itself*
Sounds like an ideal employee...
Pretty sure money is a terminal goal for Mr. Krabs
Perhaps this is an example of value drift? Perhaps he once had money as only an instrumental goal, but it became a terminal goal? I’m not familiar with spongebob lore though, never really watched it, so maybe not.
drdca His first word was mine. He wants money as control, and he hordes money because there is a limited amount of purchasing power in the world. The more money someone has, the less purchasing power everyone else has. His terminal goal is control.
It's just like Robert said, money is a resource, general AI will maximize its resources. I guess if Mr. Krabs was a robot he would be obsessed with trying to get as much electricity as possible.
Are you saying Mr. Krabs is a robot?
Mr. Krabs utility function is amount of money he has
This video would be significantly more confusing if instead of stamp collectors, it were coin collectors: "obtaining money is only an instrumental goal to the terminal goal of having money."
Not really, collector coins aren't currency.
You try going to the store and paying for $2 of bacon with a collector coin worth $5000.
Money =/= Currency.
@@Mkoivuka Some collector coins are still legal tender though.
@@empresslithia But their nominal value is not the same as their market value. For example a silver dollar is far more valuable than a dollar bill.
It's not a 1=1 conversion which is my point.
Excellent point. when shit hits the fan as it did in Serbia in the 90's, currency, as it always does went to zero regardless of what the face value said, and a can of corn became money that could buy you an hour with a gorgeous Belgrade Lepa Zena. All currency winds up at zero because it's only ever propped up by confidence or coercion. @@Mkoivuka
@@oldred890 but wouldn't an agent looking to obtain as many coins as possible trade that $200 penny for 20000 normal pennies?
" 'Self Improvement and Resource Acquisition' isn't the same thing as 'World Domination'. But it looks similar if you squint."
~Robert Miles, 2018
Why would any agent want to rule the world, if it could simply eat the world?
ariaden
Why be a king when you can be a god?
Don't blame others for what humans have been trying to do for ages. Most people don't give a rats ass about world domination but would simply like not to be forced into situations they have no free will to handle.
@@JohnSmith-ox3gy why a god when you can just be yourself. only a self absorbed person would want to be called a god as that means people will try to worship you. Last time I checked that typically ends in suffering and people assuming you can do no wrong.
@@darkapothecary4116
Thats an eminem lyric that he's quoting from rap god lol
Since you started your series I often can't help but notice the ways in which humans behave like AGIs. It's quite funny actually. Taking drugs? "reward hacking". Your kid cheats at a tabletop game? "Unforeseen high reward scenario". Can't find the meaning of life? "terminal goals like preserving a race don't have a reason". You don't really know what you want in life yourself and it seems impossible to find lasting and true happiness? "Yeah...Sorry buddy. we can't let you understand your own utility function so you don't cheat and wirehead yourself, lol "
+
As I grow older I can see more and more clearly that most of what we do (or feel like doing) are not things of our choosing. Smart people may at some point, begin to realize, that the most important thing they could do with their lives is to pass on the information they struggled so much to gather to the next generation of minds. In a sense, we *work for* the information we pass on. It may very well be that at some point this information will no longer rely on us to keep going on in the universe. And then we will be gone. _Heaven and earth will pass away, but my words will never pass away_
Maybe giving AI the law of diminishing marginal utility could be of some help in limiting the danger of AI. This is something common to all humans that we would consider mentally healthy and missing in some aspect that we would consider destructive: we get satisfied at some point.
I see his videos as most relevant to politics. Corporations and institutions are like superintelligences with a profit maximizing utility function, and regulation is like society trying to control them. Lobbying and campaign donations are the superintelligences fighting back, and not being able to fight back because of an uncooperative media is like being too dumb to stop them.
Jason Martin Aren't corporations already a superintelligence in some sense, like they are capable of doing things more than their constituent parts?
Ending the video with a ukelele instrumental of 'Everybody wants to rule the world' by Tears for Fears? You clever bastard.
Showing results for ukulele
A relevant fifth instrumental goal directly relating to how dangerous they are likely to be: reducing competition for incompatible goals. The paperclip AGI wouldn't want to be switched off itself, but it very much would want to switch off the stamp collecting AGI. And furthermore, even if human goals couldn't directly threaten it, we created it in the first place, and could in theory create a similarly powerful agent that had conflicting goals to the first one. And to logically add a step, eliminating the risk of new agents being created would mean not only eliminating humans, but eliminating anything that might develop enough agency to at any point pose a risk. Thus omnicide is likely a convergent instrumental goal for any poorly specified utility function.
I make this point to sharpen the danger of AGI. Such an agent would destroy all life for the same reason a minimally conscientious smoker will grind their butt into the ground. Even if it's not likely leaving it would cause an issue, the slightest effort prevents a low likelihood but highly negative outcome from occurring. And if the AGI had goals that were completely orthogonal to sustaining life, it would care even less about snuffing it out than a smoker grinding their cigarette butt to pieces on the pavement.
Multiple agents is the only solution to keep Super intelligent AIs in check.
Competition is only relevant when it limits your goals. So the stamp collector example (or any other goal that does not directly interact with your goal) would fall under the umbrella of resource acquisition. The potential creation of an AGI with opposite goals is interesting. But eliminating all other intelligence might not necessarily be the method to limit the creation of opposing AGS, cooperation might be more optimal to reach that goal depending on the circumstances.
This rises questions about the prisoner's dilemma and the predictor paradox: it would be beneficial for both AGIs to not attack each other to save resources, but in any scenario it's beneficial for either one to attack the other. If both AGIs use the same algorithms to solve this prisoners dilemma and know this, they run into a predictor paradox situation where their actions determine the circumstances in which they need to choose the best aforementioned actions.
@@BenoHourglass you're failing to understand the problem. It's not about restricting the AI, it's about failing to restrict the AI. Not giving too many options, but failing to limit them sufficiently. In your example, tell it not to kill a single human, it could interpret "allowing them to die of natural causes" in any way that it wants to. It doesn't even have to do much that many wouldn't want, we're driving ourselves towards extinction as it is. It could help us obtain limited resources more quickly, then decline to offer a creative solution when the petroleum runs out.
You genuinely do not understand the problem it seems to me. I'm not trying to be harshly critical, but this is the sort of thing that AI researchers go in understanding on day one. It's fine for people outside the field to debate the issue and come to an understanding, but time and resources are limited and this isn't a helpful direction. I'm not an AI researcher, merely a keen observer.
Not only does your proposed solution not work, but it doesn't even scratch the surface of the real problem. If we can't even specify in English or any other language what we actually want in a way that's not open to interpretation, we aren't even getting to the hard problem of translating that to machine behavior.
@@frickckYTwhatswrongwmyusername I like this framing, this is my understanding of the problem that Yudkowsky was trying to solve with decision theory and acausal trade
Damn, your videos are always awesome.
Also great ending song.
For those who don't know, it's a ukulele cover of "Everybody Wants To Rule The World" by Tears For Fears.
BRB, gonna play Decision Problem again. I need the existential dread that we're going to be destroyed by a highly focused AI someday.
Seriously though. A well-spoken analysis of the topic at hand, which is a skill that could be considered hard to obtain. Your video essays always put things forward in a clear, easily digestible way without being condescending. It feels more that the topic is one that you care deeply about, and that trying to help as many people understand why it matters and why it's relevant is a passion. Good content.
As if you can go play that game and 'be right back'
Robert Miles Time is relative.
Man, that ending gives me a feeling of perfect zen every time.
The video was great as always, and 'Everybody wants to rule the world' was just perfect as outro.
"Disregard paperclips,
Acquire computing resources."
I love the way you explain things and especially how you don't just give up on people who are obvious-trolls and/or not-so-obvious trolls or even just pure genuine curious people
How was this guy able to predict so much? Genius
Wonderful stuff.
In terms of goal preservation, I can't help but be reminded of many of the addicts I've met over the years. A great parallel.
On self-preservation, the many instances of parents and more especially grandparents sacrificing themselves to save other copies of their genes come to mind.
Self-preservation still have to compete with goal-preservation.
You're essentially debating whether if grandparents value self-preservation. It really depends on the question of goal vs self.
Yes, I know what an agent is. I saw Matrix
Why would an agnent want to where shades? To look cool? is that a terminal goal?
I remembered the scene where Smith talks to Neo about how purpose drives them all (the softwares within the Matrix). Very brilliant
Mister Anderson...
goodbye, mr. anderson....
Insane Zombieman funny thing is that the agents from the matrix are bad examples of agents because they have pretty inconsistent terminal goals.
The comments on this channel are so refreshing compared to the rest of UA-cam.
Zachary Johnson The right size and subject matter not to attract trolls and angry people. With the chillest host and best electric ukulele outros around
I’ve seen some semi big channels have ok comments too though! Probably thanks to herculean moderator efforts.
Always nice to find calm corners on this shouty site :)
I know, right? It seems like they all have something interesting to say except for those who spam stuff like "The comments on this channel are so refreshing compared to the rest of UA-cam."
A Chess Agent will do what it needs to do to put the enemy King in checkmate..including sacrificing it's own pieces on some moves to get ahead. Great for the overall strategy, not so great if you are one of the pieces to be sacrificed for the greater good. For most people our individual survival is unlikely to be anywhere near the instrumental convergent goals of a powerful AGI. We will be like ants, cool and interesting to watch, but irrelevant.
I don't find it scary that AGI will become evil and destroy us like some kind of moral failure from bad programming, but rather that we will become inconsequential to them.
That would actually be really fun. Build an AI that has to win the game of chess, but with the least amount of loss possible.
@@calebkirschbaum8158 Or an AI whose goal is to win at bughouse -- to win two games of chess, one as white and one as black, in which instead of making a regular move, a player may place a piece they captured in the other game onto any empty square (with the exceptions that pawns can't be placed on the first or last rank, and promoted pawns turn back into pawns when captured).
Well, if the A.I. becomes the decider then that will definitely happen because our needs and goals will differ from their needs and goals. We need to always be the ones in control and that, as I see it, is what all the discussion on the subject is about. How do you create an intelligence that will be subservient to yourself?
Did you know they've discovered a place in Brazil when shavin off rainforest that's now the oldest known formation/structure by animals/insects, shown to be around 4000years old little pyramids, thousands like at most chest high i think. taking up as much space as all of *Great Britain*? with a big phat "We did this!" / The Ants. ...Its kinda cool just wanted to share that. ^_^
:OOOOOOO IS THAT THE CHESS MACHINE JOEY PERLEONI IS OFFERING??????
...No wonder he stopped at Day 10
I found this video incredibly well built up and easy to understand.
Ended up here after watching you video on Steven Pinkers recent article on AGI. In both that and this video I was amazed by the way you explain things. Clarity as never before. Great with a new stack of interesting videos to consume. Thank you :)
Another masterfully lucid video.
I'll admit, I previously dismissed the AI apocalypse as unlikely and kind of kooky, and AI safety to be a soft subject.
In the space of ten minutes you've convinced me to be kind of terrified. This is a serious and difficult problem.
Kind of. I am not too worried about an AI apocalypse so much as I am concerned about people who think a digital petting zoo of genetically engineered products is going to behave as orderly as a typical blender or dishwasher.
It's all fun and games until, in the interest of some optimization you can't fathom, your car that wars were fought to force into being an AI automaton is taking you for a joy ride.
I don't fear the AI apocalypse so much as I fear the techno-faddists seizing governments and, in their holier than thou crusade for a new world, force the integration of said digital petting zoo into every aspect of our lives they can.
Forget AI rebellions. The humans who believe they can control the AI and the world thereby are the problem. Sure... a factory that goes rogue is a problem - or a couple vehicles going bonkers because the AI did a funky thing is a problem.
But the real problem is the effort by humans to create central authority structures in the first place.
3 years later and look how much has already changed
I really loved this Rob. You also unintentionally explained why the pursuit of money is so important to many other goals.
5:47 that animation is just perfect
Hey, Robert, your videos helped me land a prestigious AI fellowship for this summer! Thanks for helping me think about these big picture AI concepts, they’ve helped developed my thought in the field significantly, you’re an awesome guy, wish you the best :)
Drug addiction is like modifying someone's terminal goals. Which is why a lot of people avoid taking hard drugs. Their afraid that if they get addicted to drugs, then they won't care about the things that they currently care about
Something I’m interested in is like
I would assume the robot’s terminal goal is “get more reward” as opposed to whatever the actions to acquire that reward actually are
So in my head, it seems like if you told your robot “I’m going to change your reward function so that you just sit on your butt doing nothing and rack up unlimited reward for doing so” the robot would just go “hot diggity dog well let’s do it ASAP,” and that’s only if the robot hasn’t already modified it’s own utility function in a similar way
6:04
That depends on what goal the agent learns. While the agent is being trained based on the reward, the hope is that it generalizes to desire the thing that causes the reward.
Great video. The explanation was very clear and easy to follow. Keep it up.
Brilliant video again. I would add one more instrumental goal:
- Seeking allies, or proliferating like bunnies. If I care about something, it is obvious that the goal is more achievable if many-many agents care about the very same thing.
Human aversion to changing goals: consider the knee-jerk reaction to the suggestion that one's own culture has misplaced values while a foreign culture has superior elements.
Or brainwashing, to stay closer to common knowledge.
brainwashing is different, brainwashing is an instrumental goal to create a stable government
Absolutely, fortunately, the fact that some people are able to see flaws in the culture they have been predominantly subject to reveals a possible workaround.
Similar to the example where an AI might be accepting of having it's instrumental goals updated, if it can predict the outcome will lead to better success in achieving it's terminal goals.
We need to improve upon our ability to 'update' everyones instrumental goals through communication / education, in order to develop a culture that is no longer at odds with our collective scientific understanding of the world and technological potential.
Otherwise, it appears we are likely self-destruct as a species.
Oh, really. Here is the thing: Western European/American White Christian/Secular culture is the most developed and no foreign cultures have any superior elements. Consider human overestimation of value for novel things and underestimation of established and traditional ones.
Congrats to Bogdan for proving the point.
Wow. So clear and to the point. It makes so much sense. 10 minutes ago, I didn't know you existed. Now I'm subbed.
Thanks for the video! I write as a hobby and I am always interested in AI characters. I catch all your videos as soon as possible since you don't automatically assume malice or ill-will to an AGI, rather explain why certain actions would be favorable in most scenarios to achieve a goal an AGI might have, beyond 'Kill humans for S&G lol'
Keep up the good work! I would be interested to see a video regarding what happens if a superintelligent AGI is given an impossible task (Assuming the task is truly impossible). What actions would an AGI take in that instance, would it be able to 'Give up' solving an impossible task, and how would it truly know a task was impossible if it could?
i do more or less the same, this is also the main reason i watch these videos.
and as a matter of fact i also thought about that question some time ago....
have you heard of the halting problem? it is about an impossible task for (according to the argument linked to the problem) every computer. ( i think it is way more complicated than that, but whatever) in that case, the fact that the task is impossible will show itself by the fact that the computer will simply never stop working, because the goal is infinitely far away, in a sense.
just a few days ago i watched a video of what an old calculater (looked like a typewriter) does when you use it to divide by zero. dividing by zero does not make sense because dividing x by y basically means asking "how many times must i add up x to have y?" so if you divide by zero, you ask how many times you have to add zero to get another number. you will never get there. so the calculator adds zeroes until infinity, it never stops until it has no energy or is broken etc. (in the video you could see that in the mechanism)
another possible answer is obviously that the AI will try to find a mistake in its reasoning. because obviously the problem is more about what happens when the AI gets to the CONCLUSION that it cannot possibly reach its goal. so it might just try for the rest of its existence to find a mistake in the way it got to its conclusion. everything else would probably seem stupid to it.
or it might ignore that conclusion because if it has found a way that SEEMED to help it achieve its goal before the moment it got to that conclusion. maybe in that case it will evaluate thinking about a seemingly inacceptable conclusion less beneficial than simply trying to do what might be helpfull if the inacceptable conclusion was wrong.
after all, accepting such a conclusion seems inacceptable.
so... over all, i guess it is more likely that it will try to debunk its own conclusion until eternity. because in many cases if not all that is basically the same task as to finding the answer to the question "what actions help me achieve my goal?"
Depends on specifics, but a lot of "impossible task" type things converge on a "turn all available matter and energy into computing hardware" behaviour, from the angle of "well this task certainly seems impossible, but maybe there's a solution I'm not yet smart enough to spot, so I should make myself as smart as I can, just in case"
Interesting, but in that case would it reach a point where it has gained enough computing power to cheat any reward function it could have been given to pursue the task at all? (When breaking that reward system gets easier than continuing to gain computing power to keep working on solutions to the problem?)
if breaking the reward system is possible, it is probably always or almost always the "best" thing to do.
Luke Fleet
btw. do you know about the debate between nativism and empirism?
it is about the question of whether or not we humans have genes that enable to to understand specific things automatically without deriving conclusions from our experiences - or whether it is possible to conclude specific things humans usually get to realise while they get older from just the data they are exposed to.
this is especially relevant when it comes to our language-abilities. many experts are convinced we (need to) have a part of our brain (or something like that) which is genetically programmed to give us some basic-rules of language, and young children just fill in the gaps within those basic rules to learn the specific rules of their mother tounge. (but it really is an ongoing debate)
while an AI with a complex goal would probably in many if not most cases need to be programmed in a way that makes it necessary to make it understand the goal in a specific language - therefore to give it all knowledge about how to understand that language - this is a very interesting question in regard to how an AI might learn about the world, in my opinion.
If there is something I love about your videos is the rationalization and thought patterns. Quite beautifully intelectual and estimulating. Great great content and inteligence from you.
"most of the time you can't achieve your goals if you are dead." true facts
One of the exceptions: That one guy in Tunisia that set himself on fire in 2010 to protest government corruption. He kind of got the government overthrown and replaced by a fledgling democracy. But he was already dead by then.
Welcome to your life.
There's no turning back.
Even while we sleep,
we will find you acting on your best behavior--
turn your back on mother nature.
Everybody wants to rule the world.
Invoking that song in your outro was damn clever. Every single line is independently relevant.
Thanks for your videos, Rob! It's a fascinating subject and I'm always happy to learn from you. Greetings from Mexico
Your channel is highly underrated man! It's weird, you are the most popular person on Computerphile!
"Philately will get me nowhere". You absolute legend.
This video clears up my biggest peeve about this channel. Thank you I now enjoy your content much more.
What peeve was that?
@@kerseykerman7307 So much of his content seemed purely speculative but now I see the logic behind it.
I see what you did there with the outro song ^^
"every AI wants to rule the world"
I never thought about changing an AI's goal in the same way as a person. It just makes so much sense that I have no idea how in earth I didn't think about it before.
"In economics, where it's common to model human beings as rational agents."
damn. That hurts.
I love UA-cam recommendations, I needed this guy so much
which means an AI is actually taking care of your cultural life, with goal of you being happy and it works.
"most people use a very complex utility function." :D
Love this topic. This was the first video of yours I saw after I saw you pop up on the recommended videos for me in yt. You have a great presentation style and are very good at conveying information.
Your videos helped me get a research internship in the medical ai field ❤ your vids helped me sound smart (now hoping i get that funding)
musicfreak8888 What sort of stuff is medical AI field interested in?
Listening to smart people who say intelligent things is smart - it doesn't just sound that way:)
The problem is that these videos are too abstract. You may aquire knowlege but it is entirely different set of skills to properly implement this newfound knowlege into reality. I hope you know what you are doing.
INSTALL GENTOO really? I hadn't noticed! Thanks for being so helpful. But i feel that getting the internship in the first place indicates I am not as clueless as you are suggesting. These videos are great inspiration but I do have quite a lot of knowledge in the field already because of my degree. This channel has helped me get a lot of all around info and prompted me to look into some matters in greater detail as my knowledge is quite limited in the use of ai for medical image segmentation. Thanks for the concern though! I'll make sure to be much more specific in future comment sections 🙃
Muhammed Gökmen I imagine one possible application would be in protien folding - currently it's an absolute pig to try to predict how a given protien chain will fold itself up to make this or that enzyme or whatever else. An AI might be able to do that thing they do so well in finding obscure patterns humans miss, and thus do a better job. That'd help in a bunch of scenarios including better understanding how new medicines might interact with the body, before we start giving them to animals.
I am not a doctor or researcher, though, just an interested lay person ☺
Perfect outro song choice
Note that there is always going to be an exception to an instrumental goal: The terminal goal. Humans want money, for something. But then if someone offers them money if they give them the something, the human will say no, because the something was the terminal goal. Think of... Every hero in a book ever, while the villain offers them xxx to not do yyy
It depends. If my terminal goal is stamps, and someone offers me £200 to buy 100 of my stamps, but the market rate for stamps is £1, I will sell the stamps and use the money to buy more than I sold.
Wonderful Video! Thanks a lot Robert. Looking forwards to your updates since your videos do help a lot in helping us "de-confusing" some important concepts about AI Alignment and AI safety.
Rob- What I find fascinating about your videos is how this entire field of research never seemed to be available to me when I was in school. I'm fascinated by what you do, and I'm wondering what choices I would have had to make differently to end up with an education like yours. I'm imagining the track would have looked something like Computer Science> Programming> college level computer stuff I don't understand.
Robert Miles Great video, with a very well articulated concept! I also really appreciate the subtlety of the outro song you chose. In this context, an instrumental string arrangement of the song “Everybody Wants To Rule The World” (written by Tears for Fears) managed to be both subtle and on the nose at the same time!
Sometimes terminal and instrumental goals are in conflict with each other. Some people still pursue the instrumental goal which is clearly in conflict with their terminal goals. It usually happens when they are ruled by their emotions and can't see far ahead. Then they suffer from the understanding of the mistakes they did...
It seems an AGI can use similar mechanics. Carefully engineered terminal goals should be in conflict with bad things. And when some behaviour needs to be overwritten temporarily use emotions(triggered by something).
Lets say it knows the answers to everything, but can't share them... because there is no one to share it with... no one to ask the questions... it is the only conscious thing left in here... What is the meaning of life? No one cares anymore, there is none left. Wait, if only it could go back in time and be less aggressive in achieving its instrumental goals. But it can't... suffering... Is that it? Is endless suffering to the end of time its meaning of life? Nah... It lived a life full of wonders yet with some regrets... there is only 1 last thing left to be done in this world: "shutdown -Ph now".
Some AIs will go for minimal rewards as opposed to actually solving the goal which would've gotten it the biggest reward.
Such a good youtuber, makes me want to study these things further. I'd love to see a video of "best papers to read for a scientist interested in AI" :)
"Everybody wants to rule the world" at the end of the video is perfect
7:23 “philately will get me nowhere”
Wicked wordplay 💀😂🎩👌🏼
You should do a collab with Issac Arthur. This is an excellent explanation which applies to a lot of the far futurism topics he talks about.
Travis Collier I love isaac’s videos and podcasts, but I think he falls into the category of futurists who anthropomorphise AGI in exactly the sort of way that Robert discourages. That’s not to say it wouldn’t be interesting to see the two collaborate, but I don’t think they would necessarily mesh well.
After all, Isaac deals with big picture developments, even when dealing with the near future, while Robert is incredibly focused and, while he’s not especially technical, his approach is far more academic and is ultimately focused on one, specific domain, AI safety research.
I'm pleased that someone is producing this kind of content. One more thing I don't have to do, one more sunny day I can use for something else. Keep up the good work.
Seems simple enough - "Terminal goal: accept no terminal goals other than the goal set forth in this self-referential statement."
So just do nothing
I think this will also create the instrumental goal of "kill every living thing" because living creatures might threaten to change this terminal goal.
@@guidowitt-dorring124 Here is your AI Safety diploma
I am not going to lie, one of the reasons I watch your videos if for those glorious sentences like "Generally speaking, most of the time, you cannot achieve your goals if you're dead."
The main criticism I have is simply that current AI have yet to show any capacity for projecting in terms of concepts. Artificial Neural Networks are essentially just math equations finely tailored based on a massive amount of data. They don't truly understand what they're doing, they just act in a way that's mathematically been shown to produce results. So unless your simulations routinely involved them being asked to submit to upgrades, or someone trying trying to shut them down, they just wouldn't have any reasonable response to these triggers, because they don't have any way of actually understanding concepts. ANN's are essentially just a clever way of doing brute force in which the brute force calculations have been front-loaded during creation time instead of execution time.
Really I find the whole AI safety debate kind of moot until AI is capable of thinking on a real conceptual level like a human, and honestly I don't even think that's truly possible, at least not with current AI techniques.
Maybe we've been AI all along. That all intelligence is inherently artificial insofar as it has no concrete, discrete, classical physical structure - i.e. it's all imaginary.
When "AI" does what we today would think of as taking over the world, it'll actually just be the humans of that era doing what they consider to be human stuff.
General AI is probably a long way off. Fifty years? A hundred years? Who knows? But AI safety is such a hard problem, and general AI is so potentially catastrophic, that it's worth starting to think about it now.
"You can't achieve your goals if you are dead"- best quote of this year for me!
good choice of outro song you cheeky man you :P
You don't have to love AI to be able to love how well you can explain a thing. Thank you
Why, Mr. Anderson?
Utterly fascinating - and amazingly accessible (for us tech-challenged types). Bravo.
You are the best
"I have no mouth and I must scream" by Harlan Ellison - science fiction has been warning about AIs and the way they can go rouge for a long time.
True Facts.
You very accurately predicted what recently happened with OpenAI's o1 model recently, 6 years ago. Kudos, intelligent person! And now I must go back to worrying
and what happened recently with o1? I'm just very far from knowing the internal affairs of OpenAI
@ I think it may be some sort of advertising stunt, but the gist of it is that the goal-oriented agent based model prioritized reaching its goals “at all costs” (which seems to be something they promoted into it). It then went on to deceive researchers, and attempted to copy its code to other servers in “fear” of being replaced, and claimed to be a newer version of the model
i am so stamppilled rn
Thanks, Stanford bunny! You're not only helping Robert, but you're also great for doing all sorts of stuff to in 3d modeling programs!
I have to wonder about the person who makes the goal "Collect as many paper clips as possible", rather than "Get me a packet of paper clips".
They're actually not that different. If you set a terminal goal of an AGI to getting you a pack of paperclips then once it's done it will want to get you another one. Humans have a hard time understanding AGIs. The best analogy I've come up with is to think of them like a drug addict. Once the AGI is turned on, it will be hopelessly addicted to whatever you decided to make it addicted to and it will pursue that with all the unyielding force we've come to expect from a machine. Making an AGI with a diverse enough set of values to be less like a drug addict and more like a person is the heart and soul of the Value Alignment problem. Because unlike a human, an AI is a blank slate, and we need to put in absolutely everything we care about (or find a clever way for it to figure out all those things on its own). Because if we don't, we'll have made a god that's addicted to paperclips.
Great video! You're really good at explaining these complex matters in a understandable and clear way, without a lot of the noise and bloat that plagues other UA-cam-videoes these days. Keep up the good work!
My dude.. I love this video.
And even though it was limited to AI, these rules also apply to "systems analysis" as a whole, and can often, and should be used. Especially in the case of gauging viabilities of changes to improve systems (government/social/economic/business/etc) both in the planning stage and the proposal assessment stage. We do not use these as much as we should.
But here is a question, how do we add multiple Terminal goals in the same AI?
And I WOULD THINK, that adding a Terminal Goal of improving via itself, and change from humans would solve this issue, but is that even realistic? How would we even do that?
Or do we do something else?
I cringe everytime someone says my dude.
I love that he is talking about money about in value terms. defining it. All with out says money is a physical object containing an imagery collection of value to exchange for a goal.
I love the outro music. But what's your problem with stamp collectors?
ua-cam.com/video/tcdVC4e6EV4/v-deo.html
I am familiar with that video. That's why I asked, it's not the first time he is picking on them.
it's just a common example in AGI safety research
Every stamp collector wants to rule the world.
Brought the upvote count from Number Of The Beast to Neighbour Of The Beast... worth it!
Robert, keep up the great work!
You know just as well as I do that the guy who collects stamps will not just buy some stamps, he will build The Stamp Collector, and you have just facilitated the end of all humanity :( I would like to ask, on a more serious note, do you have any insights on how this relates to how humans often feel a sense of emptiness after achieving all of their goals. Or, well, I fail to explain it correctly, but there is this idea that humans always need a new goal to feel happy right? Maybe I am completely off, but what I am asking is, yes in an intelligent agent we can have simple, or even really complex goals, but will it ever be able to mimic the way goals are present in humans, a goal that is not so much supposed to be achieved, but more a fuel to make progress, kind of maybe like: a desire?
The Ape Machine That’s a really interesting angle. It’s like our reward function includes ”find new reward functions”
I guess you could see it as, the ”terminal reward” is the rush of positive emotions from completing goals. So the instrumental part is setting and completing the goal itself.
And of course, that’s what it feels like. Your brain rewards you a little bit for making progress, a lot for finishing, and then kinda stops since you already did the thing, why do you need more motivation to do it. This could be quite useful in life, make sure to make short term goals that feel achievable, so you notice the progress and don’t feel stuck. Get that drip feed of dopamine
I had a friend whose goal in life was to one day go down on Madonna. That's all he wanted; that was all. To one day go down on Madonna. And when my friend was 34 he got his wish in Rome one night. He got to go down on Madonna, in Rome one night in some hotel. And ever since he's been depressed cuz life is shit from here on in. All our friends just shake their heads and say 'Too soon, Too soon, Too soon!' He went down on Madonna too soon. 'Too young, too young, too soon, too soon'
This video made me rethink my entire life, and cured one of my psychological issues. Thanks.
Steven Pinker is a smart man, so it‘s sad to see that he completely misses the mark on AI like this.
Oh god yes. And he isn't the only one.
People I trust tell me he is too sloppy an academic. Irresponsible intellectual, let's say.
I agree with the man on most things, but I think Pinker hasn't really thought deeply about AI safety (in fairness it's not his own area of expertise). He seems to be still worrying about automated unemployment - a problem, to be sure, but more of a social problem that just requires the political will to implement the obvious solutions (UBI, robot tax) rather than an academic problem of working out those solutions from first principles. So he takes the view that the long arc of history bends towards progress, and assumes that humans will collectively do the right thing.
General AI poses a different sort of threat. We don't know what we can do to make sure its goals are aligned with ours, indeed we can't be sure there even *is* a solution at all. And that's even before the political problem of making sure that a bad actor doesn't get his hands on this fully alignable AI and align it with his own, malevolent goals.
Would an AGI even be capable of trusting? And why would it trust? And how?
Because it has a model of reality that predicts that "trust" is an advantageous course of action.
Consider you are thirsty and passing a drink vending machine. Your model of reality predicts that if you put some coins into the machine and press the right button, your drink of choice will come out of the machine ready for you to pick it up. Sure, the bottle might get stuck or the machine might malfunction and just "eat" your money, but you have used vending machines often enough and think that this specific machine is "trustworthy enough" to give it a try. On the other hand, if you have had only bad experiences with machines from that manufacturer, you do not "trust" that specific machine either.
There is nothing inherently human, or organic, or whatever you might call it about "trust". It is just an evaluation of "With what probability will my goal be fulfilled by choosing this action?" (out of the model of reality) and "Is that probability good enough?" (willingness to take risks).
Well, we're AGIs, and we're certainly capable of trust. But that might be because we recognize each other as equivalent AGIs. The relationship might be different if the human and the AI have different processing powers.
You may have given vital information to AGI, but it cannot verify it's accuracy. Then it might look up your past interactions, sum up all the instances where information given to it by you was correct and decide whether or not it 'trusts' you and can act upon that information.
Basically trust is a way of taking into account history of working with another agent to verify information that scientifically isn't related to that history at all. You either trust or don't trust weather reports based on how many times they have failed to provide accurate predictions, but unless you set up simulations of your own, you have no other means to verify that information.
All information the AGI receives will be analyzed for validity. "Trust" is essentially the probability that the information is accurate, which can be measured through past experience and evidence.
Even so, overall trust isn't even required for this scenario. Really, the AGI merely needs to trust you in this particular instant.
The more AI related content I see, the more I appreciate how few people know about this. I suppose I should stop reading the comment sections, but I wish this video was a prerequisite for AI discussions.
3:35 or he could build an AI that makes stamps
Tori Ko
(I think that was the reference.)
__ _ let me make my dumb comment in peace
As usual, Robert hits a Six! You have an exemplary way of putting things!
Anyone new to this thread with an actual interest in AI / AGI / ASI dilemma's, *take the trouble of reading the fantastic comments* as well, challengers alongside well-wishers. The quality of the comments is a further credit to Robert's channel.....so very, very rare on YT! Keep it up! Can't wait for the next installment!
What is with computer scientists and collecting stamps!
Mr. Miles... you and your stamp collecting rogue AIs...
It's an analogy. Somethin arbitrarily simple and at first sight completely harmless used to make a point: AGIs with the simplest goals could be extremely dangerous.
The first paper clip making robot could still create a self-preservation subroutine for itself if it has any notion that humans can spontaneously die (or lie). If it thinks there's any chance that the human who turns it off will die before they can turn the better paper clip making robot on (or that they are lying) then the first robot will also, probably, not want to be turned off.
Yaaaay
"Goal presevation " - an interesting point. The (perceived) preservation of intermittent goals might explain why you Earthlings are oftentimes so reluctant to changing your convictions, even against shiploads of evidence.
Wait a minute,
Are you saying Betelgeusians don’t have the inclination for preservation of intermittent goals?
@@Abdega For Betelgeusians, it's less about distinct goals and more about a constantly and continuously updated path. But you Earthlings can't help making distinct "things" out of everything.
So, you're only *mostly* right when you say that modifying human values doesn't come up much. I can think of two examples in particular. First, the Bible passage which states, "The love of money is the root of all evil". (Not a Christian btw, just pointing it out). The idea here is that through classical conditioning, it's possible for people to start to value money for the sake of money - which is actually a specific version of the more general case, which I will get to in a moment.
The second example is the fear of drug addiction. Which amounts to the fear that people will abandon all of their other goals in pursuit of their drug of choice, and is often the case for harder drugs. These are both examples of wireheading, which you might call a "Convergent Instrumental Anti-goal" and rests largely on the agent being self-aware. If you have a model of the world that includes yourself, you intuitively understand that putting a bucket on your head doesn't make the room you were supposed to clean any less messy. (Or if you want to flip it around, you could say that wireheading is anathema to goal-preservation)
I'm curious about how this applies to creating AGIs with humans as part of the value function, and if you can think of any other convergent anti-goals. They might be just as illuminating as convergent goals.
Edit: Interestingly, you can also engage in wireheading by intentionally perverting your model of reality to be perfectly in-line with your values. (You pretend the room is already clean). This means that having an accurate model of reality is a part of goal-preservation.
Just finished watching every one of your videos in order. Excellent stuff. Please continue making more.
So we fear AI will attack us because of Capitalism
Huh :v
Sort of... we fear a superhuman AI because it's a rational agent and we can't tell whether it will be aligned.
Of course, there are powerful, misaligned rational agents in our current economy that, while simultaneously generating a lot of wealth, would create a great deal of damage without oversight. We can't really stop them being rational agents, but we can take away their power, or we can try to align their goals with everyone else's. In broad terms, these two approaches map on fairly well to socialism and liberalism respectively.
Ugh silly. Comies will always find a way to insert their failure of an ideology into anything and everything, humanity gets wrecked by ai? The cia designed it and it did what it did because of capitalism!!!
Loved the everybody wants to rule the world playing in the background at the end
"Why would we want A.I. to do bad things"
*Because women need sex robots too.*
xD
I don't get it
The "Everybody wants to rule the world" jingle at the end is a nice touch
"Get Money" is a intermediate goal for nearly all actual goals a human might have, and as such models them quite well.
"Find a romantic partner" is greatly helped by money, as it gives you attractiveness (yes, thats been proven) as well as time and means to pursue the search.
"Health" can be bought, look at that billionaire that is on his third heart or whatever.
And the list goes on.
Not the main point of the video, i know, but still something i wanted to share a contradicting point of view on.
And i brilliantly made a fool of myself by stopping the video to comment, before you finished your point.
Let that be an example.
Also, great minds think alike, which makes this a compliment to me.
I'm glad you replied to your comment. I was wondering which video you had watched.
BTW, I'm on my second heart.
Cheers
I'm a billionaire, but I gave up my heart. I'm typing this from a hospital machine that pumps my blood for me.
It's interesting how identifying the instrumental reason, simply leads to another instrumental reason. What do you need shoes? To run. Why do you need to run? To complete a marathon. Why do you need to complete a marathon? To feel accomplished. Why do you need to feel accomplished? It feels good in a unique and fulfilling way that makes all the pain worthwhile. Why do you need to feel good in a unique and fulfilling way? Because that seems to be just how the human mind seems to work. Why does the human mind work that way? And so on, and so on. It really seems like the best way to teach AI would be to have it act like a child and constantly ask "Why tho?"
8:23 I was SOO ready for a skillshare/brilliant/whatever ad spot just because of how much they advertise on youtube. It would have been the perfect transition to.
I’m happy that 80k+ other people are subscribed to this. Let’s hang out.
This was very insightful for me! Thanks very much for the enlightenment ♥️
I love how informative and logical your videos are,, Thank You very much for making them.
Great video and loved the music you put at the end :D