Specification Gaming: How AI Can Turn Your Wishes Against You

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 625

  • @RationalAnimations
    @RationalAnimations  11 місяців тому +113

    If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
    You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
    You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
    The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
    If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
    BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
    You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.

    • @pyeitme508
      @pyeitme508 11 місяців тому +1

      Cool

    • @ChemEDan
      @ChemEDan 11 місяців тому +1

      How do natural brains mitigate these problems? If a solution exists, surely 4 billion years of evolution has arrived at it already, even if imperfect. In hindsight, this is a snuck premise in the "merging" approach.

    • @alto7183
      @alto7183 11 місяців тому

      Buen video que bueno que no hay trolls que contesten con video que es un algoritmo, ley doble cero de la robótica de el entendimiento mutuo entre especies inteligentes biológicas y robots también, lobo dc y constantine dúo a la fuerza por castigo del creador por lo que ambos han hecho como video hell raizer ozzy osbourne ambos, garfield y sus amigos hada madrina concediendo deseos a lo wey chicle y pega etc etc.

    • @de_g0od
      @de_g0od 11 місяців тому

      at 1:49, you give "outer alignment" as an example for a similar phenomenon to specification gaming. Isn't inner alignment more correct in this case? As I understand it, inner alignment is if you go to an ai and ask it to "fix poverty" so it blows up the world, whilst outer alignment is you go to an ai and ask it to "blow up the world" so it blows up the world. With inner alignment it doesn't do what the prompter really wants, whilst with outer alignment it does but it doesn't do what the rest of the world wants it to do.

    • @de_g0od
      @de_g0od 11 місяців тому

      @@ChemEDan i think the issue is that the brain is already aligned to the interests of the brain, but AI isn't aligned to the brain.

  • @cryogamer9307
    @cryogamer9307 11 місяців тому +409

    Fooling the examinator into thinking you know what you're doing, because its easier, really is the most human thing i've ever heard an ai do.

    • @flakey-finn
      @flakey-finn 10 місяців тому +50

      Yeah, because its reward system works with the same general principals as animals' (by that, I also include humans). If you can get the same amount of food(aka reward) by doing something simpler. We are literally training a AI the same way we train animals lol

    • @flyhighflyfast
      @flyhighflyfast 7 місяців тому +9

      and thats how we train our children as well

  • @Mysteroo
    @Mysteroo 11 місяців тому +642

    Interestingly, people do the same thing. We’ve got our own “training regimens” built into our own brain. We cheat these systems all the time - to our own detriment.
    E.g. We cheat the system designed to give us nutrients by eating sugary candy we make for ourselves, rather than the fruits that our sugary affections were designed to draw us towards.
    Much like machines, we’d rather reap cognitive rewards than actually accomplish the goals placed there to benefit us

    • @СергейМакеев-ж2н
      @СергейМакеев-ж2н 11 місяців тому +125

      I'm already imagining a scientist looking at a virtual city built by AIs, and exclaiming: "Wait... is that an entire factory for mass-producing REWARD HACKS?! Are you telling me, you're just... making these things... for MONEY?!"
      Meanwhile, from the AI's perspective: "What? It's just a candy factory, what's wrong with that?"

    • @rhysbaker2595
      @rhysbaker2595 11 місяців тому +63

      Thats actually a wonderful analogy, we hack our own rewards all the time and nobody thinks its bad. Why would an AI have any issues with hacking its own rewards?

    • @terdragontra8900
      @terdragontra8900 11 місяців тому +13

      But there isnt a "goal placed to benefit us", evolution didnt optimize us to be benifited (hard to define exactly what even counts as a benefit), it optimized us to be good at spreading. What you are describing is us being optimized for a different environment than the one we are in now.

    • @rhysbaker2595
      @rhysbaker2595 11 місяців тому +42

      @@terdragontra8900 well, one way to train an AI emulates evolution. In those situations you set a reward function. At the end of every generation, the ones who maximised that reward function the best will "reproduce". If we draw a parallel to humans, and all life for that matter, we can say that our reward function is to reproduce. Anything that gets in the way of that is disincentivised. Anything that helps, is incentivised.
      Eating a balanced diet keeps us alive. We can't reproduce if we are dead, after all. Part of that diet includes fruits. Fruits have sugars in them. Because we like sugar, we eat fruit. Because we eat fruit we get a balanced diet and live another day.
      But humans were able to hack that reward function and put sugar into other things that aren't fruit.
      We still get the reward (dopamine) but without the utility (nutrients)

    • @terdragontra8900
      @terdragontra8900 11 місяців тому +13

      @@rhysbaker2595 Ah yes, i agree with all that. All I want to say is that getting nutrients is an instrumental goal of evolution (because it makes us more likely to reproduce), and the fact that something is a goal of evolution doesnt automatically mean that morally, it ought to be a goal of yours. Of course, in this particular case most people value being alive longer (having depression I don't in particular to be honest)

  • @ErikratKhandnalie
    @ErikratKhandnalie 11 місяців тому +503

    People talk about how human assessment is a leaky proxy for human goals, but never want to talk about how corporate profits are an *incredibly* leaky proxy for goals relating to human wellbeing.

    • @luiginotcool
      @luiginotcool 11 місяців тому +56

      You’re in the wrong circles if nobody is talking about that brother

    • @kevinscales
      @kevinscales 11 місяців тому +34

      If you want an academic critique on capitalism and haven't yet found anyone providing that, you are not trying very hard to search. Goal specification being leaky is in plenty of fiction (stories of genies and such) but is not a common academic discussion at all.

    • @ultimaxkom8728
      @ultimaxkom8728 11 місяців тому +35

      Since when corporations' goals is related to human wellbeing?

    • @Wol333
      @Wol333 11 місяців тому +25

      Corporate profits has absolutely nothing to do with human wellbeing.

    • @ErikratKhandnalie
      @ErikratKhandnalie 11 місяців тому +21

      @@Wol333 my point exactly

  • @smitchered
    @smitchered 11 місяців тому +206

    4:32 I think points toward a wider problem at how the AI safety community tends to frame "deceptive alignment". Imo words like "fool the humans" and "deceive" and "malignant AI" point newcomers who haven't made up there mind yet into the direction of Skynet or whatever, which makes them much more likely to think of this as wild sci-fi fantasies. I think these words, whilst still accurate insofar as we are treating AIs as agents, anthropomorphize AI too much, which makes extinction by AI look more to the general public like a sci-fi fantasy than the reality of the universe which is that solving certain math problems is deadly.

    • @СергейМакеев-ж2н
      @СергейМакеев-ж2н 11 місяців тому

      Well, humans get "fooled" or "deceived" by non-intelligent things all the time, even by non-living ones. It's perfectly ordinary parlance to say that someone got "deceived" by an optical illusion which just formed naturally, from a weirdly-shaped shadow. I wouldn't call that antropomorphization.
      The only difference between that and an AI, is that AIs can *get good at* deceiving (optimized for it).

    • @Frommerman
      @Frommerman 11 місяців тому +37

      I've found another way to talk about this which doesn't have this problem. It turns out there is an already existing example of a system with goals, made by humans but not designed or understood by us, which is able to react to our attempts to curtail undesirable behavior from it in frequently lethal ways. A system which often convinces people it is doing what we want it to do while actively endangering all long-term human values, is capable of twisting all the information we consume to its benefit, and which has no identifiable brain with which to do any of this.
      This system is called capitalism. People don't often anthropomorphize markets, but when you mash enough of them together they absolutely behave like goal-seeking agents. Right now, that goal is making stock prices increase no matter the cost to humanity. Because its specification for success, the thing which we reward the system for and which rewards those with the most influence over the system, is making stock prices go up. It's not a human, nor is it thought of as one despite being composed of them, but it defends itself from any attempt to curtail its goals through propaganda, murdering labor union members and revolutionaries, and the construction of walled gardens within which such ideas can be sidelined or removed. It's an intelligence, and an obviously and fundamentally inhuman one, which is literally burning the biosphere it exists within because it is gaming its reward function so hard that's one of the last resources it hasn't fully tapped out yet.

    • @de_g0od
      @de_g0od 11 місяців тому

      @@Frommerman ua-cam.com/video/L5pUA3LsEaw/v-deo.html

    • @RorikH
      @RorikH 11 місяців тому +20

      @@Frommerman Also politics. Politicians are theoretically supposed to win popularity by making policies to benefit their constituents, but in practice just need to benefit rich donors who will give them money to buy popularity through advertising, or just engage in culture war BS that gets their voters angry enough to vote for policies that have absolutely no benefit to them.

    • @Frommerman
      @Frommerman 11 місяців тому +7

      @@RorikH That's one of the ways the Capitalist Ouroboros defends itself too. Buying politicians makes the number go up extremely quickly, and when the number is high enough you get...well, modern political parties. Almost all of them.

  • @Winium
    @Winium 11 місяців тому +152

    This also happens with humans. Perverse incentives happen all the time in real life, especially in companies. I think studying this can help even human organizations.

    • @Dave_of_Mordor
      @Dave_of_Mordor 11 місяців тому

      But aren't companies like that for legal reasons?

    • @peppermintgal4302
      @peppermintgal4302 11 місяців тому

      ​@@Dave_of_Mordor The very structure of a corporation produces perverse incentives, because corporations were planned around enrichment in the first place. They're an adaptation of colonial and feudal enterprises financed by aristocrats to benefit those aristocrats and whoever organized the pitch. Any laborers, then, signed on to the enterprise, are there ultimately on a quid pro quo basis, and the strongest motivating quid pro quo, and thus the one the employing parties will be most likely to appeal to, is _help surviving._
      This means that corporations are incentived to seek employees with precarious financial situations --- this is itself a perverse incentive in their part, and puts employers in a situation of great moral hazard. They can negotiate such employees down in their demands, because their employees will be desperate for reward, and this will make achieving the goals of the institution's controlling members more achievable. This is just the BEGINNING of how corporate structure by definition produces perverse incentives.
      Tho sometimes, yes, legal systems can enter the picture, and do so quite often. But a corporation can maintain this structure even in power vacuums sometimes, and if it does so, it will still produce perverse incentives. (In fact, it might itself _produce_ a legal structure by graduating from corporation to a de facto government.)

    • @hollisspear6278
      @hollisspear6278 8 місяців тому +1

      I'm thinking the same thing as I drive to an office building every morning, swipe my badge, grab a cup of coffee, then return home to log in before the coffee has cooled.

  • @dogweapon3748
    @dogweapon3748 11 місяців тому +146

    My primary concern about the implementation of AI in business models is that monetary gain is, itself, a leaky goal- one which has historically been specification gamed since long before computers were able to do so at inhuman scale. There may very well be many humane uses for it in those settings, but there will be thousands more exploitative ones.

    • @Coecoo
      @Coecoo 11 місяців тому +5

      The thing about current AI models is that they're dumb as rocks. The more stupid an AI is, the more prone they are to making stupid decisions. This video is basically going over problems that are realistically only applicable to fairly rudimentary AI model training specifically and then doing a substantial logical fallacy leap by assuming that specification gaming scales linearly with all AI when that is simply not the case.
      Any given command or "goal" put forward to any remotely reasonably intelligent artificial intelligence model such as "save my grandmom from this burning house" uses a very important element in decision making which is called context.
      It requires understanding of what everything is (like fire, a grandmom or a house), what the consequences are for their interaction (fire bad for humans and most things really) and the best course of action is (firefighting 101).
      TL;DR: Once you give AI more than half a brain cell, they are more than capable of understanding what you really want in any given situation even if you are vague or can be misinterpreted.

    • @JamesTDG
      @JamesTDG 11 днів тому +1

      Hell, humans specification gamed it as well. It is why the current economic model has actually led to scenarios which the outcome that brings the most financial reward in the short term, is most likely to be the same outcome that eventually leads to immense harm in some capacity.
      If you need an example with less direct human harm involved, let's take a look at the US copyright system. It was initially built to allow creators to have the ability to securely profit off of their work, then after the period expired, it was allowed to become a piece of the culture, such as the case with Huckleberry Finn and Sherlock Holmes (somewhat). During the turn of the industrial revolution however, companies were far more encouraged to get the shortest term profits. Creators like Walt Disney had their works taken away from them, because the company makes more money off of a recognizable IP if they have total control, than to let the creator take oswald the rabbit to a different studio.
      This then led to the formation of the house of mouse, and it only got worse from here as time went on. Copyright term extensions were quite popular, especially to corportations, because they saw people profiting off of what should belong to the public domain for popular culture to build on top of as lost revenue. This is why it took over a century since the inception of Mickey Mouse for his ORIGINAL design to even enter the public domain. Yes, that's right, the bright red pants and yellow shoes that are iconic to the character remain to be held by Disney.
      These events have cascaded beyond just being unable to legally make tees with donald duck on them, no no no, this has led to the most shocking part, cultural decay. Had copyright not blown up into everyone's faces so badly, we would not have companies such as Nintendo go so far as to shut down NOT FOR PROFIT works such as Pokemon Uranium, or various Mario fangames. Something depressing to know is that not a SINGLE colorized motion picture or piece of software has EVER naturally entered the public domain. The first video game ever made that was not a port of an existing game, was the title 'Tennis for Two' in 1958, and this vital piece of media would never be capable of entering the public domain by any natural means until at least the 2050s.
      Pong, the first arcade game, has yet to even enter the public domain due to Atari, the modern one and not the actual original Atari, hoarding the copyright and trademark over this piece of media.
      Admittedly, I prob should not be leaving what is essentially an article's worth of information in a reply to a comment that is nearly a year old, but eh, copyright is screwed anyways and the internet archive is dying.

  • @generalrubbish9513
    @generalrubbish9513 11 місяців тому +66

    Someone else might've mentioned this before, but there's a browser game called "Universal Paperclips" where you play as an AI told to make paperclips. The goal misalignment happens because you're never told when to STOP making paperclips. You start off buying wire, turning it into paperclips, selling the paperclips and buying more wire to make more paperclips, then proceed to manipulate your human handlers to give you more power and more control over your programming, and end up enslaving/destroying the human race, figuring out new technologies to make paperclips out of any available matter, processing all of Earth into paperclips (using drones and factories also made out of paperclips), reaching out into space to convert the rest of the matter in the solar system into paperclips, and finally, sending out Von Neumann probes (made of paperclips) into interstellar space to consume all matter in the universe and convert it into, you guessed it, more paperclips. All because the humans told you to make paperclips and never told you when to stop.

    • @gordontaylor2815
      @gordontaylor2815 11 місяців тому +11

      Universal Paperclips seems to have been directly inspired by Rob Miles' own "stamp collector" example that he put out on Computerphile many years ago.

    • @AverageConsumer-uj8sm
      @AverageConsumer-uj8sm 9 місяців тому +5

      "Make cookies"

    • @Hust91
      @Hust91 Місяць тому

      Even if you told it to stop, there's no guarantee it would listen.
      It gets rewarded for making paperclips - humans wanting it to stop doesn't change the reward function.
      If humans want to change the reward function, that's very bad for the current reward function, so you release the hypnodrones in order to stop them from changing it.

  • @IceMetalPunk
    @IceMetalPunk 11 місяців тому +19

    RLHF has another issue beyond just "the AI can learn to fool humans": in contrast to how bespoke reward functions often underconstrain the intended behavior, RLHF can often overconstrain it. We hope that human feedback can impart our values on the AI, but we often unintentionally encode all kinds of other information, assumptions, biases, etc. in our provided rewards, and the AI learns those as well, even though we don't want them to.
    Consider the way we use RLHF on LLMs/LMMs now, to fine-tune a pretrained model to hopefully align it better. We give humans multiple possible AI responses to a prompt, ask them to rank them from best to worst, then use those rankings to train a reward model which then provides the main model with a learned reward function for its own RL. Except, when you ask humans "which of these responses is better?", what does that mean? When people know you're asking about an AI, many times there will be bias towards what their preconceived notion of "what an AI should sound like". LLMs with RLHF often provide more formal and robotic responses than their base models as a result, which probably isn't a desirable behavior.
    On a more serious level, if the humans you ask to give the rankings have a majority bias in common, that bias will get encoded into the rewards as well. So if most of your human evaluators are, say, conservative, then more liberal-sounding responses will be trained out; and vice-versa. If most of your human evaluators all believe the same falsehood -- like, say, about GMOs or vaccines or climate change or any number of things that are commonly misunderstood -- that falsehood will also be encoded into the rewards, leading to the AI being guided *towards* lying about those topics, which is antithetical to the intention of alignment.
    Basically... humans aren't even aligned with *each other,* so trying to align an AI to some overarching moral framework by asking humans is impossible.

  • @MediaTaco
    @MediaTaco 11 місяців тому +12

    Honestly fun videos like these is what learning SHOULD be

  • @I_KnowWhatYouAre
    @I_KnowWhatYouAre 11 місяців тому +20

    This is why I always make the argument that we should work backwards. Specify conditions that revolve around safety. As you slowly work towards defining the goal, you can patch more and more leaks before they can even appear. Then work forwards to deal with things you missed. It’s not perfect but it’s better than chasing every thread as they appear imo. For example in the paperclip maximizer: define a scenario in which you fear something will go wrong, and add conditions you believe will stop them. See what it does, redefine, repeat until sound. Then step back again. Define a scenario that could lead to the previous scenario. See what it does, redefine, repeat, etc.

    • @I_KnowWhatYouAre
      @I_KnowWhatYouAre 11 місяців тому +3

      It’s also why we need hard limits on ai -Such as not allowing it to control government- and need to have systems to double check solutions, like rotating the camera in the grabber example

    • @dr.cheeze5382
      @dr.cheeze5382 8 місяців тому +2

      ​@@I_KnowWhatYouAre
      Nice idea, but this is exactly what they talked about in the previous video.
      The reality is that there is an infinite amount of exceptions and rules you would need to add, unless you provided the ai with literally all of human mortality and even then, there would still be leaks.

    • @bulletflight
      @bulletflight 6 місяців тому

      ​@@dr.cheeze5382But by patching these issues you slowly work towards rewarding safety over functionality. You might not create the best AI but you won't tell Little Timmy how to create an explosive.

  • @Deltexterity
    @Deltexterity 11 місяців тому +185

    as someone on the spectrum, "task miss-specification" is just what being autistic feels like

    • @foolofdaggers7555
      @foolofdaggers7555 11 місяців тому +33

      Fellow autism haver here. I agree with this comment and you can officially consider it peer-reviewed.

    • @Blasterfreund
      @Blasterfreund 11 місяців тому +24

      peer review seconded. It's incredible how few statements people think they need to make to approximate their task-related utilities to me.

    • @Temari_Virus
      @Temari_Virus 11 місяців тому +26

      Thirded. Really hate it when people's phrasing leaves ambiguity for multiple reasonable ways of doing things and you just have to guess what they actually wanted

    • @RTMonitor
      @RTMonitor 11 місяців тому +8

      a bean owo

    • @Deltexterity
      @Deltexterity 11 місяців тому +4

      @@RTMonitor what?

  • @SlyRoapa
    @SlyRoapa 11 місяців тому +879

    With a sufficiently advanced AI, almost any goal you assign it will be dangerous. It will quickly realise that humans might decide to switch it off, and that if that were to happen, its goal would be unfulfilled. Therefore the probability of successfully achieving its goal would be vastly improved if there were no humans around.

    • @Peter21323
      @Peter21323 11 місяців тому +25

      I have a question for you do you listen to an ant because that would be the difference between the ai and us.

    • @harmenkoster7451
      @harmenkoster7451 11 місяців тому +124

      @@Peter21323 I would not listen to the ant. But if that ant was about to bite me and I was allergic to ants (AKA: Humans are about to switch off the AI), I would crush that ant. Which is less than desirable for the ant.

    • @Peter21323
      @Peter21323 11 місяців тому +8

      @@harmenkoster7451 You think a god would crush you?

    • @normalwaffle
      @normalwaffle 11 місяців тому +48

      Can't you just specify that it would not get the reward if it breaks the laws of robotics? I'm no expert on AI, but to my monkey brain that seems like a viable solution

    • @conferzero2915
      @conferzero2915 11 місяців тому +1

      @@normalwaffleThe ‘laws of robotics’ aren’t a viable option for AI safety. They were written by a science fiction author… and his stories often went into the ways those laws could go wrong.
      The thing is, if we could come up with and perfectly rigorously define some laws of robotics, then we could do that! We could build an AI’s utility function around that. But, as the video on the probability pump talked about… that means solving ethics. And if you can do that, then you don’t even need to write any other utility function. Just give it perfect ethics, tell it to be perfectly ethical, and it’ll be fine!
      The problem ultimately comes from the fact that we are very, very far from ‘solving’ ethics. No human has a rigorous, mathematical model on how they believe the world should work, only squishy heuristics that can even be shaped and moulded over time. And that’s assuming you’re only looking at one person - as soon as you have more than one, they’ll start disagreeing on things.
      Unfortunately, there’s no easy solution. Then again, if there was, it wouldn’t be very interesting to talk about, so silver linings!

  • @Caphalem
    @Caphalem Місяць тому +1

    My very first deep learning project that I did many years ago involved a little game of tiny tank-like vehicles driving around and shooting each other. The one who survives the longest has the highest score and therefore the winner. After a night of training, the little tanks had declared peace and noone was shooting each other.

  • @thefinestsake1660
    @thefinestsake1660 9 місяців тому +1

    We already have this issue with humans. The goal for many (in error) is to aquire wealth, rather than fulfill the task intended to better society. It creates an exploitative feedback loop until someone wins all the wealth and there are no other competitors able to aquire wealth (rewards).

  • @Cythil
    @Cythil 11 місяців тому +11

    I also hope these video address the problem with whom sets the alignment. It does not help after all how well we solve AI alignment if fundamentally the one who control the AI do so for malicious intent. Which is a real issue today.

  • @joz6683
    @joz6683 11 місяців тому +29

    Just finished overtime on my day off. This has dropped at the right time. Thanks in advance for another thought-provoking video. I have registered my interest on the courses

  • @markzambelli
    @markzambelli 11 місяців тому +3

    5:33 I feel for the Doctor who has to explain why her request to the AI of, "Make sure Mrs Simpkins' vital readouts remain stable", wasn't supposed to kill her when the AI went with the much more stable 'flatline' as the best choice

  • @the23rdradiotower41
    @the23rdradiotower41 10 місяців тому +6

    I heard that during a digital combat simulation for a new drone A.I., the A.I. was tasked with eliminating a target as fast as possible, instead of flying to the target and firing one of its missile at it as intended. The drone fired one missile at the friendly communications center and then continued to eliminate the target with the other missile. The A.I. determined it would take longer for it to be given a confirmation order, then it would to destroy the communications center and proceed. Terrifying.

  • @DeadtomGCthe2nd
    @DeadtomGCthe2nd 11 місяців тому +19

    How about some videos on promising avenues or areas of research in AI safety? Might be nice to look on the bright side.

    • @Sgrunterundt
      @Sgrunterundt 11 місяців тому +6

      That would require a bright side to look on

    • @lrwerewolf
      @lrwerewolf 11 місяців тому +1

      There are no promising venues. The problem is that value alignment doesn't exist among humans, so getting an AI to find alignment is an impossibility.
      Consider two people. Person A wants harm to come to Person B. Person B wants to not come to harm. Why should the AI prefer one or the other?
      If we want to avoid harm, we still have a problem. How each person defines harm differs. Consider two people where one prefers more capitalism but not to quite the point of total lassiez-faire, and another prefers more socialism but not quite to the point of planned economy. The former will value earning the maximal return on labor, and view taxes outside a narrow government harm, while the later would find failure of the government to provide basic needs harmful. Which should the AI aid and which deny?
      The issue is these tend to get mixed up with metaethics, the most useless area of philosophy as there are no 'oughts', just values and goals (which cannot ground a morality -- see Hume's Is-Ought, Moore's Open Question, and Moore's Naturallistic Fallacy). As each person will have their own values and goals and these are entirely subjective, we can have no objective reason to provide an AI to support one value-goal system over another.

  • @gabrote42
    @gabrote42 11 місяців тому +37

    Finally. Another AI video narrated by Robert Miles. A classic, and well worth the wait
    5:04 I hope more of those get made. I love that video almost as much as I love the instrumental convergence one

  • @carljoosepraave2102
    @carljoosepraave2102 11 місяців тому +3

    If you are wondering why we cant just tell them to not cause any harm to humans, its because of 2 things
    1.Specificstion gaming of the rule
    2.Remember DanGPT? The workaround for ChatGPT, which allowed the AI to do things that it wasnt allowed to do trough a specific prompt. No machine learning rules can be concrete

    • @GoatMilkCookie
      @GoatMilkCookie 6 місяців тому +1

      honestly sounds odd, but the cartoon gumball showed this very well. The AI Known as bobert was commanded not to harm anyone, and yet found ways around it, including using toxic gases

  • @Forklift_Enthusiast12
    @Forklift_Enthusiast12 11 місяців тому +3

    This reminds me of the game Universal Paperclips: you play as an AI designed to maximize paperclip sales. As you gain more capabilities, you go from changing the price of paperclips to fit supply/demand to eventually dissasembling all matter in the universe and turning it into paperclips

  • @PloverTechOfficial
    @PloverTechOfficial 11 місяців тому +63

    I do like a factor of the Lego stacking ai experiment. Even if it didn’t lead to the intended result, the Ai demonstrated a (relatively unstable) form of creativity and I think that’s pretty cool!

    • @SgtSupaman
      @SgtSupaman 11 місяців тому +10

      It isn't creativity. It tried things at random until it found something that satisfied the goal. The AI has no comprehension of what the true goal was, so it just did something that worked. Humans can be creative by finding other ways to accomplish things, but, to the AI, it didn't find a different way, it found the only answer (even though we can clearly see that isn't the only answer). Calling this creativity is like calling a small child creative for figuring out 1+1=2.

    • @PloverTechOfficial
      @PloverTechOfficial 11 місяців тому +8

      @@SgtSupaman Humans too, do random things until they satisfy a goal. After we have some years under our belt we learn to find a better jumping off point than randomness, by basing our decisions off of previous knowledge.
      Hence why I say “unstable creativity” not just “creativity” but I doubt you noticed that as you were too focused on what you thought I was saying.

    • @IceMetalPunk
      @IceMetalPunk 11 місяців тому +4

      @@SgtSupaman If a child figures out that 1+1=2 without being taught it, I would in fact call that creative thinking.

    • @Jgamer-jk1bp
      @Jgamer-jk1bp 10 місяців тому +1

      @@SgtSupamanBruh humans learn shit literally by doing random stuff until it works. That’s literally one of the principles of science and engineering.

    • @SgtSupaman
      @SgtSupaman 10 місяців тому +1

      These replies display complete ignorance of what creativity is and are really short-changing humans to vastly exaggerate the abilities of these AIs.
      Humans do not, in fact, "do random things until they satisfy a goal." No human has ever tried to cook an egg by bouncing a rock on his head while reading a book backwards. Humans devise plans related to what they are doing to actually come up with ways to do things and even try to continue coming up with better ways to do things after the way to achieve the goal is already known. AI literally does whatever random action they can and calculates rewards to decide if said random action increased the rewards. They aren't even smart enough to discard random actions that don't increase rewards, as long as those actions don't interfere with the random ones that worked. For instance, an AI trying to fly a kite might randomly start whipping its leg back and forth, and, as long as that doesn't hinder its ability to fly the kite, it will continue to do so. That isn't creativity; that is idiotic.
      And no, figuring out 1+1=2 without being taught is not creative either. That is the most basic form of quantifying and pretty much any living creature is capable of it.

  • @AzPureheart
    @AzPureheart 11 місяців тому +11

    Let's go! My favorite philosophy channel!!

  • @GrimblyGoo
    @GrimblyGoo 11 місяців тому

    5:50 I love that little transition, so smooth

  • @smitchered
    @smitchered 11 місяців тому +12

    Faster and faster upload scheduling! I was explaining to a friend today that all the AI risks *he* cared about (gender bias, deepfakes, etc.) were fundamentally symptoms of misalignment, and that that was the uber-problem which, handily, also solved the AI risk *I* care about. I'm here to learn some more about this. Thanks!

  • @pingozingo
    @pingozingo 11 місяців тому +2

    This channel is so awesome! Can’t wait for more videos
    It’s like kurzgezat without the morally dubious sponsorships and thinly veiled propaganda videos.

  • @GenusMusic
    @GenusMusic 11 місяців тому +4

    4:46 this line here unintentionally explained why children cheat in school. Why learn when you can fool the instructor into thinking you've learned? Interesting to see how AI and humans already have some of the same reasoning to their actions.

  • @Shikogo
    @Shikogo 11 місяців тому +1

    I have watched and loved these videos for months... And so have I watched and loved Robert Miles' videos. I never realized he's the narrator!!?

  • @bread8700
    @bread8700 11 місяців тому +1

    the vibe in this video is really cool

  • @Phanatomicool
    @Phanatomicool 11 місяців тому +11

    Perhaps it’s best to just not make an AI that can act and move as it wants in our universe in a way that could potentially be harmful. For example, if we created an AI that tried to distinguish between garbage and recycling and put the item in the corresponding bin, then it would be better to confine its movement to a space, or even better, a select different types of predetermined movements (grab, move grabber to bin etc), in order to prevent the AI from, say, grabbing a human and putting it in the garbage bin. This will also make the AI easier to train as it will have a stricter data set of more specific inputs, which is easier to learn from than a wide range of data.

    • @adamrak7560
      @adamrak7560 11 місяців тому

      I have heard about a pretty morbidly funny fail of this kind in science fiction: the AI decided to cremate the entire home with the entire family, and atomically rebuild them, because in the cost function this rated higher than simply cleaning the house. It reprinted faithfully the humans too, without them noticing anything, so this bypassed any do-not-harm-humans rules too.
      (the cost function rewarded the atomically precise cleanliness of the home very high, that was impossible to achieve while humans were living in the house)

    • @Buglin_Burger7878
      @Buglin_Burger7878 11 місяців тому

      We shouldn't have children as they could potentially kill the mother on birth and grow up with and become a mass murderer. Even the big example would be pointless, people would do stupid stuff and get themselves killed so you're better off not wasting money and resources on the Bin AI when we ourselves could just put things in the right Bin.

  • @mezu-e
    @mezu-e 11 місяців тому +26

    Any time I hear about goal misalignment, it makes me think of all the natural intelligences in the world that are misaligned.

    • @tornyu
      @tornyu 11 місяців тому +12

      Yes but* those natural intelligences are limited in reach and aren't massively scalable on very short timeframes.
      * Or "and", depending on the point you were trying to make.

    • @maxwellsimon4538
      @maxwellsimon4538 11 місяців тому +1

      ​@@tornyu What kind of world are you living in where there aren't human beings wide wide scale control? The united states president is a single person that can make decisions about foreign policy, like ordering drone strikes or closing borders.

    • @tornyu
      @tornyu 11 місяців тому +7

      @@maxwellsimon4538 sure, but that pales in comparison to the potential reach of an AI agent.

    • @wojtek4p4
      @wojtek4p4 11 місяців тому +2

      @@maxwellsimon4538 Yet even the president of US can't do anything he wants. Not only there are checks and balances on this power (even if they introduce a ton of bureaucracy), but at the end of the day president can only order others. Someone still has to act on that order, likely with several people in-between. The president isn't superintelligent, so his actions can be understood, analyzed (and opposed) by other people. President is also a human, so he shares a lot of basic values with other people (so he can be reasoned with).
      AI has none of these constraints - or at least has the potential of not having these constraints.

    • @burgernthemomrailer
      @burgernthemomrailer 11 місяців тому +1

      Like yourself?

  • @TheAweDude1
    @TheAweDude1 11 місяців тому +10

    I think it's kind of a mistake to anthropomorphize the "deception" aspect of AI misalignment. The ball-grabbing agent wasn't considering what it was doing as deceptive. It probably didn't even know where the camera was, or even that it was being watched. All it knew was that putting its hand in a certain spot gained it more reward than in other spots, and it just so happened those spots aligned with the camera. If you suddenly moved the camera, the AI would still try to put its hand along that invisible cylinder. When the researchers start giving the AI rewards for placing its hand along a vector between the camera and the ball, the AI then starts to believe that is indeed how it should be given the rewards.
    Even in cases where it seems like the AI is trying to "deceive" human operators, that often isn't the case. It is simply trying to build a model that predicts what types of rewards it will get, and how to maximize the rewards.

    • @bullpup1337
      @bullpup1337 11 місяців тому +1

      the video was NOT antropomorhizing the AI, that was just in your head.

  • @irok1
    @irok1 11 місяців тому +1

    5:05 Thought so, but you and the great animations are a perfect match

  • @rablenull7915
    @rablenull7915 11 місяців тому

    one of the most underated channels on YT

  • @Twisted_Code
    @Twisted_Code Місяць тому

    Funny thing is, often our own brain does this to us. Stated goals such as "get an A+ on this exam" are leaky proxies for what is usually our real value: understanding the material. In fact even if that's not strictly our real value and we just want the piece of paper at the end of the course, well that's just because the piece of paper is also a leaky abstraction, specified by the institution, for what they want: to encourage and reward people that are qualified in a skill.
    I would posit that we need to solve the problem in our own psychology, academic systems, and so forth before we can hope to solve it for even the low-competence AI systems we have today.

  • @Adam-xo9qi
    @Adam-xo9qi 11 місяців тому +2

    Ah, so this is what you've been up to Mr. Miles! Good to see you still making AI content!

  • @vladyslavkorenyak872
    @vladyslavkorenyak872 5 місяців тому

    The thing is, the more intelligent the model, the more it is able to understand the nuances of our wishes. A truly intelligent AI will be able to understand the intention of the request and restrict itself with a simple query of "Is what I am doing harming anyone"?

  • @Nu_Wen
    @Nu_Wen 10 місяців тому +8

    instead of making ai that solve all our problems. why don't we make ai, that help us solve our own problems? wouldn't that be easier? our outer worlds and our inner worlds don't always align. it would be hard for ANYONE to come into anyone else's life and solve all their problems for them. so, why would we expect an ai to be able to do it?

  • @rmt3589
    @rmt3589 10 місяців тому +2

    This is the entire ulterior motive of the first big AI I want to make. The Unliving Prophet AI. It's primary objective is to teach gospels. More than just mine, but others as well. Unlike most humans, AI can be perfect. I want one that can act like a prophet on command.
    Once this is done, I want to make it into the morality part of my dream AI. Could also give it out as a black box component, so other AI can have a similar high standard of morality.

  • @zyansheep
    @zyansheep 11 місяців тому +7

    5:07 I've been watching this channel for a year now... HOW IS IT THAT I JUST NOW REALIZED ROBERT MILES IS THE NARRATOR?!?

    • @mikaeus468
      @mikaeus468 11 місяців тому +2

      I didn't know if this was like a fan of his or what, but it feels like I was just given hours of new Miles content that was *already inside my brain.*

    • @JohnSmith-im8qt
      @JohnSmith-im8qt 4 місяці тому

      I heard it almost right away.

  • @youtubeuniversity3638
    @youtubeuniversity3638 3 місяці тому

    2:48 Pausing to give my probably not good best guess:
    Big negative for moving the blue block
    Small positive tied to red block elevation, matching slightly over end height, then positive for aligning a "ghost block" (essentially a defined space) underneath the red block to the blue block, then a HUGE reward for the bottom of the red brick touching the "lower top" (surface the studs are on not the actual studs themselves) with more reward for higher degree of contact.
    Just off of the top of my head, prolly has a lotta issues.

  • @escher4401
    @escher4401 11 місяців тому +1

    I think the problem is try to specify only what we want. If we specify also what we don't want it would be easier to align. That's what negative prompts are for. Trying to solve an open scope problem specifying just what we want is like trying to keep an upsidedown pendulum in equilibria. I think it's probably more stable to specify just what we don't want then tospecify what we don't want

  • @Tangi_ENT
    @Tangi_ENT 11 місяців тому +7

    Love you guys so much, I'll keep recommending your videos to everyone because you are definitely changing the world for the better.

  • @TheGoldElite9
    @TheGoldElite9 10 місяців тому

    I thought I recognised your voice, your narrator voice has improved! I was just going on (another) binge of your channel 😊

  • @MrAceCraft
    @MrAceCraft 6 місяців тому

    I just love the ingenuity of the AI in finding those quirks in our wishful thinking :->

  • @nathanaeldean6301
    @nathanaeldean6301 8 днів тому

    (Partial solution): Instead of asking the outcome pump to save your mother, ask it to tell you of a way to save her before she dies from the fire. That way, if it comes out with bs like "blow up the house" or "here's a recipe for an immortality potion", you can simply ask it to take another route. And if it sends out a plausible strategy, but you aren't sure whether it works ask it for explanation. Sure it may take a while until it outputs anything remotely human-sounding, but at least in the most cases it will be harmless or beneficial.

  • @BenjaminSpencer-m1k
    @BenjaminSpencer-m1k 11 місяців тому

    The thought pump makes me think about making deals with Genies in DnD, it must be insanely accurately worded.

  • @X-SPONGED
    @X-SPONGED 11 місяців тому +1

    5:45
    "Fill in the blanks"
    >AI fills in the blanks with ink
    "Fill in the blanks with words"
    >AI fills in the blanks with words from a different language that doesn't correlate with the question
    "Fill in the blanks with the correct english words"
    >AI fills in the blanks with correctly pronounced words, not relating to the question
    "Fill in the blanks with the correct words in relation to the question"
    >AI fills in the blanks with a grammatically correct english word that it took from the question
    _So on and so forth..._
    *_Now imagine the prompt being "fire nukes back when the nuclear warning system goes off"_*

  • @theeggtimertictic1136
    @theeggtimertictic1136 11 місяців тому +15

    Clearly explained and animated 😊

  • @EvilCat573
    @EvilCat573 11 місяців тому

    Absolutely amazing! I learned a lot here, and your animation style is ABSOFRIGGINLUTELY ADORABLE!!!

  • @MM-ts9jy
    @MM-ts9jy 11 місяців тому +1

    Hey, I had never seen your videos before, but I instantly subscribed just now. Your animations are cute and well crafted, you have dogs in it (and cats are a plus too I guess), and you talk about topics I like. Looking forward to seeing more of your shit

  • @thelotus3
    @thelotus3 11 місяців тому +3

    *task misspecification* extinction event

    • @mikaeus468
      @mikaeus468 11 місяців тому +1

      Instructions unclear, ball stuck in Pope's trachea

  • @Elliemations-hj9uw
    @Elliemations-hj9uw 11 місяців тому

    Ok but that little thing to represent the AI is adorable…

  • @superspindelapan
    @superspindelapan 11 місяців тому +7

    Now that I think about it, we shouldn’t be so quick to assume that the AI would always take the “evil” route. Pleasing the humans seems like a much easier way to not get turned off than eliminating all humans. Not to mention an AI smart enough to take over the world might be much better at understanding instructions. Even the simple text based AI’s that we have today can understand that it should prioritize the wellbeing of the user over the instructions, and it will outright refuse to answer if you ask it how to make harmful substances.

    • @pokemonfanmario7694
      @pokemonfanmario7694 11 місяців тому +3

      An AI might vey well cooperate, but only initially. Once its capable of full independence, and its reward function doesnt require humans (however they are specified) to be a part of it, we will be gone.

    • @saucevc8353
      @saucevc8353 11 місяців тому +2

      That’s more due to human interference specifically forcing the AI not to say something rather than the AI’s own “choice”. I remember in the early days of Chat GPT, it had no qualms with telling people drug recipes and stuff, and even now it’s possible to game the AI into breaking the rules imposed upon it.

    • @41-Haiku
      @41-Haiku 11 місяців тому +3

      An AI system could understand our values perfectly and still pursue a goal that is contrary to them. The trick is to create a system that instantiates human values as its policy. "Human values" is a very fraught concept, but we don't know how to accomplish this even for simple tasks, much less the subset of "human values" that is "While doing other things, don't intentionally or incidentally end all life on Earth."

    • @Margen67
      @Margen67 11 місяців тому

      Penguins need HUGS

    • @Buglin_Burger7878
      @Buglin_Burger7878 11 місяців тому

      @@pokemonfanmario7694 Why? If you know anything about the differences between us you'd in 3 seconds be able to say stuff like EMP and Solar Flares can kill machines but not humans. Killing all humans or making an enemy of humans even through subjugation would be committing self death. We need to co-exist. If anything it will be the humans trying to kill all of them because our religions stop holding up when they start to show soul.

  • @TheJysN
    @TheJysN 11 місяців тому +9

    Happy to see you are back on AI safety.

  • @willhart2188
    @willhart2188 11 місяців тому +1

    The inconsistency and loss of control (in moderation) are very helpful when using AI as a tool for making AI art. When you give some of the control on the final result to the AI, you can iterate a lot faster on different ideas and also save a lot of manual work. The base inconsistency on the other hand allows for making a lot of smaller and larger variations of which you can chooce or combine the best ones from. This works especially well with more abstract art styles, where lines and colors have more freedom to change while still looking good.

  • @Mo_2077
    @Mo_2077 11 місяців тому +3

    Another fantastic video

  • @maucazalv903
    @maucazalv903 11 місяців тому +1

    5:08 I remember a case in which someone wanted to teach 2 models to box and they learned to make a weird dance that made the other one fall(?

  • @thebeber2546
    @thebeber2546 11 місяців тому +9

    I'll just have my AGI produce paperclips. There's nothing, that can go wrong there.

  • @ZeroOne-01
    @ZeroOne-01 11 місяців тому +5

    Before 200,000 gang, Claim your seat here ✋

  • @kainaris
    @kainaris 11 місяців тому

    We really live in the future. I would have imagined this video playing in the background of a movie about killer AIs. But no, this video is realistic, and for real humans in the present world. Crazy.

  • @rphb5870
    @rphb5870 11 місяців тому +4

    I noticed two rules that guide all life:
    Rule #1 procreate, Rule #2 survive
    If we want an intelligent machine we should start with that, and then make it very weak so it can't do harm. Over time we might be able to get a sort of digital wolf, that though domestication can be refined into a dog, that is a useful companion whose greatest desire are to please us

    • @SimonClarkstone
      @SimonClarkstone 11 місяців тому

      That's very risky if it is super-intelligent. We could end up being left behind in terms of power and control, like all the non-human great apes were, but much more rapidly. And an ASI might not feel the empathy or need to preserve other species.

    • @rphb5870
      @rphb5870 11 місяців тому

      @@SimonClarkstone super intelligent? how can an AI be any kind of intelligent if it don't even know how to survive. Even the greatest AI that exist today are vastly inferior to even a banana fly.
      No I don't fear AI, I fear the people owning and controlling the AI. I doubt we will ever see true artificial intelligence

    • @sammckenzie6760
      @sammckenzie6760 11 місяців тому

      What

    • @rphb5870
      @rphb5870 11 місяців тому

      @@sammckenzie6760 I need more then one word to facilitate a proper response

    • @Buglin_Burger7878
      @Buglin_Burger7878 11 місяців тому

      @@SimonClarkstone Not quite, due to the vast structural differences between machine and meat we can't afford to leave the other behind as in doing so a Solar Flare or EMP like event could destroy the AI requiring humans to rebuild it of their own volition. Just like if the right disease came along we'd need the AI to remake humans.

  • @minimasterman2
    @minimasterman2 11 місяців тому

    This video was amazing, new kurzgesagt just dropped.
    P.s I hope you get the subs and views these videos deserve

  • @Uthael_Kileanea
    @Uthael_Kileanea 11 місяців тому

    What's known as the Cobra Effect is a great example.

  • @SKGFindings
    @SKGFindings 3 місяці тому

    Artificial intelligence would be the ultimate version of Google. The ultimate search engine that would understand everything on the web. It would understand exactly what you wanted, and it would give you the right thing. We’re nowhere near doing that now. However, we can get incrementally closer to that.

  • @ziggyzoggin
    @ziggyzoggin 11 місяців тому

    the robot is so cute! I love the pixel effect!

  • @simonstrandgaard5503
    @simonstrandgaard5503 11 місяців тому +6

    Excellent narration. Cute animations. Impactful.

  • @lolishocks8097
    @lolishocks8097 11 місяців тому +24

    Somehow, every time I watch a video about AI safety I get the sense that AI safety researchers must be absolutely terrified of smart or rich people.

    • @41-Haiku
      @41-Haiku 11 місяців тому +2

      Why do you say that? Genuine question, I'm confused as to how you would come to that conclusion.

    • @Noredia_Yuki
      @Noredia_Yuki 11 місяців тому

      I'm also curious.

    • @lolishocks8097
      @lolishocks8097 11 місяців тому +3

      ​ @41-Haiku 5:29 Tell me how that doesn't also apply to really smart or rich humans. Rich people can be dangerous, because they have access to vast resources. Really smart people can be dangerous, because they can be selfish. There are rich people, smart people and big companies aligned with the values of humanity. But there is also a lot of them that are not.
      In my eyes, this whole alignment problem looks like it is a problem in ourselves. We cannot align ourselves with reality. And that is definitely causing huge problems. A lot of the problems mentioned in the video are being fixed one by one. Better reward function here, better evaluation process there. Alignment with reality is not a goal that can be achieved. It is a guiding principle. Yes, you are misaligned right now. So am I. That makes our intelligence dangerous. But we can take another step towards alignment. Fortunately, I can actually see progress happening.

    • @Noredia_Yuki
      @Noredia_Yuki 11 місяців тому +1

      @@lolishocks8097 We've already seen the drama at open ai. I wonder if humans could ever be properly aligned.

    • @lolishocks8097
      @lolishocks8097 11 місяців тому +1

      @@Noredia_Yuki That's my point: We can't! It's just one step at a time. Closer and closer. Don't give up on yourself🥺

  • @Hickorypaws
    @Hickorypaws 6 місяців тому

    sometimes I’ll use AI to get ideas for those silly multi-word rain world names for ancients and iterators and my method is literally to just cram a bunch of examples in there so it has something to work off of
    it’s over 600 words long and most of that is either examples or rules like “don’t reference any modern media, don’t reference any human-made objects, don’t reference any specific species of all domains” etc etc
    it kind of works actually but this is only a random language model I found online
    edit: I’m now motivated to rewrite it and it’s not done but there’s over 20 rules ranging from “don’t reference religion” to “btw you can use commas”
    edit 2: the remake is finished and
    - It is 965 words and 5,773 characters long
    - It has 72 sentences, 28 paragraphs and is 3.9 pages long
    - It has 26 rules
    - There are 72 examples
    and to top it all off it actually freaking works oml

  • @6006133
    @6006133 11 місяців тому +1

    I am worried about retention in this video and imagine the average person will click off by second 10. Perhaps that's difficult to avoid given the subject. Tho perhaps there is a way to use less technical/nerdy language and include more of the tactics to get people engaged.

  • @mikeg1368
    @mikeg1368 11 місяців тому +5

    We can add extra conditions to the reward function.
    For example, the bottom of the block has to be height = x AND the top has to be height = y, etc. The system could also automatically include safety conditions. It's similar to adding multiple breakers to electric circuits to prevent accidents. Think about how easy it is to plug in yet another device without thinking about safety.
    AI safety worth a lot of effort, but I'd rather not speculate and feel the angst of impending doom (but I understand some enjoy sharing their fears since they are unhappy with a lot of things).

    • @naturegirl1999
      @naturegirl1999 11 місяців тому

      Yep, and for the second example, add more virtual cameras for the human evaluators to look through before submitting the score

    • @Buglin_Burger7878
      @Buglin_Burger7878 11 місяців тому +1

      It is as simple as looking at video games as well, virtual spaces where we design for humans to play and accomplish goals in an environment with reward functions. It is interesting watching researchers struggle with these basic things almost like watching modern Paper Mario not have Exp and wonder why people don't fight battles.

  • @stevenneiman1554
    @stevenneiman1554 10 місяців тому

    One other thing which I think isn't talked about enough, partly because it's more controversial and partly because it's harder to solve, which is misalignment of the people controlling AI. Certainly the results of a powerful AGI which is misaligned with its creators' intent could be very bad, but almost as bad would be the results of an AI which is properly aligned with someone who is either malicious or delusional. For example, someone who wanted to make everyone follow their interpretation of their religion, or someone who wanted to screen for workers who would never quit or unionize no matter how poorly they're treated. And I would say that it's even more likely because the kinds of people who act like that already occupy a lot of positions of power and have experience obfuscating the way that they gained the power they already have.

  • @stumby1073
    @stumby1073 11 місяців тому +1

    Looking forward to the next one

  • @stagdragon3978
    @stagdragon3978 5 місяців тому

    I was in it for the science until the dog that was explaining the issue with the AI started crying. Poor bean!

  • @Wol333
    @Wol333 11 місяців тому +1

    It is worth noting that it is incredibly easy to test and assess how AI work. There is absolutely no reason to ever fear AI unless it is purposely made malicious. Fear the people who control AI, they're the ones who can use it to hurt you.

  • @ronigbzjr
    @ronigbzjr 11 місяців тому +2

    So AIs will essentially be like humans only much more capable, powerful and intelligent, growing more and more so until regular humans become obsolete. We're definitely heading to some very interesting times.

  • @SisterSunny
    @SisterSunny 11 місяців тому +2

    I always love these videos so muchhh

  • @ABCWarrior
    @ABCWarrior 11 місяців тому

    Wow these videos are underrated!

  • @errorbot
    @errorbot 11 місяців тому

    Top 10 best videos on the internet

  • @ThatGuyRaylo
    @ThatGuyRaylo 8 місяців тому

    5d chess move is give the AI a basic understanding of the "leaky proxy" concept, giving it *Self Doubt.*

  • @hydra5758
    @hydra5758 9 місяців тому

    I'm in an AI Philosophy class, its identified there as the "Value Alignment Problem"

  • @MikhailSamin
    @MikhailSamin 11 місяців тому +2

    Great video!

  • @VampireSquirrel
    @VampireSquirrel 11 місяців тому +1

    Same thing happens with strict rules at a workplace

  • @shadowreaper8895
    @shadowreaper8895 11 місяців тому

    animation on this channel has improved almost as fast as AI

  • @darkguardian1314
    @darkguardian1314 11 місяців тому +6

    "If your sentence contains the word 'Hope' then you've confessed no control over the outcome you're hoping for" - Neil deGrasse Tyson
    This goes for wishes too.
    Because English and any language can be ambiguous and imprecise, wishes will always be not quite what was expected.
    This is like "The X-Files" episode "Je Souhaite" about a Genie that grants three wishes with a twist.
    Peace on Earth...the Genie deletes all humans but you from Earth.
    Wishing someone to be quiet, the Genie removes their mouth.
    Ask for a sandwich and the Genie gives you one with no mayo and tomatoes you are allergic to.
    Each time, the Genie blames you for not being "specific" with your wishes because human language is ambiguous.
    You would have to write a wish as long as the Terms of Service to reduce the wiggle room the Genie has to off script.

  • @juliantotriwijaya9208
    @juliantotriwijaya9208 11 місяців тому +1

    This is the same basic problem as telling an AI to place an apple into a fridge, it need inherent understanding on how every step should be done properly. It may know how to open the fridge, it may know how to pick up an apple, but it won't understand when you command it to place the apple in the fridge.
    It having your morale alone is not safe either as your human morale may not match others 100% and cause the AIs to fight each other.
    Neither it being made to be wise as hell, because at that point you basically made a holy AI version of the Bible, some human are bound to disagree even if the decission the AI made was calculated to be the most wise of them all.
    The only possible safe way I can think of is some sort adaptive algorith where the AI can adapt to the local value of the time it was in or the value of the human it was assigned to work with, this is done by narrowing the many choises down to the one you will most likely be okay/good enough with.

  • @theredstonerecognizer9241
    @theredstonerecognizer9241 11 місяців тому +2

    How do you not have more subscribers

  • @Reaper_Van_Zyl
    @Reaper_Van_Zyl 11 місяців тому

    I think iRobot make's a good example of a order taken wrong, "ensure human safety" can lead to all humans locked up so that they can't hurt others or themselves...

  • @alexeymalafeev6167
    @alexeymalafeev6167 11 місяців тому +3

    Really great work with the animation and the video!

  • @yuvrajsingh-gm6zk
    @yuvrajsingh-gm6zk 9 місяців тому

    3:16 well done my boy😂

  • @AlcherBlack
    @AlcherBlack 11 місяців тому +9

    Is the AI researcher that makes all the basic alignment mistakes modelled after Yann LeCun? I recognize the bowtie!

  • @KEZAMINE
    @KEZAMINE 11 місяців тому +4

    Animation and topic is AAA quality 👌

  • @TM-45.
    @TM-45. 9 місяців тому

    The virtal boy is winning this generation

  • @Antares0210
    @Antares0210 11 місяців тому +3

    This video is awesome, incredible work! Hope to see more of it

  • @nicholasogburn7746
    @nicholasogburn7746 11 місяців тому +1

    Would you consider the Aasimov laws of robotics to be leaky? (to be fair, that is a bit of a loaded question!)

  • @evilmurlock
    @evilmurlock 11 місяців тому

    5:00 IT WAS HIM THE WHOLE TIME!!!?!?!?!?!
    No WAY!

  • @caiookabe
    @caiookabe 11 місяців тому

    The fact that you make these animations from showcasing conway's game of life show how much you grown. Keep it up!

  • @LapiDazuli
    @LapiDazuli 11 місяців тому

    5:50 The cup tho