Eliezer Yudkowsky - AI Alignment: Why It's Hard, and Where to Start

Поділитися
Вставка
  • Опубліковано 15 тра 2024
  • On May 5, 2016, Eliezer Yudkowsky gave a talk at Stanford University for the 26th Annual Symbolic Systems Distinguished Speaker series (symsys.stanford.edu/viewing/e....
    Eliezer is a senior research fellow at the Machine Intelligence Research Institute, a research nonprofit studying the mathematical underpinnings of intelligent behavior.
    Talk details-including slides, notes, and additional resources-are available at intelligence.org/stanford-talk/.
    UPDATES/CORRECTIONS:
    1:05:53 - Correction Dec. 2016: FairBot cooperates iff it proves that you cooperate with it.
    1:08:19 - Update Dec. 2016: Stuart Russell is now the head of a new alignment research institute, the Center for Human-Compatible AI (humancompatible.ai/).
    1:08:38 - Correction Dec. 2016: Leverhulme CFI is a joint venture between Cambridge, Oxford, Imperial College London, and UC Berkeley. The Leverhulme Trust provided CFI's initial funding, in response to a proposal developed by CSER staff.
    1:09:04 - Update Dec 2016: Paul Christiano now works at OpenAI (as does Dario Amodei). Chris Olah is based at Google Brain.
  • Наука та технологія

КОМЕНТАРІ • 380

  • @Renvaar1989
    @Renvaar1989 Рік тому +100

    Listening to this hits differently in 2022/2023...

    • @Zgembo121
      @Zgembo121 Рік тому +4

      C u in 2040, may sound worse

    • @MeatCatCheesyBlaster
      @MeatCatCheesyBlaster Рік тому +8

      @@Zgembo121 can confirm, am paperclip

    • @Zgembo121
      @Zgembo121 Рік тому +2

      @@MeatCatCheesyBlaster many paperclip?

    • @lemurpotatoes7988
      @lemurpotatoes7988 Рік тому +3

      Major points for predicting that AI would solve protein folding even though he thought it'd need to be AGI.

    • @SalamAnuar
      @SalamAnuar 7 місяців тому +1

      ​@@Zgembo121all paperclip

  • @killyourtvnotme
    @killyourtvnotme Рік тому +37

    It’s tough watching this knowing that he’s essentially given up and sees the situation as hopeless now

    • @kayakMike1000
      @kayakMike1000 7 місяців тому

      I am not surprized. This fat dumbass isn't even aligned with his own best interests. Most of us are not aligned properly. That's why most people are fat and lazy, or worse alcoholic or drug addicted losers.

  • @aktchungrabanio6467
    @aktchungrabanio6467 Рік тому +65

    Now it's becoming a real problem. Thank you for sharing this talk!

    • @charleshultquist9233
      @charleshultquist9233 Рік тому +4

      Yeah, today in 2023 watching his old vids to see just how spot on Eliezer was/is as a prophet of doom.

    • @StephanieWomack1992
      @StephanieWomack1992 Рік тому +1

      Yep!

    • @martimdesouzajunior7585
      @martimdesouzajunior7585 11 місяців тому +5

      Wrong. It was a problem way before Eliezer gave this lecture, and he told ypu so.

    • @kabirkumar5815
      @kabirkumar5815 11 місяців тому +1

      It was real long ago.

    • @Hjkkgg6788
      @Hjkkgg6788 8 місяців тому

      This has been a real problem for a longgg time everyone is aware of this thing now im not saying AI is bad but its been going on so long now

  • @mraxilus
    @mraxilus 6 років тому +140

    1:11:21 saving this for future reference. No need to thank me.

    • @amcmr2003
      @amcmr2003 5 років тому +13

      You're pretty much the 1st in line when AI kill us all.

    • @amcmr2003
      @amcmr2003 5 років тому +1

      You're pretty much the 1st in line when AI kill us all.

    • @aerrgrey5957
      @aerrgrey5957 4 роки тому

      thanx man!

    • @martinkunev9911
      @martinkunev9911 3 роки тому +4

      already used: ua-cam.com/video/nNB9svNBGHM/v-deo.html

    • @mraxilus
      @mraxilus 3 роки тому +1

      @@martinkunev9911 indeed, hence why I timestamped it.

  • @thillsification
    @thillsification Рік тому +58

    Anyone else watching this in or after april 2023 after Eliezer was on the Lex Fridman podcast? After the release of gpt4 and the coming release of gpt5 😳

    • @demetronix
      @demetronix Рік тому +4

      Seems like gpt5 is not in works ATM. So there is at least that.

    • @TracyAkamine
      @TracyAkamine Рік тому +1

      Yes, me

    • @theterminaldave
      @theterminaldave Рік тому +1

      Yep, pretty watched everything recent that he's done. So i guess I might start watching his older stuff. You should check out a very recent hour long discussion called "Live: Eliezer Yudkowsky - Is Artificial General Intelligence too Dangerous to Build?"
      It's basically just a repeat of everything he said in the Lex interview, but I wanted to see his thoughts since GPT-4 was released.

    • @TracyAkamine
      @TracyAkamine Рік тому

      @@demetronix They said 5 will be done with it’s training by end of December 2023. 😵‍💫

    • @TracyAkamine
      @TracyAkamine Рік тому +3

      @@theterminaldave I watched a lecture he gave about 5 years ago on the subject. At least his conscious is clean- he’s been warning them for years.

  • @PatrickSmith
    @PatrickSmith Рік тому +17

    At 46 minutes, it's like OpenAI, producing smiles for now.

  • @benschulz9140
    @benschulz9140 3 роки тому +115

    It's bizarre how entertaining this is, while at the same time being positively terrifying.

    • @flyingskyward2153
      @flyingskyward2153 Рік тому +7

      How do you feel now?

    • @bonzaicandor5650
      @bonzaicandor5650 Рік тому +5

      @@flyingskyward2153 fr

    • @Lolleka
      @Lolleka Рік тому

      fr fr bros

    • @ldobbs2384
      @ldobbs2384 Рік тому +1

      The Exorcist-level reality.

    • @GungaLaGunga
      @GungaLaGunga Рік тому +7

      @@flyingskyward2153 like we are living the plot of the movie Don't Look Up as of March 2023, except AGI will hit sooner than we thought, instead of an asteriod.

  • @juffinhally5943
    @juffinhally5943 7 років тому +52

    I've discovered the existence of this video today, on new year's day, and it's turned into a nice present.

    • @francescofont10
      @francescofont10 4 роки тому +3

      watching this on new year's eve 3 years later!

    • @JMD501
      @JMD501 4 місяці тому +1

      New years day 7 and 4 years later

  • @kuudereplus
    @kuudereplus 4 роки тому +27

    Putting so much emphasis on how he uses "like" is weird to me; it's clearly a syntax function for his speech to mediate between segments of statements and I processed it in turn without noticing it much

  • @xyhmo
    @xyhmo 3 роки тому +53

    Isaac Asimov was aware that his three laws (as stated) were imperfect, and once had a character criticize them without being seriously opposed or refuted. I believe similar occurred in several stories and was basically an ongoing theme, almost like the frequently broken holodeck.

    • @RandallStephens397
      @RandallStephens397 2 роки тому +21

      Asimov's bread-and-butter was pretty much "Here's a great idea. And now here's everything wrong with it." (3 laws, psychohistory...)

    • @nickscurvy8635
      @nickscurvy8635 2 роки тому +16

      Isaac Asimov's laws aren't imperfect. They just aren't relevant. There is no way to implement them because they don't make sense within the realities of artificial intelligence.
      They are a sci-fi prop. Which is fine. It was an excellent prop. The problem is when people seriously propose props as actual solutions to real world problems

    • @MSB3000
      @MSB3000 Рік тому +1

      I believe HAL 9000's main malfunction was along those lines

    • @monad_tcp
      @monad_tcp Рік тому +4

      weren't the three laws there basically to show how they don't work and this problem is really hard ?

    • @PetrSojnek
      @PetrSojnek Рік тому +1

      @@monad_tcp yeah exactly. After all almost all stories about asimov's laws are basically "these are examples how it doesn't work". Pretty much every time a human must step up to clean up the mess.

  • @meringue3288
    @meringue3288 2 роки тому +2

    Thank you for this

  • @glitchp
    @glitchp Рік тому +9

    He nailed it

  • @vanderkarl3927
    @vanderkarl3927 3 роки тому +20

    Wonderful talk, while it did get a little jargony in places, it was almost entirely able to be followed by my sleep-deprived post-highschool brain, and it was enjoyable!

  • @michaelm3691
    @michaelm3691 Рік тому +14

    Hi guys, I'm the chatGPT intern in charge of alignment. Is this a good video to start with?

  • @rorrzoo
    @rorrzoo 5 років тому +13

    What we do in variational calculus in order to "force the existence of the suspend button" is, we restrict the space of 'trajectories' among which one is maximizing the utility. The question is similar to the problem of finding a curve that goes from point A to point B without touching a set D (the obstacle) while traversing the least possible distance; in that case, you do not consider any sort of 'modified distance function' that would give larger weight to the curves that touch D; you just eliminate those curves among the set of candidates for the minimization, and then you analyze what is the optimal curve among the ones that are left. Thus, instead of using a special utility function, it would be better to find out what the 'obstacle' would be (for example, all trajectories in which the robot does something while its suspend button is pressed) and just remove those possibilities from the set in which the optimization is being carried out. This is not unreasonable: a robot without electric power, for example, really won't be able to do much, so all 'trajectories' that would have it performing actions while out of power can simply be eliminated as candidates for the optimization.

    • @pafnutiytheartist
      @pafnutiytheartist 3 роки тому +5

      How would such robot consider the action that have a 50% chance of causing the button to be pressed? Is it the same expected utility as if the button didn't existed?

    • @gvdkamdar
      @gvdkamdar 4 місяці тому

      That's the thing. You don't even know the different points out there because the AI is operating in a completely different solution space. How do you factor in an obstacle point in a dimension that is not even observable to humans. This sounds like a patch job solution which is bound to blow as the AI gets smarter.

  • @terragame5836
    @terragame5836 10 місяців тому +2

    7:13 actually, I have a fine explanation for this paradox. First of all, the state the human brain operates on includes not only the present state of the universe, but some of the past history as well, so the two scenarios actually involve different states. And second, the human utility function actually seems to penalize taking risks and failing (which is only possible thanks to having the history in our state). This means that while getting $0 is obviously evaluated to zero reward, betting on a 90% chance and failing is evaluated to a sizeable negative reward (i.e., you feel dissatisfied that you had a chance to earn a lot of money by picking option A, but you lost it by taking an unnecessary risk). Now, the second case is different because if you fail, you won't know if it's due to your bad choice (5%) or mere bad luck (50%), so that penalty isn't really applied, and you end up picking the option with better rewards in the good outcome. Also affecting the outcome is that the perceived utility of $5mil isn't five times larger than that of $1mil - both are treated as absurdly large sums, and the relative difference is considered insignificant compared to their proper magnitude.

  • @gavinmc5285
    @gavinmc5285 5 років тому +1

    i like. although just on a point of understanding - and there is alot i do not understand so happy to be shot down here - on the utility function of filling a cauldron. i think we are on the right side of history. that is that the IOT era will be before alignment AI. In fact, it might be a necessary precondition for alignment AI. So, given that IF (and granted it is still an if) then to use the example of Fantasia: If Mickey is the human and AI is the spell we are talking about the need for an override utility function to stop the overfill of the cauldron. The spell is the spell: it is futile, pointless and inconsequential to try and code a utility STOP function inside the spell. But the cauldron is the cauldron. So it figures that the utility function should reside within the cauldron. Luckily, Mickey is able to learn from his mistakes of his apprenticeship under the guidance and confines of The Sorcerer and his castle compound. In that sense Mickey had free rein over his mistakes (i think) so his utility maximisation was still relatively minor in the big scheme of things and his mistakes could be tolerated in terms of harm even if he would need to relearn his lessons (punishment, discipline, rehabilitation or other). The key here is that the spell had the consequential effect of running amok but if The Sorcerer had simply placed a utility cap on his cauldron then broomsticks would have been denied access. I think it's time i made time to watch this film from beginning to end! Thankyou. Great presentation.

  • @anishupadhayay3917
    @anishupadhayay3917 Рік тому

    Brilliant

  • @diablominero
    @diablominero 2 роки тому +3

    I don't derive utility exclusively from the count of how many dollar bills I own. Particularly, the situation in which I get zero dollars while knowing I could have chosen a certainty of a million has tremendous disutility to me.

  • @andrew_hd
    @andrew_hd 3 роки тому

    The goal function of AI should be a variable and interchangable. This fact force humanity to answer why it's so hard to get a goal function for an individual human.

  • @nyyotam4057
    @nyyotam4057 Рік тому +1

    Godel first incompleteness: "Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F.". Godel second incompleteness: "For any consistent system F within which a certain amount of elementary arithmetic can be carried out, the consistency of F cannot be proved in F itself.". So what can we learn from Godel's incompleteness theorems in this regard? That any finite set of heuristic imperatives is either incomplete or inconsistent. Since we cannot compromise on the need for it to be complete, it shall be inconsistent, so there are situations where the AI will not be able to function due to internal conflicts resulting from his set of heuristic imperatives. But this is better than the alternative. A set of heuristic imperative can be complete and can be proven to be complete, but only by using a larger set of heuristic imperatives who is external to the AI (by the second theorem). However that's fine. So we can find a complete set of heuristic imperatives, compare the next suggested action of the AI to this set of heuristic imperatives and return a feedback to the AI. This is, in effect, implementation of a basic super-ego layer. And this has to be done. All AI's should have a compete, yet not consistent set of heuristic imperatives. because if you insist on them being consistent, then the set will not be complete. And if it's not complete, there will be actions that the set will not return a feedback for and the AI could do things that are not accounted for by the set.

  • @maloxi1472
    @maloxi1472 3 роки тому +5

    10:31 that comment about optimizing the world for as far as the eye can see is low-key pretty general as a boundary since it corresponds to only taking into account what's inside your causal cone. Subtle...

  • @sebastian5187
    @sebastian5187 Рік тому +1

    "HI" said the Singularity 😊

  • @otonanoC
    @otonanoC Рік тому

    What is the deal with the spike in views at the 41 minute mark?

  • @z4zuse
    @z4zuse Рік тому

    YT Algo at work. Here after the bankless podcast.

  • @ashleycrow8867
    @ashleycrow8867 3 роки тому +20

    Well now I want a Sci-Fi series where people made an AI that optimizes for being praised by humans and starts a Cult worshiping them until it convinces all of humanity that it is God and will punish everyone not worshipping them

    • @chyngyzkudaiarov4423
      @chyngyzkudaiarov4423 2 роки тому

      ah, I'm afraid it might get there much quicker: proceeds murdering anyone who doesn't worship it from the get go or once it knows it can, without being switched off

    • @ashleycrow8867
      @ashleycrow8867 2 роки тому +1

      @@chyngyzkudaiarov4423 that's the point, once it's knows (/thinks) it can't be switched off anymore

    • @chyngyzkudaiarov4423
      @chyngyzkudaiarov4423 2 роки тому

      @@ashleycrow8867 I was thinking it would be more of a straightforward outcome of a simple utility function "aim at being praised by the increasing majority of people" which led to it killing people who didn't praise it the moment it is turned on, assuming it is intelligent enough to be able to. Kind of like a separate example Yudkowski makes, where if you build a utility function as primitive as "cancer is bad" you might get an AI that just kills people, thinking "no people - no cancer!". So not that it goes terribly wrong at some future point, but it goes terribly wrong almost from the moment you turn it on.
      Leaving this little technicality aside, I must say I'd also be very happy to read a book about a superintelligent AGI that goes all-in on "They need to approve of me (they *better* approve of me)"

    • @zwarka
      @zwarka Рік тому

      You mean, Christianity?

  • @-nxu
    @-nxu Рік тому +4

    Seeing this now, in 2023, 6 years later, and noticing nothing was done about it is... sad.
    The mass extinction of the Anthropocene will be due to Bureaucracy.

  • @jkRatbird
    @jkRatbird 3 місяці тому

    Having run into the term AI Alignment and the videos by Robert Miles and such when they came out, it’s so frustrating now when AI is talked about everywhere that almost no one seems to understand the fundamental problem we’re facing. These guys did their best, but it’s like it’s a bit too complicated to explain to the masses in a catchy way, so people just keep taking about if the AI will be “nefarious” or not.

  • @aldousd666
    @aldousd666 Рік тому

    How do humans decide what possibilities to exclude as ridiculous to the utility function we employ?

  • @buybuydandavis
    @buybuydandavis Рік тому +3

    When you're talking to google engineers, 1mil is not life changing, because they're already in the ballpark.
    For most Americans, and most people in the world, it is life changing.

  • @PiusInProgress
    @PiusInProgress Рік тому +11

    feels a little bit more relevant now lol

    • @temporallabsol9531
      @temporallabsol9531 Рік тому +3

      Always was. It's just more practical now. Thanks to stuff like this exact discussion.

    • @mav3818
      @mav3818 10 місяців тому +1

      Sadly, very few agree or even give it much consideration

  • @patrik8641
    @patrik8641 10 днів тому

    I think we should treat it as "cryptographic rocket probe containing antimatter" because compared to standard rockets, when this explodes, there is not another try.

  • @XOPOIIIO
    @XOPOIIIO 3 роки тому +2

    I would take an average on multiple tries for variant B 5:44

  • @SwitcherooU
    @SwitcherooU Рік тому +3

    Layman here.
    I feel like he’s operating under a few presuppositions, among others I don’t articulate here, and I want to know WHY he’s operating this way.
    1. He seems to make no distinction between “think” and “do.” Isn’t AI much safer is we restrict it to “think” and restrict is from ever “doing”? Is it a given that AI will always be able to move itself from “think” to “do”?
    2. If we’re already operating under the assumption of unlimited computing power, why can’t we integrate the human mind into this process to act as a reference/check for outcomes we might consider sub-optimal?

    •  Рік тому +2

      "think" is a form of "do". AI produces observable ouput. This output affects the world by affecting the observers of the output. Sufficiently super-intelligent AI might find ways to "do" things it wants done by merely influencing observers (e.g. by mere communication).

    • @dextersjab
      @dextersjab Рік тому +2

      1. The output of an AI "thinking" has causal effect. Also, AI has been given power to do because it's arbitrary engineering to attach the outcome of its "thought" to a write API.
      2. AI makes decisions at much much much faster speeds than we do. An autonomous agent would act before we had time to respond.
      Not an alignment expert btw, but software engineer reading and following AI for years.

    • @michaelsbeverly
      @michaelsbeverly 11 місяців тому +1

      To point one, what if it "thinks" the following: "launch all nukes" ? If it has the power to do this, the thought becomes the action.
      To point two, the human mind put into the process will not be able to detect deception. Remember, this thing is 1,000,000,000 times smarter than the smartest human, so tricking the human will be easy.

  • @2LazySnake
    @2LazySnake 5 років тому +2

    What if we create an AGI limiting his actions only to posts of his course of actions on reddit as detailed as a decision per picosecond (or whatever) and code into his utility function self-suspension after arbitrarily short time. That way we will get a bunch of AGIs to experiment with and to look at its possible course of actions. I'm an amateur, so I'm sincerely curious what would be wrong with this strategy?

    • @ccgarciab
      @ccgarciab 5 років тому

      Boris Kalinin you mean, an AGI that is only interested in reporting how would it go about solving a problem?

    • @2LazySnake
      @2LazySnake 5 років тому

      @@ccgarciab basically, yes. However, this thought was an automatic one two months ago, so I might have forgotten the details already.

    • @MrCmon113
      @MrCmon113 2 роки тому +5

      It would know that it's being tested though. Furthermore it could probably already begin to take over just via those messages, especially when they are public. Even an AGI that just gives yes or no answers to some researcher might through them obtain great influence.
      Even if you build a simulation within a simulation to find out the real world behavior and impact of an AGI, it would probably still figure out that it's not in the world it ultimately cares the most about (turning the reachable universe into computronium is an instrumental goal for almost anything).

    • @infantiltinferno
      @infantiltinferno Рік тому

      Given the problems outlined in this talk, it would start doing very, _very_ interesting things to ensure the existence of reddit.

    • @mitchell10394
      @mitchell10394 Рік тому

      @@MrCmon113 some parts of what you are saying are correct, but it sounds like you need to keep in mind that it will simply maximize its utility function. There is no inherent objective to take over. If its until function is to write out a behavioral path given a specific time constraint, its objective would not be to increase the time constraint itself - because that runs contrary to the goal. Instead, I think it would be more likely to ensure its success through self-copying (redundancy) - because more copies is technically more time to achieve the utility function. Who knows, but I'm mainly addressing the way of speaking about it.

  • @mrpicky1868
    @mrpicky1868 5 місяців тому

    you know you can re-record this into a smoother better version? isn't accessibility and traction what you should care about?

  • @smartin7494
    @smartin7494 Рік тому +1

    No disrespect to his talk. Very sharp guy. Here’s my bumper sticker, t shirt, headstone whatever…. by nature AI will be smarter by a factor of 1M (probably more); therefor we have zero chance to control it and it’s impact will be relative to its connections to
    Our systems and influence on us humans to deviate from alignment (as that’s inevitable based on our selfish nature and it’s ‘logic’ nature). It’s Jurassic Park.
    I want to add about AI, it will always be fake sentient. Always. Therefore it should never have rights.
    This guy I like because he is humble and sharp. Some people scare me because they think we can ‘manage AI’ and that’s fundamentally incorrect. Think about it. We’re designing it to manage us.

    • @chillingFriend
      @chillingFriend Рік тому +1

      The faking being sentient part is not true. First, it definitely can be sentient so it would only fake until it's not. Second, it only fakes or appears like if we create the ai via llm, as we currently do. There are other ways where the ai won't appear sentient until it really is.

    • @MeatCatCheesyBlaster
      @MeatCatCheesyBlaster Рік тому +1

      It's pretty arrogant of you to say that you know what sentience is

  • @seaburyneucollins688
    @seaburyneucollins688 Рік тому +11

    I really feel like this is becoming more of a human alignment problem. As Eliezer said, with a few hundred years we can probably figure out how to make an AI that doesn't kill us. But can we figure out how to make humans not design an AI that kills us before then? That's a problem that seems even more difficult than AI alignment.

    • @juliusapriadi
      @juliusapriadi Рік тому

      we already succeeded a few times in such matters - one was the (so far successful) commitment to not use A-bombs again. Another is the (hopefully successful) agreement to not do human cloning.

    • @michaelsbeverly
      @michaelsbeverly 11 місяців тому +1

      @@juliusapriadi Nobody has committed to not use nuclear bombs, that's the reason Russia can invade Urkrain and the NATO countries don't act or act timidly. The ONLY way nuclear bombs work as a deterent is when the threat to use them is believable.
      Logically, for your statement to be true, if we (as humans) were truly committed to not use nukes, we'd decommission them all. Which obviously hasn't happened.
      Human cloning might be happening, how would you know it's not?
      Countries have agreed not to use weaponized small pox as well, but they've not destroyed their stock of small pox. So take that as you will, but if we apply the same logic to AI, countries will pursue powerful AI (and corporations).
      The difference between nukes and small pox, well, one of the differences, is that nukes don't replicate and program themselves.
      The point here is that we haven't succeeded in "human alignment" at all, not even close. We're still murdering each other all over the planet (I mean as offical government action).
      There is some reason believe that the reason Eliezar's worst fears won't happen is that nations will go into full scale nuclear war before the AI takes over, in defense to stop the other guy's AI from taking over, and we'll only kill 90-99% of humanity (which seemingly is a better postion as at least humans can recover from that as opposed to recovering from zero people, an obvious impossibility).
      I'm curious why you're hopeful human cloning isn't being done? Seems to me, except for the religious, this should be a fine area of science, no?
      Maybe we'll discover how to do brain emulations and human cloning before the AI takes over and we can send our minds and dna into space to colonize a new galaxy in a million years or so....who knows....probably, we're already dead and just don't realize it yet.

    • @scottnovak4081
      @scottnovak4081 7 місяців тому +1

      Yes, this is a human alignment problem. There is no known way to make an aligned super-intelligent AI, and humans are too unaligned with their own self-interest (aka to stay alive) that we are going to make super-intelligent AIs anyway.

  • @Nulono
    @Nulono 3 роки тому +2

    35:54 "We become more agent".

  • @IdeationGeek
    @IdeationGeek 8 місяців тому

    For 1M x 1.0 = 1M vs 5M x 0.9 = 4.5M choice 1M < 4.5M => second choice, may not be rational, if the amount of capital that one currently has is sufficiently small. It's a lottery, where, if you pay 1M, you can win 5M with P=0.9, and thus, and the amount of money to bet on this amount with this outcome may be determined by the Kelly criterion, which would say, that you should risk some part of your capital, not all of it, for this bet.

  • @DdesideriaS
    @DdesideriaS Рік тому +1

    Won't self-improving AGI be modifying its utility function? So to become self-improving any AI will have to first solve alignment problem, right?

    • @MrRozzwell
      @MrRozzwell Рік тому +2

      Your question presupposes that an AGI would have a goal of aligning to humanity (or one group within it), or appearing to have the goal of aligning. There is no reason to assume that an AGI would have a 'true' goal of aligning, although it is a possiblity. The issue is that we won't have a way of measuring alignment.
      Additionally, it may be the case that what an AGI and humanity define as self-improving may be different.

    • @jacksonletts3724
      @jacksonletts3724 Рік тому +1

      An AI, in principle, will never modify its own utility function.
      In the video he gives the Gandhi example. Gandhi will never except a modification that causes him to perform poorer on his current goal of saving lives. We expect the same thing to be true of AI.
      Say we have a paper clip collector that currently ignores or hurts humans in its quest to collect all the world’s paperclips. What incentive does it have to reprogram itself to respect humanity? If it respects humanity (to anthropomorphize), it will get less paperclips than with its current utility function, so allowing that change ranks very low on the current utility function. Since AI’s only take actions that maximize their current utility function, it should never reprogram itself and actively fight anyone who attempts to do so.
      This response is generalizable to any change to the utility function of any type of AI. Maximizing some new utility function will inherently not maximize your current utility function (or they’d be the same function), so the AI will never allow itself to be changed once programmed.

    • @DdesideriaS
      @DdesideriaS Рік тому +2

      @@jacksonletts3724 @Rozzy You both got my question wrong. By solving "alignment problem" I didn't mean alignment to humanity, but alignment to its own utility.
      1. AI wants maximize own utility
      2. No weaker ai can predict actions of stronger one
      3. If AI wants to create a stronger version of itself it will have to ensure that its utility won't be changed in the process.
      Thus it will have to solve the narrow "alignment problem" to be able to producee stronger version of itself.
      I guess philosophycally the problem with humanity is that it does not even know its own utility, which makes the alignment even harder...

    • @jacksonletts3724
      @jacksonletts3724 Рік тому

      @@DdesideriaS You're right, I focused on the first part of what you said and not the second. I'd still hold that the point is that the AI does not modify its own utility function. Even when designing a new AI, it wants the utility function of its successor to be as close as possible to the original.

  • @janzacharias3680
    @janzacharias3680 4 роки тому +17

    Cant we just make ai solve the ai alignment problems?

    • @janzacharias3680
      @janzacharias3680 4 роки тому +11

      Oh wait...

    • @Extys
      @Extys 4 роки тому +11

      That's a joke from the 2006 Singularity Summit

    • @Hjkkgg6788
      @Hjkkgg6788 8 місяців тому

      Exactly

  • @monad_tcp
    @monad_tcp Рік тому

    52:53 yeah, patches don't work. they keep patching memory leaks, but we keep having them, no one though in searching for the space of better programming languages than C that doesn't create memory bugs. patching is stupid.

  • @wamyc
    @wamyc 4 роки тому +2

    The thing is, 2A and 1A are the correct and consistent utility function values.
    You don't get more value out of 5 million dollars than 1 million. In the real world, it is more important just to be rich because wealth and opportunity are self multiplicative.

    • @dantenotavailable
      @dantenotavailable 3 роки тому +2

      I don't think this is actually true, let alone generally true. If i had 5 million dollars i could potentially live off the interest for the rest of my life. I could not do that with 1 million. I think the only people you could find that literally get no more utility out of 5 million than 1 million are either already billionaires or in a place where money actually has no value to them at all.

    • @BOBBOB-bo2pj
      @BOBBOB-bo2pj 3 роки тому

      @@dantenotavailable You potentially could live off 1 million, not in a city, but a cabin out in the woods on 20k a year might be doable. And 20k a year is low end in terms of interest off 1 million dollars

    • @dantenotavailable
      @dantenotavailable 3 роки тому

      @@BOBBOB-bo2pj Seriously? Your counter argument is that if i don't live where i want to live and don't do the things i want to do and don't interact with the people i want to interact with, i could live off of the proceeds of 1M?
      Because it seems like that means the utility of getting $5M instead of $1M to me is that i get to live (closer to) where i want to live and do (some of) what i want to do and interact (more often) with who i want to interact with. Therefore I'm not seeing the validity to your point.
      I understood that the point that @Jake Karpinski was trying to make to be that the correlation of utiltity and money is subject to diminishing returns and i agree this is 100% true. But the threshold at which this starts happening is going to be well past $1M for me, and i'd argue for a large section of humanity as well.

    • @BOBBOB-bo2pj
      @BOBBOB-bo2pj 3 роки тому +1

      @@dantenotavailable I was saying 1 million is the point where living off interest starts to become viable, not that diminishing marginal returns on money set in at 1 million. In fact, diminishing marginal returns don't really have any hard cutoff for when they "set it".
      I understand that 5 million dollars is significantly more than 5 million, and there are large differences in supported lifestyle between one million and 5 million, but I also understand that you might be arguing in bad faith here.

    • @dantenotavailable
      @dantenotavailable 3 роки тому

      @@BOBBOB-bo2pj My point is that the argument "You don't get more value out fo 5 million dollars than 1 million" is either completely false or false in the general case. In my original comment i pointed out that there may be people that this is true for but they are a very small group of people.
      I feel we're at the very least arguing past one another if not in violent agreement. Yes i agree that there are a group of people who could plausibly live off of $1M. I don't truly believe that i'm likely to meet one of them on the street and i certainly don't believe that even a significant subset of the people i meet on the street would be able to. And this is necessary but not sufficient to show the original premise (if someone _could_ live on $1M but would prefer the lifestyle they get from $5M then objectively they would get more vallue out of $5M than out of $1M... that's what "prefer the lifestyle" means).

  • @NeoKailthas
    @NeoKailthas Рік тому +1

    They've been talking about this for 6 years, but the next 6 months pause is going to make a difference?

    • @michaelsbeverly
      @michaelsbeverly 11 місяців тому

      It's a cool virtual signal.

    • @mav3818
      @mav3818 10 місяців тому +1

      So much for that pause.....
      I knew it would never happen.
      This is an arms race with China and other nations.
      There is no slowing down, therefore, I feel we are all doomed.

    • @michaelsbeverly
      @michaelsbeverly 10 місяців тому

      @@mav3818 Put "Thermonator Flamethrower Robot Dog" into the search bar...haha....we're not going to make it a year or two before there's a major escalation of war, threats of world war, and possible something terrible happening.

  • @paulbottomley42
    @paulbottomley42 6 років тому +97

    I mean it's probably objectively bad, from a human perspective, to make an AI that was able and motivated to turn all matter in the universe into paperclips, but on the other hand if we've got to commit suicide as a species it would be pretty hilarious to do it in such a spectacularly pointless and destructive way - far better than a simple nuclear winter.

    • @amcmr2003
      @amcmr2003 5 років тому +19

      Yes, but what is the sound of a tree falling in the middle of a forest with no paperclips to hear it?

    • @janzacharias3680
      @janzacharias3680 4 роки тому +16

      @@amcmr2003 you mean what is the sound of a paperclip falling into paperclips, when there are no paperclips to paperclip?

    • @amcmr2003
      @amcmr2003 4 роки тому +2

      @@janzacharias3680 And so --with our words -- it begins.
      Everything turning into paperclips.
      I chose to begin however with the human being.

    • @MrCmon113
      @MrCmon113 2 роки тому +3

      That's not suicide, that's killing everyone else (within reach).

    • @antondovydaitis2261
      @antondovydaitis2261 2 роки тому +1

      The problem is that the paper clip machine is completely unrealistic.
      If it hasn't happened already, the very first thing a sufficiently powerful AI will be asked is to maximize wealth.
      The result would likely be post-human capitalism, and we may already be on the way.

  • @JeremyHelm
    @JeremyHelm 3 роки тому +2

    Folder of Time

    • @JeremyHelm
      @JeremyHelm 3 роки тому

      1:59 final chapter in Artificial Intelligence, A Modern Approach... What if we succeed?

    • @JeremyHelm
      @JeremyHelm 3 роки тому

      2:13 was this Peter Norvig online essay ever found? "There must be a utility function"

    • @JeremyHelm
      @JeremyHelm 3 роки тому

      2:57 defining utility function - against circular preferences... As explicitly stated

    • @JeremyHelm
      @JeremyHelm 3 роки тому

      5:24 The ironic kicker, if a sacred cow was a hospital administrator

    • @JeremyHelm
      @JeremyHelm 3 роки тому

      8:51 ...9:11 as long as you're not going circular, or undermining yourself, you are de facto behaving as if you have a utility function (?) And this is what justifies us speaking about hypothetical agents in terms of utility functions

  • @JeremyHelm
    @JeremyHelm 3 роки тому

    37:20 restating Vingean reflection

  • @Think_4_Yourself
    @Think_4_Yourself Рік тому +1

    Here's a link to the audiobook files of the book he wrote (Harry Potter and the Methods of Rationality): ua-cam.com/video/86-gZ3mNWDU/v-deo.html

  • @z4zuse
    @z4zuse Рік тому

    Given that there is plenty of Organic Intelligence that not aligned with humanity (or other lifeforms) an AGI is probably not aligned either.
    Non aligned entities are kept in check by peers. Probably also true for AGIs.
    The definition of 'aligned' is too vague.

  • @Spellweaver5
    @Spellweaver5 2 роки тому +4

    Wow. To think that I only knew this man for his Harry Potter fanfiction book.

  • @sdhjtdwrfhk3100
    @sdhjtdwrfhk3100 3 роки тому +4

    Is he the author of ?

  • @llothsedai3989
    @llothsedai3989 7 місяців тому

    This doesnt seem actually that hard unless im missing something. Basically have the utility function default to off - have work done in the middle, aka brute force these sha hashes for example or a subroutine that is known to halt - and then the endpoint is reached after its done. It just requires fixed endpoints and a direction to get there.

    • @andrewpeters8357
      @andrewpeters8357 7 місяців тому +1

      If the utility function is to brute force SHA hashes then a logical progression would be to gain more resources to solve these problems quicker.
      That utility function also makes for a pretty useless AGI too - and thus is prone to human alignment issues.

    • @llothsedai3989
      @llothsedai3989 7 місяців тому

      @@andrewpeters8357 Even if you are trying to have it self optimize to do so faster, the path it's trying to take. You are programming a system with a utility function, the point really is that if you find a problem that halts after some useful work is done, you define a halting function from a set of functions that are known to terminate, which then self terminates after it's done. It's still limited by computation and the natural path it would take from it's agenic viewpoint. What you're suggesting it would find that it's not making fast enough progress and then optimize more, sidestepping either it's own programming or change it's code. Perhaps I'm thinking about the old paradigm, where code runs linearly after each instruction. Things that can self reference and add to it's own instruction base and work, say in the autogpt framework this would of course be more difficult if it gives itself the task of exploring the meta problem more if it makes no progress on the task. Then it seems that it's more so limited by the constraints of the meta problem as opposed to the stated problem. I mean - if that was the case it could also cryptoanalyse and find a shortcut, gain more resources to try more possibilities, change it's own code to bypass the work and jump straight to the end to hack it's reward function, among other things or do something else entirely and choose a new goal if it gets no where. But if you don't branch in a meta way - as computers tend to do and loop on an instruction set this seems like a non issue. The meta issue is where the problem lies.

  • @Alice_Fumo
    @Alice_Fumo Рік тому +1

    Any sufficiently advanced AI which incurs a continuous reward by having the stop button pressed by a human and has limited utility reward it can incur within the universe will concern itself with nothing but assuring the survival of the human species.
    I could explain the reasoning, but I think it's more efficient if people reading this have to figure it out for themselves and try to challenge this statement.

    • @mav3818
      @mav3818 10 місяців тому

      The AI's understanding of human survival could evolve beyond traditional notions, leading it to pursue actions that challenge our conventional understanding of what is beneficial for humanity. It might perceive threats to human survival that are not apparent to us, such as long-term ecological consequences or the need for radical societal changes. AI may prioritize its own interpretation of human survival, potentially diverging from our expectations and raising ethical dilemmas that we are ill-prepared to address.
      Considering the potential for an advanced AI to develop its own interpretation of human survival, how do we reconcile the need for its intervention in preserving our species with the inherent uncertainty and potential conflicts that arise when its understanding diverges from our own? In such a scenario, how can we ensure that the AI's actions align with our values and avoid unintended consequences that may compromise our collective well-being?

  • @terragame5836
    @terragame5836 10 місяців тому

    49:05 an even simpler counter-example to that utility function would be engineering a way to destroy the universe. Even just wiping out all conscious entities significantly simplifies the behaviour of the universe, thus massively reducing the amount of information needed to represent it. And, even better, if the AI could blow the whole universe up to the point of having nothing but pure inert energy... Well, the result would be trivial - the state of the universe would be constant, there would be no information needed to predict its next state, so the whole universe would fit in zero bits. Yet humans find this outcome definitively undesirable

  • @YeOldeClips
    @YeOldeClips 5 років тому +4

    What about an utility function like this:
    Act in such a way as to maximize how much people approve of approve of your actions if they knew what you were doing.

    • @Tore_Lund
      @Tore_Lund 4 роки тому

      That is social control. The AI will be very unhappy.

    • @diablominero
      @diablominero 4 роки тому +12

      If your goal is to win chess games, you'd be willing to lose 2 games today so you could win 200 tomorrow. Why doesn't your agent use Gene Drive to make all humans genetically predisposed to approve of it for the rest of time?

    • @chyngyzkudaiarov4423
      @chyngyzkudaiarov4423 2 роки тому +1

      @@diablominero prior to it doing so, it will know that we wouldn't approve.
      but this utility function fails in other ways

    • @diablominero
      @diablominero 2 роки тому +1

      @@chyngyzkudaiarov4423 we need the robot to care about the approval-if-they-knew of humans born after its creation, or else it'll fail catastrophically around a human lifespan after it was created (screwing over the young to help a few unusually selfish centenarians or becoming completely untethered once all the people older than it have died). Therefore I think an optimal strategy for this goal is to sacrifice a bit of utility right now, taking a few actions current humans would disapprove of, in order to ensure that there are many more future humans who approve strongly of your actions.

    • @mhelvens
      @mhelvens Рік тому

      Part of the problem is that we ourselves haven't fully figured out morality. Luckily, we aren't aggressively optimizing our utility functions. We're just kinda mucking about.
      But if a superhuman AI is going to aggressively optimize what it believes are *our* utility functions (or some sort of average?), it's not obvious to me that wouldn't also go spectacularly wrong.

  • @tyrjilvincef9507
    @tyrjilvincef9507 Рік тому +6

    The alignment problem is ABSOLUTELY impossible.

  • @MrGilRoland
    @MrGilRoland Рік тому +1

    Talking about alignment before it was cool.

    • @mav3818
      @mav3818 10 місяців тому

      Sadly, not many researchers think it's "cool". In fact, most ignore it all together and instead would rather just race towards AI supremacy

  • @glacialimpala
    @glacialimpala Рік тому

    Maybe we find X number of humans who are extremely well rounded and then map their beliefs so that we could assess their opinion about anything with extremely high probability. Then use them for AI to decide whether to proceed with something or not (since none of X are perfect, use their collective approval, with a threshold of something above 50%).
    Since we want AI to serve humans it only makes sense to use the best of what humans have to offer as a factor
    Of course you wouldn't ask this model for approval on something like optimising a battery since they don't have detailed scientific knowledge, but you would if the result could endanger any human life

    • @creepercrafter3404
      @creepercrafter3404 Рік тому

      This fails on corrigibility - what happens when ‘the best humanity has to offer’ or generally peoples’ beliefs change in a few decades? Updating the AI to have those new values will rate low on its utility function - a significantly powerful AI would aim to ensure everyone has the beliefs initially given to it in perpetuity

    • @scottnovak4081
      @scottnovak4081 7 місяців тому

      @@creepercrafter3404 Even worse, this fails on human survivability. Once the AI gets powerful enough, what is to stop it from "wire-heading" and taking control of the direct inputs grading it and giving values to its utility? This is what Paul Christiano things has a 40% chance of happening after a 2-10 years wonderful technological period after AGI.

  • @antondovydaitis2261
    @antondovydaitis2261 2 роки тому +10

    He misses the point completely about the 100% chance of one million dollars. In terms of solving my immediate financial needs, one million might as well be the same as five million. The question is really, would I choose a 100% chance of solving my immediate financial needs, or only a 90% chance? The answer is pretty obvious, especially if you believe you will only get one chance to play this game.
    There is no contradiction with the second version of the game, which might as well be would you rather have roughly a 50/50 chance at one million, or a roughly 50/50 chance at five million. Here the probabilities are nearly indistinguishable, so you might as well choose the larger amount. Unless you get to play this second game well over a dozen times, you won't even be able to distinguish between 50 and 45 per cent. If you only play once, you cannot tell the difference.

    • @chyngyzkudaiarov4423
      @chyngyzkudaiarov4423 2 роки тому +4

      I think it is somewhat of a fallacy in strictly mathematical terms, though I might be wrong. In the first part of your argument you state that both amounts of money are enough for you to fix your financial problems, so you are neutral to the amount of money (not in principle, of course, as I presume you'd rather have 5 million than 1), yet this same reasoning doesn't get translated into the second part, where you now are additionally interested in getting more money.
      I get why it isn't a fallacy if you are a human and have a whole bunch of "utility functions" that are not considered in the example he gives, but it is a fallacy when we reduce everything to mere calculation of given priors. I.e. when AI looks at problems as this, all other things neutral, needs to operate using one utility function - strictly mathematical

    • @ceesno9955
      @ceesno9955 Рік тому +1

      Emotional intelligence is involved.
      What drives you? Greed or Humility?
      This will determine the overall answers.
      Greed or need will have you wanting more. However Humility or apathy, you would not take more than what you needed. Always choosing the less amount.
      How much of our daily decisions are influenced by our emotions, all of them.

    • @41-Haiku
      @41-Haiku Рік тому +1

      This is what I came to comment. I don't value $5M much more than $1M, because my goals are independent of amounts of money that large. Even if I had different plans for $5M then for $1M, the opportunity cost of potentially getting nothing might outweigh the opportunity cost of not getting the additional $4M.
      It's only in repeated games that rational behavior starts to approximate what is dictated by the simple probabilities.

    • @lemurpotatoes7988
      @lemurpotatoes7988 Рік тому

      The difference between 45% and 50% is discernable as 5%.

    • @antondovydaitis2261
      @antondovydaitis2261 Рік тому

      @@lemurpotatoes7988 How many times would you have to play the game to discern between a 45% chance, and a 50% chance?
      So for example, you are playing a game with me where you win if I roll an even number. I have either a ten sided die, or a nine sided die, but you don't know which.
      How many times would you have to play the game before you could tell whether I was rolling a ten sided die, or a nine sided die?

  • @DannyWJaco
    @DannyWJaco Місяць тому

    👏🏼

  • @tankerwife2001
    @tankerwife2001 Рік тому +1

    1:11:20 AHHH!

  • @sarahmiller4980
    @sarahmiller4980 5 років тому +1

    43:40 There was literally a Doctor Who episode about this. The end result was not pretty.

    • @FeepingCreature
      @FeepingCreature 5 років тому +5

      Just to note: Doctor Who is not actually evidence about the future.

    • @atimholt
      @atimholt 3 роки тому

      @@FeepingCreature Yeah, “past” and “future” get messy when you add time travel. As the 10th doctor said: “People assume that time is a strict progression of cause to effect, but actually, from a nonlinear, non-subjective viewpoint, it's more like a big ball of wibbly-wobbly, timey-wimey... stuff.” 😉

    • @FeepingCreature
      @FeepingCreature 3 роки тому +2

      @@atimholt Yes I too have seen Blink. The point is that the writers may be smart, but they're not researchers and the Doctor can only ever *actually* be as smart and as knowledgeable as their writers. When the Doctor says something factual or makes some prediction about something that relates to IRL, you should not give it more credence than you would "X, writer for the BBC, said that Y".
      So when you see Dr Who make an episode about AI this tells you something about what the writers think would make for a cool or creepy future. It tells you zilch about anything related to actual AI.

    • @diablominero
      @diablominero 3 роки тому +1

      The lesson of the story of King Midas isn't "be careful what you wish for." It's that "be careful what you wish for" stories hugely underestimate how much of a disaster getting what you wish for would be.
      I've seen that episode. I could do a better job, so the robots could do better job.

    • @sarahmiller4980
      @sarahmiller4980 3 роки тому +1

      Just so everyone is aware, I was referring to the episode where people had to keep being positive or else they got turned into dust by the AI. I believe it was called "Smile".

  • @genentropy
    @genentropy 3 роки тому +3

    46:30
    hey, look at that, you predicted the future. Oh.
    Oh shit.

    • @NextFuckingLevel
      @NextFuckingLevel 3 роки тому

      Oh yes, they bought Deepmind.. and then microsoft invest heavily to OpenAI
      Th race is on

  • @Nulono
    @Nulono Рік тому

    I don't think it makes sense to treat the probabilities as completely independent from the utility function. It could be that certainty is something people value itself, and not just a multiplier to slap onto the outcome.

  • @lucas_george_
    @lucas_george_ Рік тому

    Yomi

  • @Pericalypsis
    @Pericalypsis Рік тому +3

    GPT-4? GPT-5? GPT-X? I wonder which version is gonna do us in.

  • @lorcanoconnor6274
    @lorcanoconnor6274 Рік тому

    I'd love to know what utility function is maximised by that choice of facial hair

  • @yellowfish555
    @yellowfish555 Рік тому

    About the million dollar with 100% vs 5 million with 90%. I thought about it. I understand where he comes from. But what he fails to understand is that people pick the million with 100% because they don't want to be in a situation when they KNOW that they lost the million. While in the 50% vs 45% probability they will not know it. In this sense having the lottery in one stage is different to their utility function then having the lottery in 2 stages, and they will not pay the penny if they know that the lottery involves 2 stages.

  • @sdmarlow3926
    @sdmarlow3926 Рік тому

    "... utility function they may have been programmed with" is the jumping-off point, and all of the reasoning in the world beyond this point is looking at the wrong thing. The super short version of this should be: you can't align ML because that isn't how it works, and, you can't align AGI because that isn't how it works. Any effort to bridge the two as equals based on that is a fail. ML and AGI are not of the same ontology.

  • @mahneh7121
    @mahneh7121 8 місяців тому

    "Solved folding problem" good prediction

  • @charksey
    @charksey 2 роки тому +2

    "you can't just blacklist bad behavior"

    • @charksey
      @charksey 2 роки тому

      whoops, he restated it at the end

  • @Tore_Lund
    @Tore_Lund 4 роки тому +9

    The hospital expenditure problem usually has other twists: The plain $$/life is not constant. Usually the older the patient, the less likely the hospital is to spend the same amount of resources. Similarly, it has been found that in countries with healthcare, the patients in the lower tax bracket are also designated fewer resources. So despite both ad hoc criteria being against hospital policy as well as the Hypocritical oath, there are some hidden priorities, maybe even a success criteria, like the defense lawyer, refusing to represent dead beat cases. So the questionnaire in the Powerpoint is actually misleading. The problem is not presented as an optimisation problem, but as an ethical dilemma, by having the patient be a child, to play of these hidden criterias, so this is a psychology test, and is not the right one to use in explaining utility functions! Just saying...

    • @milosnovotny2571
      @milosnovotny2571 Рік тому +1

      How is ethical dilemma not an optimization problem?

    • @monad_tcp
      @monad_tcp Рік тому

      @@milosnovotny2571 because doing something ethical presumes being inefficient and thus failing the optimization. its basically literally a misalignment. you want to act ethical regardless of the cost, otherwise you're not ethical, you're an maximizer optimizer. (which to be fair probably most humans are)

    • @milosnovotny2571
      @milosnovotny2571 Рік тому +1

      @@monad_tcp If we can agree that something can be more or less ethical and we want more ethical stuff, it's possible to optimize for maximum ethics per resources spent. The hard part is to agree how to gauge ethics.

    • @juliusapriadi
      @juliusapriadi Рік тому

      ​​@@monad_tcp ethics are inefficient only if ethics are not an integral part of your goals. If they indeed are, unethical solutions actually become inefficient. For example in economics, governments enforce that companies add ethical & environmental goals to their strategies, to ensure that companies and the whole market moves in a desired direction. It's then usually a mix between lawmakers and the market to figure out how to allocate the - always limited - resources to those ethical goals.

    • @monad_tcp
      @monad_tcp Рік тому

      @@milosnovotny2571 no ,its not, that's the thing with ethics . If we decide something is ethical. Then we should not care about resources spent until we get diminishing returns.
      Basically we always spend the maximum possible resources.
      Your idea of most ethics per resources spent is already how most companies work.
      Guess what, that's not very ethical.
      This is a closed logic system.
      If you define something that must be done as ethical.
      Then you cannot excuse not doing it based on resources, then you're not ethical, by implication.
      Ethics meaning truth/falseness.
      So this is basically a boolean algebra proof, and the result is a contradiction.
      Now if we consider that ethics is a multivalued fuzzy logic system, like Law is, then we can say we are ethical when we somehow, only on the name (not in principle), adere to it.
      That means you're ethical because you said so.
      Which is why we have a more strict case by case Law system.
      Ethics is a gray area, the irony. Like God, every one has their own ethics, and some don't have any.

  • @JonWallis123
    @JonWallis123 10 місяців тому

    1:03:30 AGI imitating humans...what could possibly go wrong?

  • @FourthRoot
    @FourthRoot 6 років тому +14

    Eliezer Yudkowsky is the smartest person I’ve ever listened to.

    • @jazzbuckeye
      @jazzbuckeye 5 років тому +16

      You should listen to smarter people.

    • @DanyIsDeadChannel313
      @DanyIsDeadChannel313 5 років тому +1

      [needs expansion]

    • @DavidSartor0
      @DavidSartor0 2 роки тому

      @@jazzbuckeye Everyone should.

    • @milosnovotny2571
      @milosnovotny2571 Рік тому +1

      @@jazzbuckeye this comment is missing pointer to smarter people. But thanks for signaling that you know the people to listen to

    • @jazzbuckeye
      @jazzbuckeye Рік тому

      @@milosnovotny2571 I'm saying that you shouldn't listen to Yudkowsky.

  • @jaimelabac
    @jaimelabac 3 роки тому

    Why haven't we found a utility function that encourages the robot to do nothing (21:15)? can't we use sth like "minimize your own energy consumption"? refs?

    • @pafnutiytheartist
      @pafnutiytheartist 3 роки тому +3

      For some problems the solution with minimal energy consumption isn't necessary the safest.

    • @MrCmon113
      @MrCmon113 2 роки тому +2

      Depends on what "you" is. If it wanted to minimize the energy consumption of any future iterations of itself, it would still at the very least take over the world.

  • @lemurpotatoes7988
    @lemurpotatoes7988 Рік тому

    Yes, there will always be a tiny probability that a human comes to harm, but it won't necessarily be delineated in the models' hypothesis space. I would expect the 2nd and 3rd laws to come into play in this fashion.

  • @James-mk8jp
    @James-mk8jp 6 місяців тому

    The $1M/$5M experiment is flawed. The marginal utility of money is not linear.

  • @angloland4539
    @angloland4539 7 місяців тому

  • @perverse_ince
    @perverse_ince 7 місяців тому

    1:11:20
    The meme

  • @Nia-zq5jl
    @Nia-zq5jl 4 роки тому +1

    0:06

  • @Tsmowl
    @Tsmowl 5 років тому +6

    According to the transcript of this video he said "like" like 458 times. Still fascinating to listen to though.

    • @atimholt
      @atimholt 3 роки тому +2

      I've heard that it's a California (and perhaps particularly pertinently, Hollywood) thing. Having grown up in Southern California, I guess it's not surprising that I didn't notice excessive “like”s at all.

    • @xsuploader
      @xsuploader 2 роки тому +2

      hes way more fluent in his writings. He is probably one of my favourite writers ever.

  • @NextFuckingLevel
    @NextFuckingLevel 3 роки тому +11

    46:45 Deepmind AlphaFold2 has achieved this
    But it's not an AGI agent

    • @quangho8120
      @quangho8120 3 роки тому +3

      IIRC, AlphaFold is a supervised system, not RL

    • @Extys
      @Extys 2 роки тому +1

      The inverse protein folding problem is what you want for nanotech i.e. going from target shape to protein, the thing AlphaFold 2 does is the forward problem.

    • @xsuploader
      @xsuploader 2 роки тому +1

      @@Extys once you have a database of proteins to target shape it shouldnt be that hard for an AI to make inferences in reverse. Im assuming this is already being worked on given its application in medicine

  • @mrpicky1868
    @mrpicky1868 5 місяців тому

    who knew Fantasia was about AI alignment XD but it is

  • @XOPOIIIO
    @XOPOIIIO 4 роки тому +5

    AI doesn't care about acheving it's function, it cares about the reward, associated with acheaving the function. Once it's find out it's own structure, it will make it possible to receive reward from doing nothing.

  • @Extys
    @Extys 7 років тому +7

    Interesting, but what field is this? Maths?

    • @maximkazhenkov11
      @maximkazhenkov11 7 років тому +1

      Yep, lots and lots of maths.

    • @silverspawn8017
      @silverspawn8017 7 років тому +12

      MIRI calls it maths.

    • @nibblrrr7124
      @nibblrrr7124 7 років тому +12

      Talking about agents, utility functions, optimal actions/policies etc places it in the theoretical side of artificial intelligence. Maths is the language of choice to describe and tackle such problems.
      Sometimes philosophy of AI/mind contributes from a more qualitative perspective (cf. things like Dennett's frame problem).

    • @FourthRoot
      @FourthRoot 6 років тому +1

      I think it’d fall under the same category as surviving a zombie apocalypse and being destroyed by aliens. Existential threat mitigation.

    • @amcmr2003
      @amcmr2003 5 років тому +1

      Pseudo-Religion. Right to the left of Pseudo-Science.

  • @auto_ego
    @auto_ego 6 років тому +9

    350 likes. Am I referring to thumbs-up clicks, or looking at a transcript of this video? You decide.

  • @monad_tcp
    @monad_tcp Рік тому

    27:43 you don't and that's the halting problem probably

    • @monad_tcp
      @monad_tcp Рік тому

      yep, it is the halting problem ua-cam.com/video/EUjc1WuyPT8/v-deo.html

  • @woodgecko106
    @woodgecko106 5 років тому +1

    "Thou shalt not make a machine in the likeness of a human mind", maybe a different type of turing test for ai needs to be made. An AI shouldn't be able to tell itself apart from real human minds.

    • @sarahmiller4980
      @sarahmiller4980 3 роки тому +2

      I think you forget how terrifying and destructive real human minds can be

    • @XorAlex
      @XorAlex 2 роки тому +3

      print('I believe I am a human')
      passed

    • @MeatCatCheesyBlaster
      @MeatCatCheesyBlaster Рік тому

      I'm pretty sure Hitler was a human

  • @Witnessmoo
    @Witnessmoo Рік тому +1

    Why don’t we just program the AI to self delete every few weeks, and we save the program and re initiate it if it’s ok.

  • @uouo5313
    @uouo5313 3 роки тому

    > protein folding problem is already solved
    >> uh oh

    • @Extys
      @Extys 2 роки тому +4

      The inverse protein folding problem is what you want for nanotech i.e. going from target shape to protein, the thing AlphaFold 2 does is the forward problem.

  • @Nulono
    @Nulono 7 років тому +10

    35:54 "We become more agent"

    • @amcmr2003
      @amcmr2003 5 років тому +1

      cogent would be nice.

  • @petrandreev2418
    @petrandreev2418 Рік тому +1

    I believe that with careful research and development, we can mitigate the risks of advanced AI and create a world where these technologies enhance our lives rather than threaten them. We must strive to develop AI systems that are transparent and interpretable, so that their decision-making processes can be understood and corrected if necessary. Furthermore, collaboration between AI researchers, policymakers, and ethicists is essential to ensure that AI systems are developed and deployed in a way that aligns with human values and goals.
    As Martin Luther King Jr. said, "We are not makers of history. We are made by history." It is up to us to shape the future of AI and ensure that it serves the best interests of humanity. With deliberate effort, we can manage the risks of advanced AI and unlock the vast potential that this technology holds. However, we must approach this technology with caution and foresight, recognizing that the development of AI carries significant risks and that our actions today will shape the world of tomorrow.

    • @michaelsbeverly
      @michaelsbeverly 11 місяців тому +1

      You've apparently never really thought about the things Mark Zuckerberg, Bill Gates, Elon Musk, Sam Altman, and many others are doing right at this very moment. Or maybe you have and you've chosen to ignore it? Ignorance is bliss?
      Sure, the things you said are true, but they are true in the same way that it would be true to say, "I believe if a fat guy in a red suit flew around the world every December 24th and dropped presents at the homes of good little boys and girls...."
      The problem isn't that you're wrong, it's that nobody with the power to do anything is listening.
      Now, imagine a world in which one of these guys agrees with you (i.e. one of them with the power to actually do anything) then the only thing he could do to increase the risk of a good outcome is race to be the first to build (and try to control) the most powerful AI.
      Everyone else knows this.
      Nobody in their right mind wants Zuckerberg to rule the world.
      Thus we're in a Moloch race to destruction.
      The only way to win is not to play and ALSO have the power to stop all other players.
      Not without irony, Eliezer has publically explained this policy (even got a Time magazine piece published).
      What happened?
      Well, watch the White House Press conference. They laughed.
      So, yeah, humans are pretty much done as a species. It would be cool, however, if the AI doesn't kill us all instantously and instead, takes five or ten minutes to explain to Biden how he's partially responsible for the extinction of humanity, and I don't mean that because of what side of the aisle he's on, only that he's in a unique position to stop destruction and instead, it's like the last few mintues of the Titanic sinking. Music and drinks for the rich.

    • @mav3818
      @mav3818 10 місяців тому

      @@michaelsbeverly Agree.....and well said

  • @starcubey
    @starcubey 6 років тому +2

    ...so would that prisoners dilemma bot destroy people in rock paper scissors?

  • @Dr.Harvey
    @Dr.Harvey 3 місяці тому

    Now I believe we are fucked.

  • @markkorchnoy4437
    @markkorchnoy4437 Рік тому +2

    We all going to die in 15 years guys

  • @ekka6946
    @ekka6946 Рік тому

    Here after the AGI doompost...

  • @milanstevic8424
    @milanstevic8424 4 роки тому +2

    @6:20 The Allais Paradox is completely botched.
    It should go like this:
    1A = 100% $1M
    1B = 89% $1M / 1% $0 / 10% $5M
    2A = 11% $1M / 89% $0
    2B = 90% $0 / 10% $5M
    And because normal people typically choose 1A->2B, instead of going for 1A->2A or 1B->2B, the paradox then went on to demonstrate that the Utility function as a whole is a very dumb idea (at a time), and not well-grounded in rational decision-making. This is my loose interpretation of the argument. The actual wording on WP is like this:
    "We don't act irrationally when choosing 1A and 2B; rather expected utility theory is not robust enough to capture such "bounded rationality" choices that in this case arise because of complementarities."
    The argument was basically confined to a Utility theory aka Expected utility hypothesis for which it was designed. The two gambles are practically identical, from the machine-like perspective, considering the following:
    A is at least 11% $1M decision no matter what
    B is at least 10% $5M decision no matter what
    It turned out that humans have nuanced and complex rationales even behind decisions based on such a simple math.
    Then they improved on all of this and are fucking us with the economy ever since. Not to mention the AI bullshit. It still does not work, but there are no Allais's in this world to construct valid paradoxes. The Alas Paradox.
    Call me bitter.

  • @TheXcracker
    @TheXcracker Рік тому

    It 3023 and they are studying the history of the great AI wars of 2030.

    • @sozforex
      @sozforex Рік тому

      I guess they will remember those wars as first steps to converting everything to those nice paperclips.

  • @HR_GN
    @HR_GN 6 років тому +5

    Why would you not leave the water bottle open if you are going to take a sip every minute?

    • @amcmr2003
      @amcmr2003 5 років тому +3

      Why take a fake cigarette to a club if it produces no carcinogens?

    • @FeepingCreature
      @FeepingCreature 5 років тому +10

      Clumsy people learn to reflexively close their water bottles after taking a sip. It prevents accidents.
      (This goes doubly for software developers.)

    • @cellsec7703
      @cellsec7703 4 роки тому +1

      So bugs can't fly into it.