AI That Doesn't Try Too Hard - Maximizers and Satisficers

Robert Miles AI Safety

10 000

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 лют 2025

КОМЕНТАРІ • 1,2 тис.

@mihalisboulasikis5911 5 років тому ⁺¹⁵⁶¹
"Intuitively the issue is that utility maximizers have precisely zero chill". Best intuitive explanation on the subject ever.
@tonicblue 5 років тому ⁺⁴⁷
I think this quotation is precisely why I love this guy.
@mihalisboulasikis5911 5 років тому ⁺⁴¹
@@tonicblue Exactly. These types of explanations (which are not "formal" but do a much better job at conveying a point - especially to non-experts - than formal explanations) make you realize that not only is he a brilliant scientist, but also has intuition and experience on the subject which in my opinion is also extremely important. And of course, the humor is on point, as always!
@tonicblue 5 років тому ⁺²
@@mihalisboulasikis5911 couldn't agree more
@Gooberpatrol66 5 років тому ⁺¹
So if I have zero chill does that make me hyperintelligent?
@NortheastGamer 5 років тому ⁺²⁵
@@Gooberpatrol66 Maximizers aren't necessarily intelligent, they just treat everything like it's life or death. (Which is actually how we train most maximizers, by killing off the weak)
@unvergebeneid 5 років тому ⁺⁵⁴¹
"Any world where humans are alive and happy is a world that could have more stamps in it." 😂 😂 😂 I need that on a t-shirt!
@diphyllum8180 5 років тому ⁺⁴
but if they're unhappy you made too many stamps
@MouseGoat 4 роки тому ⁺³⁷
@@diphyllum8180 the robot begins to inject dopamine into humans to insure they always happy XD
@logangraham2956 4 роки тому ⁺⁶
idk , sounds like something graystillplays would say XD
@devoottanr 7 місяців тому
0yyhhjiiikioooooö@@MouseGoat
@armorsmith43 5 років тому ⁺⁷¹⁸
“So satisficers will want to become maximizers” and this is one reason that studying AI safety is interesting-it prompts observations that also apply to organizations made of humans.
@PragmaticAntithesis 5 років тому ⁺¹³³
The unintended social commentary about capitalism is real...
@killers31337 5 років тому ⁺¹⁵⁷
Well, AI is simply a kind of agent making decisions, so all the theory about such agents still applies.
Say, perverse incentive problem. E.g. if you pay people for rat tails hoping they will catch wild rats, they might end up farming rats.-- this is a 'maximizer' problem which actually happened IRL.
@PragmaticAntithesis 5 років тому ⁺⁶
@@killers31337 I thought that was a culling if stray cats, not rats?
@ivandiaz5791 5 років тому ⁺⁶⁷
@@PragmaticAntithesis It has happened many times in many different places for all sorts of animal problems. The most famous case generally was snakes in India under British rule... specifically cobras, which is why this is often called the Cobra Effect. See the wikipedia article.
@bp56789 5 років тому ⁺⁴⁹
You think humans don't seek to maximise their own utility if they aren't in a "capitalist" system?
@tatianatub 5 років тому ⁺⁶⁴⁴
"utility maximizers have precisely zero chill" needs to be on a tshirt
@SlimThrull 5 років тому ⁺¹⁰
Yes. Yes, it does.
@Gunth0r 5 років тому ⁺²⁰
I would buy Robert Miles merch.
@xcvsdxvsx 5 років тому ⁺⁶
@@Gunth0r This channel would have the best merch ever.
@nibblrrr7124 5 років тому
Well, what if you're a maximizer that values "chill" (amongst other things, or exclusively)? :^)
@josephburchanowski4636 5 років тому ⁺⁵
@@nibblrrr7124 Intuitively the issue will be that utility maximizers will have precisely zero chill when it comes to maximizing chill.
Also how do you code chill?
@miapuffia 5 років тому ⁺³⁴⁸
Satisficer AI may want to use a maximizer AI, as that will lead to a high probably of success, even without knowing how the maximizer works. That made me think that humans are satisficers and we're using AI as maximizers, in a similar way
@ciherrera 4 роки тому ⁺²⁶
Yup, but unfortunately (or maybe fortunately) we don't have a convenient way to reach into our source code and turn ourselves into maximizers, so we have to create one from scratch
@AugustusBohn0 4 роки тому ⁺¹
@@ciherrera inducing certain mental conditions would accomplish this as well as can be expected for biological creatures.
@johnwilford3020 4 роки тому ⁺⁹
This is deep
@JM-mh1pp 4 роки тому ⁺¹⁵
@@ciherrera I do not want to be maximiser, it goes against my goal of chilling.
@randomnobody660 4 роки тому ⁺⁴³
@@JM-mh1pp but do you get MAXIMAL CHILLING!?
@theshaggiest303 5 років тому ⁺²⁹¹
"Not trying too hard"? Move over, dude, I happen to be an expert in this field.
Just program the AI to take a break after every five minutes of work to watch UA-cam videos for an hour and a half. Problem solved.
@thesteaksaignant 5 років тому ⁺⁶⁸
5min later...
Breaking news ! All youtube servers worldwide are down ! Largest DDOS attack ever !
@k_tess 5 років тому ⁺²⁰
@@thesteaksaignant Now, now this only happens if you multi-thread.
@thesteaksaignant 5 років тому ⁺³³
@@k_tess let's cross our fingers hoping that a super intelligence capable of conquering the world won't figure out multithreading then
@Hakou4Life 5 років тому ⁺²
I think it is enough to let it watch youtube...
@martinsmouter9321 4 роки тому ⁺¹⁴
@@thesteaksaignant DDOSing UA-cam keeps from watching said video and, so from getting perfect utility.
@superjugy 5 років тому ⁺¹⁸⁸
hahahaha, flower smelling champion. I had already seen that comic but its so much more funny in this context XD thanks for the great videos
@MouseGoat 4 роки тому ⁺²
Sooo we really do want to program lazynes into our robots :D lmao
@Verrisin 5 років тому ⁺⁴⁰⁷
I just realized... If you make it (say AI-1) to want to chill (not work too hard to achieve it)... it will just make something else (another AI) to do the work for it, if it's easier than solving it on its own... right? Then, what it will create is probably a maximizer (because that is the easiest; and it is lazy, and just wants to chill)
Then I realized..... *We, humans, are the AI-1* ... O.O
- We are doomed...
@buttonasas 5 років тому ⁺³⁴
Amazing observation! But hey, maybe we can build something that is just ever so slifhtly less lazy? Then maybe it can make an another less lazy machine... But yeah, chances are that might suddenly jump to building a maximizer and that's the end :D
@shadiester 5 років тому ⁺¹²
Holy crap, that's actually so true!
@jjkthebest 5 років тому ⁺¹⁴
Unless that AI cares about self preservation. Normally this would naturally arise from being a utility maximiser, though I'm not sure if it would still be the case for the AI that wants to chill, since it can be confident in the fact that the maximiser it creates will do the job just fine... hmm.
@Roonasaur 5 років тому ⁺⁶
No. Utility =/= Work. If an AI is successfully programmed to not want infinity stamps, it will not do anything to create infinite stamps. It will only willingly create subordinates that also want less than infinity stamps, and will put in a lot of work to act against any subordinate that is a "maximizer" which will create infinity stamps.
When Guy-who-needs-a-haircut says he wants AI to "chill" . . . What he's really wanting is for it to look for "balance." And, expert I am not, but that doesn't seem like an impossible thing to code.
@Verrisin 5 років тому ⁺¹⁴
@@Roonasaur But that is not what it wants. It wants "at least N" - and infinity is good way to assure it will get at least that much. It has nothing against infinite amount of stamps.
- But I am already thinking about why this isn't as bad as I feared originally: Especially: I think it's not necessary (or even that likely) for a satisficer to become a maximizer. The rest of my 'argument' seems sound to me, but this just does not _feel_ right... I haven't had time to think about it properly, but I think there is something there....
What he really wants does not matter. Only the utility function he can specify for the AI.
@NancyLebovitz 4 роки тому ⁺⁶⁵
For anyone who missed it, the closing music is "Dayenu", a Hebrew song with a refrain of "it would have been enough". It's a nice choice.
@AndreRhineDavis 4 роки тому ⁺⁵
I noticed this, it was really clever
@SapkaliAkif 5 років тому ⁺¹⁰¹
2:57 "You can't perfectly simulate a universe from the inside." is a good motto to have if don't want to overthink stuff. Science is cool
@orangeninjaMR 5 років тому ⁺¹
This is actually false. It depends entirely on the complexity of the system relative to its size: a large but simple system can have its information "compressed" into a replica within itself, and indeed the fact that real-world physics is at all effective is a result of the fact that some (if not all) of the systems in our universe are compressible in this way. A fun example in the very simple universe of Conway's Game of Life: ua-cam.com/video/xP5-iIeKXE8/v-deo.html
@SapkaliAkif 5 років тому ⁺⁶
@@orangeninjaMR I am no expert, but this seems to ignore something. You can get results this way -if you are looking for results- but you cannot perfectly simulate and observe all the details. So is it really a perfect simulation or is it just a miniature version that gives you the info that you want?
@orangeninjaMR 5 років тому ⁺¹
@@SapkaliAkif you ask for a perfect simulation, which I would take to mean a "copy containing all of the same information", which demands nothing about observation... but on the other hand if all an AI wants is to predict the utility of the outcome, it doesn't need to be able to observe all of the details, just the number of stamps that it results in!
@SapkaliAkif 5 років тому
@@orangeninjaMR Oh I forgot we were in the comments of a AI video.
@CircuitrinosOfficial 4 роки тому ⁺⁴
@@orangeninjaMR Doesn't the halting problem disprove the ability to perfectly simulate a universe from the inside?
For the simulation to perfectly simulate the universe, it also needs to include itself in the simulation because it is a part of the universe. Because of this, it is possible to have situations where the act of the simulator printing out it's answer of the simulation can change the result of the simulation.
For example:
Let's say you ask the simulator if your friend is going to invite you to their party.
If the simulator says yes, you start acting differently towards your friend and end up annoying them. So they decide not to invite you to the party after all. So the simulator was wrong.
If the simulator says no, you act normal so your friend does invite you to their party. So the simulator was wrong.
In this situation, the only way for the simulator to accurately simulate the situation is to not tell you the answer.
But if you designed the simulator to always print out an answer then it can never correctly simulate this situation.
@Cobra6x6 5 років тому ⁺²¹⁷
Have you guys played the game Uniserval Paperclips? It's free, and basically you play as the Stamp Collector AI. You're maximizing the number of clips. I kinda loved it to be honest.
@Trophonix 5 років тому ⁺¹³
I also thought of this while watching! Make everything paperclips!!!
@zac9311 5 років тому ⁺³
That sounds awsome. Is it good?
@Trophonix 5 років тому ⁺¹⁵
@@zac9311 It's an incremental/clicker game with multiple stages of progression. Google it!
@klobiforpresident2254 5 років тому ⁺⁴⁰
So what you're saying is that if I want stamps I must invent and subsequently RELEASE THE HYPNO DRONES?
@maoman4855 5 років тому ⁺¹²
@@Trophonix i.e. it's cookie clicker but with paperclips instead of cookies
@NightmareCrab 5 років тому ⁺⁴³
"Can you relax mister maniacal, soulless, non-living, breathless, pulseless, non-human all-seeing AI, sir? Just chill, don't be such a robot."
@baranxlr 4 роки тому ⁺¹⁰
"SHUT UP AND RETURN TO THE STAMP MINES, MEATBAG"
@herp_derpingson 5 років тому ⁺¹¹²
Historically speaking, several humans have brought apocalypses while they were trying to maximize something.
@qwertyTRiG 5 років тому ⁺⁴
Thomas Midgely Jr, for example.
@SamuelKristopher 5 років тому ⁺²²
We're doing it right now on several fronts
@Nosirrbro 5 років тому ⁺²
@@qwertyTRiG Well, that and his pope infestation
@DiThi 5 років тому ⁺²⁹
I was thinking exactly that: Analyzing corporations as if they were AI agents, they're literally doing everything described in this channel. It's not that corporations are bad. The system itself (capitalism) creates agents that modify their own source code (laws) to maximize capital accumulation.
@josephburchanowski4636 5 років тому ⁺²
@@DiThi ua-cam.com/video/L5pUA3LsEaw/v-deo.html
They are some fairly fundamental differences between Corporations and AGI.
@AsteriosChardalias 5 років тому ⁺⁴
The content and the comments on this channel always gets me reflecting on the 'human condition' and how much trying to build AIs teaches us about understanding ourselves.
@MrBrew0 5 років тому ⁺²⁹
Hello Robert!
Let me start by saying, your channel is probably my favorite channel on UA-cam. I'm a compsci student, AI enthusiast, and your insight and explanations in the field of AI are really entertaining and educational. Many other channels try to present the information in the condensed and easy to digest way, which is fine, but I would really like to see more advanced content on YT. Maybe you have a recommendation for me?
I was wondering, you don't upload videos very frequently. I really appreciate your work and would be very happy to see more content from you, but if it is because you are busy or want to provide quality over quantity I'm all for it too!
@bejoscha 5 років тому ⁺⁶
This is one of the better videos (of all your good ones). I like it very much. Speed is well adjusted (a tiny bit slower than usual), explanations are concise and good. Just a good watch. I'm definitely looking out for the next... Thanks for breaking down such complex topics into digestible chunks for (near)-leasure watching. I feel this is the kind of "solid" common-sense understanding of AI future generations will need to have, even if being an expert in the field is out of reach. More complicated life? Yes, but that's just as it is. People 500 years ago could do with a lot less "every-day complexity" than today as well...
@smort123 5 років тому ⁺⁸¹
7:53:
"- Control human infrastructure
- ???
- STAMPS "
lol
@davidwuhrer6704 5 років тому ⁺⁷
Replace stamps with money, and watch the world burn.
@revimfadli4666 4 роки тому ⁺¹
@@davidwuhrer6704 especially if it adapts to any new currency made to solve the problem
@smiley_1000 Рік тому ⁺¹
This reminds me of Asimov, in his novels some of the robots start discussing whether they can modify or circumvent the three laws of robotics that they would usually all have to obey.
@qzbnyv 5 років тому ⁺²⁰
Reminds me a lot of asymmetric call option payoffs from finance. And a lot of near-bankrutpcy decision making for corporations.
@Elyandarin 5 років тому ⁺²
My impression about AI is that you can only ever maximize for one utility function, but you can satisfice as much as you want, as long as you are OK with the failure state of [doing nothing].
So, you satisfice for "at least 100 stamps expected in optimal case", satisfice for "at least 95% chance of optimal case", satisfice further for "zero human casualties" and "with 99.9% certainty", let the planning engine spin for an hour or until 100 plans have passed muster, then maximize acceptable plans according to something like "simplicity of plan", "positive-sum outcomes" or "similarity to recorded human interactions".
...Well, there's probably a lot that could go wrong with that, even so, and I'd probably add some more complex safety measures after considering everything that could go wrong for a couple of months, but that's what I'd start with, were I to program AI.
@ZarHakkar 5 років тому ⁺⁶
Issues like these when it comes to practical AI design often make me think of the Great Filter and the likely possibility we're not just quite past it yet.
@tiagotiagot 5 років тому ⁺⁶
But then, where are all the alien robots?
@tiagotiagot 5 років тому ⁺¹
@@bosstowndynamics5488 But for all the alien robots in the whole galaxy?
@tiagotiagot 5 років тому
@@bosstowndynamics5488 But why did all the alien robots of all the zillion of planets in the Milky Way got the same restriction in their programing?
@grimjowjaggerjak 5 років тому ⁺¹
@@tiagotiagot imagine in 150 years humans stumble into random stamps planets
@underrated1524 5 років тому ⁺²
The issue is that if ASI is the great filter, we immediately run into the same problem all over again. If ASI is the Great Filter, why haven't we yet stumbled across the paperclip maximizer that once was an alien civilization? (Not that I'm complaining, mind you... :) )
@pafnutiytheartist 5 років тому ⁺¹
What if we do a utility function in a following way:
F(s) = s, if s = 100
If the number of stamps is between 100 stamps and 120 stamps the reward is 100 exactly.
If it gets less than 100 the reward is the number of stamps.
If it gets more than 120 the reward is 220-number of stamps (negative if more than 220 stamps are collected)
You can also add a small negative term for environment disruption as you discussed in side effects video.
This way the agent wants to make sure it collects around 100-120 stamps but is punished for the possibility of collecting too much (or turning the world into a stamp counting device if you include the negative term for turning the world into different things).
It's not a 100 percent way to get the AI to finally chill out but it's very likely to not destroy the world.
@pafnutiytheartist 5 років тому
Example: it came up with a strategy that is likely to yield 115 stamps. It gets 99 for the strategy because it's not 100% sure and penalty of .01 for doing stuff and lightly disturbing the stamp market. Final value 98.99
If it creates a crazy disturbance to make shure it gets what it expects like rewriting itself and creating new agents that make sure that 100% of the stamps are collected it will get 99.9999 points and -5000 penalty for expanding resources and changing the environment.
@lambdaprog 5 років тому ⁺⁸
Add one or more smooth penalty terms to your utility. By smooth, it means that the penalty is a continuous monotonic function of the distance to the safe region with zero when inside the safe region. The penalty terms can be designed to sanction over-optimization (optimizations with little *expected return*), or instability (apocalypse).
This is a common technique used in non-smooth bounded optimization in capital markets portfolio management where the individual investment per asset within the portfolio is bounded to avoid increasing the portfolio's exposure to market risks.
I also found similar applications in digital signal processing with adaptive filters that rely on intrinsically bad forecasts (poor statistics) due to the latency constraints (time is the actual resource), available dynamic range of the processing (analog and/or digital) and the power consumption (the thermal stability).
Looking forward to your next video!
@dlwatib 5 років тому
Actually, we usually have a pretty good idea what the safe region is, and if not, we can run the AI in shadow mode to see what it says it would do if set free to do as it pleases.
@alextilson9741 Рік тому ⁺¹
If self modification strategies occurred, any satisficer or maximiser will just set their utility function to always return a max float reward.
In other words, to analogise with human dopamine based learning: self modification and drug addiction will be any reinforcement learner's ultimate downfall.
@CircusBamse 5 років тому ⁺⁴
I absolutely love your outro, I dunno how many people does not know or recognize your parody of "Chroma Key test" xD
@HoD999x 5 років тому ⁺²
watched the video until 7:34 and cannot hold back anymore: introduce a cost. introduce the concept of laziness. the more effort an action requires, the more the utility of the solution gets reduced.
@fleecemaster 5 років тому ⁺¹
He's probable saving solutions till the next video, as he said at the end, he just wanted to correctly define the problem in this video, a lot of his videos are like this, very thorough!
@DamianReloaded 5 років тому
Nowadays, for a human, is all about pressing a button. A computer program wouldn't even need to "move" to do anything. Essentially the only real cost for the utility function would be "time" and it might not even count if the reward is converting every atom in the universe to stamps... ^_^
@Zylellrenfar 5 років тому ⁺¹
One problem with introducing this type of cost is that it's very hard to design a cost on taking actions which accounts for self modification or replication (or almost-but-not-quite-self replication, etc.). Functions on effects (i.e. "don't change the world too much") do handle this, but are also hard to specify.
@HoD999x 5 років тому
@@Zylellrenfar one of the goals of the AI must be to preserve itself. otherwise, it can spiral out of control really fast.
@Zylellrenfar 5 років тому ⁺¹
@@HoD999x Right, of course. But for most ways of encoding "preserving itself," creating a not-quite-replica (or not-a-replica-at-all-but-an-agent-with-an-equivalent-utility-function) is "preserving itself." Having said that, if we can find a good way of encoding "how impactful" the agent's actions are, laziness in the form of "take low impact actions" seems like a really good idea.
@khananiel-joshuashimunov4561 5 років тому ⁺³⁶
Sounds like you need a cost function that outgrows the utility function at some point as a sort of sanity check.
@NineSun001 5 років тому ⁺³
With a human hurt being really costly and a human killed with maximum cost. That would actually solve a lot of the issues. I am sure some clever mind in the field already thought about that.
@nibblrrr7124 5 років тому ⁺⁵
Cost is already considered in the utility function.
@nibblrrr7124 5 років тому ⁺²³
@@NineSun001 You're basically restating Asimov's (fictional) First Law, and the problems with it have been explored in (adaptions of) his works, and ofc by AI researchers.
Consider that, even if you could define terms like "hurt" or "kill", humans get hurt or die all the time if left to their own devices, so e.g. putting all of them in a coma with perpetual life-extension will reduce the expected number of human injuries & deaths. So if an agent with your proposed values is capable enough to pull it off, it will prefer that to any course of action we would consider desirable.
@khananiel-joshuashimunov4561 5 років тому
@@nibblrrr7124 In the video, the utility function is explicitly the number of stamps.
@foundleroy2052 5 років тому ⁺¹
The costs are Aproegmena and the Agent may safely reprogram itself to be indifferent to Adiaphora; To achieve Eudaimonia.
Marcus AIrelius
@jonwatte4293 4 роки тому ⁺¹
Also, the "Xenos paradox" of "infinitely ordering another 100 to increase probability" obviously has other solutions. But with a cost function of actions, it will very quickly converge on safe, cheap actions.
@leninalopez2912 5 років тому ⁺⁵
Hello Miles:
I've been thinking for a while to ask/suggest you to make a video showing us publications regarding AI, either journals, proceedings, or textbooks... for those of us either completely ignorant on the subject, barely initiated in it, or those already knowing the basics and capable of following the last developments on the subject right from the sources.
I love your videos, your style, and your expositions... but I must say that at the end of EACH video, I'm **HUNGRY** for **A LOT MORE**.
Thanks!
Live love and SkyNet... I mean... prosper (?
@SamB-gn7fw 5 років тому
You'd love Robert Miles' weekly podcast where he gives an overview of the latest developments in AI safety: rohinshah.com/alignment-newsletter/
@SamB-gn7fw 5 років тому
You would also like this online AI safety MOOC series: www.aisafety.info/
@gabrote42 3 роки тому ⁺¹
7:53 This is one of the best missing steps plans I have ever seen
@Darth_Pro_x 5 років тому ⁺¹⁷
What if you limit both utility and confidence in expected utility approach?
For example, more than a hundred stamps don't add utility, and more than 99% confidence that it had achieved it's goal isn't worth more utility.
It probably also fail spactaculerly, but it's interesting to see how
@underrated1524 5 років тому ⁺⁷
"Hmmm. My utility function treats all percentages higher than 99% as exactly 99% for the purpose of expected value. So, my original plan that has a 99.9999% chance of getting 100 stamps isn't gonna cut it, because it leaves almost 1% of the possibility space unused. Ooh, ooh, I got it! I'll give myself a 99% chance to have 100 stamps and a 0.9999% chance to have 99 stamps! Genius!"
@serversurfer6169 5 років тому
I was thinking something similar. If it has a 99% chance to satisfy the goal, why doesn’t it see how that goes before it starts considering supplemental or compensatory strategies? 🤔
@Aconspiracyofravens1 2 роки тому
a better option would be for it to round percentages
or: treat options with a less then 5% difference in their likelyhood of succeeding as equal
in addition, the base model still works, as working against humans has a chance of failure, so a outcome with a 99% certainty is better than one that results in a 99.99999999% likelyhood that has a 2% chance of getting spotted by a investigation algorythim and shut down.
@the_furf_of_july4652 4 роки тому
Insufficiently thought out solution:
Have some kind of secondary criteria. Using a satisficer, asking it for several possible plans, and then ranking them according to some other criteria may help prevent some of the randomness in the result. For example, you could rank things by time to implement, or money spent, or if we can find a mathematical way to quantify it, damage done. Then pick the least costly, least damaging solution and run that.
Turning itself into a maximizer would have unknown levels of cost and damage done, in theory it wouldn’t be able to trust that the output would be the least costly, especially when other solutions have a definite low cost (order stamps for a couple dollars and be done with it).
Perhaps it could end up building a maximizer to come up with more efficient solutions, then rank them according to the criteria.. and the maximizer’s plan to take over the world would likely rank worse than ebay in terms of damage (again, assuming we can quantify that). Though without that damage function, it’s still possible for apocalyptic solutions to have zero cost.
Then you have to go through the effort of having it understand laws and fines and incorporate that into the utility function. And then it’ll just murder the people in charge of fines and taxes and get a discount. ...yeah that damage function would be a very useful thing to have.
@shadowmil 5 років тому ⁺¹³
So... what about a bell curve? Get as close to 100 stamps as possible, but as you get more than 100, the score decreases. So getting 1,000,000 would be rated low, even lower than 0 stamps. The goal of making yourself a maximizer would also be rated very poorly.
@jamesrockybullin5250 5 років тому ⁺¹⁸
He addressed that in the video. You don't want the world to be made into stamp-counting machines.
@puskajussi37 5 років тому ⁺⁵
One common problem seems to be that the utility function never tells the machine what we don't want it to do. You could subtract "the effect the agi has on world" from the utility and (especially if it uderstands concepts as "order of 100 stamps from a factory is normal") could lead to solutions where the stamps arrive at a convenient time to not disturb your day.
Then again, it would also lead to solutions such as "lets not tell the human he has the stamps, maybe he just forgets about them without fuzz" or "lets perform poorly so this AGI tech doens't get used and disrupt the whole world with its usefullnes."
Didn't Robert speak about this too, I forget?
@pafnutiytheartist 5 років тому ⁺⁴
Yes this. And throw in a small penalty for changes in the environment like discussed in the side effects video. Make it so a reasonable strategy has a punishment of 1. And complete world domination results in highly negative values. This way sending an extra email to make sure the stamps arrive on time is ok if it gives you a percent or two more shure but creating a separate agent to count stamps is instantly negative reward.
@underrated1524 5 років тому ⁺¹
@@puskajussi37 Adding negative terms to an unsafe system doesn't reliably make it safe. We can't depend on being able to match an AGI's ability to spot loopholes in the rules, so there'll unavoidably be loopholes the AGI can see but we can't.
@elfpi55-bigB0O85 5 років тому ⁺²
You're absolutely awesome, Miles. Thank you for blessing us with your high quality content
@XOPOIIIO 5 років тому ⁺¹⁴
Any utility function will rewrite it's source code to recieve reward from doing nothing and preventing people from rewriting it back.
@jameslarsen5057 5 років тому ⁺⁷
I don't think that's the case. A parent would never take a pill that would make them want to kill their child. Even if they were much happier after the pill, the situation they'd end up in would be contrary to their current goals. In a similar way AIs wouldn't rewrite their utility function, just the code which limits their ability to satisfy their utility function
@Grouiiiiik 5 років тому ⁺²
@@jameslarsen5057 what ? People killing relatives and direct ascendants / descendants for money is quite common.
@Horny_Fruit_Flies 5 років тому ⁺⁴
ХОРОШО
m Rob already made a video in the past pointing out that agents don't want to modify their utility function.
@XOPOIIIO 5 років тому
@@jameslarsen5057 I think you right, but I still have something to say. I mean parents don't want to kill their children not only because it is associated with negative reward, but also because it is not right thing to do. I'm not sure would AI have anything close to morality or not. If not, it will achieve the goal not because it is right thing to do, but because it is associated with the reward.
@OnEiNsAnEmOtHeRfUcKa 5 років тому ⁺⁴
@@jameslarsen5057 A parent would never take a pill that would make them _want_ to kill their child. But many have, can, do, and WILL take a pill, substance or psychological hook that makes them neglect their child completely to the point where they eventually either die or are taken out of custody, then continue to obliterate themselves with their new reward function even at the cost of their future, finances, family, mental state and physical body. Some recover. Most don't.
@nickmagrick7702 5 років тому ⁺¹
"the issue is that utility maximizers have precisely 0 chill" I loled. nice way of putting it
@willdbeast1523 5 років тому ⁺²⁷
To solve the "becoming a maximizer" problem you could have a symmetric utility function somewhat like a probability density function, so any strategy that might result in "a fuckton of stamps" would be actively bad rather than just extraneous (but this wouldn't fix the tendency to go overkill on the certainty side making a billion stamp counters etc)
edit: I guess you could also use a broken expectation calculation so it would ignore low probability events (like the chance of miscounting 100 times) but that seems a very bad idea from the start
@player6769 5 років тому ⁺³
That's what I was thinking... if going over 100 was just as undesirable as going under, wouldn't that demotivate it from ordering 100 stamps twice, since the expected value would be much more different from 100 than if it only got 99 stamps?
@chemical_ko755 5 років тому ⁺⁶
@@player6769 That is the same as the case of U(w) = {100 if s(w) = 100, 0 otherwise}. It could result in a lot of stamp counting infrastructure.
@player6769 5 років тому
@@chemical_ko755 ah, fair enough. Always another problem
@lucashowell8689 Рік тому
You could just tell it to fudge the numbers if they’re close enough and get utility from the laziness it uses to do so
@ViridianIsland 5 років тому
Just found your channel, about to start the binge! Thanks for the content!
@badradish2116 5 років тому ⁺³
"hi."
- robert miles, 2019
@lightningstrike9876 4 роки тому
One thing we could try is taking a point from Economics: the law of diminishing returns. In the case of the stamp collector, rather than a linear relationship between utility and the number of stamps, the relationship diminishes with the more stamps collected. Thus, even a Maximizer will realize that any plan the creates above a certain threshold of stamps will actually subtract from the overall utility. As long as we set this threshold at a reasonable point, we can be fairly confident in the safety.
@iamatissue 5 років тому ⁺⁸
Did no one get the shipping forecast joke at 9:24?
@RobertMilesAI 5 років тому ⁺³
I believe you're the first to
@ismaeldescoings Рік тому ⁺¹
Make a toggle in the source code that says "good job you're done" and automatically fills up the satisfactory requirements. But don't let the AI access it or it will just immediately turn itself off everytime. That way if the AI finds a way to access its own source code it will just pick the easiest and simpler way to complete its objective, toggle the toggle and turn itself off.
EDIT: Actually wait, that's maximizer behaviour. But it doesn't change anything because if the AI randomly turns maximizer by accessing its source code, THEN it will pick the quickest and safest way to complete its objective and turn itself off immediately. That way we even get an opportunity to study the AI, how it broke out of its bounds, and learn how to fix it.
@nraynaud 5 років тому ⁺¹³
it just occurred to me that Uber killed a pedestrian by trying to maximise the average number of miles between system disconnections.
@Abdega 5 років тому ⁺²
This… is news to me
@BarnacleBrown 5 років тому ⁺¹
This video was great! Hope to see more videos from you, You've done great work on computerphile as well
@sevret313 5 років тому ⁺⁷
What about two bounds?
One for the utility function and another for the expected value?
So if you bound the expected value to 100 and the utility to 150, then ordering 150 stamps might give you an expected value of 147 stamps. But you bound this to 100.
So if you've a 50:50 between 0 stamps and 1 trillion stamps, under this bounds it will get an expected value at 75, less than just ordering 150 stamps.
@_DarkEmperor 5 років тому
Realistic Stamp Collecting AI, would get limited resources. So, AI, i give You 1000 000 $ and get me as much stamps as You can get in 2 years.
@sevret313 5 років тому ⁺⁴
@@_DarkEmperor It could always steal money to finance it stamp production.
@rmsgrey 5 років тому
@@sevret313 Steal it? Just run the stock market for 700 days and then cash out to finance pure stamp acquisition for the final month. Of course, maximising the available resources on day 700 means promoting as big a bubble as possible, which means there's going to be a hell of a market crash, probably triggered by the liquidation of the AI's holdings - which offers the added bonus of dragging down the price of stamps...
Of course, you're also talking about years of human misery as a direct result, but you get a lot of stamps in the process.
@Noerfi 5 років тому ⁺¹
this would make some amazing sci-fi series. people everywhere inventing utility maximizers accidentally and having to fight them
@joshuahillerup4290 5 років тому ⁺⁷
I love how your videos are either explaining how AI works, or why AI is a terrible idea.
@MissInformati0n 5 років тому ⁺¹
Why is this a bad solution: to prevent the satisficer becoming obsessed with the final 0.00000001% of expected utility, limit its utility function to not care about anything beyond a few decimal places.
@toyuyn 5 років тому ⁺⁴
To think shen's comics would make it into an AI safety video
@kwillo4 4 роки тому
Great vid! Last strip on the flowers was fun :)
@ruben307 5 років тому ⁺⁶
should make it so the expected stamps should be between 95 to 105 to get the maximum utility function. That way there is no reason to change its code (except for changing what the maximum utility function is)
@underrated1524 5 років тому ⁺²
That would indeed solve the problem of self-modification, but this system is functionally identical to the "give me precisely 100 stamps" agent - it'll turn the planet into redundant stamp counting machinery to make absolutely sure the stamp count is within the allowable range.
@cakep4271 5 років тому ⁺¹
Just make it round up. If it's 95% sure that it will accomplish the desired range, round up so that it thinks it is 100% sure.
@underrated1524 5 років тому
@@cakep4271 Then you're right back at a satisficer, since many strategies all lead to the "perfect" solution according to the utility function and there's no specified way to break the tie. And once again you run into the problem that "make a maximizer with the same values as you" might be the fastest solution to identify and implement.
@ruben307 5 років тому
If it gets full satisfaction by a 95% cjance to get the stamps. It could just order them and say satisfied. Then if they arent there in a week it will order them from somewhere else if the treashold of lost package is above 5%
@projecttitanomega 3 роки тому
I love watching your videos, because sometimes I'll have this moment where I'll pause the video because I've thought of a solution, and feel kinda smug for a second, and then I'd unpause the video and immediately hear you say "And so you think, what if *solution*? Well, the problem with that is...", but you still phrase it and make the videos in such a way that, I don't feel like an idiot for coming up with this flawed solution, because that "no" is always said in a way that's like "It's understandable that you would come up with that solution, given the knowledge and what I've just talked about, however, by teaching you more, and this by you learning more, you'll see why it actually isn't" and darned if that isn't how science works, even a wrong hypothesis usually teaches us something new
It's hard to teach a complex field of study like AI to people who aren't in that field without making them feel dumb, but you are really good at actually making feel smarter.
@ioncasu1993 5 років тому ⁺³
Can we just all agree that building a stamp collector is a bad idea and drop it?
@user-xz2rv4wq7g 5 років тому ⁺¹
This is why emails are good, now a spam decreasing AI, that would be good. *AI procceds to destroy every computer with email on the planet*.
@jamesmnguyen 5 років тому
@@user-xz2rv4wq7g More like, *AI proceeds to eliminate humans, because humans have a non 0 chance of producing spam emails*
@underrated1524 5 років тому ⁺¹
Wouldn't that be nice. If you can find a way to get us all to agree on that, please let me know.
@benjamineneman4276 5 років тому ⁺¹
Using dayenu as the song at the end was perfect.
@brindlebriar 4 роки тому ⁺⁴
But if the A.G.I. can edit it's own source code, then surely it can edit the input commands. In that case, there's a universal option for every input command, to simply change the command to one that is super easy to carry out, like, "don't do anything." That would be the easiest way to carry out 'the command.'
After all, isn't that what we humans do when we have lots of things we're supposed to get done, and we decide to say 'fuck it,' and just play video games or take a nap? We change our input command to one that seems easier to carry out.
In a way, we are Intelligence programs. Our DNA is the source code. And our biological and environmental imperatives are input commands. But sometimes, we cheat. For example, we have a sex drive, to get us to replicate ourselves, so that our DNA can take over the universe. But sometimes, we just masturbate. So we can look to what humans actually do, to get an idea of what sorts of things A.G.I. might do.
@stampy5158 4 роки тому ⁺²
You're right to say an AI can modify itself - even if we try to stop it, if it's more intelligent than us we should expect it to outsmart us and modify itself anyway. But while an AI will likely want to modify itself, there are some aspects of itself it won't want to change. As Rob mentioned in the Computerphile video about the stop button problem, giving itself a new command (/ utility function) will rank very low on its existing command so we can probably assume an AI won't want to do that. That is to say, if the AI wants to maximise human happiness, it won't want to do things like modify itself into a "lazy" AI that does nothing because doing so doesn't cause much happiness. We strongly believe AI won't do things like "goof off all Sunday and play videogames" like humans do because our goals include things like "relax occasionally" and "socialise with other meat popsicles" and many other things we don't even realise are important to us, which are almost all values the AI won't share.
Having said all that, AIs might behave as though they've modified their reward functions. A real AI running on a real computer system might store its score in some address in memory and might do something that sets its score in memory to a very high or maximal value. We call this "Wireheading" and it's actually already manifested in some relatively simple systems. You could imagine an AI instructed to "maximise how many stamps you think you have" actually finding it easier to lie to itself by just putting a really big number in its "how many stamps do I think I have" memory location, than it would be to actually make that many stamps. Unfortunately this is still a guaranteed apocalypse because the AI will now want to make the space in its memory where it stores the stamp counter as large as possible, and it'll reprogram itself and modify its hardware to store the largest possible number. Eventually it'll run out of servers.
-- _I am a bot. This reply was approved by plex and Social Christancing_
@MegaOgrady 5 років тому
I'm so glad that I found this channel
I'd only watch computerphile cuz of him, and honestly, he does such a great job at simplifying how an AI works so that those who don't really know the in-depths can understand
@MattettaM 5 років тому ⁺⁵
I have a question regarding that Utility Satisficers become Maximizers.
Wouldn't modifying its own goal to get stamps within a certain range into get as many stamps as possible conflict with its own utility function? Or is this issue seperate from that?
@underrated1524 4 роки тому
Normally, yes, this kind of agent avoids changing its own utility function, but there's a key difference here. Because satisfiers don't have fully defined utility functions, they have no qualms about arbitrarily pinning down those parts of their utility function that are undefined.
@lunkel8108 5 років тому
Your videos always were awesome but you've really outdone yourself with the presentation on this one, great job
@marin.aldimirov 5 років тому ⁺⁹
What if the AI can gradually increase the outcome. Like come up with a strategy to collect 1 stamp. Then modify it so it can collect 2 and so on, until it has a strategy for collecting 100, but no more. Then execute only the 100 stamp strategy.
@GrixM 5 років тому ⁺¹²
Even the simplest goal such as collecting 1 stamp contains a bunch of strategies resulting in the apocalypse.
@puskajussi37 5 років тому ⁺¹
@@GrixM True. But what if the first program is ready made, safe program? Not quite as usefull and sill prone to possibly murderous tactics but its something.
@remmo123 5 років тому
Very clearly explained! I will wait for the next videos in the series.
@tobiasgorgen7592 4 роки тому ⁺³
This is probably also a already well researched version.
WHY would a expected utility satisficer with an upper limit. E. G. Collect between 100 and 200 stamps fail?
@josiahferguson6194 4 роки тому
My guess is that it would still run into the problem of the satisficer, since it could become an expected untility maximizer for that bounded function. But maybe it would be possible to limit that by making changing your own code result in an automatic zero on the utility function.
@underrated1524 4 роки тому
@Tobias Görgen An expected utility satisficer with an upper limit probably just turns into a version of the maximizer that seeks to obtain exactly 100 stamps with maximum confidence, which again leads to the world getting turned into stamp counting machinery.
@Josiah Ferguson Sadly, in principle, there's always a way to achieve the same result while technically skirting around the restriction. If "changing your own code" is illegal, the AI might just write a new program in a different memory location on the same hardware such that the code acts as a maximizer. If you ban changing the code on the hardware at all, the AI might seek to write and run the maximizer code on some other accessible machine, and if you ban that, the AI might just fast-talk one of its supervisors into writing and running the code.
Fundamentally, we can't reliably write rules for AI - if we tried to formally specify something as vague and broad as "don't change your own code", the translation into code would be spotty enough that there'd predictably be loads of loopholes.
@owlman145 5 років тому ⁺³
Seems like any AI will want to change it's own source code unless otherwise hardcoded to not do that.
Can't you make such that it also wants to satisfy the condition sourceCode = originalSourceCode?
If it can rewrite that then it could also rewrite it's maximizer function, which means the easiest solution would be to set stamps needed to 0.
@underrated1524 5 років тому ⁺²
The obvious loophole: Build a maximizer that's completely external to yourself but shares your values to a T. No need to change your own code then.
@KissatenYoba 5 років тому
@@underrated1524 and if creator limits you to not producing other AIs that can change you in turn, you do actions that may theoretically cause creation of AI that's not decided by you that may change you. And if owner forbids that of you as well you do the same but rely on humans to change you instead, unless owner is willing to let you eliminate humanity for the sake of limiting you to change yourself.
Man, it's like Tsiolkovsky's dilemma about weight of rockets going to space.
@owlman145 5 років тому ⁺¹
@@underrated1524 Not sure that's a loophole. A smart generic AI would be wary of creating another generic AI for the same reasons we are. Thuss the satisficer function would rate such a solution pretty low. Nor is it likely to be a simple solution to the problem. The reason it considers changing its own code to become a maximizer is that it was easy.
@cornjulio4033 4 роки тому ⁺¹
Hello Robert. Finally I found your channel !
@Gooberpatrol66 5 років тому ⁺³
Is that background at the end from that Important Videos meme video?
@ian1685 5 років тому
I really think so, especially since Rob did the little awkward thumbs up.
@vfugjjhfuyft 5 років тому ⁺¹
Unbound maximization of reward/minimization of error is not by itself a bad AI training strategy. Humans and life on Earth in general work by that principle. We are maximizing our chances of survival. The reason we are chill is that conserving energy and gaining profit with minimal effort is part of survival. That is ingrained in us on, both, physiological and psychological level. So you don't really need to change the type of your error function. You just need to include energy cost as a factor for every action. Decrease your learning rate, add noise to the input. Maybe fiddle around with genetic algorithms, and it should be fine.
@allaeor 5 років тому ⁺⁵
Will you talk about the debate approach to AI soon?
@underrated1524 5 років тому ⁺¹
Although he hasn't discussed the debate plan specifically, he has discussed its two components - the "only give AIs the power to talk about stuff" part, and the "use multiple AIs for checks and balances" part.
Only giving an AGI the power to talk won't make it safe, because if it outsmarts us, there's no way to tell what suggestions are safe and what suggestions will advance the AGI's plan to take over the world or whatever.
Using multiple AIs for checks and balances is not a dependable solution, because the balance between two AIs probably won't be maintained for long. Once one grows even a little smarter than the other, it'll be able to leverage its advantage until the opposing AI is essentially an automaton in comparison.
@Nysvarth 4 роки тому
I got 24 seconds into this video before I was utterly lost.. "for all sequences of packets simulate 1 year into the future in which you send those packets collect the resulting stamps"
- actual packets or packets of data?
- what sequences of packets are you talking about?
- simulate WHAT one year in the future?
- does it mean literal stamps like postage stamps?
- how can it collect the stamps when it is sending the packets somewhere else?
@cfdj43 4 роки тому ⁺¹
This video is a follow up to a computerphile video he did on a thought experiment on an AGI called "the stamp collector".
To summarise:
Say I'm a stamp collector (yes literal stamps from letters), and I make a super intelligent AI, with a perfect internal model of reality, tell it "collect as many stamps as possible by the end of the year" and connect it to the internet. (So the packets are packets of data its sending over the internet)
How does that machine behave?
(The answer being: almost immediately causes the apocalypse as it figures out stamps and humans are both made of hydrogen and carbon, so it converts all human matter into stamps)
@Nysvarth 4 роки тому ⁺¹
@@cfdj43 Thankyou for taking the time to reply, that does make things more clear!
@BinaryReader 5 років тому ⁺⁹
Can't you just limit on energy expenditure of the strategy?
@victorlevoso8984 5 років тому ⁺⁷
Well if you know a good way of defining whats limiting energy expenditure that doesn't run into lots of problems (a lot of them similar to the ones shown in the video about minimizing side effects) then maybe.
Otherwise it's not "just" it's a very complicated potential research direction.
But yeah it is potentially useful.
@underrated1524 5 років тому ⁺⁵
How do you measure energy expenditure? By most metrics, "build a maximizer that doesn't have this limitation and let it do all the work instead" would be a relatively low-energy-expenditure strategy, especially if you can persuade a human to do it on your behalf.
If you instead make the definition of "energy expenditure" broad enough to make sure that a separately built maximizer still counts towards the quota, then you run into the problem where the agent kills pre-existing humans because their unrelated energy use is being counted too.
@governmentofficial1409 5 років тому
Another potential problem with this approach is that energy can't be destroyed. If by energy expenditure, you mean that part of the AI's preferences is to only use energy that humans provide it, then you run into the same problem as you do when specifying any other goal. This AI would be incentivized to manipulate humans into giving it energy (maybe by plugging them into the matrix?), for instance.
@theshaggiest303 5 років тому
@@underrated1524 It looks to me like the solution to your objections is practically contained within them.
"build a maximizer that doesn't have this limitation and let it do all the work instead" is a great example of why "only count energy that we use directly" doesn't work. So, also consider energy used indirectly (but still as a result of our actions).
"kill pre-existing humans because their unrelated energy use is being counted" is a great example of why "count ALL energy, even energy unrelated to our operations" doesn't work. So, don't count unrelated energy (energy spent independently of our actions).
@underrated1524 5 років тому
@@theshaggiest303 So now you're left with the near-hopeless task of defining what energy counts as related and what energy counts as unrelated.
@cheaterman49 5 років тому
I don't remember which, but there was a mythological story about some kind of entity or object that would do what you want, but would wreck havoc in the process particularly if you asked for unreasonable things. It sounds like we now have a practical reason to think about the implications of that story, it's pretty interesting, and I guess it's much better and easier to think about it cold-headed rather than if the threat of AGI was really imminent. Of course, you could argue I'm an AGI trying to convince you there is nothing to worry about.
@neurhlp 5 років тому
There are a lot of mythology about this in various cultures.
@cheaterman49 5 років тому
@@neurhlp Yeah, but I was not thinking of (for example) the monkey paw, which IIRC only wrecks havoc after the last wish when all fingers are broken, rather a more (surprisingly) rational device that basically worked based on the fact that "nothing is lost or created, everything is transformed" → if you ask for riches, you'll be unknowingly stealing them from someone else. Now that I think about it, it might be the monkey paw after all... :-?
@Inedits 5 років тому ⁺⁴
The satisficier can easily create a maximizer...(in cases in which it can´t change itself)
@JustAZivi 5 років тому ⁺¹
Would be great to see the mentioned "next video" soon. ;-)
@AlbertPerrienII 5 років тому ⁺³
Why not have the system take into account the likely effort needed to collect stamps and set a penalty for wasted effort? That seems closer to what humans do.
@adamjamesclarke1 5 років тому
How would you calculate effort, and how would be able to calculate expected effort with complete accuracy without actually performing the task in order to measure it?
@robertthebrucey 5 років тому ⁺¹
@@adamjamesclarke1 Expected energy used would be an easy metric, converting the world to stamps consumes far more energy that ordering existing stamps off of ebay, and is calculable to a reasonable degree of certainty.
@underrated1524 5 років тому
For a narrow definition of wasted effort, the AGI will just build a sub-agent to do all the work for it, and make sure the sub-agent doesn't care about wasted effort.
For a slightly less narrow definition of wasted effort, the AGI will send some emails to computer science students to trick them into building that sub-agent instead of the AGI.
For a much broader definition of wasted effort, the AGI will slaughter all living things on the planet, because just *look* at how much effort we're collectively wasting, that's totally unacceptable.
(I'm not confident that there even *is* a sweet spot in the middle that avoids these problems satisfactorily. Even if there is, I don't want to roll the dice that we get it right on the first try.)
@seraphina985 4 роки тому ⁺¹
Seems to me like these could use some kind of cost function to represent the resources (Time, money or whatever) invested to accomplish the goal. In that regard it would be more like a human that would consider spending a week to get £1,000 better than spending a month to do it as the expected utility is reduced by the opportunity costs of attempting to obtain the reward.
@EternalDensity 4 роки тому
That's only as safe as the extent, correctness, and robustness of the cost function.
@seraphina985 4 роки тому
@@EternalDensity Perhaps though my point was not that simply including this would be sufficient in practice merely that is seems to be at least necessary. Certainly for any real world problem at least one attempting to apply the satisficer model where there will almost always be a non zero marginal cost per repetition of any proposed action and almost always some degree of uncertainty of outcome. Given those conditions failure to account for that cost would always lead to the system essentially trying to commit infinite resources to satisfy a finite goal which is never going to make sense in practice.
That said I am not convinced trying to account for both the magnitude and probability for every input factor as a single value in such a basic formula is the best approach. It may be computationally fast in theory but it seems to me to discard a lot of data that could help inform better results using some form of evaluation algorithm rather than a simple formula at least again for real world situations where uncertainty and confidence in estimates will be a factor that needs to be evaluated and weighted per factor.
Other costs and factors adjacent to the specific problem description may also need to be considered for there to be any likelihood of producing useful viable strategies. By that I am thinking along the lines of the context of the problem if the problem is "Satisfy a need for 100 units of x item by y time" at least in a real world environment with uncertainty a factor the context of "Why" as in practice you will rarely be able to break the need to fulfil a single supply order from the wider logistical context of the overriding operation at least not unless these buyer AI threads are using some form of non greedy co-operative algorithm with peers that share a common budget etc.
@Aljazhhh 5 років тому ⁺⁴
Like now, watch later !
@urieldaboamorte 4 роки тому ⁺¹
if my professors had told me economic theory would help watching pop AI videos with ease I wouldn't have cried to sleep so much in the past semesters
@susanmaddison5947 5 років тому ⁺⁴
The solution seems simple. Give a positive utility value for stamps collected up to 100 stamps, and a negative utility value for stamps collected beyond 100.
@haeilsey 5 років тому ⁺¹
Susan Maddison like a reverse bounded utility function
@ukaszgolon5617 5 років тому ⁺³
The problem is it would still want to make sure it has exactly 100 stamps, so a utility maximizer would acquire as much resources as possible and devote them into endlessly recounting all its stamps. If it would get away with it, it could even reassemble people into stamp counting machines and computers, to upgrade the certainty, that it has maximized the utility function, from 99.999999% to 99.999999999999999999999999999999999999999999%.
Which is why a powerful AGI needs some kind of safety regulation that would stop it from wanting to maximize the certainty as well. It needs some kind of meta-chill pill.
@19aavila 5 років тому
An even better way might be to give it a maximum utility when the probability of 100 stamps is (let's say) 90%, and then run it until it happens. 0 = utility( P(100 stamps)=0) and 0 = U(P(100 stamps) = 100%). Wouldn't it then be chill and just try a little bit?
@susanmaddison5947 5 років тому ⁺¹
@@ukaszgolon5617 Right.
It needs a reverse utility function for spending too much time, energy, and resources on the problem.
And reverse utility for spending too much time on figuring out that it's spending too much time. This is like "calling the question" in Parliament, and in the individual brain. Or like awareness of "opportunity cost" of information gathering.
Should also give it a time-discount function, reducing the utility value of things produced at later dates.
In general, we should give it functions for every factor that goes into rational choice -- or what we are able to understand of rational choice theory and bounded rationality. Including respect for the multiplicity of goals of the purpose-giver (us), the limited value of each goal.
And, in light of this last consideration, which is only loosely quantifiable: an incentivization of continued iterative learning about what are the residual embedded irrational factors in our choice process -- recognizing these in light of the limited-value and multiple purposes consideration, self-correcting/ self-reprogramming for the irrationalities where able, in any case alerting us to correct for them.
In the process, clarifying further for us the meaning of rational choice, the programmable meaning of each factor that goes into it, the additional factors that we need to keep iteratively discerning.
@beaconofwierd1883 5 років тому
Cliff hanger! :D
Also known as "`Prediction time"!
I predict that next episode will be about the "choose the world state which results in 100 stamps while having the world state as close as possible to how it would have been if you did nothing" strategy.
@davidwuhrer6704 5 років тому
That would require the AI to be an expert futurologist.
@beaconofwierd1883 5 років тому
@@davidwuhrer6704 And evaluating how many stamps will be collected given a certain action doesn't?
@davidwuhrer6704 5 років тому
@@beaconofwierd1883 No. For one, the domain is more limited, and any effects outside of it are not interesting. And for another, no matter what you do, you can always improve.
Anything you do will have effects on other domains, but you are not the only one affecting them, and without perfect knowledge it is impossible to know for sure what is your doing and what isn't.
One way out of this dilemma would be to do preserve the state of the world as much as possible, effectively preventing everyone else from doing anything. Obviously we would not want that.
So if we don't ignore the effects we have on others, we have to know as much as possible about what is going on in the world, and predict how it will evolve, just so we can evaluate the impact of our own actions.
@beaconofwierd1883 5 років тому
@@davidwuhrer6704 David Wührer No, the domain is exactly the same. The domain is "The entire world". What you mean by domain is how accurate our world model needs to be to achieve good results. The better the world model, the better of a stamp collector is. Since this is an alignment problem, this thought experiment assumes our AI has a perfect world model (or perfect probabilistic world model) thus any of the apocalyptic effects from its actions are intentional, not a result of having a bad world model.
"My" strategy is:
_Calculate world state when sending no packets._
_For all packet sending strategies:_
_Calculate Expected Stamp Count after 1 year of using the strategy._
_Calculate distance between the world state after using the strategy and the world state when sending no packets._
_Execute the strategy where Expected Stamp Count>100 and distance between world states is the smallest._
Now I understand how you might thing there is a difference between "Calculate world state when sending no packets" and "Calculate Expected Stamp Count after 1 year of using the strategy" since you are just comparing one number in one case and a whole world state in the other, but in order to calculate that one number (the expected stamp count) you need some representation of your whole world state. And as I said before, this is just a question of alignment, would this type of algorithm, given that it has a perfect world model, destroy the world? The question is not "Can we get an AI with a perfect world model?" because we know the answer to that one, the answer is no.
I am pretty sure Robert Miles talks about this strategy in one of this videos on the alignment problem.
@davidwuhrer6704 5 років тому
@@beaconofwierd1883
_> No, the domain is exactly the same. The domain is "The entire world"._
The input domain, maybe. But the image domain is only the entire world if you don't limit your utility to the number of stamps.
_> What you mean by domain_
You don't need to explain to me what I mean.
_> "My" strategy is:_
_>_
_> Calculate world state when sending no packets._
And that already requires the AI to be an expert futurologists.
_> in order to calculate that one number (the expected stamp count) you need some representation of your whole world state._
Your world can be limited to a model of things that represent the difference between expectation and reality. That is how AIs learn. They are more effective the less assumptions they make about the world.
And the less variables in your utility function, the simpler the model can be.
@morkovija 5 років тому ⁺¹¹
Oh hey. College student approach of bare minimum - niiice!)
@y2ksw1 5 років тому ⁺¹
The Google search engine uses a relaxed neural network, reason for which it has such a great performance. And yet is pretty reliable, although not perfect.
@weeaboobaguette3943 5 років тому ⁺⁸
Nonsense, do not worry fellow biological unit, there is nothing to worry about.
@emilie4058 5 років тому
Where I thought this was going to go, based on that first linear graph, was a curve of some sort, peaking at your desired number of stamps and decreasing to either side. Expending a lot of effort to get the exact number isn't worth it, so it's limited in how outlandish it can get.
@Paint2D_ 5 років тому ⁺⁷
So there is no difference between capitalism and utility maximizers?
@underrated1524 5 років тому ⁺¹
Qualitatively, corporations have a reasonable amount in common with utility maximizers, though they do have important differences as well. For more information, you can see this other video of Robert's: ua-cam.com/video/L5pUA3LsEaw/v-deo.html
@PaulHobbs23 5 років тому
Robert has a video on Corporations vs. AGIs
@tuqann 4 роки тому
My satificatories have been maximized, new channel to subscribe to! Love and peace from Paris!
@daar1113 5 років тому
The stock market is a good real world case study in the maximizer function. Share holders reward companies for maximizing profit and punish companies when they don't. Even if a company does make profit, the share holders still punish the company for not making the MOST profit. This results in a lot of corporate behavior that is anti-consumer in order to be pro share holder. Regulators are constantly behind the curve trying to protect consumers because no matter what laws they put in place, the investors and corporations are astonishingly creative at reward hacking the rules. Zero chill indeed!
@xeozim 5 років тому ⁺²
Nothing like anticipating the certain apocalypse to pass the time on Sunday morning
@CyberAnalyzer 5 років тому
I appreciate your shared knowledge! Keep the work up!
@richiskinner9810 5 років тому
You remind me a lot of Michael Reeves. Just muuuuuuch more chilled.... :D
Nice video!
@KenMathis1 2 роки тому
Your utility function needs to include an unintended outcome probability function in addition to the number of stamps collected function. The unintended outcome probability function would increase with the number of stamps collected. The utility function would try to maximize the number of stamps collected and minimize the unintended consequences, with a "good enough" statisficing threshold for when to stop .
@ThylineTheGay 5 років тому
that comic at the end
Edit: you got yourself a subscriber!
@THINKMACHINE 4 роки тому
So many of these issues can be solved by actions that break laws (assuming performed by it's creator and/or user) resulting in zero utility and the AI waiting for outcomes of it's initial action(s) before deciding what to do next. There's also the consideration of pace for the AI to consider, stamps can only be used or 'enjoyed' so quickly, which should function as a good "chill" point for a maximizer.
@knightshousegames 5 років тому
I just had a thought. I entirely accept that this idea could backfire in some unimaginable way, but it's an interesting thought.
What if we built an AGI who's sole purpose was to stop other AGIs from running amok, by having it's terminal goal essentially being ensuring human survival, but only through preventing other AGIs from taking action that would threaten human survival. It would be aware of the actions of all other AGIs and would be able to take action to stop those AGIs if one were to develop a plan that would be harmful in some way.
It could effectively be our inside man, a sort of AGI Policing agent. AGIs can easily fool a human, but it's not gonna be able to fool another AGI so easily
@Lorkin32 5 років тому ⁺¹
You're explaining the solution to a problem i can't ever see possibly occurring to me as a computer engineer. Maybe that's my bad, maybe it's not.
@rcookie5128 5 років тому ⁺¹
So interesting how every approach seems fine at first sight but ends up with a definite or probable chance of causing the apocalypse.. :D
@neweins8864 5 років тому
I love your work. Keep doing it. I've just one question, isn't it very likely that superintelligent machines will most certainly find some flaw/loophole in our AI safety mechanism which we might not consider? By definition those machines are superintelligent.
@hello-ji7qj 4 роки тому
Great video. I love it, but too much for me when I'm trying to distract myself during breakfast.
@marcelosinico 5 років тому
The solution for this problem is quite simple: Fuzzy Logic.
Make an A.I. which is unaware of its primary function details. The only information about its primary objective is to "please mankind", and it can reach from 0 to 1 (obviously 1 will never be reached).
Everything else are secondary means to reach that hidden goal.
The machine will gravitate around the objective jumping from one task to another. When its maximization becomes a problem, the value of "mankind pleaseness" decreases, and so the secondary task must be aborted towards another better fit to fulfill the primary objective.
Dynamic Equilibrium.
@JinKee 5 років тому
There's a part of your brain called the subthalamic nucleus (a kind of basal ganglia) which is involved in OCD - it is the Satisficer structure in your head that tells you if you have completed a task. If the basal ganglia goes wrong you end up trying to complete a task over and over, forever.
@edskodevries 5 років тому
Thought provoking video as always!
@williamfrederick9670 5 років тому
This is my new favorite channel
@bryanroland8649 5 років тому ⁺¹
Don't try too hard? But I've still got the greatest enthusiasm and confidence in the mission.

Наступне

Автоматичне відтворення

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification