I can’t wait for 2045 Self Driving Patch Notes v2.6.7 - Road line distinguishing improved - Dynamic Weather Analysis added - Car will no longer slam the gas when it reads a school zone sign
Hardware hacking in 2016: brute force cut cpu power at precise startup intervals to bypass end-user mode, dump the bios to surreptitiously installed removable drive, decode using black market software tools, insert new code. Hardware hacking in 2036: Take a piece of paper, write {reset as root} on it. Wait for the camera. Give verbal commands.
Can't wait for the cyberpunk future where we all run around with giant bizarrely patterned sheets over ourselves so that the robocops think we're all bananas and won't report our crimes
Every time he uploads I think I'm super early because there's only a few thousand views. Then I remember that this channel is severely underappreciated and needs about 1000x the subscribers it has right now
I wouldn't say _unique;_ I can think of some other UA-camrs who do much the same thing. (Tom Scott is probably the best-known.) But it's certainly uncommon.
@@timothymclean I actually disagree. While Tom Scott is also a great creator (and by no means boring), he tends to only film in one location, explaining an interesting fact about a place or thing. James on the other hand, films at several different locations for one video, I find this very engaging and I can't think of any other educational youtubers who also do this. The locations he chooses are interesting and relevant, for instance in this video as he was talking about road signs, instead of just showing some b roll of one, he went to some and filmed in front of them.
@@hannahranga And that is why Australians fit Roobars to their cars in the outback (to protect the car radiators from impact). You hit a bull or camel and it goes through the windscreen.
Excellent intro to AI. As someone in this field, I have a few comments: 1. For detecting straight lines, the Hough line transform is the better, more efficient approach to use. 2. The RGB values of objects are too dependent on lighting conditions to be useful in most real-world situations. One solution is to convert colors to HSV space and only look at the hue component.
Tbh our brains aren't that much different from NN so they can also be confused similarly, look up deep dream images, they really mess with you when you look at them
@@Jtzkb I can see the Banana one, it's a grape of them seen from below, kind of. Most of these adversarial pictures are what the algorithm interprets as the subject from multiple angles, adversarial animals look very trippy also, seeming to have multiple faces each with a different angle
@@pedrolmlkzk not really. You see, seeing something is mostly about expectations. You can identify things because you have an idea about them. If i were to show you a picture with no context and expect you too see something you might not be able to see it. But if i were to tell you to look exactly for _that_ thing then you'd try to see that and might be able to see.
I saw movement among the words but didn't catch what they did. Did they flash several times? Disappear for a few seconds? Change font size? I couldn't tell you. I feel like what was done with the 2 words was referring to human ability or lack thereof.
Human “vision” includes a lot of understanding. Think about how hard it was to learn how to drive, even as an almost adult human. And how much concentration it takes to safely drive, especially in difficult and dangerous situations. Good luck with AI!
Learning to drive isn't hard at all though lol, most people are more than capable of driving within minutes of being put into the driver seat. The "hardest" part of driving is staying calm in stressful situations which an AI never has to worry about.
@@animusadvertere3371 My driving instructor said I was a better and safer driver than most other people on the road the very first time I drove a car. It depends on the human
I'd really like to know what the UA-cam algorithm's adversarial banana is so I could give James infinite recommendations by watching a specific set of videos for a specific amount of time :D
@alext3811 The response was about the cows. Which also are in America, and also escape from fenced areas too. Though usually that's because the fence is damaged or someone left a gate open. As someone who drives professionally and encounters animals on the road often, I thank those farmers who maintain their fences. I've made more than a few calls to the local sheriff's office about cattle on the side of the road (which is what you should do if you see any, btw).
A couple of key points that weren't covered here: These adversarial images are AI specific, in this case generated for Google's AI in particular. If you showed that shirt to a Tesla, it won't think you're a banana. Other major point, most AIs nowadays aren't actually built like this; more popular techniques include back-propogation, or gradient descent methods that are based more on mathematical theory than evolution like we see in nature.
I was impressed with your dedication to travelling to all the different filming locations around Crawley, Kings Park and West Perth. Great intro to the complexities of vision AI. I'll be sharing with my students :)
That array of stop signs triggered Sesame Street memories. "One of these things is not like the other. One of these things just isn't the same..." That round stop sign is one I've never seen. I've even seen home-made stops signs and they're at least somewhat similar to an octagon. One was not even red anymore or even had the word STOP on it due to weathering, and it still worked.
Hey James, great video as always! Just one small gripe from a somewhat experienced AI developer: while the process you describe at 7:47 is real, and has been used to train some neural networks for some tasks, it's not how any vision-oriented network that I know of is trained. What you described is a genetic algorithm, but most modern nets rely on some form of gradient descent and supervised learning. This process also starts with a random network that spits out gibberish, but rather than making random mutations and combining it with other ones, it uses only one network and makes small strategic adjustments to it in an attempt to minimize one (or many) values, called the loss. The loss is calculated after every step by comparing the network's output to the expected output, and we can then do some "backpropagation" to figure out how each weight would have to be adjusted in order to reach a result that's closer to the one we want. This is possible because we have images that are labeled (usually by an overworked and underpaid undergrad student) with the expected output, which allow us to nudge the network in the right direction. If we do this enough times for enough images, we should get a network that can reliably predict things within that dataset. Thus, the more diverse the data we have in our training dataset is, the better our network will be at dealing with previously unseen situations. You can even go one step further and do what's called "adversarial training", whereby you find these pictures that will trip up the network and intentionally include them in your training data, with the right labels of course, in an attempt to make the net more robust against them. Hope this helps!
In addition, most vision oriented neural networks start with a few convolutional and pooling layers. Multilayered perceptrons do work, but no where near as good compared to using image convolutions.
@@rickwilliams967 ???? clearly if someone's watching this video they think it's interesting and would probably like to know more accurate information from a specialist. i don't think you know how you're supposed to use that phrase.
A self driving car will never get distracted by their phone, drive drunk, be sleepy, or freak out when a bee gets into the car. Even if a self driving car can never reach the abilities of a human in ideal conditions, it is important to remember that humans almost never drive under ideal conditions
In reality they need to be much better than humans. We are irrational and if you had a 1 in a 1-million chance of being deliberately killed by a machine or a 1 in a 500-thousand chance of being accidentally killed by a human, many people would choose the later (at least subconsciously).
This was a great video! Very informative and you pulled a sneaky on us at the end; definitely a little more confident in self driving vehicles but more knowledgeable about it's limitations. Thanks!
Im glad at least one UA-cam takes the drop bear risk seriously. Too many of them think it's a joke, resulting in hundreds more deaths per year than necessary.
Neural nets don't (usually) get trained with genetic algorithms, buy with some form of a gradient descent learning algorithm. Genetic algorithms do get used for setting the parameters of that learning algorithm. Adversarial attacks only work on an specific trained network, and those same attacks could no longer work once the network is retrained. A lot of AI systems actually go through another round of training where they are shown a set of such adversarial attacks. After that, the network is less vulnerable to them, but at the cost of accuracy. In some cases it's actually safer to keep the adversarial attacks weakness, as those are way less likely than the situations in which you are giving up some accuracy.
Yeah, I expected him to give an oversimplified description of gradient descent ("but unlike with a series of steps, a computer can automatically tune these weights with a lot of math" or something), but a good explanation of the evolution method is fine by me.
This video really shows how easy it is to forget that we inherit some of our parents abilities and their parents abilities and so on, and the fact that our brain has been in development for millions of years by this point
I've never heard of neural networks being trained by generic algorithms, and never heard of such training affecting the number of layers and the number of nodes per layer (in your simple vs complex example where the simple is deemed more fit when the results are the same). Neural networks are typically trained by using "back propagation", which you never described in the video.
7:52 The process you explain here is not the normal commonly used approach to train neural networks, the normally used way would be using gradient descent (for supervised learning as in this case), what you explained is using genetic algorithm like NEAT, which are useful but not so much compared to gradient descent in this case.
Was looking through the comments to see if someone said this first. I am worried that most basic ML videos explain ML as if all NNs are trained with genetic algos.
there's also the issue that he never mentioned the impact of training data on results. changes to the structure of the neural network is also sometimes necessary, but many issues can be solved by providing more varied and elaborate training data, forcing the network to be more in line with what we want
An important note here is that adversarial patches are generated to trick the specific neural network which they were generated from. You cannot expect an adversarial patch from one neural network to generalize to other neural nets. It probably has less to do with the engineers improving the networks (which they certainly have done) but rather the fact that any change to the neural networks whatsoever would lead to a different set of adversarial patches needing to be generated to fool the updated network. TL;DR: The adversarial patch problem has not been 'solved' by Google engineers.
your intro is so good, "so it will think I am a banana and run me over" and "recuperate my university fee by committing insurance fraud" wow, 2 amazing lines in the first 35 seconds...
13:08 this is what would be referred to by certain people as the danger of "but sometimes". Most of the time, the AI can recognize a stopsign and stop, BUT SOMETIMES it doesn't. Failing yourself to recognize that it's far more likely for a human to either not pay attention and see the stopsign or to just choose to ignore it. Those are things that happen ALL the time. I'd trust my life to the AI every time.
It still feels so cool to see my own city and University represented on the science-y side of UA-cam. The super high quality of these videos is even cooler 😄
Discovered you today. Wow. Amazing. Exceptional quality, clear audio, easy to understand and a very joung talented boy. Hope i sre you grow, very well done
Why don’t we try and recognize parts/multiple objects individually? For an example we recognize stop signs because it is red, and has the letters S, T, O, P, in order
11:46 "We could also cross reference the government databases which store the location" of all the people in the country which we don't want to hit with a bus. I was amazed how far into that sentence my prediction remained accurate. The visuals even helped.
Wow! Amazing video as always! A thing that I wanna point out is that the probabilities shown at your experiment (12:00) decreases a lot when the Adversarial Patches are added. Google improved for sure its IA, however the Patches are still making an impact on the classification.
Glad I'm not the only one that noticed lol. Just imagine being at a resort and a kangaroo comes flying off a side hit in the trees and just knocks you out cold in the middle of a run lol.
2:30 that's actually really neat, and probably explains why we can "visualize" things in our head, or how the most vivid hallucinations are visual ones
So I am currently doing a research project in machine learning, and I noticed two issues in your explanation. The model that you described is a multilayer perceptron(MLP - built entirely of fully connected layers), and although they are capable of classifying images, they are no where near as good as convolutional neural networks (CNN - which are translationally invariant). Most image classifiers use a few convolutional and pooling layers, which is then passed to a few fully connected layers. Many tutorials use MLP for image recognition to teach fundamental theory, which is probably where you got the confusion from. The training method you described is reinforcement learning, and although this is a popular method for training models for other tasks, it is not great for training image recognition. A much more suitable training method for image classification is Adam optimization.
Fun video! A few corrections to keep in mind, though. 1.) Neural Networks used in vision and self-driving don't tend to use genetic algorithms (the evolving style you mentioned here). Not only that, but even if they did use genetic algorithms, it would use a NEAT-like algorithm, which starts with a sparse or empty network which slowly gains neurons through mutation. No, most computer vision use tried and true backpropagation methods, like Stochastic Gradient Descent (SGD) where the weights of the various neurons are corrected by comparing the output of the net to some target value, and adjusting the weights based off of that difference and a pre-determined (or adaptive) learning rate. 2.) The issue of adversarial attacks isn't just a matter of network complexity. In fact, a paper a few months back even found that simpler networks tended to do better, because small incorrect regions had a lesser impact on decision making. It's sort of like displaying an image on different resolution screens, with a higher-res one able to pick out more details, but also more likely to notice errors. On the lower resolution screen, you can't tell the difference. Obviously, that comes with its own pitfalls, but the point is that adversarial attacks predominantly work against specific forms of vision, and often exploit specific shortcomings (such as interpreting the blob of colors as a toaster, because it hits all the same buttons as the toaster). 3.) Most forms of self-driving vision (and controls) are different. Tesla uses a segmented neural network (with each segment helping identify specific items within the world) using a shared input, while Comma AI uses a more end-to-end design, and Waymo just uses Lidar and can only work within specific pre-mapped areas. While Tesla and Comma AI both use Neural Networks IIRC, different attacks would likely be required. 4.) The best way to stop adversarial attacks is to feed the network enough data that its generalizations are....well, accurate generalizations. Give it noise, different perspectives, lighting, everything. Essentially train it to the point it's not using a short-cut interpretation, but rather a more robust, almost human-equivalent interpretation. As a black box, though, it's hard to know when enough is enough. Thankfully most self-driving projects still have redundancies. :P
“Once, men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them.” ― Frank Herbert, Dune
Considering how few of us need to farm and do menial labor compared to the old days, I would say it has set us free. If not for everything the technological revolution brought, I would likely be a farmer working 12+ hour days 7 days a week Thankfully I only have to work 8 hour shifts and make more than just enough to survive
I remember when my National Geographic Kids magazine in 2005 or so predicted we’d have self-driving cars perfected (as well as color-changing clothes that we can tell our mirror to switch), but I don’t think those writers understood how woefully complex AI could be back then...
Sup fellow Perther! i live in the hills (kalamunda) and am really fasinated by ur work. i hope to work at UWA under Chemestry one day and you are a real insparation
Great video, although I'm curious as to why you chose to use a genetic algorithm to train the network in your example. The typical training method is back-propagation, which works entirely differently. What was the reason for picking GA over backprop?
I was thinking the same thing, and was also surprised by there being no mention of deep learning in the context of image recognition. Also AFAIK the reason the adversarial patches did not work on the other nets is because adversarial images are tailored to a single neural net, not because engineers are constantly updating their nets to keep up with the latest batch of adversarial images. Both the vehicle and the example he made likely used both a different training algorithm and different data, which made the images not work on them.
I double checked the paper and apparently these attacks do generalize to an extent to unseen models, though it's not entirely clear from the paper under which circumstances they will/will not generalize well.
@@Vincent89297 The networks they would generalise to (if they do) would be the ones trying to detect the same type of objects. A self driving car is not going to be trained to recognize bananas, so wouldn't be fooled by an adversarial banana patch. Also: camera resolution, at 10m from the car, it is doubtful the resolution is good enough for a patch like that to work either way.
@@Frank01985 Right, I hadn't even considered that. Of course if a network does not have a toaster category then a toaster patch is going to do nothing...
Fascinating video! Thank you. I knew nothing about this topic coming into the video and left feeling like I genuinely gained a broader understanding. Much appreciated, watch out for buses! :)
wait australia has (40% of 800,000 km) 320,00 km of paved road? the netherlands has 140,000 km of paved road! and australia is 185 times as massive. I know a lot of australia is outback and stuff but still that is mind blowing.
I don't think you understand the level of empty the outback is. There are single owner farms more than half the size of the Netherlands, and they're in the 'populated' areas.
@2:20 I work in the field of machine learning and computer vision and I never heard this explanation for human's big brains before. Will totally be starting every public speaking opportunity with that explanation going forward.
8:34 what you're describing is a genetic algorithm, which while could be applied to neural networks, I don't think is that common? Usually it's gradient descent, i.e., for each weight taking the partial derivative to determine if the output would be slightly more or less accurate if the given weight increased or decreased.
Batch processing of several such operations in parallel, and then combining the results in some way (taking the best, weighted average, etc), can be thought of a little bit like a genetic algorithm, though.
okay but i want to say that you gave the simplest and yet most understandable breakdown of neural networking i've ever head and i am extremely pleased by that
Absolutely fascinating. My monkey brain caught typo of "neutal" net (time code 6:35) like those banana identifiers. Then I thought you had planted it for fun - just so I couldn't sleep. Our world needs amazing minds like yours to keep our roads safe. Also, please explain "drop bears", saw it on Bluey with my kids and still not sure wtf.
0:58 I only looked over at the cars for the first time at this point and my brain assumed they had deliberately orchestrated them to all be the same colour!
You should read about generative adversarial networks or GANs for short. It's a technique that aims to avoid overly simplistic criteria for identifying things by training not just the classifier network, but also an adversarial network that's being trained for the precise purpose of trying to fool the classifier and using that to train the classifier to not get fooled so easily. A lot of the more impressive advances in AIs seem to be done using GANs lately. Also, it's worth noting that programmers working on AIs aren't working on patching individual errors, but rather looking for ways to improve the training process so that it's the AI that learns how to overcome them.
This was a really well made and informative video. But there is one issue where the stop sign suddenly becomes a 45 speed limit sign, other than that, this was a great video.
Bobtail is a cute name for an animal, but knowing it is Austria, it would probably murder me in a wood chipper because I rang my bike bell too loud that one time 6 years ago.
4:50 Hough transform does that finger in the nose. And I improved it to recognise any shape, so you can teach it to recognise lines, squares, or kangourous ... Its as simple as a convolution filter. If you find that the object recognised has a very good probability ratio, then just recompute your filter form the new image to adapt to its changes. This allows to follow in 2d an object moving in 3d ( including adapting to a tracking camera, or an item turning on itself ).
I'd be fine with self-driving cars - so long at the drop of a hat I can manually take control of the car regardless of what the AI wants to do, at any time.
i think for at least the forseeable future i can see that being a law or rule. Its not the same as self-driving but a lot of cars have cruse control, but even though your car is locked in a set speed you can still turn it of in seconds, and adjust the speed yourself too. at least that's how my car works. I'd imagine they would have similar things put in a car even if the car could do 100% of the stuff itself
@@zeronpeat3407 I could see how maybe some self-control things could maybe be put on the manufacture. but if self-control cars every do go wide spread there will probably the way they work would be well known enough that there's certain things you can't do. Like i'd have to assume it would still be illegal to use a phone behind the wheel or if you flat out fell asleep in your car. If you mean by FULLY self driving then maybe but by "self driving cars" most people are talking about cars that can drive themselves, but they still have a person behind the wheel. We're very far off from having thousands of cars on the road that have no drivers in them what-so-ever.
The government would be like "Naw, humans are too dumb and make too many errors, I am now making it a federal crime to get behind the wheel of a vehicle" I mean they are essentially pushing us towards that anyways. We will literally be slaves to AI in the near future, because the average joe and jill dont understand how cheating daily activities will always have consequences. Letting your car drive you to work in the morning could get you killed or rack up nasty traffic violations. Its like forgetting to brush your teeth, you get cavities. Its like that show Upload where the guy dies in a car accident because of the self driving car because it has serious AI issues.
4:00 Speaking of international recognition of signs, I recognize that "today's fire danger level" sign in the background. Greetings from Utah, US. Keep up the good work.
That vehicle isnt stopping due to any proximity sensors, its just intimidated by the almighty levitating banana.
It's also not looking for just the top thing. I'm pretty sure it sees him as a lack of flat road with a banana in, possibly as having a banana shirt.
It’s played Mario Kart. It knows what’s up
"There is no way in hell I'm fitting through under that"
The almighty *giant* levitating banana!
O H . . . B A N A N A !
"Before I recuperate my university fees by committing insurance fraud." Classic
Even as a kid that went to UWA, his fees aren't that much. This is Australia remember, plus there is basically no interest.
we all saw the video
Good thing about telling someone over and over that you'll do something with no objections is when you finally do can they really get mad at you?
I read this just as he said it.
Ah, the Australian spirit is strong in this one
I can’t wait for 2045
Self Driving Patch Notes v2.6.7
- Road line distinguishing improved
- Dynamic Weather Analysis added
- Car will no longer slam the gas when it reads a school zone sign
Gas?
@@brodies2494 more like throttle pedal
GAS GAS GAS
Car will no longer deliberately hit giant bananas.
-Removed Herobrine
Because you know someone will make that joke in the future.
The funniest adversarial attack I have ever seen is: a piece of paper with 'iPhone' written on it, incorrectly identified as an iPhone.
Hardware hacking in 2016: brute force cut cpu power at precise startup intervals to bypass end-user mode, dump the bios to surreptitiously installed removable drive, decode using black market software tools, insert new code.
Hardware hacking in 2036: Take a piece of paper, write {reset as root} on it. Wait for the camera. Give verbal commands.
A.I. "It's not my fault! The paper lied to me! You would never lie to me, would you Master Programmer?"
@@gorkyd7912 Once we develop true AI and replace all menial tasks with it, all hacking will essentially be social engineering.
😂
@@nightsong81 Most hacking is already social engineering.
Imagine walking next to this guy and hear “this car thinks I’m a banana, so it’s going to run me over”
69th like
149th like
@@lailoutherand that was a year ago bro
@@tinycup45 184th like
@@puginator1612 you guys dont know the meme do you
Can't wait for the cyberpunk future where we all run around with giant bizarrely patterned sheets over ourselves so that the robocops think we're all bananas and won't report our crimes
That sounds like an awesome plot for wacky "stealth patterns"
You could even call it dazzle camouflage
And then they halt important or growing bananas because they commit too many crimes.
Or even make their ai crash by using an exploit which caused an infinite loop
@@lztx dazzleflage
The unique thing about this guy is the many on screen graphics and varied filming locations that just make his videos 10x more interesting!
Thanks! Keeps me out the house :)
@@AtomicFrontier You can't fool me! Roo's don't ski! Only yowies do.
Every time he uploads I think I'm super early because there's only a few thousand views. Then I remember that this channel is severely underappreciated and needs about 1000x the subscribers it has right now
I wouldn't say _unique;_ I can think of some other UA-camrs who do much the same thing. (Tom Scott is probably the best-known.) But it's certainly uncommon.
@@timothymclean I actually disagree. While Tom Scott is also a great creator (and by no means boring), he tends to only film in one location, explaining an interesting fact about a place or thing. James on the other hand, films at several different locations for one video, I find this very engaging and I can't think of any other educational youtubers who also do this. The locations he chooses are interesting and relevant, for instance in this video as he was talking about road signs, instead of just showing some b roll of one, he went to some and filmed in front of them.
As a human, even after recognising a kangaroo, I still have no idea what it is going to do. They can, and do change direction mid jump.
Change direction to the nearest ARB to buy a bullbar?
@@hannahranga no, those work for thirsty bulls. Bulls are mean and don't allow kangaroos to sit at there bar.
@@hannahranga And that is why Australians fit Roobars to their cars in the outback (to protect the car radiators from impact). You hit a bull or camel and it goes through the windscreen.
Yeah but at least they're tasty.
You install a ram bar
Excellent intro to AI. As someone in this field, I have a few comments:
1. For detecting straight lines, the Hough line transform is the better, more efficient approach to use.
2. The RGB values of objects are too dependent on lighting conditions to be useful in most real-world situations. One solution is to convert colors to HSV space and only look at the hue component.
He did ultimately go with a different approach, but good points
"This pattern should confuse it enough into thinking I'm a banana."
This seems like a good channel
That first blobby picture does look like a toaster though, at least that's what I immediately picked up from seeing it in my peripheral vision
Tbh our brains aren't that much different from NN so they can also be confused similarly, look up deep dream images, they really mess with you when you look at them
@@Jtzkb I can see the Banana one, it's a grape of them seen from below, kind of.
Most of these adversarial pictures are what the algorithm interprets as the subject from multiple angles, adversarial animals look very trippy also, seeming to have multiple faces each with a different angle
@@Raren789 our brains are really different from a neural network
@@pedrolmlkzk not really. You see, seeing something is mostly about expectations. You can identify things because you have an idea about them. If i were to show you a picture with no context and expect you too see something you might not be able to see it. But if i were to tell you to look exactly for _that_ thing then you'd try to see that and might be able to see.
@@pedrolmlkzk our brains are just nature's computers. Our neurons even use electricity to communicate.
"Or can spot a lion, hiding away in the long grasses"
Meanwhile the safe and unsafe switch sides.
Those berries are sneaky bastards.
I saw that safe and unsafe switch and I never thought anything of it until this comment
I was real darn confused when the holly berries were labelled as "safe." I don't recognize the other berries though, they could both be poisonous.
I saw movement among the words but didn't catch what they did. Did they flash several times? Disappear for a few seconds? Change font size? I couldn't tell you. I feel like what was done with the 2 words was referring to human ability or lack thereof.
I'm partial to (at 2:49) standing next to a give way sign and showing a bunch of stops signs
I really hope you continue this channel after you graduate. You’re a natural.
Thanks! As long as I keep finding cool things we'll keep making cool videos!
@@Jtzkb Same. :)
STOP SIGN: “DUR”
Me: yeah, Dur it’s a stop sign.
@@AtomicFrontier your a natural.
An all-natural banana.
Human “vision” includes a lot of understanding. Think about how hard it was to learn how to drive, even as an almost adult human. And how much concentration it takes to safely drive, especially in difficult and dangerous situations. Good luck with AI!
Learning to drive isn't hard at all though lol, most people are more than capable of driving within minutes of being put into the driver seat. The "hardest" part of driving is staying calm in stressful situations which an AI never has to worry about.
@@Outwardpd not safely
Admittedly true, but I also never had a grey blob next to my banana and thought I was looking a toaster, so the analogy probably isn't great.
@@animusadvertere3371 My driving instructor said I was a better and safer driver than most other people on the road the very first time I drove a car. It depends on the human
"The book is still a book"
Screen shows clock and alarm clock as most likely answers as to whats in the image.
Just posting a comment for the algorithm. I really want to see this channel grow.
🍌
🥵
some more random engagement
I'd really like to know what the UA-cam algorithm's adversarial banana is so I could give James infinite recommendations by watching a specific set of videos for a specific amount of time :D
Worked for me!
10:49 - "Classified as the pure essence of a toaster"
By the Omnissiah, this is making me harder than terminator armor.
“If the impact doesn’t kill you, the farmer will”
Given how fond of ice cream I am, the farmer sounds pretty understandable to me.
"Learn to build a fence idiot." They've only been around for thousands of years.
@@andfriends11 ... You know they can jump over them.
@alext3811 Had to rewatch this video since it's been 2 years since I commented.
Then you didn't build a big enough fence. Electric fences work, too.
@@andfriends11 Yeah. I'm American so the most I've had to worry about is deer and maybe foxes.
@alext3811 The response was about the cows. Which also are in America, and also escape from fenced areas too. Though usually that's because the fence is damaged or someone left a gate open.
As someone who drives professionally and encounters animals on the road often, I thank those farmers who maintain their fences. I've made more than a few calls to the local sheriff's office about cattle on the side of the road (which is what you should do if you see any, btw).
A couple of key points that weren't covered here: These adversarial images are AI specific, in this case generated for Google's AI in particular. If you showed that shirt to a Tesla, it won't think you're a banana. Other major point, most AIs nowadays aren't actually built like this; more popular techniques include back-propogation, or gradient descent methods that are based more on mathematical theory than evolution like we see in nature.
You have a ton of potential, James. This channel is a hidden gem, I can see you becoming the next VSauce.
Good content. He needs a spellchecker first, though.
but will he be as bald
@@shaolinshoppe It's a definite possibility - give him time, he's young.
As a fellow Perthian, its been a hoot trying to figure out where each of these shots were filmed!
You should write a list and run tours!
Ditto!
I drive through all those areas on my way to uwa lol
I was impressed with your dedication to travelling to all the different filming locations around Crawley, Kings Park and West Perth. Great intro to the complexities of vision AI. I'll be sharing with my students :)
Thanks Paul! Let me know how it goes!
12:39 "The book, still a book"
Pretty sure that's an alarm clock
The neural net in his head is clearly poorly trained, if he looks at that alarm clock and sees a book
It's an iphone 12 with Minecraft on it!! 1!
That array of stop signs triggered Sesame Street memories.
"One of these things is not like the other. One of these things just isn't the same..."
That round stop sign is one I've never seen. I've even seen home-made stops signs and they're at least somewhat similar to an octagon. One was not even red anymore or even had the word STOP on it due to weathering, and it still worked.
AI Recognition Software:
Bro it's fine, it's just a banana. Just go.
Proximity Sensor:
If it's a banana it's a HUGE BANANA oh my god STOP
I'm so happy Tom Scott promoted you! Great content! :)
If you've never appeared on Tom Scott, it might take extra 2 years for the algorithm to get me to you.
he did
It did take me 2 more years, on the other hand...
Hey James, great video as always!
Just one small gripe from a somewhat experienced AI developer: while the process you describe at 7:47 is real, and has been used to train some neural networks for some tasks, it's not how any vision-oriented network that I know of is trained. What you described is a genetic algorithm, but most modern nets rely on some form of gradient descent and supervised learning.
This process also starts with a random network that spits out gibberish, but rather than making random mutations and combining it with other ones, it uses only one network and makes small strategic adjustments to it in an attempt to minimize one (or many) values, called the loss. The loss is calculated after every step by comparing the network's output to the expected output, and we can then do some "backpropagation" to figure out how each weight would have to be adjusted in order to reach a result that's closer to the one we want. This is possible because we have images that are labeled (usually by an overworked and underpaid undergrad student) with the expected output, which allow us to nudge the network in the right direction. If we do this enough times for enough images, we should get a network that can reliably predict things within that dataset.
Thus, the more diverse the data we have in our training dataset is, the better our network will be at dealing with previously unseen situations. You can even go one step further and do what's called "adversarial training", whereby you find these pictures that will trip up the network and intentionally include them in your training data, with the right labels of course, in an attempt to make the net more robust against them.
Hope this helps!
In addition, most vision oriented neural networks start with a few convolutional and pooling layers. Multilayered perceptrons do work, but no where near as good compared to using image convolutions.
One way the networks are trained is through captchas that humans have to solve to verify they're actually human
@@ahetsame
Don't think anyone asked, but okay.
@@rickwilliams967 ???? clearly if someone's watching this video they think it's interesting and would probably like to know more accurate information from a specialist. i don't think you know how you're supposed to use that phrase.
The main take away i got from this, is that we can make an image, that is the quintessential ultimate integral essence if a toaster
Blown away by the production on this video and the content. This kid’s got a future (and the team behind the scenes)!
Thanks! Nope, it's just me and my dad (who does the music and any of the camera work that looks decent)
"The book is still a book"
AI: *C L O C K*
Was looking for this comment xD
"They don't need to be perfect. They just need to be better than humans."
A self driving car will never get distracted by their phone, drive drunk, be sleepy, or freak out when a bee gets into the car. Even if a self driving car can never reach the abilities of a human in ideal conditions, it is important to remember that humans almost never drive under ideal conditions
I think this will be an extremely easy accomplishment in retrospect .
@@generalcodsworth4417 It should be noted that while this is true of the average human, the average human rarely sees itself as an average human.
That's not hard.
In reality they need to be much better than humans. We are irrational and if you had a 1 in a 1-million chance of being deliberately killed by a machine or a 1 in a 500-thousand chance of being accidentally killed by a human, many people would choose the later (at least subconsciously).
The question is, can I make an AI take over the channel for me? And would anyone notice if I did?
maybe
On it
from toms video, currently yes. in a few years. mabye. in a decade, probably not.
no
we wouldnt notice
I don't think AI is yet sophisticated enough to replicate what you look like enough to fake a full length video of "outdoor filming.
This was a great video! Very informative and you pulled a sneaky on us at the end; definitely a little more confident in self driving vehicles but more knowledgeable about it's limitations. Thanks!
Im glad at least one UA-cam takes the drop bear risk seriously. Too many of them think it's a joke, resulting in hundreds more deaths per year than necessary.
Neural nets don't (usually) get trained with genetic algorithms, buy with some form of a gradient descent learning algorithm. Genetic algorithms do get used for setting the parameters of that learning algorithm.
Adversarial attacks only work on an specific trained network, and those same attacks could no longer work once the network is retrained. A lot of AI systems actually go through another round of training where they are shown a set of such adversarial attacks. After that, the network is less vulnerable to them, but at the cost of accuracy. In some cases it's actually safer to keep the adversarial attacks weakness, as those are way less likely than the situations in which you are giving up some accuracy.
He oversimplified quite a lot, but I think it's well adjusted to most of the audience.
Yeah, I expected him to give an oversimplified description of gradient descent ("but unlike with a series of steps, a computer can automatically tune these weights with a lot of math" or something), but a good explanation of the evolution method is fine by me.
From Breakthrough Junior Challenge Finalist to this - Congrats James!
Thanks for joining me! Its been quite a journey
This video really shows how easy it is to forget that we inherit some of our parents abilities and their parents abilities and so on, and the fact that our brain has been in development for millions of years by this point
This feels so old compared to our modern neural image recognition systems.
This video/production quality was incredible, I was fully expecting you to have over a million subscribers, keep up the great work!
The "talking banana" angle is an interesting direction for the channel, but I think it has potential going forward.
I've never heard of neural networks being trained by generic algorithms, and never heard of such training affecting the number of layers and the number of nodes per layer (in your simple vs complex example where the simple is deemed more fit when the results are the same).
Neural networks are typically trained by using "back propagation", which you never described in the video.
Not only that, but most image classification models in practice make use of convolutional layers first.
7:52 The process you explain here is not the normal commonly used approach to train neural networks, the normally used way would be using gradient descent (for supervised learning as in this case), what you explained is using genetic algorithm like NEAT, which are useful but not so much compared to gradient descent in this case.
Was looking through the comments to see if someone said this first. I am worried that most basic ML videos explain ML as if all NNs are trained with genetic algos.
there's also the issue that he never mentioned the impact of training data on results. changes to the structure of the neural network is also sometimes necessary, but many issues can be solved by providing more varied and elaborate training data, forcing the network to be more in line with what we want
"Pure essence of toaster" is not a string of words I'd ever thought I'd hear.
Sounds like a new dystopian cyberpunk perfume
An important note here is that adversarial patches are generated to trick the specific neural network which they were generated from. You cannot expect an adversarial patch from one neural network to generalize to other neural nets. It probably has less to do with the engineers improving the networks (which they certainly have done) but rather the fact that any change to the neural networks whatsoever would lead to a different set of adversarial patches needing to be generated to fool the updated network.
TL;DR: The adversarial patch problem has not been 'solved' by Google engineers.
Love to see a good UA-cam channel growing.
I am so happy to finally find a channel that is aware of the need to educate visitors on the dangers of dropbears!
your intro is so good, "so it will think I am a banana and run me over" and "recuperate my university fee by committing insurance fraud" wow, 2 amazing lines in the first 35 seconds...
13:08 this is what would be referred to by certain people as the danger of "but sometimes". Most of the time, the AI can recognize a stopsign and stop, BUT SOMETIMES it doesn't. Failing yourself to recognize that it's far more likely for a human to either not pay attention and see the stopsign or to just choose to ignore it. Those are things that happen ALL the time. I'd trust my life to the AI every time.
2:55
Why is my ad-blocker a stop sign now?
It still feels so cool to see my own city and University represented on the science-y side of UA-cam. The super high quality of these videos is even cooler 😄
Good job with the not-voice over!
Really liked your personal little experiment in the end, instead to just talk about the news headline and leave it there. GJ, as always ;)
Thanks! I wasn't origionally intending on having it but then found out there was a python API and just had to give it a go!
1:08 actually using gps and Galileo you absolutely do know what part of the road you are on.
Regardless of the information within this video, I was most impressed that there was not a single jump-cut. Well done. Excellent work.
Discovered you today. Wow. Amazing. Exceptional quality, clear audio, easy to understand and a very joung talented boy. Hope i sre you grow, very well done
From one aussie to another: You're a bloody legend mate! Fantastic videos!
I just discovered this channel, and I already love it. It’s like a combo of Tom Scott and Fact Fiend, two of my favorite creators!
Why don’t we try and recognize parts/multiple objects individually? For an example we recognize stop signs because it is red, and has the letters S, T, O, P, in order
Yeah
11:46 "We could also cross reference the government databases which store the location" of all the people in the country which we don't want to hit with a bus.
I was amazed how far into that sentence my prediction remained accurate. The visuals even helped.
The audio is so, so much better in this video! Really great improvement.
Wow! Amazing video as always!
A thing that I wanna point out is that the probabilities shown at your experiment (12:00) decreases a lot when the Adversarial Patches are added.
Google improved for sure its IA, however the Patches are still making an impact on the classification.
5:58 Just gotta love the Kangaroo skiing in the bottom right corner
I wasn't completely sure that's what I saw until now.
Glad I'm not the only one that noticed lol. Just imagine being at a resort and a kangaroo comes flying off a side hit in the trees and just knocks you out cold in the middle of a run lol.
2:30 that's actually really neat, and probably explains why we can "visualize" things in our head, or how the most vivid hallucinations are visual ones
So I am currently doing a research project in machine learning, and I noticed two issues in your explanation.
The model that you described is a multilayer perceptron(MLP - built entirely of fully connected layers), and although they are capable of classifying images, they are no where near as good as convolutional neural networks (CNN - which are translationally invariant). Most image classifiers use a few convolutional and pooling layers, which is then passed to a few fully connected layers. Many tutorials use MLP for image recognition to teach fundamental theory, which is probably where you got the confusion from.
The training method you described is reinforcement learning, and although this is a popular method for training models for other tasks, it is not great for training image recognition. A much more suitable training method for image classification is Adam optimization.
Fun video! A few corrections to keep in mind, though.
1.) Neural Networks used in vision and self-driving don't tend to use genetic algorithms (the evolving style you mentioned here). Not only that, but even if they did use genetic algorithms, it would use a NEAT-like algorithm, which starts with a sparse or empty network which slowly gains neurons through mutation. No, most computer vision use tried and true backpropagation methods, like Stochastic Gradient Descent (SGD) where the weights of the various neurons are corrected by comparing the output of the net to some target value, and adjusting the weights based off of that difference and a pre-determined (or adaptive) learning rate.
2.) The issue of adversarial attacks isn't just a matter of network complexity. In fact, a paper a few months back even found that simpler networks tended to do better, because small incorrect regions had a lesser impact on decision making. It's sort of like displaying an image on different resolution screens, with a higher-res one able to pick out more details, but also more likely to notice errors. On the lower resolution screen, you can't tell the difference. Obviously, that comes with its own pitfalls, but the point is that adversarial attacks predominantly work against specific forms of vision, and often exploit specific shortcomings (such as interpreting the blob of colors as a toaster, because it hits all the same buttons as the toaster).
3.) Most forms of self-driving vision (and controls) are different. Tesla uses a segmented neural network (with each segment helping identify specific items within the world) using a shared input, while Comma AI uses a more end-to-end design, and Waymo just uses Lidar and can only work within specific pre-mapped areas. While Tesla and Comma AI both use Neural Networks IIRC, different attacks would likely be required.
4.) The best way to stop adversarial attacks is to feed the network enough data that its generalizations are....well, accurate generalizations. Give it noise, different perspectives, lighting, everything. Essentially train it to the point it's not using a short-cut interpretation, but rather a more robust, almost human-equivalent interpretation. As a black box, though, it's hard to know when enough is enough.
Thankfully most self-driving projects still have redundancies. :P
“Once, men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them.”
― Frank Herbert, Dune
banana
Your mother
Those damn machines trying to tell me what is and isn't a banana! Revolt!
Considering how few of us need to farm and do menial labor compared to the old days, I would say it has set us free.
If not for everything the technological revolution brought, I would likely be a farmer working 12+ hour days 7 days a week
Thankfully I only have to work 8 hour shifts and make more than just enough to survive
@@dustinjames1268 You clearly don't understand how wealth is created.
I remember when my National Geographic Kids magazine in 2005 or so predicted we’d have self-driving cars perfected (as well as color-changing clothes that we can tell our mirror to switch), but I don’t think those writers understood how woefully complex AI could be back then...
Futurists have been predicting that strong AI is only twenty years away for almost a century. :P
5:16 I'm sure someone else has pointed this out but that mix of RGB would be cyan/aqua, not yellow.
Sup fellow Perther! i live in the hills (kalamunda) and am really fasinated by ur work. i hope to work at UWA under Chemestry one day and you are a real insparation
"Sometimes a banana is just a banana, Anna"
-- Sigmund Freud
Great video, although I'm curious as to why you chose to use a genetic algorithm to train the network in your example. The typical training method is back-propagation, which works entirely differently. What was the reason for picking GA over backprop?
I was thinking the same thing, and was also surprised by there being no mention of deep learning in the context of image recognition. Also AFAIK the reason the adversarial patches did not work on the other nets is because adversarial images are tailored to a single neural net, not because engineers are constantly updating their nets to keep up with the latest batch of adversarial images. Both the vehicle and the example he made likely used both a different training algorithm and different data, which made the images not work on them.
I double checked the paper and apparently these attacks do generalize to an extent to unseen models, though it's not entirely clear from the paper under which circumstances they will/will not generalize well.
@@Vincent89297 The networks they would generalise to (if they do) would be the ones trying to detect the same type of objects. A self driving car is not going to be trained to recognize bananas, so wouldn't be fooled by an adversarial banana patch. Also: camera resolution, at 10m from the car, it is doubtful the resolution is good enough for a patch like that to work either way.
@@Frank01985 Right, I hadn't even considered that. Of course if a network does not have a toaster category then a toaster patch is going to do nothing...
Ha loved the "Dingley road" easter egg. Great video!
Fascinating video! Thank you. I knew nothing about this topic coming into the video and left feeling like I genuinely gained a broader understanding. Much appreciated, watch out for buses! :)
This video was very interesting, mostly because I live next to almost every shot in the video! Perth for the win!
1:35 Just gotta love the Swiss cheese building behind him
Great video. Now get some more coffee and do your lit review / finish your thesis.
On it!
wait australia has (40% of 800,000 km) 320,00 km of paved road? the netherlands has 140,000 km of paved road! and australia is 185 times as massive. I know a lot of australia is outback and stuff but still that is mind blowing.
I don't think you understand the level of empty the outback is. There are single owner farms more than half the size of the Netherlands, and they're in the 'populated' areas.
@2:20 I work in the field of machine learning and computer vision and I never heard this explanation for human's big brains before. Will totally be starting every public speaking opportunity with that explanation going forward.
Welcome back Tom Scott
8:34 what you're describing is a genetic algorithm, which while could be applied to neural networks, I don't think is that common? Usually it's gradient descent, i.e., for each weight taking the partial derivative to determine if the output would be slightly more or less accurate if the given weight increased or decreased.
Batch processing of several such operations in parallel, and then combining the results in some way (taking the best, weighted average, etc), can be thought of a little bit like a genetic algorithm, though.
Wow, never heard of the bobtail before. Australia has some truly weird animals.
Just avoid the drop bears...
@@AtomicFrontier ye avoid them when your traveling to australia.
Something about this gives off such a strong vibe of parody.
the accent
How do you not have more subs?! This channel is great
okay but i want to say that you gave the simplest and yet most understandable breakdown of neural networking i've ever head and i am extremely pleased by that
Absolutely fascinating. My monkey brain caught typo of "neutal" net (time code 6:35) like those banana identifiers. Then I thought you had planted it for fun - just so I couldn't sleep.
Our world needs amazing minds like yours to keep our roads safe. Also, please explain "drop bears", saw it on Bluey with my kids and still not sure wtf.
It’s an Australia thing.
I don’t know either.
Not real. It’s a running joke of sorts, meant to scare tourists
0:58 I only looked over at the cars for the first time at this point and my brain assumed they had deliberately orchestrated them to all be the same colour!
If you look at it carefully enough it actually does look like a psychedelic toaster
You should read about generative adversarial networks or GANs for short. It's a technique that aims to avoid overly simplistic criteria for identifying things by training not just the classifier network, but also an adversarial network that's being trained for the precise purpose of trying to fool the classifier and using that to train the classifier to not get fooled so easily. A lot of the more impressive advances in AIs seem to be done using GANs lately.
Also, it's worth noting that programmers working on AIs aren't working on patching individual errors, but rather looking for ways to improve the training process so that it's the AI that learns how to overcome them.
This was a really well made and informative video. But there is one issue where the stop sign suddenly becomes a 45 speed limit sign, other than that, this was a great video.
7:23 computer: this is clearly the letter A
Me, the product of billions of years of evolution:
amugus
"Hm, at first I thought this was a human, but it is pure banana."
Bobtail is a cute name for an animal, but knowing it is Austria, it would probably murder me in a wood chipper because I rang my bike bell too loud that one time 6 years ago.
AUSTRIA?!?!
@@macaroon_nuggets8008 yeah? Austria, land of kangaroos and bloodlusting magpies
@@Azivegu AHHHHHHHHHHHH
@@Azivegu Australia
@@HercadosP That is in the alps silly goose
4:50 Hough transform does that finger in the nose. And I improved it to recognise any shape, so you can teach it to recognise lines, squares, or kangourous ...
Its as simple as a convolution filter.
If you find that the object recognised has a very good probability ratio, then just recompute your filter form the new image to adapt to its changes. This allows to follow in 2d an object moving in 3d ( including adapting to a tracking camera, or an item turning on itself ).
Extremely high quality content. I felt as if I was watching something from 90’s/00’s Discovery Channel in 4K
The gravitas and credulity of Michio Kaku or Neil DeGrasse Tyson in the body of an undergrad. Two thumbs up, sir! Your videos are excellent!
I'd be fine with self-driving cars - so long at the drop of a hat I can manually take control of the car regardless of what the AI wants to do, at any time.
As long as your car didn't learn from playing GTA games you should be relatively fine.
Reminds me of the film Upgrade
i think for at least the forseeable future i can see that being a law or rule. Its not the same as self-driving but a lot of cars have cruse control, but even though your car is locked in a set speed you can still turn it of in seconds, and adjust the speed yourself too. at least that's how my car works. I'd imagine they would have similar things put in a car even if the car could do 100% of the stuff itself
@@zeronpeat3407 I could see how maybe some self-control things could maybe be put on the manufacture. but if self-control cars every do go wide spread there will probably the way they work would be well known enough that there's certain things you can't do. Like i'd have to assume it would still be illegal to use a phone behind the wheel or if you flat out fell asleep in your car.
If you mean by FULLY self driving then maybe but by "self driving cars" most people are talking about cars that can drive themselves, but they still have a person behind the wheel. We're very far off from having thousands of cars on the road that have no drivers in them what-so-ever.
The government would be like "Naw, humans are too dumb and make too many errors, I am now making it a federal crime to get behind the wheel of a vehicle" I mean they are essentially pushing us towards that anyways. We will literally be slaves to AI in the near future, because the average joe and jill dont understand how cheating daily activities will always have consequences. Letting your car drive you to work in the morning could get you killed or rack up nasty traffic violations. Its like forgetting to brush your teeth, you get cavities. Its like that show Upload where the guy dies in a car accident because of the self driving car because it has serious AI issues.
"neutal" network at 6:33
Came here to comment that. Glad someone was already on it. :p
I’m waiting to see this guy on science channel or discovery commentating or hosting. Love the vids!
4:00 Speaking of international recognition of signs, I recognize that "today's fire danger level" sign in the background. Greetings from Utah, US. Keep up the good work.