When I first watched the neural-network vids on computerphile, I didn't know what a neural network was, much less a CNN. Now, I've had to learn so much machine learning for my job that I know exactly what the next video is going to contain. Won't stop me watching it though
1:54 When I first looked into cnn I couldn’t understand why applying 32 filters to a 3 colour channel image would not result in 32 * 3 convoluted layers but rather 32. That « hidden dimension » explains a lot of things thanks.
Dr Mike Pound if you are reading this, PLEASE we need and demand more content featuring your explanations. Please come on Computerphile atleast monthly, and talk about the weather i dont care. Anything privacy or security related will be fine. Just come on our screens more.
Simon Johansson bump. Author makes it sound as if convolutional layers are not trained and simply transform the data into some high dimensional statespace. Almost like liquid state machines. This is, to my knowledge, mostly wrong (probably a misunderstanding), the convolutional layers are trained as well
AFAIK: in this CNN, correct label for training is no longer number (class) but something like multidimensional feature vector. In the process of training, network learning how to mapping vector to another vector. So, inaccuracy of mapping may be computed from difference between output vector and correct, ground truth, vector.
The kernel that does a convolution is just another "neuron". The convolutional bit comes in because it is only connected to a few pixels/neurons in the previous layer(s), rather than the whole layer
So is this why CAPTCHA uses these photographs divided into sets of squares, and you gotta pick which square contains a road sign or something? Because it's compared to the low resolution output of the CNN?
THIS IS THE VIDEO I'VE BEEN WAITING FOR! I love all the guys on this channel but Mike Pound's content is super. Any chance he's looking for students for research? ;)
I was always interested in Dr Pound videos, but I never understood them fully. How, when I have passed some courses by Andrew Ng it is much clearer, because of techincal knownleges I now have. It is so good to see that now everything makes sense. By the way, It would be great if you could make some videos with Andrew.
Haha! Love the idea! Instead of looking for him yourself, you'll write a CNN to find him for you! If you code an already trained network for android, it would make for a funny app.
Absolutely love these videos, especially the ones with Mike, but I’m still not exactly sure I follow the whole “tip the picture on its side and scan like that” bit Are you just scanning the top row of pixels? Or scanning the picture row by row from the top? Or...
How do you prepare the data for such a network?! I can't understand how it manages to learn if we don't provide the output "heat map" for a given input, or how do we prepare heat map for a given input if the network in fact needs one.
Several ways are possible. The method that is most commonly used is to train the network with fully-connected layers at the end and after you are done you can convert them to convolutions (By that I mean you use the weights from the fully-connected layers as filter coefficients for the convolution). Or you can directly train the network with a convolutional output. But in that case you will not only need annotations as to what can be seen on the image but also where it can be seen.
I want to watch UA-cam videos but I want to watch them and replace everybody else's voice with my own that way I can learn faster and watch the video faster. Is there some sort of plug-in for a modified UA-cam APK where I can put my digital copied voice on top of captions or something? What I'm asking for is an AI to replace the in video voice with my own because since a person is used to their own voice they could understand themselves better than having to listen somebody else therefore I'll be able to learn this faster instead of trying to understand his thick English accent or anyone else who speaks non American English.
What does the convolution actually do on the photo. I googled some images and I saw that it takes the sum of the pixels surrounding the main pixel, multiplied with the filter pixels. (not so good explained) but I don't see the value of this? and in the end it just search for pixel patterns that could be a cat?
1. Since the input space of convolutional NN is "raster-like", then the short answer is no, it would not work on vector images. The long answer depends on what do you want your neural network to accomplish. 2. Output of these kinds of NN is almost always just a probability distribution of what the network "thinks" is in the image, so in an ideal case it would be just 50/50.
thevoodooninja it could theoretically work on vector images if you get the images to look for points rather than individual points and join them together , the more fascinating thing would be polygons
It was rather easy for people who know the matter, but not very well explained for thos who don't. But that's mostly just because Neural Networks are not easy to understand intuitively.
Jarvis when? But seriously, when can I "train" an AI/assistant via imitation and commands? Say "turn off speakers" or "load up my email", but trained to just work my PC? Currently Google/Siri/Alexia/Cortana are stuck to the OS they are programed on, they don't "see" the PC screen/apps/systems as I do. :(
To process audio you would normally use an RNN instead of a CNN, and then you would wire the output of the network to the commands that do the things you want the assistant to do. Then you either find a dataset online or make one yourself by saying the commands and selecting the correct output for each of them. Unless you mean you want an assistant that literally looks at the screen and operates the mouse and keyboard on any program like an actual person. That would be basically a general intelligence, I think. OpenAI is working on something similar to that, so I guess start there. And talk to Robert Miles about making sure your AI doesn't destroy the universe.
Yeah... kinda that. I've been using VoiceAttack to setup simple commands, with mouse and keyboard actions. It can do simple things like switch tasks/open windows/apps. But of cause, it cannot "see" like we do. So yeah, something mixing imitation (to avoid the need of the user to know/input commands) as some robotics AI systems are currently doing, plus simple Siri/Google voice recognition. "I am clicking Google", for example, then when I say "click Google", it would look for an image similar (learn to look for words or logos I guess). Some of the apps are there, text recognition, image recognition. Most of those though, are specialist AI, and not "general AI".... and I guess I want the impossible one! :D [edit] Oh, my fail safe to my AI, is letting it know it's an AI... disaster averted. ;)
Well, an AI doesn't care what it is. It justs wants to complete its task as efficiently as possible, to the detriment of anything else. If you train it on the real world, it might try to aggressively optimize some aspect of the real world, regardless of what it needs to destroy in order to do so. This is actually an open problem in AI research. And we should probably work it out soon, before *someone* creates an AI that is too smart to be stopped.
Captcha actually is mainly used to train these networks. Because almost all users will choose (roughly) the same correct boxes Google can just take these inputs and use them as valid training data. So basically Google uses it for two things at the same time. The older text-based captcha also had another purpose: One of the two words to be typed in was a random word from a book that Google scanned for Google Books. That way they got internet users to convert all their book scans into digital texts.
Huge rant incoming. That's why I dislike Captchas other than click here to prove you're human. Why should I do work for Google if the only thing I am trying to do is login with the correct email and password. And the sites that use these annoying Captchas are paid by Google, so none of the parties that does the actual work is being paid. There are few things they could do to make their information farming a bit more moral. They could hire cheap workforce in thirdworld countries, but probably the simplest fix of all times, make it possible to opt-out of it, meaning let the user choose other means of verification that aren't that user unfriendly. I don't want site operators to decide what's my time and brain activity worth.
"They could hire cheap workforce..." Lol, are you serious? You want Google to hire people just because you don't want to do 3-4 clicks more in order to use many websites for free? Laziness level over 9000.
Hiring people for work is strange concept? In that case, there is really no point in this conversation, take your baits where people might appreciate them.
The computer has to test millions of neural connections to see which ones produce the correct answer. It's sort of like evolution, but much faster. It's only practical at all because of new computer architectures.
More specifically, the computer uses an algorithm, such as gradient descent. For this, it needs to calculate the partial derivatives with respect to each weight value, which is an extremely CPU/GPU intensive task.
IP UNIVERSITY ETCS-301 These methods have been theorised for decades but it was only a decade ago when nvidia started making SIMD (single instruction multiple data) co processors for simultaneous pixel rendering (intended for gaming) - gpus - that computer scientists realised they could use this these co processors for NNs
I think it's important to mention for True historical keeping. that Crypto Mining greatly poured money in to GPU development. the whole reason go is 10 years ahead of schedule because of the GPUs. think about it a gamer buys what 1 medium end gaming GPU that he can afford you maybe sell 20 of these a month. a miner back in the day comes in to your shop buys the 10 top of the line GPUs, he ask you to order 15 of the model you don't even carry because you it was too expensive and you would never sell it. he comes back complains that they are not powerful enough and take to much electricity while he's complaining he buys your store out again. so you ask what are you using these for? he says don't worry about it. you know that is a supplier for the NSA or a hacker taking down a bank best not to know. the guy taking the orders at nvidia is getting orders like this from around the world. with the new found money development of more then simple gaming needs performance GPUs. fast forward to now a AI programmer comes in orders a bunch of GPUs. you say you must be one of those crypto miners. the guy says phif those guys are driving up the price of my GPUs. you say well if you ar'nt using them for mining what are you using them for. he says. don't worry about it. history repeats. I wonder what unintended market advances AI will lead to. I think clearing up the noise in quantum design.
I used to get so excited to see your video's titles, but honestly I'm pretty frustrated now that you guys almost never talk about any practical details. You're not really teaching anything that a single google image couldn't say. I guess I'll look somewhere else for the details on everything if I actually want to do something with them.
"This one's got a cat in it.
This one's got a dog in it.
Well this one's got a cat and a dog in it, and that's very exciting."
Dr Mike Pound you are an excellent teacher please opt in for more computerphile videos! Big fan
I'm a simple man, I see Mike Pound, I pound that like button.
OwenMc1992 what a teerible pun. But then again. I'm a simple man, I see a simple pun, I punch the phumbs up
@@userou-ig1ze You are very Punning
@@ther701 YUCK
We missed you! :D
Dr Mike Pound is always my favourite. I'll be waiting for the follow up!
When I first watched the neural-network vids on computerphile, I didn't know what a neural network was, much less a CNN. Now, I've had to learn so much machine learning for my job that I know exactly what the next video is going to contain. Won't stop me watching it though
Pound for Pound one of the best teaching on Deep learning
I’m a simple man. I see Dr. Pound, I like & watch.
Love this guy. It would be an honor to be taught by him.
Pretty sure I drove past Mike Pound on the Derby ring road. I couldn't believe I saw such a celebrity, where I live!
Yay, he's my favorite one of the usual people.
smooth man, smooth. In internetz speak: Much sublte! Such smooth.
1:54 When I first looked into cnn I couldn’t understand why applying 32 filters to a 3 colour channel image would not result in 32 * 3 convoluted layers but rather 32. That « hidden dimension » explains a lot of things thanks.
Dr Mike Pound if you are reading this, PLEASE we need and demand more content featuring your explanations. Please come on Computerphile atleast monthly, and talk about the weather i dont care. Anything privacy or security related will be fine. Just come on our screens more.
Confused. If you take off the neural net when/where's the learning done?
the convolutional layers are also part of the neural net and they are being trained
He meant taking out the last fully connected layer that does the actual categorization.
Simon Johansson bump. Author makes it sound as if convolutional layers are not trained and simply transform the data into some high dimensional statespace. Almost like liquid state machines. This is, to my knowledge, mostly wrong (probably a misunderstanding), the convolutional layers are trained as well
AFAIK: in this CNN, correct label for training is no longer number (class) but something like multidimensional feature vector. In the process of training, network learning how to mapping vector to another vector. So, inaccuracy of mapping may be computed from difference between output vector and correct, ground truth, vector.
The kernel that does a convolution is just another "neuron". The convolutional bit comes in because it is only connected to a few pixels/neurons in the previous layer(s), rather than the whole layer
One of the guys I look forward to looking at is Dr Mike Pound.
So is this why CAPTCHA uses these photographs divided into sets of squares, and you gotta pick which square contains a road sign or something? Because it's compared to the low resolution output of the CNN?
My favorite guy from computerphile talking about my favorite subject from computer science! awesome
Don't forget to make that next video!
THIS IS THE VIDEO I'VE BEEN WAITING FOR! I love all the guys on this channel but Mike Pound's content is super. Any chance he's looking for students for research? ;)
Was thinking the same thing! Unfortunately, my university does not exchange with Nottingham currently. Now I'm sad :(
We're always looking for students! Check out the Nottingham, CS and Computer vision lab website for opportunities.
I was always interested in Dr Pound videos, but I never understood them fully. How, when I have passed some courses by Andrew Ng it is much clearer, because of techincal knownleges I now have. It is so good to see that now everything makes sense. By the way, It would be great if you could make some videos with Andrew.
Coursera?
Yea. Free Machine Learning, and then deeplearning.ai courses.
I prefer Dr Kilo
My favourite scientist on this channel
a companion video of a simplified version made in keras would be helpful
Can we have this applied to Where's Wally? Basically a frivolous waste of time, but perhaps an interesting example.
Haha! Love the idea!
Instead of looking for him yourself, you'll write a CNN to find him for you!
If you code an already trained network for android, it would make for a funny app.
New to CNNetwork, so each kernel produce only one feature output out of three channels or the feature output is also in rgb.
Nice! The return of Mike #
£
Frixion pens? Love them
Dr Mike Pound can you talk about how karnel works, please?
Frixion pens? Love them
Confused. If you take off the neural net when/where's the learning done?
Awesome! Brilliant! Marvellous! :D I love him and his style. I wish every teacher was like you and I wish I was your student.
No yellow on white? I think Prof Ed said something similar
merqyuri Nah, it was something told to him by Prof Tom Kibble.
How does the network fully convolutional train? Without a NN at the end, what is actually getting trained here? How could you train a convolution?
I like how this guy explain things
I want to know why captions are disabled for Computerphile?
Please post more such videos. Easy to understand concept with animation. Thank you
Absolutely love these videos, especially the ones with Mike, but I’m still not exactly sure I follow the whole “tip the picture on its side and scan like that” bit
Are you just scanning the top row of pixels?
Or scanning the picture row by row from the top? Or...
Can someone explain (in simple terms) why the image needs downsampling to learn stuff from it?
You are such a great teacher. Thank you for your videos!
Could someone please link the follow up video here?
How do you prepare the data for such a network?! I can't understand how it manages to learn if we don't provide the output "heat map" for a given input, or how do we prepare heat map for a given input if the network in fact needs one.
Several ways are possible. The method that is most commonly used is to train the network with fully-connected layers at the end and after you are done you can convert them to convolutions (By that I mean you use the weights from the fully-connected layers as filter coefficients for the convolution). Or you can directly train the network with a convolutional output. But in that case you will not only need annotations as to what can be seen on the image but also where it can be seen.
Thanks for the answer.
More videos on DL or ML please
keep on the amazing work guys! Thanks for the video!
keep on the amazing work guys! Thanks for the video!
Frixion pens? Love them
How do you backpropagate here?
Color is a little weird this time
Looking forward to next part
I wonder, why some of *phile videos don't have automatic subtitle? Maybe somebody forget to set the language of the video?
Man: Left-handed.
Computerphile: Let's put the camera to his left
gorgeous animation!
Did Tom leave the channel?
I strive to be like him
another great vlog! btw, what's the name of those marker pens?
Think they're called frixion pens - bought them cause I hoped they'd be quieter... >Sean
What is a convolution?
Try here ua-cam.com/video/py5byOOHZM8/v-deo.html
I want to watch UA-cam videos but I want to watch them and replace everybody else's voice with my own that way I can learn faster and watch the video faster. Is there some sort of plug-in for a modified UA-cam APK where I can put my digital copied voice on top of captions or something? What I'm asking for is an AI to replace the in video voice with my own because since a person is used to their own voice they could understand themselves better than having to listen somebody else therefore I'll be able to learn this faster instead of trying to understand his thick English accent or anyone else who speaks non American English.
what a cliffhanger
That one got a cat and dog in it, that is very exciting! 😂
Spotted the reMarkable on the desk!
Convolutional Neural Networks, the kind of CNN you CAN learn something from.
4:25 Or "how is the cat"
I enjoy your videos
We missed you! :D
please make a channel on AI , may be name it Intelliphile - explicitly speaking on ML and DL.
Mike! Finally!
♥ Mike Pound
What does the convolution actually do on the photo. I googled some images and I saw that it takes the sum of the pixels surrounding the main pixel, multiplied with the filter pixels. (not so good explained) but I don't see the value of this? and in the end it just search for pixel patterns that could be a cat?
I've been watching the Rubik's cube of Pound's office for a while now, and they're starting to get out of hand.
Well done!
Ahh the cliffhanger on Unets and other general semantic segmenters
Does this work on svg or vector graphics , this has lots of opportunities
Could you get errors if you merge a dog and a cat
In the future you'll get catdog
1. Since the input space of convolutional NN is "raster-like", then the short answer is no, it would not work on vector images. The long answer depends on what do you want your neural network to accomplish.
2. Output of these kinds of NN is almost always just a probability distribution of what the network "thinks" is in the image, so in an ideal case it would be just 50/50.
thevoodooninja it could theoretically work on vector images if you get the images to look for points rather than individual points and join them together , the more fascinating thing would be polygons
I totally understood everything.
:P me too
It was rather easy for people who know the matter, but not very well explained for thos who don't. But that's mostly just because Neural Networks are not easy to understand intuitively.
Mike Pound is back! Yasssssssss!
I see a white ghost on the shelf.
what?
Peter Bočan a ghost cube on the shelf
ayee
+ for the (maybe involuntary ?) japanese reference
'Its gonna take a while cause the rubber is tiny' xD
Jarvis when?
But seriously, when can I "train" an AI/assistant via imitation and commands? Say "turn off speakers" or "load up my email", but trained to just work my PC? Currently Google/Siri/Alexia/Cortana are stuck to the OS they are programed on, they don't "see" the PC screen/apps/systems as I do. :(
To process audio you would normally use an RNN instead of a CNN, and then you would wire the output of the network to the commands that do the things you want the assistant to do. Then you either find a dataset online or make one yourself by saying the commands and selecting the correct output for each of them.
Unless you mean you want an assistant that literally looks at the screen and operates the mouse and keyboard on any program like an actual person. That would be basically a general intelligence, I think. OpenAI is working on something similar to that, so I guess start there. And talk to Robert Miles about making sure your AI doesn't destroy the universe.
Yeah... kinda that. I've been using VoiceAttack to setup simple commands, with mouse and keyboard actions. It can do simple things like switch tasks/open windows/apps. But of cause, it cannot "see" like we do. So yeah, something mixing imitation (to avoid the need of the user to know/input commands) as some robotics AI systems are currently doing, plus simple Siri/Google voice recognition. "I am clicking Google", for example, then when I say "click Google", it would look for an image similar (learn to look for words or logos I guess).
Some of the apps are there, text recognition, image recognition. Most of those though, are specialist AI, and not "general AI".... and I guess I want the impossible one! :D
[edit] Oh, my fail safe to my AI, is letting it know it's an AI... disaster averted. ;)
Well, an AI doesn't care what it is. It justs wants to complete its task as efficiently as possible, to the detriment of anything else. If you train it on the real world, it might try to aggressively optimize some aspect of the real world, regardless of what it needs to destroy in order to do so. This is actually an open problem in AI research. And we should probably work it out soon, before *someone* creates an AI that is too smart to be stopped.
But that is the point. If we let them know they are AI, then the danger is less, as the "solutions" they have available change. :)
Mike is so cute and smart I would love him to Pound me.
I wish Mike was my teacher
This wasn't very explained IMO, probably only people versed in Computer Science and Deep Learning would understand.
I agree, did he even explain what "looking at the image from the top" means ?
“Alright it’s working but it’s just going to take a while because this rubber’s tiny” XD
Is this how the CAPTCHA works when you have to "select all the boxes in the image that display a [object]"
Captcha actually is mainly used to train these networks. Because almost all users will choose (roughly) the same correct boxes Google can just take these inputs and use them as valid training data. So basically Google uses it for two things at the same time.
The older text-based captcha also had another purpose: One of the two words to be typed in was a random word from a book that Google scanned for Google Books. That way they got internet users to convert all their book scans into digital texts.
Huge rant incoming.
That's why I dislike Captchas other than click here to prove you're human. Why should I do work for Google if the only thing I am trying to do is login with the correct email and password. And the sites that use these annoying Captchas are paid by Google, so none of the parties that does the actual work is being paid.
There are few things they could do to make their information farming a bit more moral. They could hire cheap workforce in thirdworld countries, but probably the simplest fix of all times, make it possible to opt-out of it, meaning let the user choose other means of verification that aren't that user unfriendly. I don't want site operators to decide what's my time and brain activity worth.
"They could hire cheap workforce..."
Lol, are you serious? You want Google to hire people just because you don't want to do 3-4 clicks more in order to use many websites for free? Laziness level over 9000.
Hiring people for work is strange concept? In that case, there is really no point in this conversation, take your baits where people might appreciate them.
what is the network he is referring to at the end? Wanna do some extra research on it since it kinda solves a problem im working on.
Would this be a way of doing the not a robot captchas? Or would you just do something a lot simpler
woohooo! a Mike's video!!!
Why is no one talking about the fact that he erased permanent marker
I’m lost.
This guy is instant like.
This sounds similar to Geoff H. ‘s capsule network implementation with dynamic routing and encoder error function.
Ah, now I got it. Deep learning means, doing whatever and hoping something useful will come out.
Why it's so difficult.
The computer has to test millions of neural connections to see which ones produce the correct answer. It's sort of like evolution, but much faster. It's only practical at all because of new computer architectures.
More specifically, the computer uses an algorithm, such as gradient descent. For this, it needs to calculate the partial derivatives with respect to each weight value, which is an extremely CPU/GPU intensive task.
IP UNIVERSITY ETCS-301 These methods have been theorised for decades but it was only a decade ago when nvidia started making SIMD (single instruction multiple data) co processors for simultaneous pixel rendering (intended for gaming) - gpus - that computer scientists realised they could use this these co processors for NNs
I think it's important to mention for True historical keeping. that Crypto Mining greatly poured money in to GPU development. the whole reason go is 10 years ahead of schedule because of the GPUs. think about it a gamer buys what 1 medium end gaming GPU that he can afford you maybe sell 20 of these a month. a miner back in the day comes in to your shop buys the 10 top of the line GPUs, he ask you to order 15 of the model you don't even carry because you it was too expensive and you would never sell it. he comes back complains that they are not powerful enough and take to much electricity while he's complaining he buys your store out again. so you ask what are you using these for? he says don't worry about it. you know that is a supplier for the NSA or a hacker taking down a bank best not to know. the guy taking the orders at nvidia is getting orders like this from around the world. with the new found money development of more then simple gaming needs performance GPUs. fast forward to now a AI programmer comes in orders a bunch of GPUs. you say you must be one of those crypto miners. the guy says phif those guys are driving up the price of my GPUs. you say well if you ar'nt using them for mining what are you using them for. he says. don't worry about it. history repeats. I wonder what unintended market advances AI will lead to. I think clearing up the noise in quantum design.
but why so difficult man?
This dude doesn't work in the private sector, right? What do these kinds of people do?
He's a lecturer at the University of Nottingham. Most of the people on this channel are.
Sam Booth doubt!
Never in doubt. Always self-assured.
Sam Booth a quote by many 'smart' people
Oh that's pretty cool! He looks so young, it didn't even cross my mind that he does lectures.
I used to get so excited to see your video's titles, but honestly I'm pretty frustrated now that you guys almost never talk about any practical details. You're not really teaching anything that a single google image couldn't say. I guess I'll look somewhere else for the details on everything if I actually want to do something with them.
wow an erasable marker :O first time seeing it for me
Rolling in the deep feat. Dr Mike Pound
What is the difference between this and a neural network in a human body
the difference is we understand what the computer does ;)
Don't write in yellow :)
"This one's got a cat in it, or this one's got a dog in it, or this one's got a cat AND a dog in it, and that's very exciting." - Dr Mike Pound
which industry will be more relevant in the next ten years - AI or Blockchain ?
That ghost cube in the background......
Cliffhanger!
who edited this video the color grading is terrible
Why does it seem like he's sitting in front of a green screen?
WO WO WO THAT GUY JUST ERASED THE FREAKIN PEN BRO! ABSOLUTE MADMAN EVERYBODY, HERE IS THE ABSOLUTE MADMAN!
He’s so cute