Separable Filters and a Bauble - Computerphile
Вставка
- Опубліковано 1 сер 2024
- How do image processing apps and realtime applications apply effects so quickly? Dr Mike Pound decides to blur his Christmas Tree...
Mike's code: GitHub.com/mikepound/convolve
EXTRA BITS: • EXTRA BITS: More on Se...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Why not just buy a blurred xmas tree to start with?
Don't tell the CS majors that you can translate python to C
I need the Fast Fourier transform video in my life!
I've seen this done with a discrete cosine transform (which I believe is a special case of a Fourier transform). It was actually pretty neat, and SUPER fast.
AFAIK convolutions become multiplications in frequency domain. so you take the FT of the image and the FT of the kernel, multiply them together and then take the inverse fourier transform of the image
@@calaphos Essentially yes. You would need some padding and cropping to not get cyclic behavior (left side blurring onto the right side and vice versa, and same for up/down).
I'm no expert for the FFT, but it's "just" an algorithm for the discrete Fourier transform, originally only for cases where you dealt with N=2^n values, that is more efficient than the classic approach. The classic DFT algorithm requires a number of steps proportional to N^2 whereas the FFT requires only a number proportional to N log N. This is the same proportional difference, for example, between the average timings of bubble sort and quick sort and does make a huge difference for large N.
Also, the Fourier transform is incredibly useful, image processing has already been mentioned, but also e.g. digital audio compression, statistics, cryptography, number theory to name but a few.
"It just looks like my camerawork" :D
Self deprecation.. a fine form of humor ;)
I just wish he'd actually do something about it.
@@Anvilshock His camerawork is fine.
@@General12th No, it's idiotic, hyperactive, nauseating garbage.
I could listen to him pouring the knowledge on my face for hours
A video with Mike breaking down some Machine Learning terms? "What's Machine Learning VS. Deep learning? AI? Etc."
Please make the FFT convolution video also!
32 butterflies liked your comment! :)
It would be a short video. After a FT, a convolution is just multiplying. Taking the DFT(discrete version of FT) takes just as long as a single filter the size of the full image, but you can then do as many convolutions(of any size) as you want. FFT is closely related to jpeg compression and you see the same types of artifacting, but it is fast and allows for unlimited conversations like a DFT.
TLDR, that is more a topic for numberphile. The convolution is so easy in the frequency/spatial domain that it is trivial, but the reason why it works is very math heavy.
I'd like to point out that gaussian blur still needs to sample all pixels in that horizontal and vertical window but box blur doesn't. Box blur can be simplified to a pair finite input response filters that only take 2 samples per pixel each. A common trick in image processing software is then to use a series of box blurs instead of a true gaussian. The result you get is very similar, however the distribution is not a gaussian one, but a binomial one. Once you get enough box blurs (say, three or more), this distribution is similar enough to a gaussian one that it's hard to tell the difference.
ahhh I see Mike also still uses Windows Photo Viewer on windows 10 instead of whatever garbage photo viewer they include nowadays.
on my machine, the Windows 10 photo viewer takes up over 600MB of RAM just while viewing ordinary jpegs...
Candy Crush Photo Viewer
it's because the new photo viewer is a "Universal Windows Platform" app which really means it's written in JavaScript and has to load a web browser to view your photo LOL
take a look at the program folder, a huge amount of files and dlls, remember that it is necessary to configure the programs folder and give access to you (user) to be able to see the files contained, a lot of unnecessary. I'm almost done with my 15mb photo viewing program that supports the formats written by me .tga, .pcx, net pbm and some other .jpg, .png, .tiff, .hdr, .psd, giff libraries. ..
This is the sort of video I enjoy the most
"I wasn't really interested in how fast it was -" I assumed that was a given with Python.
thats so insightful, you must be python expert!
Just a friendly reminder that pypy exist.
Anyone who complains about python being slow either has never used python or has tried a really naive implementation of a solution to an expensive problem in pure python instead of using standard python tools like numpy or pandas which are implemented in C.
You can see in this video that he's using numpy.
Python is fast if you use it correctly.
@ebulating was your python implementation using numpy or pandas?
@@jackeown "Python is fast if you use tools that are not in Python"
I was about to ask about Fourier Transforms because a convolution in time is multiplication in frequency
This helped me implement a separable sobel filter thank you.
Excellent code, very clean.
Absolutely beautiful!
For image processing, although separable filters offer a huge speed benefit, they typically also have something of a penalty in image quality. If you use horizontal and vertical separable filters to sharpen, any diagonal detail in the image will get sharpened twice. Likewise, for blurring, any diagonal detail in the image will get blurred twice. This tradeoff should be taken into consideration. Usually though, the speed benefit is more important.
Is it possible to have a talk about text localization in images? dr.Pound is really great when its come to explain things.
I thought your area of interest was mostly security. Nice to see you have knowledge of my area of work too :) Separable filters are drastically important for post-processing in games.
really enjoyed this. please more high level stuff like video analysis, maybe also more ML.. love u
Dude your camera work is fine
Mr pound could talk about paint drying and I would still like the video !
7:58 missed opportunity to stop the counter at 3:14:159, it went up to 3:14:190 :(
Great video!
I am a simple man. I see Mike, I click.
You can approximate any filter with the outer product of the first singular vectors (multiplied by the first singular value if you don't want scaling). A better approximation can be made using just the first few singular vectors (and values). Maybe you can do a video on the singular value decomposition. Or one on filters in the frequency domain.
Some GUIs contain transcluent & blurry interfaces, the blurry effect is done by scaling the background image down and up again.
examples: Windows Aero(Windows Vista & 7), IOS, Ubuntu Unity, etc...
is there any way to classify which filters are suitable for the singular value decomposition?
3:40 I was about to post a comment about using the Fourier transform to perform the convolution. (I kinda have an obsession with them.)
Computerphile Christmas Drinking Game: take a shot every time Mike says "really"!
Dr Mike Pound, not all heroes wear capes.
Love the print paper. Are you still using dot matrix printers or is the paper left over from a great deal that your purchase department did in '92?
How do you decide for a standard deviation of x you have to use the kernel size of y, how to you decide y here ?
I think it would be interesting to go over the maths of why the filter can be separated into horizontal and vertical parts. Or if anyone has a paper on it to link please :)
Can you suggest a book from where I can learn more about these things in detail?
i wonder if my numpy-based (using some cython code) convolver would be faster than the first one. i’ve spent a long time optimizing it.
Oh, I see! I've been doing it the slow way! Thanks!
This is great, could be used as similar to checkerboard rendering for video game performance.
Very useful. Thanks.
Did that clock stop at 3m 14s 15fr?
Did you write your code cache friendly? Like going in x direction in the inner loop and y in the outer?
Love your channel!
Could you do a video on how game protection works; what cracking a game looks like, tricks used by game studios and some historical highlights?
in what cases is it possible to seperate the kernel?
So what about the Fast Fourier transform method?
I'd love a video on convulsions based on FFT!
Why does increasing the standard deviation change the runtime? If the filter size is constant in both the cases, only the values inside the filters will change, why does it end up in increasing the runtime?
At 5:28, Mike promised us a source code link in the description. I can't find it.
Ah sorry, bear with me >Sean
Waiting for that code too! Come on Mike >:(
Chill it's just 30 minutes since the video :P
@@DroidFreak32 well, i assume it has been edited for some time between filming and posting.
even if that's not the case, he had 5 minutes already
not Mike's fault I just forgot to paste the link in :) Sean
Probably should be pointed out that the only circularly symmetric 2D filters that are separable into two 1D filters is the gaussian. This means that circularly symmetric filters like high pass filtering cannot be done by such decompositions.
Would be nice to provide (or explain; I can google it if I just want it) the Algorithm, how to split a 2D-Matrix intwo 2 1D-Vectors for doing these convolutions. And probalby explain which Matrices are seperable, and which are not :>
please do the fast fourier transforms too!
What about the pixels at the boundary, how does the convolution deal with the adjacent “pixels” outside of the image boundary?
It depends om what you want it to do. You can ignore those pixels and get a smaller output image, you can pad the edges with zeroes which will result in a dark edge around it, or you can pad the edges by repeating information in your input. This could be repeating the edge pixel, mirroring pixels around the edge or wrapping the opposite side of the image around.
how to tell if a 2D filter is separable?
I see Mike uses a thinkpad
I own several, they are one of the most comfortable laptop keyboards and they look nice with their classic black monolithic design
That number should be multiplied by 3, because RGB.
The video won't even load for me. Interesting. Makes me wonder, with the other quality comments, if the video got corrupted on upload.
I had to manually select 360p for the video to load.
There's also a good IIR filter for gaussian blurs.
Can you do a video on QUADTREEs
Since we are talking about speed to run the code, wouldn't you also gain some speed between the first and second passes if you rotated the image 90 degrees and did two horizontal passes (one before and one after)? This would allow the CPU cache to better predict what data you need loaded, thus saving you a lot of cache miss RAM fetches (when reading previous and next rows of data).
image rotations arent free.
@@styleisaweapon In this case a rotation would be pretty much free compared to doing a convolution, especially if your kernel is decently sized.
@@alcesmir Saying it doesnt make it true. A rotation forces two different access patterns, one with short stride, one with long. Doesnt matter which one is the read and which one is the write. Stop trying to get a free lunch by hiding the work in an abstraction trick. The abstraction also has costs.
cache misses have costs no matter how you try to disguise them or hide them or pretend they arent there - the proof is that you are trying to solve the cache miss problem - by doubling the number of them
@@styleisaweapon My point is not about reducing cache misses. My point is that doing the rotation O(width*height) is cheap compared to the convolution step O(width*height*kernel size), especially for large kernels.
I agree doing the rotation is pretty useless in this case, but the argument is that you're just introducing (potentially more) cache misses somewhere else, not that image rotation is massively expensive.
“We need a computer” (at 4:55) - not if you really want to show the performance difference! Though I wouldn’t recommend trying to compute the 16 megapixel image with pen and paper...
A video on Fast Fourier Transform would be useful. I can't get my head around it
This video is blurred...
Wait a few hours
EDIT: Oh I got the joke
Is this supposed to be funny?
360p tho
10:20
depending on how you view it, it's either interpolated or downscaled #nerdoff
Does not seem that you guys posted the code!
Only 360p?
And no speed options
watching it in 360 because I can't wait
@@sn0opyKS but the "Extra Bits" video has HD as an option already
Maybe a high-quality low-quality joke?
sounds like my camerawork
Where is the actual convolution code in C++
O(n). n increased from 60(8sec) to 271 (×4.5). Gusses: 18-21 seconds -surely trolling :)
time it took: 271/60*8sec=36sec. right on. Great video btw.
PLEASE DO THE FFT VIDEO
WOW at 8:50 there are major spoilers on the screen. :D
Thank you for the github link... (just as I've migrated to gitlab.) Wasn't able to add an issue, so
python3 run.py --no_separable_filters --sigma 3.0 ./christmas.jpg ./christmas.out.jpg
Traceback (most recent call last):
File "run.py", line 84, in
main()
File "run.py", line 61, in main
compute.convolve(img, output, kernel_2d)
File "compute.pyx", line 7, in compute.convolve (compute.c:1705)
def convolve(float[:, :, ::1] image, float[:, :, ::1] output, float[:, ::1] kernel):
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'
Makefile:10: recipe for target 'run-fast' failed
make: *** [run-fast] Error 1
That camera on the tripod, is that a Sony?
Panasonic LX100 >Sean
@@Computerphile
Looks quite similar to the Sony Alpha 6000 series.
Thanks.
instant braingasm
bounds checks in C?
Python can be plenty fast, but writing fast Python usually isn't emphasized. There's a couple great articles from a guy at IBM online where he takes a straightforward Mandelbrot implementation that took several seconds to run and got it to running in nanoseconds. The code was still pretty readable, but you certainly have to do it intentionally. I'd expect first step for anything like this is you need to take advantage of numpys vectorized operations. Were you looping through the indices manually? You can use something called 'stride tricks' to do very fast sliding 2D windows over large matrices with numpy and keep operations vectorized which makes a big difference on modern CPUs.
Python is inded fast enough for 99.9% of use cases I've found. In this case, you need a bit of a workaround. It kind of depends on what you mean by python being fast here. The Numpy approach would be fine if you vectorised it all, but in some sense by fully vectorised what that means in practice is the heavy lifting is done almost entirely in C behind the scenes. This isn't a complaint about python, it's just how it's designed.
I tried a partial vectorisation, but I left it at that because using a full vectorised approach (or indeed the convolve function in numpy) would have somewhat defeated the object of the demo. Similarly Opencv just uses an FFT, which again wouldn't show what I was trying to do!
Oh come on, Python is inherently *extremely* slow, and also constrained by its infamous GIL (global interpreter lock). There are indeed many things you can do to improve performance, from cython to pypy (I love pypy myself!) but pure Python is actually much, much slower than C, Java, Julia, Haskell etc.
What helps is that part of its "batteries included" library (BLAS routines etc.) is actually written in very fast C, but if you have to write preformant code yourself you should be ready to use its FFI...
(and then it's not Python anymore)
Irony with it being at 360p, but also bad because the slight blur is unseeable to me! 2n > n^2
n=1?
places Portuguese subtitles
eggnog_latte is now my password
Your server is called deadpool?
Maybe watch the "beast gpu cluster" video >Sean
Couldn't we create a copy of the original images rotated by 90º avoiding a vertical pass and use it also the horizontal filter? If I recall that would be cache friendly and be orders of magnitude faster(assuming that could be performed). I remember that for Doom wall rendering does that for source images.
Good old bilinear filtering.
The trick is in the name. XD
Paint .NET, wooo...
photopea
Why only 360p?
why we dont use the same method to do convolutions in CNN? It would make CNN faster.
It took pi minutes: seconds. Coincidence?
Hi
what the heck is a bauble?
360p?
My boy Mike went up a level in the last video with his favourite language being c# :-D
This is the reason I don't like python. That wouldn't have been hard at all to do in C, but ofc he compiled the python or whatever instead of doing it.
Interpreted languages… for when you DO have all day!
What *is* a scripting language? (That's a real question.)
If you want to define that with respect to the way it's executed (with an interpreter, with a compiler to machine code, or with a compiler to byte code), that's actually incorrect. And actually, the notion of "compiled language" or "interpreted language" doesn't really make sens. There are C interpreters and and there are python compilers.
And neither the language itself nor the way it's executed decide its performance. Compilers have all the time in the world (kind of) to analyze the code and generate the best possible machine code. Interpreters (and virtual machines) can collect statistics and perform the optimizations on the fly based on the actual distribution of values that your program runs on. There's no clear winner as of now.
@@Ceelvain obviously by "scripted" he actually means "interpreted." One of those situations where the person knows just enough to think themselves well versed while actually tricking them into being publicly misinformative.
Look who lost some weight ?
First
You have fast, pretty, cheap options.
Choose TWO.
Or wait for all THREE options.
And wait.
Times up. New games are better games.
360 for 2 hours, then suddenly 4K (odd)
Not odd at all, very typical actually
In a quantum computer We call a quantum network image "blur" a more APPropriate term:
a "bLend".
The 5D (and thus 5G) quantum super-encoders are called "bLenders".
bLenders are fed into the quantum bRoadcasts of the Neuronet, the Matrix made up of all IOTs (Internet-of-things).
A period of phases of IOTs is often called a nNOTt in the near future.
What does "nNOTt" mean?
¿Neural-Net-of-things/thinks.
why not just add image.blur = true;