Agner Fog's VCL 2: Performance Programming using Vector Class Library
Вставка
- Опубліковано 10 тра 2020
- Support What's a Creel? on Patreon: / whatsacreel
Office merch store: whats-a-creel-3.creator-sprin...
FaceBook: / whatsacreel
In this video we'll explore some more advanced algorithms using Agner Fog's Vector Class Library. These are graphical examples, fractals, emulating HDR (High Dynamic Range), and a small teaser for an upcoming project which is a 64 bits per channel non-destructive image editor.
Agner Fog's GitHub with the latest source for VCL:
github.com/vectorclass
Software used to make this vid:
Visual Studio 2019 Community:
www.visualstudio.com/downloads/
Blender:
www.blender.org/
Audacity:
www.audacityteam.org/
OBS:
obsproject.com/
Davinci Resolve 16:
www.blackmagicdesign.com/prod...
OpenOffice:
www.openoffice.org/
You and javidx9 are great.
Cheers mate! Thanks for watching :)
@@cortexauth4094 have you visited The Cherno's channel? it's a good one as well although it contains a lot of ads (the more recent the videos, the more invasive the advertising).
I still think Creel's channel is the top though
The power of SIMD always amazes me
It's really great fun!! Cheers for watching mate :)
I think WhatsaCreel is using the AVX2 instruction set with 256 bit vectors. If you can get your hands on a computer with AVX512 with 512 bit vectors you can double the performance once more.
And thanks for making such amazing images with my VCL library :-)
@@agnerfog9458 Icelake AVX-512 is half-rate, Skylake-X is mesh which is slow and AMD correctly determined AVX-512 is a waste of transistors.
@@agnerfog9458 I was indeed using AVX2! I'd love to get my hands on AVX512!! Thanks for the library :)
Awesome video, thank you. The antipodean accent can lull one into feeling that one is just sharing a couple of schooners with an interesting dude. Then comes a throw away comment like "that's euclidean distancing, by the way" and one snaps back to reality. The is really serious stuff!. I love Agner Fog's work. The assembly code for his optimised memcopy/memmove/memset library are a tour de force . He is up there with the greats like Knuth.
Cheers mate! Agner's work is fantastic! Really fun to be able to thank him in this way! Thanks for watching Michael, your comment made my day :)
If the buffers aren't overlapping, you should always restrict the input pointers in C/C++, that may make quite some difference. Especially on better compilers (visual studio's optimizer isn't great for vectorization, clang is much better at that). I'd be quite interrested to see what clang makes of this. My manually written intel intrinsics based matrix multiplication was 4 times faster than the visual studio compiled code, but only ever so slightly faster (
If I'd watched this excellent video first, I wouldn't have flubbed the XOR-Shift question in your 2021 lockdown quiz :) Thanks for the education!
22:35 "We do a currentRound++ so currentRound is incremented one"
- What's a Creel?, 2020
😂
Hahaha, complicated stuff!! :)
We want to keep these things graphical too! Keep it up and add in some T-Rex's or something!! Love your vids man!
Haha, wow, that's a great idea! The more T-Rex's the better :)
Good stuff!
What do you use to render your images to the screen?
You're videos are really insightful, I'm surprised I haven't heard about it before.
There is something I want to ask: do you know if you can apply a permutation to a binary sequence using SIMD?
Or perhaps it would be better to do apply a permutation to 4 different binary sequences at once using SIMD.
Cheers mate! I'm not sure of the best way to permute bits... Might be ok using shifting and regular code? If you had your bits spread across many large SIMD vectors turned on their side (AoS vs SoA style), maybe you could do 256 at once...? That'd mean reorganising all your data, and sometimes AoS vs SoA doesn't help... Well, thanks for watching mate, good luck with your coding :)
@@WhatsACreel Thank you!
Hey Creel what do you think about keeping ALL calculations inside SIMD registers, i.e. even representing single scalars as simd regs, to avoid conversion costs?
Wouldn't a GPU be better choice for the image editor. I guess a CPU fallback is always good to have for a robust system, but the GPU is much better at algorithms like applying filters.
If you did this in ASM would it be any faster? Thanks for the video btw
That's a very good question!! Maybe keeping all the variables in registers and unrolling might help? Though we could unroll the VCL too, and the compiler is pretty good with register allocation, so I'm not sure...
Hello,
Thank you very much,
Where is the source code of the examples?
I use veccore,which is just an api for other libraries and architectures including cuda. You should check it out.
Sir Kindly Can you suggest me about my MS Research topic , i m confused in which thing i research
I would advise you speak with your supervisor. Good luck and kind regards mate! Cheers for watching :)
@@WhatsACreel he just tell the research domine Image processing but i don't know where to i start
Here. Take a like
Take a heart :)
Wow! You are one brainy bastard, mate ;-) Thank you for this superb video, I wish the coders I'd worked with over the years were as interesting as you; instead of former VB Script Kiddies.
Cheers mate!! Thanks for the kind words mate :)
64 bits per channel, or 64 bits per pixel?
64 bits per channel is so many.....
Per channel. Yes, it is a lot of precision!
@@WhatsACreel Amazing. Do you plan to use 8 bytes for transparency as well? Also, will this be open source?
@@wiilillad It was something close to DarkTable, so just a single layer photo editor with no transparency. Though, if all goes well, maybe a general purpose image editor could grow from this project. Open source, yes. No idea when I can get back to this project, but hopefully it all works out in the coming months. Thank you for watching :)