why should march=native add special instructions for a special CPU (family) , don't you need to specify the cpu architecture explicit to test if a instruction set is generated or not? like march=amd10fam or newer, barcelona, or intel, haswell or broadwell or newer?
Ofc you were not able to see blc* instructions emitted since they are only implemented for certain AMD processors which support TBM set of instructions. parallel_bit_deposit from example should actually be named parallel_bit_extract. Why didn't it translate into PEXT? Probably because it's slower than what it translated into due to too simple selector (which only has one island of 1s). But wiki example of return ((x & 0xff000000) >> 12) | ((x & 0xfff0) >> 4); also doesn't compile into PEXT. for those willing to play with code here's the link godbolt.org/z/JUhCaF
Excellent video, Jason. I have often wondered about the compilers' use of advanced instruction sets. I have used POPCNT (via intrinsic calls), it is very valuable when used with bitmaps (similarly to the FATs used in the disk OS) that enable the reuse of elements in containers such as vectors and as such cut down, on reallocation costs. The purists will, probably object on the basis of portability of code and there is some worth to such objections; because there is now less incentive to up grade to the latest technology (one can see that in the plethora of pentium and I3 machines currently being retailed). Nevertheless, would be valuable to see how and if these instruction sets are used by the various compilers. Thank you
So, that's missing MMX, which is pretty old. I don't know if that somehow doesn't count for some reason or another. And yes, there is probably a specific -march flag (or perhaps other -m flags) you could use to enable the instructions. -march=native is... going to get you random results on godbolt.org. You should specify a specific architeture you're interested in. I have a Ryzen 7, and in the output of /proc/cpuinfo I can see that abm, bmi1, and bmi2 are all supported. So you could've specified -march=znver1 and it should enable those instructions.
When march=native is set the compiler will try to use all features of the current instruction set, right? Is there a way to specifically enable certain extensions? For example my cpu has SSE2 but I only want to include SSE1 instructions. I always had the feeling that byte code languages have a small advantage there.
You can specify the exact cpu architecture you're targeting: gcc.gnu.org/onlinedocs/gcc/x86-Options.html#x86-Options e.g. to just enable SSE use '-msse'
As long as you compile for x86-64(aka amd64), minimum of SSE2 is assumed since the ISA requires SSE2. Then if you want to reduce your userbase and include more advanced extensions, you can add -mavx, -mavx2, etc. to GCC as far as I know. Of course, if you do -march=native, your output binary will be strictly using whatever is available on your machine(that you are compiling). The reason it is not recommended on GodBolt is that march=native will generate binary assuming the server GodBolt is running. There are more options in GCC like specifying minimum x86 generation and supporting all the newer architectures and so on. Or using march=generic(default) and tune your implementation with 'mtune' for a specific uarch.
So, the guy who makes these videos generally uses the very latest C++ available. Sometimes he even uses C++ features that are not in any standard (yet). But, more often, the examples and ideas are broadly applicable to many different versions of C++. But finding a C++17 compiler isn't hard. The latest versions of both clang and gcc fully support C++17, and these compilers are Open Source, so you can download them and compile them easily if your platform doesn't already have them.
why should march=native add special instructions for a special CPU (family) , don't you need to specify the cpu architecture explicit to test if a instruction set is generated or not?
like march=amd10fam or newer, barcelona, or intel, haswell or broadwell or newer?
Ofc you were not able to see blc* instructions emitted since they are only implemented for certain AMD processors which support TBM set of instructions.
parallel_bit_deposit from example should actually be named parallel_bit_extract. Why didn't it translate into PEXT? Probably because it's slower than what it translated into due to too simple selector (which only has one island of 1s). But wiki example of return ((x & 0xff000000) >> 12) | ((x & 0xfff0) >> 4); also doesn't compile into PEXT.
for those willing to play with code here's the link godbolt.org/z/JUhCaF
Excellent video, Jason. I have often wondered about the compilers' use of advanced instruction sets. I have used POPCNT (via intrinsic calls), it is very valuable when used with bitmaps (similarly to the FATs used in the disk OS) that enable the reuse of elements in containers such as vectors and as such cut down, on reallocation costs. The purists will, probably object on the basis of portability of code and there is some worth to such objections; because there is now less incentive to up grade to the latest technology (one can see that in the plethora of pentium and I3 machines currently being retailed). Nevertheless, would be valuable to see how and if these instruction sets are used by the various compilers. Thank you
So, that's missing MMX, which is pretty old. I don't know if that somehow doesn't count for some reason or another.
And yes, there is probably a specific -march flag (or perhaps other -m flags) you could use to enable the instructions. -march=native is... going to get you random results on godbolt.org. You should specify a specific architeture you're interested in. I have a Ryzen 7, and in the output of /proc/cpuinfo I can see that abm, bmi1, and bmi2 are all supported. So you could've specified -march=znver1 and it should enable those instructions.
When march=native is set the compiler will try to use all features of the current instruction set, right? Is there a way to specifically enable certain extensions? For example my cpu has SSE2 but I only want to include SSE1 instructions. I always had the feeling that byte code languages have a small advantage there.
You can specify the exact cpu architecture you're targeting:
gcc.gnu.org/onlinedocs/gcc/x86-Options.html#x86-Options
e.g. to just enable SSE use '-msse'
As long as you compile for x86-64(aka amd64), minimum of SSE2 is assumed since the ISA requires SSE2. Then if you want to reduce your userbase and include more advanced extensions, you can add -mavx, -mavx2, etc. to GCC as far as I know. Of course, if you do -march=native, your output binary will be strictly using whatever is available on your machine(that you are compiling). The reason it is not recommended on GodBolt is that march=native will generate binary assuming the server GodBolt is running. There are more options in GCC like specifying minimum x86 generation and supporting all the newer architectures and so on. Or using march=generic(default) and tune your implementation with 'mtune' for a specific uarch.
Ahmet Erdem Thanks for the detailed answer. I'll have a look :)
Is C++ weekly videos about C++17? Or can i use C++11?
So, the guy who makes these videos generally uses the very latest C++ available. Sometimes he even uses C++ features that are not in any standard (yet). But, more often, the examples and ideas are broadly applicable to many different versions of C++.
But finding a C++17 compiler isn't hard. The latest versions of both clang and gcc fully support C++17, and these compilers are Open Source, so you can download them and compile them easily if your platform doesn't already have them.
Eric Hopper thanks for the information friend, i will install visual studio 2017, and follow this video.
Jason Turner thanks, i check it.
nice find ~ 💖