Modern x64 Assembly 16: Basic SIMD Floating Point Arithmetic

Поділитися
Вставка
  • Опубліковано 25 лис 2024

КОМЕНТАРІ • 77

  • @sebastian_lindau-skands
    @sebastian_lindau-skands Рік тому +2

    Thanks to you i now know assembly better than say most other programming languages xD

  • @ElijahLM01
    @ElijahLM01 4 роки тому +20

    hands down one of the best instructors on youtube. just plowed through the whole series and it was great. i took over 1000 lines of notes and i feel like i have a way better understanding of assembly than i did before. i hope to see it keep going!

  • @Supakills101
    @Supakills101 4 роки тому +19

    Just marathoned this whole series in 2 days, amazing work. Sending some money your way!

  • @shavais33
    @shavais33 9 місяців тому +1

    This series was so awesome! Absolutely fantastic! You are a talented teacher! And a gentleman and scholar besides! =D
    I learned assembler for several Motorola chips and several Intel chips in the mid 80's and early 90's. I've always wanted to create a computer language of my own design for PC's (Windows, Linux, and OSX). It's a pipe dream, but recently I've been thinking about it again. I don't want to use llvm or gcc or anything, I want to start by creating a little mini home-spun assembler, and then just slowly add a piece at a time over time. Build up the language from there.
    The instruction set reference alone for x86-64 is over 2500 pages long! I don't need nearly all those instructions for my purposes! Only a tiny subset. But jeepers, how do people even come up with compilers for that!? I guess llvm, gcc, nasm and masm all somehow manage to produce x64 machine code. It seems like they must have had scores of programmers working for years! I learned about compilers and linkers and such in school, but. That was in the early 90's. Things have changed a bit since the 80386! Holy pee! And my memory isn't exactly eidetic.
    Anyway, somehow, I want to role my own little assembler. And role my own linker for calling external C libraries, too, if I can manage it. But it's hard to figure out where to even start. I guess the first thing is to find out about the .exe format, and. Then maybe do some disassembly of some C or C++ code, and see if I can find the instructions in the .exe, and then manually change them to some different ones that I find in the reference manual... and just kind of plink away at it from there.

  • @alexloktionoff6833
    @alexloktionoff6833 Рік тому +1

    Need to mention that AVX and especially AVX512 commands can throttle CPU frequency, so for some CPUs it's faster to use several SSE multiplies instead

  • @sabitkondakc9147
    @sabitkondakc9147 2 роки тому +2

    you filled a huge technical gap in me, thanks a million! brilliant series; please keep it up.

  • @Mozartenhimer
    @Mozartenhimer 4 роки тому +4

    Would love another video showing an example problem with operation masks to go branchless!

  • @willofirony
    @willofirony 4 роки тому +4

    Awesome! You got an incredible amount of information into 23 minutes. Well done you.

    • @WhatsACreel
      @WhatsACreel  4 роки тому

      Cheers mate! Glad you liked it :)

  • @steveokinevo
    @steveokinevo 4 роки тому +13

    Pure Quality, great vid man

    • @WhatsACreel
      @WhatsACreel  4 роки тому +1

      Thank you for stopping Herr Ste :)

  • @davidmurphy563
    @davidmurphy563 4 роки тому +18

    Well, I just bought my first computer and turned it on no problem. I think this video filled me in with all I need. Just you didn't cover what an email is.

  • @akosmohacsi8136
    @akosmohacsi8136 4 роки тому +8

    Yay new video!

    • @WhatsACreel
      @WhatsACreel  4 роки тому +1

      Yay! Thank you for watching :)

  • @reifactor
    @reifactor 23 дні тому +1

    Thank you for this series!

  • @dryleaf3552
    @dryleaf3552 4 роки тому +9

    i would love to see you doing a series on OpenCL

    • @judy112929
      @judy112929 2 роки тому

      plz i need the tutorial

  • @jiifox3245
    @jiifox3245 3 роки тому +1

    What a great video again.
    Would be good a video showing some small asm task which is faster than doing a corresponding C++ version. So an example of some simple task C++ compiler does slower than your asm code.

  • @Alex-op2kc
    @Alex-op2kc 3 роки тому +2

    Playlist: ua-cam.com/play/PLKK11Ligqitg9MOX3-0tFT1Rmh3uJp7kA.html

  • @zakariabahja2604
    @zakariabahja2604 3 роки тому +2

    Cheers , you're the best!

  • @INT41O
    @INT41O 4 роки тому +3

    In a struck of massive autism, I once decided to write wrappers (static inline header functions) for all the intrinsics with better names without all these ugly underscores and prefixes, which match the original asm names. I also used function overloading in a lot of places and templates instead of the horrible _MM_SHUFFLE macro: m128 x = shufps(y, z); If anyone is interested, I can maybe go over the code once more and upload it somewhere. I used it with g++ and I am not so sure it would work with other compilers as it is right now.

    • @WhatsACreel
      @WhatsACreel  4 роки тому

      Ha! This sounds amazing! Good on you :)

    • @INT41O
      @INT41O 4 роки тому

      @@WhatsACreel github.com/asdfjkloe/simd

    • @INT41O
      @INT41O 4 роки тому +1

      @@WhatsACreel github dot com slash asdfjkloe slash simd (can not post links directly here)

    • @SirusStarTV
      @SirusStarTV 4 роки тому +1

      @@INT41O github.com/asdfjkloe/simd

  • @genshinquest2024
    @genshinquest2024 4 роки тому +6

    Yo dude your videos are a blessing. Been watching them the past few weeks. Question tho, are you gonna explore other ISAs like RISC V or AVR?

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      The pic asm is cool too! I don't have a programmer though, so I am not sure :( I did have Atmel studio and MPLab installed at one point - they had some great emulators from memory... But it's been a while. Certainly an excellent suggestion! But cheers for watching :)

    • @genshinquest2024
      @genshinquest2024 4 роки тому +3

      ​@@WhatsACreel All good man! Like, for real, before i found your channel i had zero idea where i wanted to go with my knowledge of ASM especially since I was trained with old asf uprocessors like the venerable 8086.

  • @sent4dc
    @sent4dc 4 роки тому +5

    Nice video dude. One correction though. There are 32 YMM registers that were added to AVX. So you go from YMM0 to YMM31. And they also expanded XMM0 thru XMM31.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +6

      They did indeed!! But only in AVX512. I wanted to save AVX512 because of those changes, the mask registers and the new instructions. I really hope AMD adopts it. It is a crazy awesome instruction set!! Anywho, thank you for the info, and thank you for watching :)

  • @Libertoso
    @Libertoso 2 роки тому +1

    Does anyone have a book recommendation on x64 Assembly to actually turn the introductory knowledge from this series into something more advanced?

  • @spacewolfjr
    @spacewolfjr 4 роки тому +3

    Excellent video, thanks. You really remind me of Vincent Schiavelli btw...

    • @WhatsACreel
      @WhatsACreel  4 роки тому +1

      Ha! The guy is a legend! Looks a little like a Zombie, but I'll take it! Thanks for watching :)

  • @sakari_n
    @sakari_n 4 роки тому +4

    It is safe to assume that x64 machines do have SSE2. It is 20 year old and even Windows 8 and later versions require it and if i remember correctly Microsoft's compiler always assumes SSE2 to be available when compiling for x64, but not for x86 and according to 2020 Steam HW survey 100% of users had SSE2 and SSE3 available.
    edit: SSE2 capable processor is always assumed on Microsoft's compiler for x64 and for x86, SSE2 capable processor is assumed by default.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      That's good to know! I think the adoption of AVX and AVX2 has been fairly universal too. It's got to be around 90% by now? Anywho, cheers for the info, and cheers for watching :)

    • @sakari_n
      @sakari_n 4 роки тому +2

      @@WhatsACreel Yes. AVX adoption is already quite high, 92% according to the same survey, but AVX2 not yet, it was it at 76%.

    • @alienrenders
      @alienrenders 3 роки тому +1

      x64 MUST have SSE2 support. It's in the specs. Floating point numbers must be passed in SSE registers according to the x64 ABI.

  • @FaustoSaporito
    @FaustoSaporito 2 роки тому

    really amazing work, indeed. I love your way to explain things. I enjoyed also the quantum computing introduction. You could do also some assembly video with some quick one-qubit gate calculation, using the power of AVX :D

  • @luckyluckydog123
    @luckyluckydog123 3 роки тому +2

    Great, great videos, thanks!!!
    I was wondering, why they added soo many SIMD registers, 32 for avx512? what's the use of having so many? can the cpu process more than one register per cycle (superscalar simd)?

  • @Snyder0317
    @Snyder0317 3 роки тому +2

    Great vid, as always! Is there a performance difference with aligned vs unaligned data in AVX2?

  • @goldLizzzard
    @goldLizzzard 2 роки тому +1

    Thanks for the care you put into your videos!
    What's the difference between this playlist and the other x64 assembly/c++ playlist you have?

  • @diegonayalazo
    @diegonayalazo 3 роки тому

    Thanks Creel!

  • @khatharrmalkavian3306
    @khatharrmalkavian3306 4 роки тому +3

    Could you discuss integer SIMD when you come back around to this topic? Also, are there vector x scalar instructions, so you can do something like multiply all members of a vector by a single scalar, or do you just have to pack a vector with copies of the same scalar?

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      I'd like to cover integers, yes! They're more fiddly since there's more types, signed/unsigned, there's no division, and the multiplication is complicated. Some really amazing instructions though!! So I'm hoping to cover them shortly.
      The scalar operations only work on the lowest elements, so we have to broadcast the scalar if we want it to affect all elements.
      Cheers for the suggestion, and cheers for watching :)

  • @halkit6071
    @halkit6071 4 роки тому +3

    is this lesson the lastest 🧐 becouse i will download all playlist 😍

  • @gvcallen
    @gvcallen 3 роки тому

    Cheers for the videos man! Just binged this series in a day or two. Just wondering - how did you learn all this? I remember you saying you studied music - did you do something computer science related as well? Really curious as I'm studying EE but really interested in this sort of stuff as well! (instead of just ARM and microcontroller programming etc.)

  • @MauroScomparin
    @MauroScomparin 4 роки тому +3

    What happens when you divide by zero with SIMD instructions? An interrupt?

    • @WhatsACreel
      @WhatsACreel  4 роки тому +5

      It sets the result to infinity and continues without error! Cheeky, cheeky, IEEE 754! :)

    • @MauroScomparin
      @MauroScomparin 4 роки тому +1

      @@WhatsACreel Lol, sneaky!

  • @Andrew90046zero
    @Andrew90046zero 4 роки тому +1

    Also, I wanted to see if I could put these instructions to use in C++ by trying to make a simple "vector3" class with a member function to add vector3's together. But im starting to think someone has done this already. Idk if anyone else here has seen some sort of implementation I could look at?

  • @gionibegood6950
    @gionibegood6950 4 роки тому +3

    is it there any gain to switch from intrindics to asm ? i've noticed a small gain on switching from vcl to intrinsics (around 12-15 sec at 7minutes run)

    • @WhatsACreel
      @WhatsACreel  4 роки тому +9

      There is, yes. There’s benefits to both. For this particular video, we would see much better speed from the intrinsics. Really, if we jump into ASM, we want to stay there for as long as possible, since the switching between C++ and ASM can be very expensive due to saving and restoring registers and setting up stack frames.
      But the benefits to ASM are that we get to control the registers, when not even the L1 cache is fast enough! We can define our our own calling conventions which return multiple data or recurse indefinitely, and use any instructions we want whether there’s intrinsics or not, or even commandeer RPB and the RSP if we’re running out of registers. We also have access to very low level techniques, like self-modifying-code. You could get all 12 cores to write machine code for each other while they execute in a beautiful symphony of seg faults and race conditions!
      Haha, that was from an answer I wrote for Quora once. It’s true though. Modern Assembly is a monstrous language with ridiculous possibilities. Hopefully we can explore some of these techniques at some point.
      Anywho, cheers watching mate :)

    • @gionibegood6950
      @gionibegood6950 4 роки тому +1

      @@WhatsACreel hmmm, very interesting, I thought L1 is the absolut limit (or this was my target to be L1 bound), now I understand I can go even further to zero, thank you !!

  • @Midnight-r1x
    @Midnight-r1x 3 місяці тому

    Been watching the whole series in 2024 and it's still great ! I need the tutorial for setting up visual studio 2022 though.. the UI for writing assembly in c++ empty code is different and i still cant figured it out 🤕

    • @Nobodyokbro
      @Nobodyokbro 5 днів тому

      First video still works. On how ti set it up

  • @tomaspecl1082
    @tomaspecl1082 4 роки тому +1

    If register zmm is for 512 bit then what is going to be used for 1024 bit? Are they just gonna make it overflow and start using amm, then bmm, cmm ect... :D

    • @angeldude101
      @angeldude101 3 роки тому

      Or they're do what Arm and Risc-V are doing and set aside fixed-width SIMD altogether and add variable-length vector registers.

  • @JerryThings
    @JerryThings 4 роки тому +1

    Hello! I have a question
    When dealing with vectors, can I simply use movss instead of movaps ? Or something bad would happen.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      You can use it, but MOVSS it only moves one single. That's the mnemonic for Move Scalar Single. It moves 32 bits into the lowest element of the first operand. The PS version is the packed one. Hope this helps, and cheers for watching :)

  • @davidprock904
    @davidprock904 4 роки тому +1

    I was thinking you was going to show how to do it with lower level asm instead of having the specialized hardware do it

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      Not sure I understand mate. These instructions are as low level as it gets! They are single instructions for the CPU. They don't break into smaller ASM or anything. There's no lower level exposed to programmers than single Assembly instructions. They perform multiple operations at once, but that's just how SIMD works, that's what makes modern CPU's sooo powerful! I hope this clears it up a little, and cheers for watching :)

    • @davidprock904
      @davidprock904 4 роки тому +2

      @@WhatsACreel I guess what I could say is I need to know how to do anything you can think of with floating point numbers on paper but only in a binary representation.

    • @WhatsACreel
      @WhatsACreel  4 роки тому +3

      Oh, ok! Yes, the IEEE754 standard is probs worth looking up! I did a video series a long time ago on that, but I think other folks have since released much better ones. Or you might be interested in fixed point? It’s not floating point, but it’s very interesting and sometimes extremely fast!! Anywho, thank you for watching and good luck finding the info you're looking for :)

  • @MrLegarcia
    @MrLegarcia 4 роки тому +1

    +1

  • @lukasmandes1386
    @lukasmandes1386 4 роки тому +3

    yo

  • @mohamed_khoudjatelli9349
    @mohamed_khoudjatelli9349 4 роки тому

    I didn't know Nicolas Cage was into science :)

  • @teckyify
    @teckyify 4 роки тому +5

    What a mess modern x64 assembler is 💩

    • @mannycalavera121
      @mannycalavera121 4 роки тому +1

      Defiently a different world

    • @WhatsACreel
      @WhatsACreel  4 роки тому +2

      Certainly a lot of instructions! Thousands!! Cheers for watching :)