What Are SIMD Instructions? (With a Code Example) [DSP #14]

Поділитися
Вставка
  • Опубліковано 25 гру 2024

КОМЕНТАРІ • 21

  • @WolfSoundAudio
    @WolfSoundAudio  2 роки тому +2

    Have I helped you with this video? If yes, please, consider buying me a ☕ coffee at www.buymeacoffee.com/janwilczek
    Thanks! 🙂

  • @niranjanm5942
    @niranjanm5942 Рік тому +2

    Thanks this was great intro on this topic. I wanted to get started on SIMD and this will put me in right way

  • @chen-kim9440
    @chen-kim9440 7 місяців тому +1

    Thanks for your great introduction and lively demo! I really like your pace!

  • @auditiv0276
    @auditiv0276 2 місяці тому

    If you want to make sure to compile using SIMD instructions specific for the HostCPU you can use llvm bindings for the language of your choice and then compile through llvm. Interesting vid!

  • @moliver_xxii
    @moliver_xxii 2 роки тому +2

    hej, to jest trudny temat, nic nie można znaleść na Internet, cieli dziękuję ci Jan!

  • @alldyallnite
    @alldyallnite 2 роки тому +1

    Thank you Jan!

  • @cliffmathew
    @cliffmathew 2 роки тому

    Great job explaining, and demonstrating. Thank you.

  • @theruisu21
    @theruisu21 Рік тому

    great video!. looking forward the next one. for the next time, could include more on the arm and risc v case?

  • @NecdetSanli
    @NecdetSanli Рік тому

    You made the concept easy to understand, thank you. Would like to see some C examples if it's possible too.

  • @ifnullreturn1
    @ifnullreturn1 Рік тому +3

    Line 13 is killing me lol

  • @KeypleezerOfficial
    @KeypleezerOfficial 2 роки тому

    Nice video & nicely paced and clear. Just what I needed to get this topic a bit more. Just need some more examples of calculations actually taken care of by the SIMD extension sets, and perhaps some alternative SIMD/FFT libraries with info about what does what and how, that would be epic. Not many people teaching this in audio with such good phrasing! Keep up the great work! 👍

    • @KeypleezerOfficial
      @KeypleezerOfficial 2 роки тому

      I didn´t read the article about this topic you wrote before. It is great, much more info there giving more depth, thanks!

  • @davidminnix
    @davidminnix 2 місяці тому

    many dsp algorithms contain single sample feedback. can anything be done to vectorize these algorithms? It seems like the feedback complicates any attempt to use block processing to vectorize.

  • @moisascholar
    @moisascholar Рік тому

    Very helpful video. I was working on a particle system/simulation, and I use GL to draw the particles. Was wondering with SIMD and GL, how can I draw multiple particles at once? Or is this something more to do with GL buffers?

  • @przekladanki
    @przekladanki 2 роки тому +2

    Yes, you helped a lot ^_^

  • @BalakrishnanIrudhayaraman
    @BalakrishnanIrudhayaraman Рік тому

    I can understand the concept of simd. But, in the code I can see that you are adding each value when it is added to the register. I see that which is equivalent to scalar addition, I think inorder to avoid one more for loop to store the addition values into the result array which makes sense. This points me to ask whether the intrinsic function performs the addition, only when all the 256bits are filled with values or it can also perform otherwise?

  • @omnisepher
    @omnisepher Рік тому

    Great job,
    but didn't second for-loop killed the entire reason of using SIMD?

    • @corporalwill123
      @corporalwill123 5 місяців тому

      Late reply, you probably already have figured it out by now. Responding anyway for others with the same question.
      That's like saying planes are pointless for traveling large distances, because you still need to walk the short distance to your destination from the airport.
      SIMD will do a large portion of the work, in this case it will do it in multiples of 8, and the regular loop will finish the remaining amount
      so for normal loop you are looking at:
      N*scalar
      while for SIMD you are getting:
      floor(N/8)*SIMD + (N%8)*scalar
      Since by design 1*SIMD will be faster than 8*scalar, for sizes greater or equal to 8, the second algorithm will be faster than just doing the first loop. Otherwise, for sizes smaller than 8, it will be the same as the first loop + some overhead because of the division by 8.