AVX512 Properly Explained! - Performance and Syntax Analysis

Поділитися
Вставка

КОМЕНТАРІ • 47

  • @rn3srk1
    @rn3srk1 Рік тому +29

    Criminally underrated techtuber

  • @KvapuJanjalia
    @KvapuJanjalia 6 місяців тому +14

    The fact that the current high-end gaming Intel CPU (14900K) does not support AVX512 is insane.

    • @HandSwamp-c7r
      @HandSwamp-c7r 3 місяці тому +3

      It's even more insane that there are plenty of older non avx cpu's, that can run the most demanding games on the highest settings.
      A good example is Cyberpunk 2077.
      At first this game would only run on avx supported cpu's.
      After some time they removed the avx check.
      And guess what?
      With my 6 core 12 treads Xeon x5675 running at 5Ghz wit non avx, i can play this game with all settings on max with a perfect 144 fps.
      So i really don't see why they just make the game also compatible also for non avx cpu's.
      Because more than once it has been proved that all these games can be played on non avx cpu's
      Right now i'm playing Silent hill 2 remake, that also required avx.
      But thanks to some smart guy, he made a non avx tool that makes it able to play this game on non avx cpu's.
      And again with all settings on max on my non avx cpu, it runs like a charm.

    • @mika2666
      @mika2666 2 місяці тому

      Intel’s desktop P cores have had support for AVX512 since 11th gen but they disabled it since 12th gen because the E cores don’t support it :(

  • @MrHav1k
    @MrHav1k 5 місяців тому +2

    This was so well explained! Thank you!! The CoD zombies clips in the background actually helped a ton with following along on the video haha. Great stuff!

  • @MarekKnapek
    @MarekKnapek Рік тому +6

    Recently, I implemented the Serpent symmetric block cipher (AES candidate) in portable C using 32bit unsigned integers. Then ported it to 128bit SSE2 for 4× the performance, and later to 256bit AVX2 for additional 2× speedup on top of SSE2. That thing scales like magic. I don't have AVX-512 integer capable computer. Reading Intel's intrinsics docs isn't really that difficult like I was afraid of initially.

  • @problematic3255
    @problematic3255 9 місяців тому +7

    I... wh... okay THIS is how my friends feel when I start talking about various cpu's gpu's and other hardware... this whole video felt like I was listening to an entirely different language. am I just stupid or something for not being able to pick up on context clues?

    • @ProceuTech
      @ProceuTech  9 місяців тому +7

      You’re not stupid- this isn’t common sense information, and a lot of it builds on other information that without having knowledge of it, it creates holes in understanding. Kind of my fault for not explaining things well enough

    • @problematic3255
      @problematic3255 9 місяців тому +2

      @@ProceuTech I mean I know 512 starts with 11th gen and then disappears and is replaced and/or updated to a new thing on 12th gen when looking at instruction sets, but that’s as far as my knowledge goes lol I should’ve looked up instruction set basics videos before looking up the new new stuff lol

    • @ProceuTech
      @ProceuTech  9 місяців тому +5

      AVX-512 is a weird one because you’re right. Intel didn’t support it officially on 12th Gen (but it was still active in hardware in early revisions available to the public), but it’s been entirely fused off with 13th and 14th Gen. the next gen of AVX, called AVX10, aims to fix this by allowing programmers to still utilize AVX512 code, but it will be able to “double pump” an AVX2 hardware data path similar to what AMD does with Zen 4. Weird stuff but you’re not stupid for not understanding!

    • @Aurora12488
      @Aurora12488 24 дні тому +1

      ​@@ProceuTech Nah, I think you found a great balance between technical and high-level. The thing is, no matter what you say, non-programmers' eyes are going to kind of glaze over when you mention data types like ints/floats and sizes in bits/bytes, as well as any mention of assembly, runtime efficiency, hardware acceleration, vectors, and showing C++ code. There's no way around it. And I think there's plenty of videos out there telling non-programmers that AVX-512 lets programs run the same instruction on a bunch of data at once, which is probably as low-level as they can get for a general audience. You did a great job being informative while not boiling the ocean!

  • @ChrisM541
    @ChrisM541 Рік тому +4

    You have to understand that, in it's most basic form, shifting from an 8 to a 16bit CPU carries an automatic 'SIMD' upgrade to all increased registers, for rather obvious reasons. With today's 64bit CPU's, adding separate large registers and applicable opcodes (opcodes which have become more complex/powerful) can - and does - have the effect of stalling a general move to greater than 64bit CPU's.
    Today, were merely extending hybrid architectures, and today's 'large register' extensions are our means to do that.

  • @yumenokoyume
    @yumenokoyume 10 місяців тому +4

    I'm no programmer but, I wonder what happens if you run a AVX2 program on a processor that doesn't supports it. Like an Intel i5 3470.

    • @ProceuTech
      @ProceuTech  10 місяців тому +3

      It will throw a seg fault and crash :(

    • @yumenokoyume
      @yumenokoyume 10 місяців тому +3

      @@ProceuTech Thanks for the reply. I'm using a 3rd Gen i5 for my video editing and VFX. But Adobe 2024 installers won't allow me to further install because AVX2 is not supported. I'm just kinda curious what happens if I ran the program. 🤣🤣

    • @ferna2294
      @ferna2294 8 місяців тому +4

      @@yumenokoyume Usually they program their apps in a way that they have some fallback ability when we talk about the LATEST tech, so someone who has a couple gen older hardware can also use their app. However, since it´s been more than 10 years of the standarization of AVX2, Adobe probably doesn´t care anymore about backwards compatibility.

    • @yumenokoyume
      @yumenokoyume 5 місяців тому +3

      @@ferna2294 Yeah, so true. But understandable since new tech are getting so good that looking backwards aren't getting any profitable nowadays

  • @nate6908
    @nate6908 Рік тому +5

    how is avx512 used for Inference?
    from what i understood in this video avx512 enables you to execute a Multiply (or Accumulate) instruction for eight double precision floats (8*64=512, thus the name)
    so could quantized models to int8 then execute 64 int8 with one single instruction instead of decoding the same instruction 64 times?
    the company neuralmagic even goes the route of saying cpu inferencing is the way forward
    bbut even with "64simd", GPUs are still much more parallel i thought

  • @AP77432
    @AP77432 21 день тому +1

    I remember when i jumped from 10700k to 11700k and i got massively fps gain in any games :) The problem why avx512 is not integrated in many modern cpu is its die size and cpu heating problems. Cpu's went itself better without avx512. I remember prime95 tests with my i7 11700k, when i tested with avx512 disbled and cpu temps went down arround -10-15c also cpu power draw becames like -20-30w less. So question if it worth to use in modern cpus instead of just increasing manufacturer stock clock speeds that can give same effect and problaly less heating and less required voltages. Companies are smart on this enough to what they are trying to do.

  • @cdriper
    @cdriper Рік тому +2

    vector at in high performance loop? )

    • @ProceuTech
      @ProceuTech  Рік тому +3

      You could also theoretically do a vector.data()+sizeof(int32_t)*i;

    • @cdriper
      @cdriper Рік тому +3

      ​@@ProceuTech vector::at validates index on each access, vector::operator[] doesn't (pass vector by reference to simplify access to the operator[], moreover prefer to use passing by reference if null invariant is not expected)
      but yeah, more important point here is that w/o good optimization each indexed access to an array means "offset + index*sizeof(element)"
      also it's not a good idea to put a condition inside a loop because in that case a performance depends on other optimization -- a branch prediction inside CPU

  • @dennysgrimaldi9623
    @dennysgrimaldi9623 Рік тому +3

    nice video, deserve more views

  • @arnoldn2017
    @arnoldn2017 Місяць тому +2

    In think that AVX612 is overrated for floating point ops The code overhead that is required to shuffle data around in the ZMM registers absorbes as lot of the efficiency gain for the actual _mm512 intrinsia.
    Against an optimized C++ program ai squeeze out 15% speed gain at most writing in assembly, using intrinsics this gain is slightly less

    • @ProceuTech
      @ProceuTech  Місяць тому +1

      It’s very hard to disagree- most compilers are good enough in 2024 to use these intrinsics without needing to use them in your code, they’re a flag you can turn on and off in the config/settings.

  • @MZRFaith
    @MZRFaith Рік тому +5

    Most emulation requires avx-512 to run stable, the intel cpus to me have been trash in performance, the 5600x I have is way better.

    • @Adamchevy
      @Adamchevy Рік тому +6

      This is the sole reason I havent upgraded from my 11900k. Why intel went backwards on this I will never understand. I guess emulation isn't something they care about.

    • @CaptainScorpio24
      @CaptainScorpio24 10 місяців тому +4

      ​@@Adamchevy my i7 12700 non k has avx 512😊

    • @Adamchevy
      @Adamchevy 10 місяців тому +2

      @@CaptainScorpio24 the early ones do, but the later ones do not. And it isn’t in the 13th or 14th gen. Ofcourse with emulation under attack it might not be that important a year from now. But when you use RCPS3 it makes a huge difference.

    • @stevensv4864
      @stevensv4864 10 місяців тому +4

      Bro 5600x IS trash compare to 12 13 and 14 gen intel, even without avx512😂

    • @stevensv4864
      @stevensv4864 10 місяців тому +2

      ​@@CaptainScorpio24can you test god of war 3 with the same settings as my videos

  • @WallisGabriel-r6p
    @WallisGabriel-r6p 4 місяці тому

    Alberto Fort

  • @youtubeshadowbannedmylasta2629

    and it just makes performance worse.

    • @nidalspam509
      @nidalspam509 Рік тому +5

      Not in the case of zen 4 from amd.

    • @MrKatoriz
      @MrKatoriz 8 місяців тому +3

      Intel's garbage nodes that can't just not melt upon seeing an AVX512 instruction are the reason (entire CPU downclocks for signifacant amount of time as soon as AVX512 instruction is executed).

    • @panjak323
      @panjak323 6 місяців тому +1

      @@nidalspam509 It makes the performance same as with avx2. Because of functional units only being 256bit splitting the workload on 2x 256 bit operations.
      It can be argued that having full 512b FUs and running at 70% clockspeed is still better than 256b FUs and 100% clockspeed.