Magic Mirror in My Hand...Who is the Fairest Scheduler in Linux Land?

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • Today I dive into the deep and mysterious waters of the new Linux Scheduler Earliest Eligible Virtual Deadline First (EEVDF), a scheduler which has supplanted the Completely Fair Scheduler (CFS) in the Linux Kernel 6.6. I ran into a couple of surprises so I am including the other mystery to this puzzle which is the Intel Thread Director which comes into play when using Intel Hybrid CPU architecture from the 12th Generation Intel processors to the present 14th Gen (MeteorLake) CPUs. So buckle up because "we ain't in Kansas anymore".
    AI Image: In the spirit of the Evil Queen from Sleeping Beauty, who asked one single question every day...
    Chapters
    00:00 - Start
    00:18 - Why change from CFS?
    02:12 - Sounds Great so what's the problem?
    04:37 - Problem 2
    05:27 - Solution
    08:37 - Hybrid CPUs
    09:25 - Intel Thread Director
    10:47 - Intel's CPU Classification
    12:54 - Intel Thread Director
    14:01 - Use Case: Idle Load Balancing
    15:24 - Use Case: Partially Idel Load Balancing
    16:18 - Use Case: Live Exchange
    17:02 - Use Case: Overutilized and Unbalanced CPUs
    18:19 - Use Case: Overloaded and Balanced CPUs
    19:10 - Performance Governor Isn't the Best
    26:45 - Unix Bench
    28:29 - Final Thoughts
    Support me on Patreon: / djware
    Follow me:
    Twitter @djware55
    Facebook: / don.ware.7758
    Gitlab: gitlab.com/djware27
  • Наука та технологія

КОМЕНТАРІ • 36

  • @CyberGizmo
    @CyberGizmo  3 місяці тому +14

    You may think this is Intel heavy and it is, I have reached out to AMD to see if they will provide additional information about their forth coming processors, so I will be revisiting this topic, hopefully discussing their approach.

    • @andersjjensen
      @andersjjensen 3 місяці тому +1

      I surely hope they'll sample you some hardware, but your channel is pretty small (and AMD doesn't have the biggest marketing budget on the planet), so if they don't you could start a fund raiser? I'd gladly pitch in $20 for an interesting architecture comparison.

    • @CyberGizmo
      @CyberGizmo  3 місяці тому

      @@andersjjensen That would be great but right now I would settle for some documents but yeah having hardware would be good...and thanks for the offer

  • @andersjjensen
    @andersjjensen 3 місяці тому +5

    I find it "interesting" that pinning the P cores causes so much heat that the average clock speed across the entire CPU bottoms out, which makes it more sustainable to rotate priority tasks to the E cores once in a while.
    I would be very interested in seeing a similar analysis when AMD's "full and dense" hybrids start hitting the market. Since those are ISA and IPC identical, and both have SMT, that's going to be an interesting comparison.

  • @1234567zeek
    @1234567zeek 3 місяці тому +2

    I cant express how much I appreciate the extensive work you must have put into this video. Thank you!

    • @CyberGizmo
      @CyberGizmo  3 місяці тому +1

      all in a days (err ok a couple of weeks of work), but I wanted to know the answer as much as I am sure you did so its all good.

  • @chalybesmith
    @chalybesmith 3 місяці тому

    Another great video. Informative as always!

  • @suscactus420
    @suscactus420 3 місяці тому +1

    great explanations and data, thank you!

  • @sheevys
    @sheevys 3 місяці тому

    Thank you for this, it's super interesting stuff.

  • @EricLikness
    @EricLikness 3 місяці тому +1

    Thanks for that power profile benchmark. And while I don't have a 13th Gen Intel, I do have I think, 12th Gen, so Balance is going to be my rule of thumb for Desktop/Laptop and in future if I get a 13th Gen Intel on a laptop, possibly Power-save will be the rule of thumb for how I'm using that style of computer. Rule are changing with hybrid, E-core, P-core and scheduling for sure as you say.

  • @TheKirurgs
    @TheKirurgs 3 місяці тому +2

    There is a BMQ / PDS scheduler from Alfred Chen. It might be a good idea to test them too.

    • @CyberGizmo
      @CyberGizmo  3 місяці тому

      I will probably will look at those, I did show them on one of the slides

  • @MrYossarianuk
    @MrYossarianuk 3 місяці тому

    Really interesting, thanks .
    I remember the Brain f*ck scheduler ...

  • @Chris-op7yt
    @Chris-op7yt 3 місяці тому +2

    a good scheduler would know exactly the workload and use of resources by all processes, and make the right decisions to minimize resource contention, reduce swapping to disk, and overall maximize performance of work getting done.
    just being a ignorant set of traffic lights that switches the lights periodically without adjusting, causes a traffic jam.

    • @CyberGizmo
      @CyberGizmo  3 місяці тому +2

      Agreed, back in the old days we had to walk up hill both ways in a blinding snow storm to figure out what the CPU's and task workloads were doing, but now with all the extra transistors it should be a thing all the time.

    • @andersjjensen
      @andersjjensen 3 місяці тому +1

      Ideally yes, but unfortunately that, mathematically, is like solving The Traveling Salesman problem in real-time... but the map changes all the times, and the information about road repairs changes all the time.. and with P/E core designs you now also have to take gas prices into consideration.

    • @Chris-op7yt
      @Chris-op7yt 3 місяці тому

      @@CyberGizmo : maybe processes could even provide this information themselves in some agreed format. r u mostly single threaded? filesystem heavy? network heavy? batch or interactive or spurts of activity based on triggers? it wont be perfect but should provide better use of resources and performance, rather than mostly blind time slicing....especially if there's a few sleeping cores, that could knock over a whole job in 10 minutes, with time slicing removed for that.

  • @Redfrog1011
    @Redfrog1011 3 місяці тому

    Great vid

  • @lenwhatever4187
    @lenwhatever4187 3 місяці тому +1

    So we are talking about two things, the scheduler and an Intel governor. The powersave governor has generally been pretty good, but I have still been using performance (with boost turned off) not because it is faster, (it's not really) but because it has constant speed which seems to make a difference for low latency audio (< 6ms latency) I have found that ALSA or maybe Jack or perhaps one of the libs they rely on, do not do well at the point the speed drops. Performance is a bad label, it should be staticspeed or something like that. With performance running all cores at rated speed, Boost is unable to ramp up as far because of chip temperature. My old 4core i5 was only able to boost one step up while in performance but all the way to maximum boost on powersave. The current chips may be worse with more cores running hotter... over heat may keep the cores from even reaching rated speed with performance.
    I suspect that the CPU is not at fault but rather some timer (perhaps sw timer?) is not getting reset to deal with the slower cpu speed and so there is missed schedule or audible xrun. The lower speed itself is not the problem as I can manually set the cpu speed to a steady 800Mhz and run low latency audio all day (with poorer performance with the whole machine of course) and no xruns, even with 1.7ms audio latency. This is running the lowlatency kernel which is pseudo real time, the OS does get a chance to run at least once in a while even if the realtime code locks up. It will be interesting to see how well this new scheduler does with this kind of work load.
    In general, recording audio does not need low latency audio, 30ms is fine. However, for live use such as softsynth or effects (guitar effects for example) or people who want to monitor their input after plugins, low latency is a must. The whole latency must be below 10ms at least. so .65 for the ADC plus 1.7ms of buffer plus effect time (generally one graph run) 1.7ms plus the DAC delay another .65ms, is already half that and double the buffer (for most modern USB devices) means almost 10ms already. As a musician playing bass, I have noted that if I use a really long cord and walk away from the stage while playing, I am horribly out of time by the time I get even 30 feet away from the stage... not digital delay just time for sound to propagate. All that to say, my findings are that Performance it self does not make so much of a difference as the clock speed switching and I suspect that this is a sw bug, probably in some c++ lib. Very hard to reproduce or trouble shoot.
    Really, I would be quite happy to run the audio thread(s) on E-cores if the core was speed stable.
    AMD is another question, last I heard, Linux was still using ondemand for AMD chips. Ondemand has been known to use more power than performance when a core is idle because it has to wake up all the time to check cpu load. I had heard that AMD was working on an internal governor similar to intel powersave or rather, the kernel interface to the governor that is already in the AMD chips but I have not heard of any completion of such a thing (I have not been monitoring it so things may well have changed).

    • @CyberGizmo
      @CyberGizmo  3 місяці тому

      What kernel version are you running? I played around some with my old AMD a Zen 3 series but I didnt notice alot, except I was reminded why I hate NVidia so much on Linux

    • @lenwhatever4187
      @lenwhatever4187 3 місяці тому

      @@CyberGizmo I am not sure where things were at when I was looking things up for someone else but AMD has since come out with the amd-pstate module that works with ondemand and schedutil in a more complementary way. I personally only have Intel processors because I don't need video performance beyond what the on chip GPU can provide and the modules are open so they will work in any kernel version. Not a fan of nvidia either.

  • @bob_mosavo
    @bob_mosavo 3 місяці тому +1

    Thanks 👍

  • @necuz
    @necuz 3 місяці тому

    My primary heavy workload is gaming, so in hopes of the low latency promises being delivered on I'm using a kernel with the BORE scheduler. I don't have any data but anecdotally it's working well at least, less audio issues and less swings in frame render times when gaming compared to what I'm used to. I'd be interested to see a comparison to EEVDF.

    • @CyberGizmo
      @CyberGizmo  3 місяці тому +2

      BORE is supposed to be pretty good I haven't tried it but will put it on the list, maybe do a massive processor scheduler bake off video

  • @user-mr3mf8lo7y
    @user-mr3mf8lo7y 3 місяці тому

    Would love to hear your thoughts about multi-core usage/management in Lnux and BSD, and in general, in terms of performance vs eletricity consumption. Always wondered whether more than 2 cores really worth for average user or not. Thanks,.

  • @mschmaUT
    @mschmaUT 3 місяці тому

    I'd like to see the difference between Liquorix, Xanmod and the newest Linux kernel.

  • @DudsMotorShop
    @DudsMotorShop 3 місяці тому

    I have been teasing Unisys via social media regarding controlling AMD CPUs as direct as possible with an updated version of the MCP/ Clearpath...

    • @CyberGizmo
      @CyberGizmo  3 місяці тому +1

      haha imagine MCP running on AMD, now that would be an OS 🤣

    • @DudsMotorShop
      @DudsMotorShop 3 місяці тому

      @@CyberGizmo I know that would be awesome as I understand the AMD chip architecture is not that much different than the Unisys A series...

  • @32bits-of-a-bus59
    @32bits-of-a-bus59 3 місяці тому

    Isn't the performance governor meant to be more about latency than a thruput?

    • @CyberGizmo
      @CyberGizmo  3 місяці тому

      Wow its been a very long time since we've measured processor performance in throughput, today we use elapsed time, so latency is a bigger factor in measuring performance. But yeah years ago we did measure throughput, but not anymore.

    • @32bits-of-a-bus59
      @32bits-of-a-bus59 3 місяці тому

      @@CyberGizmo Sorry for the confusion. I didn't mean the latency and thruput of CPU instructions -- that's too fine-grained. I meant the latency of a thread as a time it takes for a thread that was held blocked on I/O and now it can run again to get to the CPU. And thruput simply as a work done during a fixed unit of time. I suspect that lower latency (i.e. better I/O performance) comes at a price of fairness.

  • @JohnnieWalkerGreen
    @JohnnieWalkerGreen 3 місяці тому

    Make Algorithm Great Again!