Software Drag Racing: M1 vs ThreadRipper vs Pi

Поділитися
Вставка
  • Опубліковано 26 гру 2024

КОМЕНТАРІ • 707

  • @anonymousperson8998
    @anonymousperson8998 3 роки тому +473

    This channel makes me happy.

    • @TheLeketin
      @TheLeketin 3 роки тому +12

      I am not even a programmer but somehow listening to Dave is interesting and calming at the same time

    • @scbtripwire
      @scbtripwire 3 роки тому +3

      Totally! 🥰

    • @thebrainfan
      @thebrainfan 3 роки тому +2

      Me too

    • @DoozyBytes
      @DoozyBytes 3 роки тому +2

      He made me appreciate windows .... that says alooooot

    • @submissivepeanutbutter4030
      @submissivepeanutbutter4030 3 роки тому +5

      It's so calming, entertaining, educational and just plain fun.

  • @chrisb4009
    @chrisb4009 3 роки тому +291

    Thanks for giving your time freely to play with this sort of stuff.
    UA-cam is an amazing medium for us mortals to engage with interesting people like yourself.
    Keep up the great work 👍

  • @starskiiguy1489
    @starskiiguy1489 2 роки тому +5

    Thanks Dave I am a Software Engineer, just graduated from college and am starting out. I love your content. I once had a professor who said "Programming is wizardry, and programmers are wizards." Someday I hope to be as great a wizard as you buddy.

    • @nakfan
      @nakfan Рік тому

      All the best 👍 Per (DK)

  • @doncapo732
    @doncapo732 3 роки тому +30

    Mellow piano music, sparkly lights.. new Dave's Garage episode! ... It feels like Christmas! Dave thank you so much.. as always, top notch content.

  • @Geenimetsuri
    @Geenimetsuri 3 роки тому +7

    This is bloody brilliant.
    Also, the fact that Nano was used as the editor made my day. Kudos to you sir!

  • @burnte
    @burnte 3 роки тому

    I LOVE the fact you talk at a nice, normal pace. There are some channels I watch at 1.5x speed just to get them to talk at a normal pace.

  • @davidtipton514
    @davidtipton514 2 роки тому +1

    I just ran across your channel a week ago, and I'm really enjoying hearing your take on different programming issues! I used to work out the details of an algorithm using whatever scripting language was available on the platform, and once i had a solid plan, I would go back and rewrite it using C or FORTRAN or whatever else. This proved an effective way to cook up some great code that could do the job. Thanks for all of the great comments during your videos!

  • @mahdinejad
    @mahdinejad 3 роки тому +40

    That's the comparison that we needed but didn't know it!

  • @thatcreole9913
    @thatcreole9913 3 роки тому +18

    This has quickly become my favorite channel.

  • @brendolini
    @brendolini 3 роки тому +8

    This guy is what UA-cam should be

  • @KyleHarrisonRedacted
    @KyleHarrisonRedacted 3 роки тому +7

    I'm a simple man. I see Dave drop a video, I watch it. It's really not complicated. Your a legend dude 👏

  • @redhawkrobin
    @redhawkrobin 3 роки тому +206

    Dave you talk in perfect speed. For once I don't have to speed up the video I'm watching 🤣🤣

    • @DavesGarage
      @DavesGarage  3 роки тому +55

      That's funny ;-). Yup, I default to 1.25X I think!

    • @stephanc7192
      @stephanc7192 3 роки тому +4

      Agreed

    • @thomassmith4999
      @thomassmith4999 3 роки тому +3

      Zoomers

    • @redhawkrobin
      @redhawkrobin 3 роки тому

      Hahah :D

    • @fghsgh
      @fghsgh 3 роки тому +5

      I watch these at 2x speed. But then again, I watch most others at 3x.

  • @AllAmericanBeaner68
    @AllAmericanBeaner68 Рік тому +1

    As a car/drag racing enthusiast and hardware engineer learning to code this was an excellent episode. Just subbed!

  • @seanfaherty
    @seanfaherty 3 роки тому

    I don't usually notice background music without hating it but I think you found the right balance of musical complexity and intrusiveness

  • @TrevorDBrown
    @TrevorDBrown 3 роки тому +76

    Great video, as always! Maybe another metric to consider: price per pass? :)
    For example: the Pi 3B+, $35/305 ~ $0.11/pass

    • @donaldklopper
      @donaldklopper 3 роки тому +30

      And Watts consumed per pass ;-)

    • @VndNvwYvvSvv
      @VndNvwYvvSvv Рік тому +2

      @@donaldklopper yeah, outlay is usually nothing compared to power, in an industry. Outlay is usually only an issue for home and small businesses that let equipment sit idle 99.99% of the time, even while "working".

  • @itsHanibee
    @itsHanibee 3 роки тому +37

    Oh damn this is gonna get wild

  • @remicaron3191
    @remicaron3191 3 роки тому

    I actually like the speed you talk at. You're the only videos which I can watch a regular speed instead of 2x like most others and 1.5x for everything else.

  • @Ranoiaetep
    @Ranoiaetep 3 роки тому +87

    So I changed the vector to std::array, and got ~13000 passes on my m1 air. Fyi, it was ~4500 passes with vector.

    • @carlosmspk
      @carlosmspk 3 роки тому +8

      that probably would lead it to also be faster on the other implementations, since it becomes static memory

  • @mudstrand
    @mudstrand 3 роки тому +5

    This was unexpected. I ran the CPP code on a WSL 2 terminal running Ubunutu. The CPU on the box is an AMD Ryzen 3800X running at stock speeds. And still, it outpaced the Threadripper. The first run turned in a score of 9622!
    Passes: 9622, Time: 5.000000, Avg: 0.000520, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

  • @WeltInScherben
    @WeltInScherben 3 роки тому

    Hi Dave,
    thanks for producing this channel! Very enjoyable!
    I ran PrimeCPP on my 5950X in WSL2:
    Passes: 11267, Time: 5.000000, Avg: 0.000444, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
    Passes: 11327, Time: 5.000000, Avg: 0.000441, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
    Passes: 11346, Time: 5.000000, Avg: 0.000441, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

    • @DavesGarage
      @DavesGarage  3 роки тому +2

      Cool! I've seen a 12000 as well from another viewer, but I think he was overclocked!

    • @hrayz
      @hrayz 3 роки тому

      @@DavesGarage , User_Overclocked_Error - Only Machines Should Be Overclocked (0xB00B1377)

  • @betaraddish1351
    @betaraddish1351 3 роки тому +3

    I love that we are mathing it up on different systems.

  • @soniclab-cnc
    @soniclab-cnc 3 роки тому +7

    M1 is still very impressive for a very new product in it's first life cycle. Also factoring in the power consumption makes it look even more impressive.

    • @michaelhenecke
      @michaelhenecke 3 роки тому +1

      also cost makes it impressive for its performance you could get almost 3 Mac minis for the cost of just the threadripper chip

    • @jan-lukas
      @jan-lukas 3 роки тому

      @@michaelhenecke the threadripper is a server chip, no person needs that many cores

    • @gungrave10
      @gungrave10 3 роки тому

      @@jan-lukas Yep, and we can get a decent gaming laptop with mac price

  • @donaldklopper
    @donaldklopper 3 роки тому +3

    It'll be interesting to plot the same chart but divide by Watts used by the CPU.... Surprising results...
    And you mentioned Turbo Pascal! I like you.

  • @FufuFang
    @FufuFang 3 роки тому +6

    Thanks for the quality content. This is both entertaining and educational.

  • @aafulkerson
    @aafulkerson 3 роки тому +1

    Would be cool to see an optimized version of a wasm and Node benchmark in addition to the vector optimizations you made to the CPP benchmark!

  • @sashenko
    @sashenko 3 роки тому +17

    The showdown of the decade

  • @I_SEE_RED
    @I_SEE_RED 3 роки тому +4

    Thanks for making these, as a constantly learning programmer these are invaluable.

  • @fractal_lynn
    @fractal_lynn 3 роки тому +64

    I wrote a multithreaded solution to prime number generation in C++ a few months ago, it's actually not too hard to implement. Would be interesting to see how much the threadripper outpaces the M1 when you use all the cores lmao and would perhaps be a good next-step up from this.

    • @tommcintosh4705
      @tommcintosh4705 3 роки тому +8

      Single thread performance is still super important. So much software is single threaded.

    • @Zshazz
      @Zshazz 3 роки тому +24

      @@tommcintosh4705 Sure, it's important, but it's not more important than multithreaded performance. Things that tend to take a long time (e.g. compilation, 3d rendering, encoding video files, etc.) also tend to benefit from multiple threads, plus with more threads you can run more software concurrently (e.g. even if most software _was_ single threaded, being able to run more of it simultaneously could be a huge benefit).
      Also all current implementations of x86 has SMT: an optimization around the weakness that it has in purely single-threaded workloads by allowing a single core to do a bit more than one thread's worth of tasks at once (essentially, a lot of the core's resources are left idle by it's design, and that idle portion can be used to execute another thread at the same time). The M1 specifically has a relatively large advantage in that _one_ aspect, but essentially you're handicapping x86 by not letting it use it's benefits as well.
      Based on that, it's pretty misleading to show off single-threaded performance and act as if it's _that_ important of a metric.
      Edit: to be clear, I'm not saying Dave is being misleading here, but that Apple's sudden surge of "hey, check out the single-threaded performance of our M1 part and see how powerful it is, also do benchmarks with single threads plz thx bye" is misleading and the fact it's worked: many people are suddenly trying to come up with super synthetic benchmarks that show off this weakness of x86 and push it as a huge problem, when it is typically _not_ that huge of a deal in practical usage.

    • @fractal_lynn
      @fractal_lynn 3 роки тому +1

      @@tommcintosh4705 Well yeah

    • @nephatrine
      @nephatrine 3 роки тому +7

      @@tommcintosh4705 I find that very little software is still single-threaded nowadays. Even games which are often very intensive on a particular single thread are usually multithreaded.

    • @guiorgy
      @guiorgy 3 роки тому +5

      @@nephatrine Yup, no matter how much optimization you do on a single threaded code, it'll be hard to beat just spawning a crap ton of threads, even with bad optimization (if you can that is).
      I recently had a .Net code run on a single thread for almost 50 minutes (that was optimized), but running it on 12 threads got it below 5 minutes. Try doing that on 1 core I dare you. (Also, later I got it running on my GPU using OpenCL, run the same task in under 10 seconds XD)

  • @JonathanKingstonFear
    @JonathanKingstonFear 3 роки тому +142

    - Tell me you're a Windows developer without saying "I'm a Windows developer"
    - OK.exe

  • @IamusTheFox
    @IamusTheFox 3 роки тому +2

    I really appreciate you and your channel. This is a great example of a proper benchmark

  • @TheZonga
    @TheZonga 3 роки тому +2

    i'm aspiring to take my interest in tech further, and this channel is a reason for that!

  • @AndreDeLimburger
    @AndreDeLimburger 3 роки тому +1

    Thanks for this episode. Looking forwards to see how different compilers perform.

  • @RodrigoBadin
    @RodrigoBadin 3 роки тому +111

    Congrats, you are the first youtuber who convinced me to click on the Like button upfront.

  • @dustingibson4087
    @dustingibson4087 3 роки тому +10

    I run a Ryzen 1600 (14 nm version no OC 3.2-3.6 GHz clock speeds). And I got this result with g++ -Ofast
    Passes: 8427, Time: 5.000000, Avg: 0.000593, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
    I would expect it to be a lot lower.

    •  3 роки тому +1

      Got a similar result on my AMD Ryzen 7 4800H with Radeon Graphics, no OC in a Laptop.
      Passes: 9840, Time: 5.000000, Avg: 0.000508, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

    • @userPrehistoricman
      @userPrehistoricman 3 роки тому

      I got 8200 passes on Ryzen 3600X but compiled with MSVC. WTF?

    • @AntonyShen
      @AntonyShen 3 роки тому

      Using clang in Ubuntu 21.04, my Ryzen 4750GE w/o overclock:
      Passes: 10777, Time: 5.000000, Avg: 0.000464, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

    • @Ughmahedhurtz
      @Ughmahedhurtz 3 роки тому

      I was wondering what was going on; glad to see I'm not alone. 3900X @ 4.2GHz all-core OC -> Windows 10 -> VirtualBox VM running Mint 20.1 = 9384 passes.

    • @ollydecay7851
      @ollydecay7851 3 роки тому +1

      Running PrimeCPP on my iMac with a 10700K CPU results in:
      Passes: 8607, Time: 5.000000, Avg: 0.000581, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

  • @johnterpack3940
    @johnterpack3940 3 роки тому

    You have no idea how apt the drag racing analogy is. I've been working on my own cars for more than 40 years. I know my way around an engine. But the idea of tearing down and rebuilding a 10,000 HP engine in 45 minutes is basically sci-fi to me. Similarly, I've been playing with computers since my folks bought us an Apple IIe back in the mid '80s. But what you do here is basically voodoo. Sure, I understand the concepts. It's the depth and breadth of the minutia that impresses me. Fun stuff.

  • @Timooooooooooooooo
    @Timooooooooooooooo 3 роки тому +1

    Great work once again, Dave!
    The subtitles are helpful, especially because I watch at 2x speed. There were a couple places where they were missing. I remember one when you were talking about BTR in the beginning, and one when you were talking about the bugs found in your code.
    Edit: and the entire Python apologetics chapter

  • @chriskingston4270
    @chriskingston4270 3 роки тому

    99.2 K Subs as I type this! You found your groove and your channel is growing nicely! I remember (as it was not so long ago) joining when your sub count measured in the hundreds. I do hope that you will continue to feature automotive content and tech projects as well. Well done, Dave!

  • @MikeKoss
    @MikeKoss 3 роки тому +47

    Sorry about your stroke, Dave. Rapid recovery! 😁

  • @kbates666
    @kbates666 3 роки тому

    You and curious Marc are my favorite UA-camrs right now

  • @asitisrequiredasitisrequir3411
    @asitisrequiredasitisrequir3411 3 роки тому +4

    the Threadrippers and zen2 in general are such beasts man.

  • @hbm293
    @hbm293 3 роки тому +1

    10:23 Nice of you to have mentioned the std::vector thing, that was discussed in some comments of the previous video.
    It would be interesting to see whether its template specialization in your STL implementation was done actually with bitfields (and if so, what are the differences compared to your bitfield manipulation), or using actual 1-byte bools (that would be then byte-aligned)...

  • @ryshask
    @ryshask Рік тому

    becoming one of my favorite channels.

  • @bpomowe224
    @bpomowe224 2 роки тому

    Dave, I can''t program anything more advanced than a PLC, but when ever a page with your videos load, I hit the thumb ups regardless, as you always increase my understanding of the stuff I have no knowledge in. Thank you !

    • @stonent
      @stonent Рік тому

      Code a prime calculator in ladder logic ;)

    • @bpomowe224
      @bpomowe224 Рік тому

      @@stonent I do most of the stuff in FB, but point taken lol

  • @Anna-ff2hn
    @Anna-ff2hn 3 роки тому +1

    I smashed the thumbs-up button. I couldn't argue with your logic.

    • @DavesGarage
      @DavesGarage  3 роки тому +1

      You smashed it? Do I sound like Peter McKinnon? You can just lightly click it. But I thank you nonetheless!

  • @MrCOPYPASTE
    @MrCOPYPASTE 3 роки тому

    Mr. Dave you're one of the best content creators that I had the pleasure to find on UA-cam

  • @SManey81
    @SManey81 3 роки тому +1

    Dave, I am really enjoying your videos! I am currently studying Computer Science in school and hope to pursue a career in programming and your videos are inspiring me to continue my pursuits!

  • @JaseTheAussie
    @JaseTheAussie 3 роки тому

    Really entertaining - the right balance of tech with humor i enjoy - and always stay for the outtakes - Thanks Dave

  • @AaronHulse1956
    @AaronHulse1956 3 роки тому +1

    Dude, well done.

  • @mrt1r
    @mrt1r 3 роки тому +5

    I love that your terminal window is blue with light grey text.

  • @wizkid723
    @wizkid723 3 роки тому +3

    Nice information, glad you brought up that Python isn't the answer to all code. Lately with all the do it in python rant in alot of the developer areas, its nice to hear use the language that makes sense for the task at hand. Thanks again!

    • @vipertact
      @vipertact 3 роки тому

      Some coders want everything available in the language they already know. That's how we got the do it all in Python crowd and do it all in JavaScript crowd as well.

    • @MacPhantom
      @MacPhantom 3 роки тому

      I do heaps of programming with deep learning, sometimes Web server logic, etc. A lot also includes prototyping, so my calculations of "speed" always include how long I need to code.
      Sure, had I written my code in pure C/C++/etc., it probably would have been 100 times faster than it is now. But I need to get stuff done instead of obsessing on how low-level I can get. Had I done that, I would probably have finished 10% of my work shortly before retirement in a couple of decades.
      It's perfectly sensible that there's languages on so many levels (no pun intended). No point on starting a war over _that_, too.
      Except for R. This just sucks. ;)

  • @urbaniv
    @urbaniv 3 роки тому +1

    Juhu don't know why a Video like that makes me that Happy

  • @DadofScience
    @DadofScience Рік тому +1

    I'm really getting a lot out of your content, Dave. Many thanks.

  • @MichelHermier
    @MichelHermier 3 роки тому +73

    Hi, would be nice if the github url was mentioned in the description. Otherwise nice episode.

    • @jmr
      @jmr 3 роки тому +16

      Looks like he fixed that.

  • @jaybinks
    @jaybinks 3 роки тому

    Thanks @DavePL, there goes a few hours on my long weekend playing with this :) Great content BTW now one of my favourite channels.

    • @jaybinks
      @jaybinks 3 роки тому

      I was about to go and write GoLang, PHP, Pascal implementations, then I saw all the existing implementations and now I'm not sure its worth just being another "me too" :)

    • @jaybinks
      @jaybinks 3 роки тому

      Interestingly the CPP versions of this achieve 4820 on my super old i7-870. FYI I achieved 8221 on my i9-9900K

  • @AndrewColbeck
    @AndrewColbeck 3 роки тому

    Dave, I love the content and the upvote is worth it just because you bothered to make chapter markers in this video!

  • @Bob3519
    @Bob3519 3 роки тому +3

    Dang! That's just peachy, a (former) Microsoft employee has forced me to upgrade once again. I just upgraded to a subscriber.😁 Thank you for the great content.

  • @fellipec
    @fellipec 3 роки тому

    For giggles I run Dave's code on my computers here. The Windows boxes (Ryzen 5 and Intel i5) run the g++ in Debian under WSL2, the other machines run Debian or Raspberry Pi OS on bare metal. To be honest, I'm very impressed with the Ryzen.
    AMD Ryzen 5 3600X 6-Core Processor => Passes: 9605
    AMD Athlon(tm) II X3 460 Processor => Passes: 3642
    Intel(R) Core(TM) i3-4005U CPU => Passes: 2551
    Intel(R) Atom(TM) CPU N270 => Passes: 911
    Intel(R) Atom(TM) CPU N450 => Passes: 871
    Raspberry Pi 3 => Passes: 764

  • @An.Individual
    @An.Individual 3 роки тому +28

    This will be of no interest to anyone but a Pi 1 Model B (from 2012) achieves a score of 97

    • @leftysmalls
      @leftysmalls 3 роки тому +2

      It makes me happy to know! Thank you for sharing!

  • @williamlidberg737
    @williamlidberg737 3 роки тому +3

    This is so detailed and neerdy. I love it!

  • @bsmith2053
    @bsmith2053 3 роки тому

    I assume if you went around saying "You had the best tool", you may in fact be THAT tool. Great info and done with a sense of humour, logic and pragmatism that seems to be a rarity these days. Keep it up.

  • @Soundy777
    @Soundy777 3 роки тому +10

    The bloopers got me! Whole ep of gag reel please lololololol

  • @marcuskobel6562
    @marcuskobel6562 3 роки тому +1

    Great channel Dave, lots of great info. Hope you can help folks porting Windows to the raspberry with your knowledge.

  • @SapphireTvYt
    @SapphireTvYt 3 роки тому

    I gave that feedback about talking speed, and he kept that in mind 😀. Hats off sir.

    • @DavesGarage
      @DavesGarage  3 роки тому

      Glad you liked it! I'm always paying attention and trying :-)

  • @kurisu-tina
    @kurisu-tina 3 роки тому

    The video on compiler performance should be interesting, I'm getting 9000-10000 on my 3600 with gcc-clang (clang a little bit faster) while it shouldn't be that faster than the 3970x in single core
    One interesting thing is that replacing vector with a pretty simple bitset gives a pretty good speedup in clang, while not in gcc
    Nice video Dave

    • @excitedbox5705
      @excitedbox5705 3 роки тому

      I think due to thermals the threadripper runs slower per core while having more of them. So the higher boost clock gives you an advantage.

  • @LanceMcCarthy
    @LanceMcCarthy 3 роки тому +14

    Even though I am current swinging in a hammock, in front of a volcano in Costa Rica, I could not miss a Dave's Garage premiere.

    • @johnvonhorn2942
      @johnvonhorn2942 3 роки тому

      Living the dream!

    • @SCP-POOL
      @SCP-POOL 3 роки тому

      I may be joining you, Liberal Lunatic Free Zone...

  • @srideepprasad
    @srideepprasad 3 роки тому

    The .exe extension at 6:36 does reveal your Windows roots..
    Well presented and articulated though, as always.
    Great job!

  • @aceyriot
    @aceyriot 3 роки тому

    I've watched so many of your videos that I was amused that I was not already subbed. Well I fixed that bug. Speaking of bugs, could you do a video about all the rare bugs you know about? Always found that fun.

  • @dalewyatt8507
    @dalewyatt8507 3 роки тому +1

    Dave you rock! I love your channel!!

  • @painsme2
    @painsme2 3 роки тому

    Enjoy the channel. Good stories and random bloopers. Cheers!

  • @_matte
    @_matte 3 роки тому

    Got ~10k on an old 6600k and was sort of surprised, but in the end it makes sense as it's a single core workload. Great video.

  • @Thumper68
    @Thumper68 3 роки тому +5

    3 haters who don’t have any clue what he’s talking about. I mean I know what he’s talking about but don’t know how to do it...but I don’t hate Thanks for entertaining content!

  • @Mufozon
    @Mufozon 3 роки тому

    1:16 Hell yes! Thumbs up and subscribed right away. You manage time very well in all videos i have seen so far.

  • @nasnema
    @nasnema 3 роки тому

    I thought I had a stroke when I saw Cascade working on my shared control system in 1988, maybe 90. It was so funny it deserved to get shared.

  • @chrisp6015
    @chrisp6015 3 роки тому +23

    I would love to see a drag race between C++ and Rust!

  • @aperture147
    @aperture147 Рік тому

    Apreciate your efford to include a subtitle in an informative video like this. You talk like a C program runs on a newest CPU when my brain is a pentium 3 running Java which is constantly overheating

  • @simonfarre4907
    @simonfarre4907 3 роки тому +17

    at 10:03 your testing of index % 2 == 0 and index & 1 == 0 - only makes a difference if you are running in debug, not in release mode, as release mode will always compiled SomeVariable % 2 == 0 to the more optimized version (i.e. not use modulo explicitly as it is a very costly operation, in relative terms).

    • @nayjames123
      @nayjames123 3 роки тому +2

      For the record, gcc and clang won't use modulo explicitly in debug builds if index is unsigned, msvc will. However if index is signed, msvc and gcc wont use modulo but clang will.

    • @simonfarre4907
      @simonfarre4907 3 роки тому

      @@pikachulovesketchup666 of course all compilers does it, my observation was simply about debug vs release builds, and as Nathan showed thats not the entire story today.

  • @qqii
    @qqii 3 роки тому +3

    > When it comes to gaming and other certain workloads that [single core performance ] is the reality of what matter
    Luckily that been slowly changing since Moore's law has broken down and cpu manufactures have been adding more cores! There are going to be workloads that can never be parallel but luckily there's a lot of low hanging fruit for typical applications to add parallelism.

  • @InformaticageNJP
    @InformaticageNJP 3 роки тому +3

    Much love and appreciation from the Italian computer science UA-cam community!

  • @geehaf
    @geehaf 3 роки тому

    Love this follow-up to the first SW drag race video...and we get bloopers! Great work Dave (and production staff?) :)

    • @DavesGarage
      @DavesGarage  3 роки тому +1

      Just me and a couple of shop dogs! Maybe at 200K I can hire a student editor :-)

  • @kungfujesus06
    @kungfujesus06 3 роки тому +4

    ARM is a load-store ISA but presumably Apple did something for x86 emulation that allowed it to operate in a register-memory manner. Not sure if that applies to native ARM code or not. ARM definitely has some bit twiddling instruction, I'd be a little surprised if the compiler is generating shifted bit masks and ANDs for your bit test.
    For the scalar pipeline ARM's 32 bit ISA had predication but it looks like aarch64 dropped that complexity. What you really want to maximize your integer throughput is something that auto vectorized (or explicitly vectorize it yourself with the neon intrinisics). Of course if I remember from your last video, this code has integer division in it, which takes a huge performance hit for all architectures in terms of latency. X86 and ARM both lack vectorized division due to the ridiculously complicated amount of gyrations that have to occur in the ALU for it.
    That having been said, I haven't finished your video yet, I'm only 5 minutes in. I'm curious how this goes.

  • @danielhidefjall5060
    @danielhidefjall5060 3 роки тому

    My time feels valued

  • @quincy1048
    @quincy1048 3 роки тому +1

    love it, was curious about the M1...don't have one...not in a hurry to get one...but curious where Apple is headed with it. Looking forward to your compiler comparison. Also something I don't get to look at much...in my world it's visual studio...and you live with it. But I know from prior experience that is not the only game out there.

  • @berndeckenfels
    @berndeckenfels 3 роки тому +6

    Naming the output .exe is well played ,)

  • @Jimfowler82
    @Jimfowler82 3 роки тому +1

    Love your videos Dave all the way from uk

  • @higurashinerd
    @higurashinerd 3 роки тому +4

    17k views and over 4k likes.
    Such a stroooong like to view ratio. You're going very strong here, Dave!
    Best of luck!

  • @nickwallette6201
    @nickwallette6201 3 роки тому

    Neat. Somehow, _all_ the results were actually impressive.
    The lowly Pi 3 is impressive for how narrow the delta actually is between cheapest possible self-contained computer and a TOTL desktop CPU.
    The Pi 4 for how much tighter that gap.
    The M1 for being a brand new product with the slider pegged dead in the middle between "optimized for low power" and "optimized for high performance."
    And, of course, the Threadripper for having the biggest 🥜 of just about any CPU available. haha

  • @vadymzimin5838
    @vadymzimin5838 3 роки тому

    Controversial/manipulative test. Changing vector to vector puts M1 on par with TR. and with C array it scores 13.5k (C array improves result for x86 as well)

    • @DavesGarage
      @DavesGarage  3 роки тому

      vector wastes 7 bit per byte and can't accomodate the larger sieves though!

    • @vadymzimin5838
      @vadymzimin5838 3 роки тому

      @@DavesGarage true. But sdt::vector implementation appears to be suboptimal for arm systems. I wonder how it gonna work with manual bitmasks. I would expect ~2x improvement for rPi

  • @pauljones9150
    @pauljones9150 3 роки тому

    Good stuff. Saskatooner here
    Dunno if Rosetta code already has a prime sieve implementation or not, but this could make a fine addition

  • @MegaManNeo
    @MegaManNeo 3 роки тому

    Watching you never gets old.

  • @guyman8282
    @guyman8282 3 роки тому +1

    Great video! I'd like to see a CPP vs Rust vs Go showdown

  • @eformance
    @eformance 3 роки тому +2

    The highest result I saw with the code from this video is 7929 on a Ryzen 9 3900X with g++, 7877 with clang. With the repo code it was 10191 (g++) and 10684 (clang). In theory the Ryzen 7 5800X should be about the fastest.

    • @thebosss435
      @thebosss435 3 роки тому

      got 8908 on my R7 5800x
      but using the standard run.cmd from the github

  • @jeffyp2483
    @jeffyp2483 Рік тому

    just a few years ago (seems that way anyway) single core ipc would be the main thing to look at regarding games, but i have recently started playing pc games again and nearly all of them either use multiple threads/cores or in some cases require them. multithread/core performance is more important to games now.

  • @stevenclark2188
    @stevenclark2188 3 роки тому +1

    My guess? Any RISC instruction set is probably not going to perform it's best in this workload which is very load/store intensive. Stuff that chews on a few registers is where ARM/MIPS/RISC-V shine, not striding over a list where memory accesses are a separate instruction.

  • @5FeetUnder__
    @5FeetUnder__ 3 роки тому

    I wonder how the raspis would fare when overclocked, would love to see that comparison as well.
    Also, your speech became much easier to understand since the blue screen video, so that's nice for all non-native English speakers (:

  • @simonfarre4907
    @simonfarre4907 3 роки тому +3

    I wish you would have just added one more benchmark. It could have been the same problem to solve, only allowed for multi threading. That way, we could also speculate as to what performance degradation we could (maybe/maybe not) see from the M1 doing what it does.

  • @JakePomperada
    @JakePomperada 3 роки тому

    Thank you Dave for sharing this video.

  • @mikus4242
    @mikus4242 3 роки тому +5

    Closin in on 100k!

  • @THEBIGMAX2020
    @THEBIGMAX2020 3 роки тому

    Love the increase in editing quality

    • @DavesGarage
      @DavesGarage  3 роки тому

      Thanks! Not sure which ones you mean compared to, but at least it's the right direction!

  • @nathana.7473
    @nathana.7473 3 роки тому +59

    It would be interesting to include x64/Rosetta vs. arm64/native on the M1...

    • @blooddude
      @blooddude 3 роки тому +2

      I agree the M1 is definitely doing something interesting for x86 emulation, though it appears to be just adding hardware support for strong memory ordering when running code intended for the x86, which given the cache heavy nature of this benchmark probably wouldn’t have much effect.

    • @andrewdunbar828
      @andrewdunbar828 3 роки тому +3

      This was the straw that broke the camel's back in favour of me buying an M1 Mac after a decade of netbooks and secondhand business laptops from Japan. The high performance with long battery life and low heat output got me close, but not close enough to fork out the $$$ until I saw even the x86 emulation was sometimes faster than on x86 hardware.

    • @blooddude
      @blooddude 3 роки тому +4

      @@andrewdunbar828 what makes Dave’s tests here interesting is that the M1 is a Laptop CPU... the Threadripper is a Desktop CPU. It will be fun to see what Apple do in the Desktop space with their ARM implementation!

    • @PeterDBalazs
      @PeterDBalazs 3 роки тому

      @@blooddude I might be mistaken, but from what I know the ARM based architectures don't scale that well.

    • @markbrown8097
      @markbrown8097 3 роки тому

      @@blooddude if anything

  • @rasmusjonsson1348
    @rasmusjonsson1348 3 роки тому

    Hello!
    I figured it out I think.
    The M1:s performance cores runs at 3200MHz, the AMD 3970X runs at 3700MHz.
    If we assume that the code is written (I have not looked at i yet) in such a way that the compiler simply can't optimize the code due to the way the instructions depend on the result of a prior instruction (hop it makes sense), the result of each CPU is roughly 2 cycles per MHz.
    So the M1 would score 3200*2 = 6400 and AMD 3970X would score 3700*2 = 7400.
    It lines up quite nicely, but it does not test the difference in architecture in a good way.