It would be nice to have a video commenting on how Rust and Zig (and their respective codes used in this challenge) managed to be so much faster than the competition. Very cool stuff!
I was sleeping through the video, so I'm not sure if Dave addressed it, but compile time code execution kind of send like cheating. Especially since specific information about primes other than 2 wasn't allowed.
Dave's website has a section with constantly updated results for the calculations. The problem with determining a Bottom Five is that it's much more likely to find the most poorly written code than which language is the slowest.
Oh wow, that is an incredible result for Zig, would be cool to compare the assembly listing of the Zig compiler vs C to see where it was able to gain such a huge advantage.
The Zig result is slightly less incredible when you see it's using a -no-mt variant that only spawns 16 threads, and the performance metric chosen is passes per second *per thread* on a 16c/32t CPU: Zig 16t: 805969 passes, 10074 passes/s/thread Zig 32t: 899141 passes, 5620 passes/s/thread Rust 32t: 948969 passes, 5930 passes/s/thread
It's also way less impressive once you realize that Dave showcased the leaderboard that had way less participation involved. Clearly, the fiercest competition was on the one labelled "Leaderboard", that's why it has over 50 languages represented rather than 10 for the multithreaded one. I honestly think it's really disrespectful to not even mention the single-threaded leaderboard when so much work has clearly gone into those submissions.
The difference in speed between the leader Zig and the second in the position of Rust was a big surprise for me. I kindly request to have in a future episode a more complete speed test (benchmark) between these two. In all cases thanks a lot Dave!
@@hamzanajji8615 zig still uses the llvm toolchain for the compiler. the main reason zig is so much faster however is the compiler scans through for code in which the output can be determined at compile time, and replaces code with that. C++ has something similar with constexpr but iirc its not as powerful.
It would be great to compare the generated assembly for each of the languages to see why the performance for each one is the way it is! Also, it would be cool to do some performance profiling to see where the bottlenecks of each implementation is. Great stuff!
For those that compiled to assembly, I agree completely I wonder if zig generated a well vectorized code, with precompiled sections. I've worked in HPC for decades. I write in C with instrinsics and assembly. I doubt there is any assembly generated by zig that I couldn't achieve with C. I guess the only argument is that the Zig code might be easier to read. Which may be a winning point, because some of the code I have to deal with is obtuse in the extreme
@@tolkienfan1972There's no magic. It's just that zig is not using hyperthreading, thus getting same performance out of 16 threads, as rust and all other langs from 32. Points are given for iterations PER THREAD.
@@dolorsitametblue if that's true, then it would be trivial to do the same for the other languages. Given that hasn't been done, it seems unlikely that that's the true reason, and I also suspect better vectorisation
@DavesGarage If this is true, that would invalidate Zig's score as it wouldn't be an apples-to-apples comparison. If the rest of the languages are using 32 threads, and the scores are calculated on a per-thread basis, Zig should also have to use 32 threads with hyperthreading to make it a fair comparison. Either that, or disable hyperthreading in the BIOS and have ALL programs use 16 threads without hyperthreading to make it a fair comparison.
@@noomade yes, it sounds like the ranking was per second per thread and the Zig submission was only 16 threads. Rust submission as 32 threads. It was more per completions per second but way less per thread. Seems like that could be optimized out.
@@NickWindham Ah, yeah, a smart person would eliminate the second logical core on each physical core, because effectively two hyperthreads can only run at like 55-65% the bandwidth of two proper physical cores.
I'm definitely biased as a Zig fan, but yeah... It's not super legit as a victory, Rust by most comparisons squeaks out a win of +5% or more. I definitely like the composability and readability of Zig better for my brain, but Rust can definitely create the absolute optimal if that is what you need out of algorithms. I'd argue that the other zig features make up for that little performance difference, but its super subjective
I honestly would love a follow up video going over the compiled code for Zig vs the other languages. The gap is so huge there must be something going on, or at least my intuition tells me so.
Yes, a badly chosen metric is the reason why zig basically got a x2 bonus: It uses 16 of 32 threads, so no Hyperthreading, which seems to make only a minor difference in overall throughput, but since Dave uses Pass/s/t, i.e. full runs of sieve per second per thread, you get a skewed result (or I am too stupid to understand why this is legitimate).
@@grafgrantula6100 The zig solution has a lot of thread contention which means that an additional 16 threads don't help. With 16 threads for both rust and zig, zig still wins. It's literally just a comparison of different solutions. That being said, zig is very useful for being able to express very unique solutions without any overhead in how the code is written or in performance. The zig solution is the best one for 16 threads, which was set out to be the target platform. If you say that it performs worse on 32 threads, that doesn't matter, because it wasn't made for that.
@@N00byEdge Sure if you only consider 16 threads then it’s the fastest, but that’s because it’s the only solution that uses 16 threads. The target platform was set out to be a 16c/32t processor so choosing 16 threads as the upper limit of threads is not utilising the hardware to its limit. Passes/second/thread is a bad metric to use for the final ranking.
@@siematos1099 If the intention was to optimise for passes/second/thread then all of the comparisons should have been made with single threaded runs. Compared to the multi threaded results the single threaded ones completely blow all of the multi thread results out of the water when using that metric.
@@Nox3x3 Yup it's disappointing. Most of the good solutions are _only_ on the single-threaded leaderboard, since this was understood to be main leaderboard. Hopefully Dave will do a follow up and highlight those solutions too: people put a lot of work into them.
Testing Prime sieves is a fabulous thing to do, but it seems to me that a better test would use a few different kinds of problems and report results in each category, as well as a composite result, rather than just a single problem. I expect that certain languages would be better at one class of problems than another, so on an individual class of problems, the ranking would, I think be different than on another class of problems. I hope that you will seriously consider creating an updated competition accordingly.
if you had truly been doing 30 years of embedded you would have noticed with a quick glance at the code that these results are cheated. They didnt accept optimized submissions for C, C++, Rust and other languages just for the sake of making Zig win, not to mention the way the threading was programmed in the Zig code was completely different from the way it was implemented in the rest of the languages, meaning you're basically comparing completely different programs at that point. Everyone participated with a rule-instructed handicap, but Zig was on wistrol during the race.
I'm wondering how large the binaries where and whether a project programmed in Zig would even fit into or run on a bare-bones embedded microcontroller.
Fantastic series and results! Thanks to all who contributed, but especially the hard work of your small team. It would be awesome to take the framework you've built here to implement different algorithms across these languages and see what changes.
These results are definitely surprising to me, I expected all the LLVM languages to be have very similar results. I wonder why the top two languages performed so much better, all other things being equal.
@@Nox3x3 Just to make sure I'm not missing something, are the Rust and Zig implementations using multithread vs C and C++ using only single thread? That would be a completely unfair apples to oranges comparison. Like others have commented, the difference shouldn't be this large. Something else must be going on. If this is really legit I would ask that the generated assembly be compared.
@@greatwolf. Looking at the scoreboard it looks like he's reporting the passes per second per thread score, but that seems a little funky. There's something weird going on with the scoreboard for that particular metric because the Zig solution runs with half the cores available while nearly every other solution runs with thread counts equal to core counts (E.G. on a 32 core system the Zig solution would use 16 threads while nearly every other language would use 32 threads). That results in Zig actually having a lower total pass count, but a higher pass per thread count. E.G. In one example report Rust turns in the 1st place result with 946K passes in 5 seconds as opposed to Zig in second place with 838K passes in 5 seconds, but due to the lower total thread count of the Zig solution when normalized across threads it ends up with 10K per thread as opposed to Rusts 5.9K per thread. As for why there's such a huge performance gap between Rust and Zig and everything else, I'm not entirely sure. It's possible it's just down to not as much effort having been put into tuning the other solutions, or maybe there's some gotcha related to the way the algorithm has been implemented. I personally suspect that there's some kind of optimization that's being done behind the scenes by the compilers that gives Rust and Zig a significant edge (and arguably might be violating the spirit if not the rules of the competition). Maybe there's some new-ish LLVM optimizations that both Zig and Rust can take advantage of that others are missing out on? The C++ solution is using clang to compile which I would expect would also be able to take care of similar LLVM optimizations if they exist, but maybe it's missing some pragmas or something that it would need to do so. Part of the problem with these sorts of benchmark competitions is of course that they of necessity ignore things like readability and maintainability, something that a lot of languages will take performance hits in order to improve. I think an interesting take on this would be a version that that follows each languages best practices and style guides. Of course then you have the problem of deciding what that actually means since there's often disagreement about what's actually best practice.
@@orclev Regarding the "best practice" part, not only is there disagreement on what that means, but it also depends on context. What might normally count as "best practice" may go straight out of the window when the thing you actually need to do is optimize some hot loop. Suddenly, "best practice" becomes checking the assembly output, inserting compiler hints like [inline] or [unreachable], running performance-guided compilation, customizing optimization passes schedule and purposefully omitting runtime checks in favor of manual proofs of correctness.
@@user-py9cy1sy9u The scoreboard itself uses the terms core count and physical core count, E.G. that same Ryzen processor is listed as 32 core count, 16 physical core count. It is an interesting point that maybe Zig is scaling based on the physical core count, although it raises the question of why the other languages don't.
Those results are surprising. I wouldn't have expected Zig (or any language) to improve over C or Rust performance in any significant way. I'd really like to see a breakdown of where those performance gains came from.
Dave's results page shows that Zig only beat rust in the performance per thread category because it was the only build which used 16 threads rather than 32. The machine is 16 core hyper threaded, so it's pretty obvious why 16 threads would beat 32 on a per thread basis.
And in my experience the thing you said is in fact true. Did some naive benchmarking myself. At least in the things I tried rust is always slower than C about 2-10%
@@sparky173j Thanks for "catching" and pointing this out, as it seems to be a bit of an "oversight" in the rules/conditions for the "race" !? Which leads my uneducated mind to a couple of questions *Firstly, could there be any "good reason" to measure the performance "per thread", as You say is done, rather than just (naively) "measure performance/time" ? * And secondly, if there is such a "penalty" from utilising "more threads than cores", how come only ZIG was the only language that had "it's winning contender" coded that way . I would have thought that some "C", "C++" or "Rust-Coder" would have taken the "top contender"for "his language". And then ""simply converted it"" to only use 16-threads. Thereby producing the most "performant" ((according to the "rules and conditions" for this contest)) version of the program for "his language" Anywhoo thanks for sharing Your knowledge. Best regards.
@@sparky173j Uhh I just went to the result browser and you are right. Dave's results is the "operations per second PER THREAD", but he totally ignored that different solutions use different AMOUNTS of threads. And in some cases he ran more threads than the CPU has actual cores. What a bad benchmark metric! If you sort by total operations, Rust wins. In fact one guy "rbergen" has 6-core results for i7-9750H where everything runs with 6 threads and the leaderboard is: Rust wins at 6590/sec/thread, then Zig with 4754. Go into the result list, choose "Filter preset: Multithreaded leaderboard" and see for yourself. In fact, I just checked davepl's own Ryzen 5950X results from today, and his 16-thread results are Rust winning with 11172/sec, and Zig second with 10500/sec. Does he not know how to read his own results? It just goes to show that this video is a typical Dave Plummer "programming drag racing" video: Badly researched and badly presented, lol. This is not the first time I've taken issue with his videos about this topic.
They cheated and precomputed a bunch of primes at compile time. Zig defenders argue that "this is a key feature of zig! So thats why it should be allowed!" But rust macros could do that too and they didn't because it was against the rules. Why Dave allowed zig to count with this implementation is beyond me.
@@jabadahut50 The difference is that the Zig compiler supports this out of the box. While in Rust you would have to write your own macro's, which would technically be against the rules as the written software needs to be as close to the original algorithm as possible. This is also probably one of the reasons why Assembly isn't in the Top 5, because compilers can 'cheat', while a solution written by humans needs to be true to the original in form as described by the rules.
It would be interesting to see a breakdown on why the top 5 languages differ so much in performance. Definitely gained a new respect for Zig and Rust today. Great video : )
Me too - and if it's so easy to use libraries written in Rust in C-code, I started to wonder if I should start learning Rust and consider that instead of pure C in implementation of a library I'm writing (similar to aalib and cacalib, but somewhat different approach taking advantage of 256-color terminal emulators and extended ascii/utf-8 dithered block characters). Provide it as Rust library, and as C-library wrapper around it - and as Perl module using the C-library, as that was one of my original goals as well.
There are 3 kinds of lies: small lies, big lies, and computer benchmarks. Not saying that this benchmark has been cheated, but you can't judge the performance of a language on a single program. I am very surprised that Dave, with his experience and knowledge, doesn't know that. This test has exerted only 5% of the features of each language (mainly the optimization of a small inner loop) and is therefore pretty meaningless. Since many years, the most significant benchmark of language speed is "The Computer Language Benchmark Game" (it has changed name several times along the years, it used to be "The great computer language shootout"). This benchmark tests a bunch of languages and compilers accross a dozen representative programs that togethre test most of the core features of each language. In the same fashion, people are free to optimize the code of their favorite language and calls to libraries other than the standard library that comes with the language are not accepted. In my experience, this benchmark has been quite representative of real world performance. In that benchmark, C is still the fastest, followed by C++, and Rust just behind. Java falls behind Julia, C#, Ada, Haskell, and is roughly on par with Go and Swift, the difference between these languages in the real world would be less significant than the skill of the programmers. Unfortunately, Zig isn't tested.
I wouldn't have expected the performance jumps between the three fastest languages. I assumed that is almost equal and some language is marginally faster than the other. But the jump from C to rust and from rust to zig is insane.
There's definitely something strange there. Either the Zig and Rust implementations have been so well optimized and say, C and C++ not, which I highly doubt, or the respective compilers are just that much better, which I also doubt, or there is something else at play here.
@@nowave7 It could be interesting to recompile those two and c or c++ to see the difference. Did it optimize some of the operations into matrices and made use of SSE2 or other extensions? Maybe really efficient and clever use of the extended registers? Or, due to their better compile time memory handling, they were able to skip some of the more costly memory operations?
Zig's speed doesn't surprise me at all. One secret that Zig has is compile-time execution; you can program in Zig itself to do some of the computations at compile time. It's possible to replicate this in C++ with a judicious use of template metaprogramming (and constexpr), but the template metalanguage is not really the same "language", and it's much harder to use. It's much more weakly typed and therefore error-prone, for a start. This is especially useful on this problem, such as to compute the wheel settings in advance. In C or C++, you'd have to either calculate it at run-time, or write a code generator to generate arrays with the data in them, or something like that. (Note that simply dumping the wheel settings into array and checking that in would probably be acceptable in a "real" program, but it's against Dave's rules.) IMO, good support for multi-stage execution should be mandatory for any modern language that claims to be about performance. Note that this isn't why Zig won, but it's an advantage that Zig has over the other contenders.
@@dolorsitametblue since it is hyper threaded and all the calculations will use the same parts of each core, hyperthreading doesn’t produce any extra performance really, probably lower due to the overhead of handling the extra threads. A better performance metric should have been the overall number of passes. It currently just makes zig look like two times better because it doesn’t use hyperthreading.
@@DeGuerre In modern C++ you could do all of that with constexpr which would not be a different language at all. No need for template metaprogramming unless you are using an older version of the language.
I've been considering learning Rust, and this really gives me a good reason to start looking into it. I'd be interested to see a breakdown of languages that fit similar niches compared to each other in a future video now that you have so much optimized data.
Can't wait for the in depth analysis videos, I'm quite interested by what could be going on for such a drastic performance between the 'top' programming languages!
Interesting results. In fact, so interesting that I think Dave or someone else is bound to jump on some of the questions this raises. In terms of the metric, comparison, threads, the algorithms and whatnot. I can't wait to see the follow-ups on this.
Wow. I was NOT expecting this much if a difference. Also kinda surprised Java made the cut! A video dissecting and explaining the differences between results would be SUPER rad.
Almost 10x difference between C and Zig indirectly point that different implementations (aka algorithms) have ben used for different languages. I.e. author of each implementation write the code on their own, instead of implement the same algorithm. If so, then these tests actually testing nothing.
@@AlexanderBorshak I haven't read the code base for these tests, but translate the algorithm from language to language doesn't make use a specific advantages and optimizations from any of them, making in my opinion, a very bios bench.. If this bench is made on the best implementation possible on each language, then for me, it's a fair bench for every language, since if you have features that make your code better and faster, you will simply use it.
@@btotta Yes and no at the same time. If someone writes algo in Java with complexity O(n), another in C with complexity O(3*n), and in Zig w/ complexity O(2*log(n)), then what we are trying to compare here? Obviously not the language's speed. I've landed on this video from Reddit, there were mentioned that for Zig the SIMD instructions were enabled, while for C is not - possibly even because Zig compiler allows enabling SIMD without any effort when used C compiler requires much more effort to use SIMD. If so, these tests barely can be named as languages speed benchmarks, IMO. I.e. we can not rely on the tests results and say "Zig is 7 times faster than C" because that is not true and these tests do not reflect the actual speed of compared languages.
I think it's really interesting that Rust and Zig were so much faster compared to the C/C++ versions. I'm definetly looking forward to a deeper analysis of the solutions.
@@antronixful zig was 2x Rust, but rust was 3x C/C++, so I'd say rust is closer to zig than to C/C++ and definitely not an order of magnitude faster than rust.
@@snapstromegon the ratio is something like 1:3:6... thus, 3-1 < 6-3 i'm not mocking or anything, but try to scale that to omega big numbers, you don't have to look the samples separately, look at the entire "universe", that in this case is {language in top five | language=zig or language= rust or language=c(and others)} also, it is important to calculate the distance between samples to make them more comparable to make statistical analysis, and being fair, the differences using the numbers provided in the video are slightly more similar than the ones i told, but that's it... in other words, if you were to get arithmetic mean between every sample, the distance between the mean and rust will be less than the distance between the mean and zig
@@snapstromegon update, i did it myself μ≈4136 σ≈3874 and you see, μ±σ covers every other language but zig (including rust obviously)... so yeah, even maths agrees with me (i did it just with the top 5)
It would be interesting to compare the binary code from Zig to the others and try to reverse engineer what is actually going on. I haven't used Zig, but it sounds like it may be taking the original code, then predicting what it's trying to do and then cheating at compile. I suppose that isn't technically cheating on the part of the programmer but again, it would be interesting to actually reverse compile the binary code to see what it's actually doing.
Could be running the sieve in compile time and just caching the result inside the binary. That would mean it just prints the stored results, without any computation.
That is indeed a very big selling point for Zig that it does 'comptime' calculations. However it would be foolish to think the testers didn't think of that. I assume the input is given to the program using stdin/file etc. at runtime. EDIT: The results shown are for multi-threaded cases (if I'm not wrong). Overall, Rust wins with around 13k passes/sec/thread (by a small margin, but a win nonetheless), and guess what, this result is single-threaded.
One way to test this would be to allow a max int value to be parsed to the binary at runtime, then there’d be no way to predict the user input so compile time calculations couldn’t happen. I’m only presuming that’s possible using knowledge of Linux cli and how commands (by my understanding are individual binaries) take arguments at runtime. Is that doable?
Not really, when the fastest zig solution is doing some inlining, it is not predicting nor doing much at compile time outside inlining, this is just some well written solution haha
I'm a fan of this contest because of the fun education it brings about optimisation, efficient coding techniques, multithreading, language comparisons... win-win all around. The discussions around 16 and 32 threads are interesting too, given that there are 16 physical cores where each core can do hyperthreading.
Really nice comparison. I did not expect rust and Zig to outperform C so drastically. Looking forward to the language tours :) just subscribed and thanks for interesting content
Kudos to Dave and everyone involved in this impressive effort to test almost 100 programming languages. I was surprised to see Java make it into the top 5, given it isn't compiled into native code. However, seeing Rust and Zig leapfrog the formidable competition is even more amazing. I'm looking forward to the upcoming GPU vs CPU episode.
The secret sauce of Java is the hotspot compiler. At runtime it will pick the most used portions of code and convert the byte code to machine code. That’s where the performance gains come from.
It isn't surprising at all, considering Dave used the leaderboard with only 10 languages participating. It bothers me how much Dave boasts about the number of participants when he so casually disregarded the vast majority of people's effort on the leaderboard he ignored.
@@GordonChil I'm sorry if I'm wrong, but isn't that JIT? Which means that it can be either a hit or miss depending on how predictable your code is during runtime. (still learning please correct me if wrong)
Very interesting. I'm not terribly convinced of the ranking, but I really appreciate the introduction to the languages with the ranking as one factor to consider. I hope you will dive into the reasons for the differences. It looks like there really isn't much differences among the top ranks in single threaded.
Shockingly impressive! It would be great to see at some point a performance analysis of the best solutions to understand why Zig is so fast compared to C and Rust.
I'm wondering the same. Something weird must be going on with the benchmark. I don't know about Zig but even the Rust numbers look very suspicious. I love Rust and think it's the best language in existance right now, but it definitely shouldn't be 4 times faster than C/C++. It can be faster sometimes because the compiler can enable optimizations that would be unsafe in C, but not by a factor of 4. Something weird is going on there.
The benchmark is flawed. Dave compares sieve passes per second *per thread* , which basically means that the benchmark doesn't deal with the raw passes. This obviously skewes the overall metric to favour solutions with less threads since launching more threads will just incur more overhead and turns out the metric being used was passes done by one thread. Using this metric, Rust wins with 13k passes/sec. And guess what? This solution was single-threaded. And single-threaded solution were not included.
@@VivekYadav-ds8oz yeah, IMO it should be passes/second single threaded and passes/second multithreaded on a given CPU. Letting Zig claim twice the performance by turning off multithreading is ridiculous.
The compiler could also make a substantial difference. Around 1995 I compared different C compilers and there was quite a large difference running the same code compiled with borland, djgpp, quickc, and gcc. Gcc won, though not an equal test since that one was on linux instead of dos.
I raised an eyebrow at Rust making the list ahead of the most common languages, because I was shocked, but I've heard of Rust before. I am a python, go, and Javascript/Typescript developer so I knew my favorite languages were not going to make the top 5 although I am curious where golang ended up. I am floored by zig, a language I've never heard of, dominating the list. I'm now hanging on every word, Dave! This is a very fun video! Thank you! I'm looking forward to the GPU versus CPU video because I am familiar with shaders and how they can use the GPU to do some very impressive visuals on screen with next to no perceivable impact to phone performance because it utilizes the often dormant GPU rather than the heavily taxed phone CPU to draw to the screen. Thanks for the great content! I'm loving your videos even though I think you played a major part in those windows progress bars I have stared at for a lot of my career! (I got paid to watch a lot of them, so I'm not really mad) 😆
Read the Description of the Zig Solution. They are using compile time code execution to precompute primes. This is not cheating, because compile time code execution is one of the most important features of Zig. The speedup by a factor of 6 over c/c++ cannot be achieved without some out of the box tricks. The problem is that this speedup factor is not possible for more general tasks, like a database query or a rendering engine.
> This is not cheating, because compile time code execution is one of the most important features of Zig With all repect to Zig, it just means they had another algorithm during runtime than other competitors. Dave was talking on that - that the algo should stay the same. And I totally agree that for more general tasks it is not possibe. So, addressing "one of the most important features of Zig" argument, let's imagine that we created a special language that's most important feature would be printing out prime numbers up till one million - and we just hardcoded these numbers into the languge's source. So it will be O(1) task for the language - because it would just print out the stuff it has in memory. I think it would be same sort of "compile time" trick.
'This is not cheating" Of course it's cheating. It explicitly violates the rule that no prebaked information can be included about primes other than 2. Just because the prebaking happens at compile time makes zero difference.
I was legitimately surprised to see C++ for far down in this list given its powerful compile time features, so this was very interesting. While I don't use Zig myself it's impressive to see how much the compiler can do in this language to optimize the runtime code. Very excited for a deeper dive into the solutions to find out the sources of these massive performance gaps
I can't wait to see the CPU vs GPU comparison. This series has been heaps of fun and I'm glad its not over just yet. Thanks for your and everyone else's efforts to bring this to life. I also have to go take a look at Zig now. I can hardly grasp how much faster it is than the rest.
Honestly, I've never heard of Zig, and I'd love it if you would do a video focussing on explaining what Zig is all about and how it differs from Rust or C/C++. Great video, and great effort by all those contributing!
Those are amazing results for Zig! I never heard of the language prior to this competition and so, naturally, I need do a bunch of research on it. Thanks for pulling this together and thanks to everyone who contributed and helped to manage the competition. Bravo!
That was cool. The gap from C to Rust then to Zig was quite surprising. As someone who used to write some C and now writes a combo of Python and Rust ( with a smattering of allsorts in between ) it was very interesting indeed. As for the CPU vs GPU, I wonder what tools exist around compiling, optimisation and memory management? I'd imagine there might be some interesting results here, even changing drivers about. Another interesting comparison would be comparing NPU or FPGA results to GPU and/or CPU. Only if you can manage to get your hands on one of course.
That's a serious leap to Zig there, a clear doubling on the final spot is quite something. And your bit about how assembly is hard to optimise for is a great insight into the very nature of programming languages as a concept. The CUDA programming sound very interesting and I look forward to that episode
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave didn't pick up on this !!!
As someone who loves the concept of Rust and has used it from time to time, glad to see it rank well. At a previous employer they refused to let me have access to a powerful machine and code I had written in PowerShell to interpret a large file set literally took hours to run. Moving to Rust it was done in minutes. It’s a great language. Thanks for what you do Dave. One of my active projects is providing a place where people in rural areas can access great technology training and job opportunities. We plan to launch coding courses this year as well as courses related to OS and Security issues. I’m sure I’ll be referencing some of your content along the way. Keep on trucking Dave :)
The wide gulf between Zig and everyone else didn't pass the smell test, so I pulled the code and ran the benchmark on my system to see if that was still the case 8 months later. Single -threaded: Zig 14850.57, Rust 14832.62. Multi-threaded: Rust 14545.44, Zig 13082.760 Also the Java solution is suspect. Three of the solutions are based on outdated versions of Java (8, 16, and 16).
Very interesting. I'd never heard of Zig before, and I was almost certain that Java wouldn't make the top 5. Would you be willing to do a review of the disassemblies in an attempt to compare and contrast?
What an amazing project this was (and still is, I suppose) and what a great insight it gives into the wonderfull world of programminglanguages. Thanks for this amazing content Dave.
I think it would be cool to see build size and system memory usage in these top contenders. I wonder if Zig’s usage of heap allocations helped with its speed at the sacrifice of memory allocation.
I've really enjoyed watching episodes of Dave's Garage every since I've found his channel. It is piquing my interest in code, something I haven't touched since I was a kid. Thanks, Dave!
I'd never heard of Zig, so that was a surprise. I'm currently learning to use Rust and love how simple it is to write code and documentation with an included markdown feature. It seems like the toolchain grabbed the best of C++, Doxygen, and Markdown ... added stuff I'm still trying to understand, and came up with something less OOP without losing anything. $0.02
It used probably a mix of simd and compile time execution automatically or semi-automatically, That’s why all the speed ups. Still it is seems like a nice language to learn.
Someone in the Zig subreddit that contributed to the project said that for the Zig code they used SIMD everywhere they could. C and C++ only used it in the single threaded implementation. TLDR: The single threaded benchmarks show a different pictures as they're the ones people focused on.
@@igorthelight so does C and C++. even where its not expected, such as copying a small struct uses (v)movaps/movups from sse. something is very wrong with the test here.
This was fantastic Dave. I was expecting C to win but was blown away by the results of Rust. I did not even know Zig existed and it destroyed everything. This result made this series one of my all time favourites. I am hoping the GPU might win the cpus but I have no clue. I am so excited.
One snag in the Zig result is that it spawns the amount of cores of threads instead of hardware threads. So it only spawned half the threads of the Rust implementation. Sounds impressive right? Except that the result is based on the resolves per job, so divide the Zig result by 2 and you have a more real world comparison to Rust. Someone just pushed a change to the Rust code to spawn the amount of hardware cores of threads in that implementation, so now it's comparable, and it wins against Zig. The test was simply wrong, now that it's fixed Zig fell behind again.
I wonder if things really got tied down as well as they should've been with the requirements and guidelines for the comparisons, going through some of the comments and code logic it seems like some hardware factors were involved in it too which tells me the race was much closer than it really was. Either way I am glad to know that in any performance-centered application you can't got wrong with the go to langs like C, C++ and Rust, it's really nice to know about Zig too for future considerations!
Great explanation of the features, strengths, and differences between the top five languages in this test. It'll be interesting to see if the administrative overhead of assigning and receiving the tasks to the GPU will substantially affect the relative speed compared to the number of cores the task is assigned to.
Uh gosh a 4080 has 9728 cuda cores each of which can do SIMD up to 16. You can expect a performance increase of a view 1000x if his algorithm can run completely parallel.
I’m loving this series. It inspired me to write some prime sieves (and even a factoring tool) in BASIC, the only language I know. I ran it on my vintage Tandy hardware and had a blast looking for ways to optimize and speed up the code on the old hardware. Thanks for the content, Dave!
The different ways these languages go about accomplishing their performance is so interesting. Never been that interested in studying compilers, but now Zig has got me revved up to find out the details 😳
Check out Compiler Explorer. Check out the video "What Has My Compiler Done for Me Lately?" Does Dave know about these things? I haven't seen very many of his videos.
The difference between the top 3 entries is astonishing. After seeing c++ and c with marginal improvement over Java (quite surprised to see Java do high), Rust score blew me away. And then came Zig. Has the resulting code been decompiled to see if there haven't been any 'unfair' optimisations put in place by the compiler?
Apparently the scores are per thread and Zig used half the threads compared to rust and had less overhead and memory bottlenecks because of that. Something weird is going on with C/C++ too for sure, they aren't 5x slower than Rust
Honestly I was expecting good Java performance. After using the language for over 10 years it's actually pretty fast. The JIT is really good at compiling these types of tight loops and it produces code that's pretty close to the performance of C/C++ if you're not requiring anything that Java can't do easily. My question is how is Rust 3 times fast than C/C++?
Suprising results. Could you detail a bit the reasons for the large differences between Zig, Rust ans C? Comparing the machine code / disassembly of the main functions could be interesting. Also which solutions are actually using multi threading?
This is the question I also think should be made "Which are using MT?". My guess is that Rust is making heavy using of multithreading due to how easy it is in this lang and that Zig is using some sort of compile time evaluation to obtain such performing results.
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave - and almost no one in this thread - didn't pick up on this !!!
I think the biggest takeaway from this video is that we are capable of making modern languages like Rust and Zig that are just as fast as C or C++. As you said with assembly, just because a language is lower level that does not necessarily mean it will be faster if people can't manage to write extremely efficient code in that language. Thanks for the great content!
I didn't even hear of Zig really much before this, and I am SO impressed with the preformance. I mean, basically doubling the second place winner, Rust no less, is no easy feat. Love the videos, keep it up!
I was going to move on to rust after completing c language study but am now considering zig. Thanks for the informative video. Concise, precise and well delivered.
I have heard about Zig for the first time about two weeks ago. Usually I write programs in C. But maybe I should take a look on Rust and Zig some day as well. I'm surprised D didn't evolve better.
They are definitely worth a look if you have a open mind for new languages. Zig is still in its early days but is already looking very interesting. Rust is more mature and can already be used in production, but it is still evolving (and very fast) and has a steeper learning curve.
I was not expecting such a large discrepancy between 1, 2 and 3. I only program occasionally but Java was a big surprise to me to make top 5. Given Rust and Zig's C relationship, an interesting follow up video could be to run the top C example in Rust and Zig with as little medication as possible. See any optimisations or other changes in compiler etc
@@AnthonyJClink think people got bit of bad idea because of games that were coded in Java before or still is like minecraft and runescape and I think legacy code is biggest problem most code you see in wild is was coded in Java 8 while Java 17 is so much faster than Java 8 so people can have misconception of Java performance because of it
@@AnthonyJClinkNever got that either. Java's VM has had millions and millions of dollars poured into it over 3 decades, it's quite possibly the absolute fastest VM based language out there
Very interesting results. Being an oldschool 6502/Z80 (with the Rodeny Zaks ref guide)/X86 and then various 3GL and 4GL languages the results have re-opened my interest in learning some of the newer languages. Will be checking out some of the features in Rust, and Zig. You never know you still might be able to teach an old dog new tricks.
Thanks Dave!! I'd be interested to see a breakdown of why the top 3 had such differences in speed. The results are very surprising to me. I would have expected only a few percentage points of difference.
That is an interesting outcome. I wonder how well this translates into real world performance. It would seem like rust or zig would be a good choice for microcontrollers based on the performance aspect. I guess that would depend a lot on many other aspects of the languages. I am very interested to see the gpu vs cpu episode.
Firstly i got to say i didn't expect Java to make it to the top 5,and also be close to C and C++, that is very interesting! The other thing that blew my mind is the order of magnitude performance difference of Zig to C. And Rust too being about 4 times faster than C also was a surprising performance jump! Pretty insane stuff.
Java is such a tricky language, like Double-edged sword. If written properly and carefully, java is so fast. Already many benchmarks prove this since long time ago, you can google it.
Was hoping for Zig and also guessed it would be of double the Rust solution performance, so this video is extremely satisfying for me. Great video and the project itself, thanks to all who did it! Java is bit of a surprise but it just showcases how little we know about possible language optimizations just by judging from the outside, or even knowing the language but not deep (and I leave space for individual talent as well) as compared to experienced and dedicated programmers.
Very interesting comparison (and analysis from a perspective I have not considered), I'm still a novice learning python right now, but my list of priorities of what to look into next changed significantly now... thanks!
How C can be faster than C++ if I copy the c code and compile it with g++. If the generated machine code is different then, something is horribly wrong.
The only way to test speed is to write a real world application, not one routine. You need array stuffing retrieving, math incl. multiply, divide, sin, cos and integer, floating point, fixed point, bcd etc. string ops and others that are used in real live operations. You might have a hard time beating assembly or cross assemblers for portability. Phil Programmer since 1976 and un appoletic. I wrote many languages in my time.
No it's because of their strong efforts into the multithreaded leaderboard. Most other languages put their effort into the single threaded leaderboard, seemingly thinking it was the important one. I don't know why Dave decided to go with the leaderboard that had clearly way less participants.
As others have said, it's great to see a ranking of raw speed and it's certainly fun to see a competition like that. But that metric alone doesn't tell the whole picture when trying to figure out what language would be best to use for a given situation. I suspect if metrics included the consumption of various system resources on a weighted scale, we would see something quite different in terms of overall rankings. But still a fun competition none the less. Thanks for the great video Dave!
It's interesting to see how different programming languages perform when it comes to prime number sieves. I was particularly impressed with Zig's performance, it's amazing how fast it was compared to the other languages. I also noticed that the video briefly touched on thread implementation, which is a crucial aspect of programming for performance. It would be great to see a more in-depth analysis of how each language handles threading and how it affects their performance in different scenarios. Overall, this was a great video for anyone interested in programming language performance.
C was 4 threads (I think) Rust was 32 threads. (16 core machine with 32 HT) Zig was 16 threads (so as to avoid the slower HT virtual cores) So around a4x gain over the C implementation is sort of expected. Close to double throughput of Zig compared to Rust implementations would also be expected. End of the day, Rust / C / Zig are about equivalent in speed ... in that order, everything being equal. Zig does get a slight edge over C in rare cases, as its more explicit than C and may give the compiler better info optimise off This benchmark is showing implementation differences, not compiler differences. Having said that, Zig is awesome. Its not radically different to C, just a better C
Hmmm, something fishy about the the top two results there. Those kind of performance increases suggest something very different is happening under the hood. Given the nature of the program, there isnt actually a need to compute much at all during run time, no? Couldnt basically the whole program just be a constexpr function (computed at compile time) ? And to what level is computing at compile time vs run time cheating? If it is not, then what we are seeing could be how fast Zig access the terminal to print out a constant string, not how fast it has optimized run time computation. If the compiler decides something is constant at compile time, and therefore just reduces it to a constant, but the programmer hasnt explicitly told the compiler to do so, is it still cheating? I would argue; yes. Because what we are seeing is the effect of something completely different. If however, the optimization is due to f.ex concurrency, then the computational work is still performed, and i would consider the result more in line with a result you could reasonably expect to somewhat duplicate for a similar task.
I am totally flabbergasted by the jump in performance of the top two. I would have never expected this. Mind you, a comparison based on several benchmarks is necessary as there is no guarantee that the order of the finalists will be the same.
Congratulations Zig on being the fastest language! It's amazing to see the advancements and innovations in programming languages, and Zig's speed is truly impressive. Keep up the great work!
I'd say most modern language are optimized for developer time, they just make different level of compromises to fit their use case. Even languages like Rust. Compare to C++, it shifts a lot of checks from runtime to compile time. So though it takes longer to write, it takes less time to debug.
What I don't understand is why manually hand-optimized ASSEMBLY is not the winner here ALWAYS? One should just grab the Assembly output from the fastest compiled language so far, and then have experts for that CPU inspect that manually and see what can be optimized further, and that way Assembly should ALWAYS be the fastest. I find that ZIG's DOUBLE speed performance IS at least somewhat SUSPECT. Is it because they ran it on _16_ threads only (no hyperthreading) and then multiplied the performance statistic by 2 to extrapolate 32 thread performance? That would not be realistic because 16 threads may fit better into cache whereas 32 threads would NOT. Where would I find the ZIG source code and the generated compiled Assembly code? Does ZIG (and Rust) allow multiple copies of the same code (=32 threads) to just execute IN ONE CACHE MEMORY copy? That could explain the huge boost. And YES, I would absolutely LOVE to see the CUDA optimizations and performance. My bet: If GPU CUDA is well programmed it WILL run faster than CPU code when it reaches 64+ concurrent threads.
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave - and almost no one in this thread - didn't pick up on this !!!
I'm no programmer, but I'm still surprised to see a language I've never heard of take the top spot. Especially by such a wide margin. Good stuff as always, looking forward to CPU vs GPU!
Learning something new and relevant/interesting with every video. I hadn't heven heard of Zig before and I would never have believed that there is so much headroom above C/C++. Happy to see Java doing so well, as it's still my weapon of choice for most applications. Please keep the great content coming Dave! 🙏
As a Java developer I was very pleased to see my favorite language makes the top 5. I admit I was somewhat surprised to see Rust crushing C and CPP by such a large lead. But... then I watched the whole video. Jesus Christ, Zig...
That's kind of mindblowing. I expected that Rust would beat C, and once it hit second place I pretty much knew it would be Zig to top out, but the sheer difference between the top three absolutely baffles me! Definitely looking forward to the deep-dives, in the hope that they'll shed some light on exactly what's different between the languages and why the top 3 were so vastly different. Also eager to see how CUDA performs, because I don't immediately know how parallelisable this would be. I also think it'd be cool to discuss how SIMD may or may not help out with this problem.
I have seen some programmers underestimating zig, but for a language that is not even in version 1.0 this is incredible, I really hope to see its progress and how the community begins to consider it more for use in the industry.
I'm coding now since about 13 years( 9 of them with actual engineering in mind :D) and never heard of Zig - definitely have to take a look! Thanks for the great series!
It would be interesting to see the entire ranking of all 94 languages. We would then know what languages to avoid, or we could tell our managers why certain areas of the project are slowing the system down.
Are the people in this comment section trolling? How can there be so many people saying that they have never heard of Zig and praising this video without even checking the code to see why Zig supposedly won? Like, shit, i knew intelligence was on decay but i did not expect to see this level.
Assembly... I've been writing a lot of assembly firmware for Microchip PIC 8bit µCUs over the last two and half decades; once I decided to try an high level language compiler. When decompiling the code I found a couple of interesting solutions I'd probably would never have thought about (one of which, putting two conditional jumps one after another, was mind-blowing, and I've been using it ever since in my Assembly code) In short, I'm not surprised that no one managed to write an assembly code as optimized as a good compiler could.
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave didn't pick up on this !!!
Yeah, no. There are clearly shenanigans going on here. This feels like a UserBenchmark AMD situation where the test is rigged to keep the obvious winners from beating the preferred winners. There is no way that properly optimized C code can be slower than other languages, especially given that 30+ years of work have gone into GCC and the "winners" over C are far newer with far less development. I've also seen some comments mention that optimized code for some languages was rejected. I call BS.
So true, and the total omission of Pascal/modern Win32 Delphi Pascal, which is basically a better C++ in many ways, shows how imperfect and misleading these results are.
I had a strong feeling Zig might be a contender, when you mentioned that the usual suspects weren’t the fastest, but the difference between the top 2 & others is incredible. Everyone seems surprised about Java making #5, as am I.. I thought it might have been Swift or oCaml or something like that 😅
Comparing "the language" is a red haring... it's the compiler that matters. And I don't buy for one second that a someone skilled in ASM optimization couldn't start with the compiled Rust ASM solution and be unable to find a single optimization to run any faster (considering the head-room of ~5000 p/s that Zig showed is possible) which would by definition put it in spot number 2 on the list. Nope, don't buy it.
What a nice video, fun competition to us as a programmer, what a surprise from Rust beating C/C++, and more surprise with Zig with double performance lead against Rust!
It would be nice to have a video commenting on how Rust and Zig (and their respective codes used in this challenge) managed to be so much faster than the competition. Very cool stuff!
Apparently the PRs that would have put other languages above them via optimizations were rejected. These tests mean nothing.
I was sleeping through the video, so I'm not sure if Dave addressed it, but compile time code execution kind of send like cheating. Especially since specific information about primes other than 2 wasn't allowed.
Skipping, not sleeping
@@namewastaken360 So in theory the compiler could precompute the complete result b/c all information is available at compile time.
@@micknamens8659 I would assume that's why it's so much faster, the computer can only be so efficient if it were essentially doing the same thing.
Man, wasn't expecting that! Both Java and Zig surprising results.
Can we get a bottom 5 video as well?
Awesome content by the way!
Dave's website has a section with constantly updated results for the calculations. The problem with determining a Bottom Five is that it's much more likely to find the most poorly written code than which language is the slowest.
Add my vote to the bottom top 5 request?
,
I think Rockstar (if someone did one) would be among the slowest, though maybe the most entertaining!
last one, html bruh
@@baraka629 BrainFuck might be slower then Minecraft’s Redstone. =P
Oh wow, that is an incredible result for Zig, would be cool to compare the assembly listing of the Zig compiler vs C to see where it was able to gain such a huge advantage.
Probably it executed the sieve once at compile time and cached the result.
The Zig result is slightly less incredible when you see it's using a -no-mt variant that only spawns 16 threads, and the performance metric chosen is passes per second *per thread* on a 16c/32t CPU:
Zig 16t: 805969 passes, 10074 passes/s/thread
Zig 32t: 899141 passes, 5620 passes/s/thread
Rust 32t: 948969 passes, 5930 passes/s/thread
It's also way less impressive once you realize that Dave showcased the leaderboard that had way less participation involved. Clearly, the fiercest competition was on the one labelled "Leaderboard", that's why it has over 50 languages represented rather than 10 for the multithreaded one. I honestly think it's really disrespectful to not even mention the single-threaded leaderboard when so much work has clearly gone into those submissions.
@@limitationsapply That seems to be a very bad way to measure then, why not No of passes in 1 real time minute?
@@noomade Sure - all Zig is doing is halving the thread count in no-ht mode. Rust could go one better and use num_cpus::get_physical().
The difference in speed between the leader Zig and the second in the position of Rust was a big surprise for me. I kindly request to have in a future episode a more complete speed test (benchmark) between these two.
In all cases thanks a lot Dave!
Zig doesn't use LLVM for Compilation , it has its own optimizied compiler i guess
@@hamzanajji8615 zig still uses the llvm toolchain for the compiler. the main reason zig is so much faster however is the compiler scans through for code in which the output can be determined at compile time, and replaces code with that. C++ has something similar with constexpr but iirc its not as powerful.
All u need is Assembly
It would be great to compare the generated assembly for each of the languages to see why the performance for each one is the way it is! Also, it would be cool to do some performance profiling to see where the bottlenecks of each implementation is. Great stuff!
For those that compiled to assembly, I agree completely
I wonder if zig generated a well vectorized code, with precompiled sections. I've worked in HPC for decades. I write in C with instrinsics and assembly. I doubt there is any assembly generated by zig that I couldn't achieve with C. I guess the only argument is that the Zig code might be easier to read. Which may be a winning point, because some of the code I have to deal with is obtuse in the extreme
@@tolkienfan1972There's no magic. It's just that zig is not using hyperthreading, thus getting same performance out of 16 threads, as rust and all other langs from 32.
Points are given for iterations PER THREAD.
@@dolorsitametblue if that's true, then it would be trivial to do the same for the other languages. Given that hasn't been done, it seems unlikely that that's the true reason, and I also suspect better vectorisation
@@thethiefmasterI like your videos! Keep it up!
I would like to reply something, but my comments keep getting deleted.
@DavesGarage If this is true, that would invalidate Zig's score as it wouldn't be an apples-to-apples comparison. If the rest of the languages are using 32 threads, and the scores are calculated on a per-thread basis, Zig should also have to use 32 threads with hyperthreading to make it a fair comparison. Either that, or disable hyperthreading in the BIOS and have ALL programs use 16 threads without hyperthreading to make it a fair comparison.
I'd heard Zig was fast but wasn't expecting such a huge difference. Shout-out to all the people who contributed to the project, amazing work
Indeed, shoutout to all the contributors that got ignored, because Dave randomly chose the leaderboard with only 10 languages participating.
@@noomade yes, it sounds like the ranking was per second per thread and the Zig submission was only 16 threads. Rust submission as 32 threads. It was more per completions per second but way less per thread. Seems like that could be optimized out.
@@NickWindham Ah, yeah, a smart person would eliminate the second logical core on each physical core, because effectively two hyperthreads can only run at like 55-65% the bandwidth of two proper physical cores.
I'm definitely biased as a Zig fan, but yeah... It's not super legit as a victory, Rust by most comparisons squeaks out a win of +5% or more. I definitely like the composability and readability of Zig better for my brain, but Rust can definitely create the absolute optimal if that is what you need out of algorithms. I'd argue that the other zig features make up for that little performance difference, but its super subjective
@@noomade Well, "artificial", I think is just better engineered to account for these factors
I honestly would love a follow up video going over the compiled code for Zig vs the other languages. The gap is so huge there must be something going on, or at least my intuition tells me so.
Yes, a badly chosen metric is the reason why zig basically got a x2 bonus: It uses 16 of 32 threads, so no Hyperthreading, which seems to make only a minor difference in overall throughput, but since Dave uses Pass/s/t, i.e. full runs of sieve per second per thread, you get a skewed result (or I am too stupid to understand why this is legitimate).
@@grafgrantula6100 The zig solution has a lot of thread contention which means that an additional 16 threads don't help. With 16 threads for both rust and zig, zig still wins. It's literally just a comparison of different solutions. That being said, zig is very useful for being able to express very unique solutions without any overhead in how the code is written or in performance. The zig solution is the best one for 16 threads, which was set out to be the target platform. If you say that it performs worse on 32 threads, that doesn't matter, because it wasn't made for that.
@@N00byEdge Sure if you only consider 16 threads then it’s the fastest, but that’s because it’s the only solution that uses 16 threads. The target platform was set out to be a 16c/32t processor so choosing 16 threads as the upper limit of threads is not utilising the hardware to its limit.
Passes/second/thread is a bad metric to use for the final ranking.
So, was the metric chosen in advance or last minute? Afaik, it was clear for a while and it sounds like rust-devs simply failed to optimize.
@@siematos1099 If the intention was to optimise for passes/second/thread then all of the comparisons should have been made with single threaded runs. Compared to the multi threaded results the single threaded ones completely blow all of the multi thread results out of the water when using that metric.
Jeez. I did not expect such a big difference in the top five.
The performance gap between 1 and 5 is substantial.
I would guess it's something to do with SIMD. Authours of Zig and Rust solutions know how to pull that trigger.
the gap between second place and third is significant as well.
I didn't expect the difference between the top two!
@@grafgrantula6100 I've added a 16-core run to the Rust solution now, so results are more comparable.
@@Nox3x3 Yup it's disappointing. Most of the good solutions are _only_ on the single-threaded leaderboard, since this was understood to be main leaderboard. Hopefully Dave will do a follow up and highlight those solutions too: people put a lot of work into them.
Testing Prime sieves is a fabulous thing to do, but it seems to me that a better test would use a few different kinds of problems and report results in each category, as well as a composite result, rather than just a single problem.
I expect that certain languages would be better at one class of problems than another, so on an individual class of problems, the ranking would, I think be different than on another class of problems.
I hope that you will seriously consider creating an updated competition accordingly.
Exactly!
yes, i made the same comment. The Computer Language Benchmark Game does just that.
After doing embedded systems programming in C for over 30 years, (I am retired now), I am astonished at Zig's performance. Well done and good to know!
if you had truly been doing 30 years of embedded you would have noticed with a quick glance at the code that these results are cheated. They didnt accept optimized submissions for C, C++, Rust and other languages just for the sake of making Zig win, not to mention the way the threading was programmed in the Zig code was completely different from the way it was implemented in the rest of the languages, meaning you're basically comparing completely different programs at that point. Everyone participated with a rule-instructed handicap, but Zig was on wistrol during the race.
..
@@AlFredo-sx2yy Where did you read that they didn't accept optimised submissions from some languages?
I'm wondering how large the binaries where and whether a project programmed in Zig would even fit into or run on a bare-bones embedded microcontroller.
Fantastic series and results! Thanks to all who contributed, but especially the hard work of your small team. It would be awesome to take the framework you've built here to implement different algorithms across these languages and see what changes.
These results are definitely surprising to me, I expected all the LLVM languages to be have very similar results. I wonder why the top two languages performed so much better, all other things being equal.
@@Nox3x3 Just to make sure I'm not missing something, are the Rust and Zig implementations using multithread vs C and C++ using only single thread? That would be a completely unfair apples to oranges comparison. Like others have commented, the difference shouldn't be this large. Something else must be going on. If this is really legit I would ask that the generated assembly be compared.
@@greatwolf. Looking at the scoreboard it looks like he's reporting the passes per second per thread score, but that seems a little funky. There's something weird going on with the scoreboard for that particular metric because the Zig solution runs with half the cores available while nearly every other solution runs with thread counts equal to core counts (E.G. on a 32 core system the Zig solution would use 16 threads while nearly every other language would use 32 threads). That results in Zig actually having a lower total pass count, but a higher pass per thread count. E.G. In one example report Rust turns in the 1st place result with 946K passes in 5 seconds as opposed to Zig in second place with 838K passes in 5 seconds, but due to the lower total thread count of the Zig solution when normalized across threads it ends up with 10K per thread as opposed to Rusts 5.9K per thread.
As for why there's such a huge performance gap between Rust and Zig and everything else, I'm not entirely sure. It's possible it's just down to not as much effort having been put into tuning the other solutions, or maybe there's some gotcha related to the way the algorithm has been implemented. I personally suspect that there's some kind of optimization that's being done behind the scenes by the compilers that gives Rust and Zig a significant edge (and arguably might be violating the spirit if not the rules of the competition). Maybe there's some new-ish LLVM optimizations that both Zig and Rust can take advantage of that others are missing out on? The C++ solution is using clang to compile which I would expect would also be able to take care of similar LLVM optimizations if they exist, but maybe it's missing some pragmas or something that it would need to do so.
Part of the problem with these sorts of benchmark competitions is of course that they of necessity ignore things like readability and maintainability, something that a lot of languages will take performance hits in order to improve. I think an interesting take on this would be a version that that follows each languages best practices and style guides. Of course then you have the problem of deciding what that actually means since there's often disagreement about what's actually best practice.
@@orclev Regarding the "best practice" part, not only is there disagreement on what that means, but it also depends on context. What might normally count as "best practice" may go straight out of the window when the thing you actually need to do is optimize some hot loop. Suddenly, "best practice" becomes checking the assembly output, inserting compiler hints like [inline] or [unreachable], running performance-guided compilation, customizing optimization passes schedule and purposefully omitting runtime checks in favor of manual proofs of correctness.
@@orclev " nearly every other solution runs with thread counts equal to core counts"
Thats a wrong statement. Ryzen 9 5950X is a 16-Core Processor.
@@user-py9cy1sy9u The scoreboard itself uses the terms core count and physical core count, E.G. that same Ryzen processor is listed as 32 core count, 16 physical core count. It is an interesting point that maybe Zig is scaling based on the physical core count, although it raises the question of why the other languages don't.
Those results are surprising. I wouldn't have expected Zig (or any language) to improve over C or Rust performance in any significant way. I'd really like to see a breakdown of where those performance gains came from.
Dave's results page shows that Zig only beat rust in the performance per thread category because it was the only build which used 16 threads rather than 32. The machine is 16 core hyper threaded, so it's pretty obvious why 16 threads would beat 32 on a per thread basis.
And in my experience the thing you said is in fact true. Did some naive benchmarking myself. At least in the things I tried rust is always slower than C about 2-10%
@@sparky173j
Thanks for "catching" and pointing this out, as it seems to be a bit of an "oversight" in the rules/conditions for the "race" !?
Which leads my uneducated mind to a couple of questions
*Firstly, could there be any "good reason" to measure the performance "per thread", as You say is done, rather than just (naively) "measure performance/time" ?
* And secondly, if there is such a "penalty" from utilising "more threads than cores", how come only ZIG was the only language that had "it's winning contender" coded that way .
I would have thought that some "C", "C++" or "Rust-Coder" would have taken the "top contender"for "his language". And then ""simply converted it"" to only use 16-threads. Thereby producing the most "performant" ((according to the "rules and conditions" for this contest)) version of the program for "his language"
Anywhoo thanks for sharing Your knowledge.
Best regards.
@@sparky173j Uhh I just went to the result browser and you are right. Dave's results is the "operations per second PER THREAD", but he totally ignored that different solutions use different AMOUNTS of threads. And in some cases he ran more threads than the CPU has actual cores. What a bad benchmark metric!
If you sort by total operations, Rust wins. In fact one guy "rbergen" has 6-core results for i7-9750H where everything runs with 6 threads and the leaderboard is: Rust wins at 6590/sec/thread, then Zig with 4754. Go into the result list, choose "Filter preset: Multithreaded leaderboard" and see for yourself.
In fact, I just checked davepl's own Ryzen 5950X results from today, and his 16-thread results are Rust winning with 11172/sec, and Zig second with 10500/sec. Does he not know how to read his own results?
It just goes to show that this video is a typical Dave Plummer "programming drag racing" video: Badly researched and badly presented, lol. This is not the first time I've taken issue with his videos about this topic.
@@sparky173j so it is misleading
Like many others I'd love to see an analysis of _why_ the zig result is so much faster. There must be something great we can learn from that
Hardware and thread usage. When it comes to speed in this nature, results are highly integral to the exact hardware.
It's because they wanted it to be, not because it is.
Zig uses 16 threads (other langs use 32) on a machine with hyperthreading.
And the performance is compared by iterations *per thread* .
That's it.
They cheated and precomputed a bunch of primes at compile time. Zig defenders argue that "this is a key feature of zig! So thats why it should be allowed!" But rust macros could do that too and they didn't because it was against the rules. Why Dave allowed zig to count with this implementation is beyond me.
@@jabadahut50 The difference is that the Zig compiler supports this out of the box. While in Rust you would have to write your own macro's, which would technically be against the rules as the written software needs to be as close to the original algorithm as possible.
This is also probably one of the reasons why Assembly isn't in the Top 5, because compilers can 'cheat', while a solution written by humans needs to be true to the original in form as described by the rules.
It would be interesting to see a breakdown on why the top 5 languages differ so much in performance. Definitely gained a new respect for Zig and Rust today.
Great video : )
Me too - and if it's so easy to use libraries written in Rust in C-code, I started to wonder if I should start learning Rust and consider that instead of pure C in implementation of a library I'm writing (similar to aalib and cacalib, but somewhat different approach taking advantage of 256-color terminal emulators and extended ascii/utf-8 dithered block characters).
Provide it as Rust library, and as C-library wrapper around it - and as Perl module using the C-library, as that was one of my original goals as well.
I wouldn't put too much faith into your new found respect. There is no way a virtual language is going to make the top 5. This video is complete crap
There are 3 kinds of lies: small lies, big lies, and computer benchmarks. Not saying that this benchmark has been cheated, but you can't judge the performance of a language on a single program. I am very surprised that Dave, with his experience and knowledge, doesn't know that. This test has exerted only 5% of the features of each language (mainly the optimization of a small inner loop) and is therefore pretty meaningless. Since many years, the most significant benchmark of language speed is "The Computer Language Benchmark Game" (it has changed name several times along the years, it used to be "The great computer language shootout"). This benchmark tests a bunch of languages and compilers accross a dozen representative programs that togethre test most of the core features of each language. In the same fashion, people are free to optimize the code of their favorite language and calls to libraries other than the standard library that comes with the language are not accepted.
In my experience, this benchmark has been quite representative of real world performance.
In that benchmark, C is still the fastest, followed by C++, and Rust just behind. Java falls behind Julia, C#, Ada, Haskell, and is roughly on par with Go and Swift, the difference between these languages in the real world would be less significant than the skill of the programmers. Unfortunately, Zig isn't tested.
I wouldn't have expected the performance jumps between the three fastest languages. I assumed that is almost equal and some language is marginally faster than the other. But the jump from C to rust and from rust to zig is insane.
There's definitely something strange there. Either the Zig and Rust implementations have been so well optimized and say, C and C++ not, which I highly doubt, or the respective compilers are just that much better, which I also doubt, or there is something else at play here.
@@nowave7 It could be interesting to recompile those two and c or c++ to see the difference. Did it optimize some of the operations into matrices and made use of SSE2 or other extensions? Maybe really efficient and clever use of the extended registers? Or, due to their better compile time memory handling, they were able to skip some of the more costly memory operations?
Zig's speed doesn't surprise me at all. One secret that Zig has is compile-time execution; you can program in Zig itself to do some of the computations at compile time. It's possible to replicate this in C++ with a judicious use of template metaprogramming (and constexpr), but the template metalanguage is not really the same "language", and it's much harder to use. It's much more weakly typed and therefore error-prone, for a start.
This is especially useful on this problem, such as to compute the wheel settings in advance. In C or C++, you'd have to either calculate it at run-time, or write a code generator to generate arrays with the data in them, or something like that. (Note that simply dumping the wheel settings into array and checking that in would probably be acceptable in a "real" program, but it's against Dave's rules.)
IMO, good support for multi-stage execution should be mandatory for any modern language that claims to be about performance.
Note that this isn't why Zig won, but it's an advantage that Zig has over the other contenders.
@@dolorsitametblue since it is hyper threaded and all the calculations will use the same parts of each core, hyperthreading doesn’t produce any extra performance really, probably lower due to the overhead of handling the extra threads. A better performance metric should have been the overall number of passes. It currently just makes zig look like two times better because it doesn’t use hyperthreading.
@@DeGuerre In modern C++ you could do all of that with constexpr which would not be a different language at all. No need for template metaprogramming unless you are using an older version of the language.
I've been considering learning Rust, and this really gives me a good reason to start looking into it. I'd be interested to see a breakdown of languages that fit similar niches compared to each other in a future video now that you have so much optimized data.
Congrats to Dave and the community, this was a terrific idea and executed fantastic! I hadn’t even heard of Zig prior
Was it? There were two leaderboards, one was strongly contested with more then 50 languages represented, and the other one made it into this video.
Can't wait for the in depth analysis videos, I'm quite interested by what could be going on for such a drastic performance between the 'top' programming languages!
Interesting results. In fact, so interesting that I think Dave or someone else is bound to jump on some of the questions this raises. In terms of the metric, comparison, threads, the algorithms and whatnot. I can't wait to see the follow-ups on this.
Wow. I was NOT expecting this much if a difference. Also kinda surprised Java made the cut!
A video dissecting and explaining the differences between results would be SUPER rad.
Almost 10x difference between C and Zig indirectly point that different implementations (aka algorithms) have ben used for different languages. I.e. author of each implementation write the code on their own, instead of implement the same algorithm. If so, then these tests actually testing nothing.
@@AlexanderBorshak I haven't read the code base for these tests, but translate the algorithm from language to language doesn't make use a specific advantages and optimizations from any of them, making in my opinion, a very bios bench.. If this bench is made on the best implementation possible on each language, then for me, it's a fair bench for every language, since if you have features that make your code better and faster, you will simply use it.
@@btotta Yes and no at the same time. If someone writes algo in Java with complexity O(n), another in C with complexity O(3*n), and in Zig w/ complexity O(2*log(n)), then what we are trying to compare here? Obviously not the language's speed. I've landed on this video from Reddit, there were mentioned that for Zig the SIMD instructions were enabled, while for C is not - possibly even because Zig compiler allows enabling SIMD without any effort when used C compiler requires much more effort to use SIMD. If so, these tests barely can be named as languages speed benchmarks, IMO. I.e. we can not rely on the tests results and say "Zig is 7 times faster than C" because that is not true and these tests do not reflect the actual speed of compared languages.
Why so surprised Java was one of the 5 fastest languages? It's had millions upon millions of dollars poured into the VM, of course it'd be fast
I think it's really interesting that Rust and Zig were so much faster compared to the C/C++ versions. I'm definetly looking forward to a deeper analysis of the solutions.
tbf, zig was one order of magnitude faster than the rest, so rust is more comparable to c than zig to rust
@@antronixful zig was 2x Rust, but rust was 3x C/C++, so I'd say rust is closer to zig than to C/C++ and definitely not an order of magnitude faster than rust.
@@snapstromegon ?
@@snapstromegon the ratio is something like 1:3:6...
thus, 3-1 < 6-3
i'm not mocking or anything, but try to scale that to omega big numbers, you don't have to look the samples separately, look at the entire "universe", that in this case is {language in top five | language=zig or language= rust or language=c(and others)}
also, it is important to calculate the distance between samples to make them more comparable to make statistical analysis, and being fair, the differences using the numbers provided in the video are slightly more similar than the ones i told, but that's it...
in other words, if you were to get arithmetic mean between every sample, the distance between the mean and rust will be less than the distance between the mean and zig
@@snapstromegon update, i did it myself
μ≈4136
σ≈3874
and you see, μ±σ covers every other language but zig (including rust obviously)... so yeah, even maths agrees with me
(i did it just with the top 5)
It would be interesting to compare the binary code from Zig to the others and try to reverse engineer what is actually going on. I haven't used Zig, but it sounds like it may be taking the original code, then predicting what it's trying to do and then cheating at compile. I suppose that isn't technically cheating on the part of the programmer but again, it would be interesting to actually reverse compile the binary code to see what it's actually doing.
Could be running the sieve in compile time and just caching the result inside the binary. That would mean it just prints the stored results, without any computation.
That is indeed a very big selling point for Zig that it does 'comptime' calculations. However it would be foolish to think the testers didn't think of that. I assume the input is given to the program using stdin/file etc. at runtime.
EDIT: The results shown are for multi-threaded cases (if I'm not wrong). Overall, Rust wins with around 13k passes/sec/thread (by a small margin, but a win nonetheless), and guess what, this result is single-threaded.
@@VivekYadav-ds8oz If one didn't know, one wouldn't catch it. I've been "caught out" more than once by a "feature" that came back to haunt me.
One way to test this would be to allow a max int value to be parsed to the binary at runtime, then there’d be no way to predict the user input so compile time calculations couldn’t happen. I’m only presuming that’s possible using knowledge of Linux cli and how commands (by my understanding are individual binaries) take arguments at runtime. Is that doable?
Not really, when the fastest zig solution is doing some inlining, it is not predicting nor doing much at compile time outside inlining, this is just some well written solution haha
I'm a fan of this contest because of the fun education it brings about optimisation, efficient coding techniques, multithreading, language comparisons... win-win all around.
The discussions around 16 and 32 threads are interesting too, given that there are 16 physical cores where each core can do hyperthreading.
Really nice comparison. I did not expect rust and Zig to outperform C so drastically. Looking forward to the language tours :) just subscribed and thanks for interesting content
Kudos to Dave and everyone involved in this impressive effort to test almost 100 programming languages. I was surprised to see Java make it into the top 5, given it isn't compiled into native code. However, seeing Rust and Zig leapfrog the formidable competition is even more amazing. I'm looking forward to the upcoming GPU vs CPU episode.
The secret sauce of Java is the hotspot compiler. At runtime it will pick the most used portions of code and convert the byte code to machine code. That’s where the performance gains come from.
It isn't surprising at all, considering Dave used the leaderboard with only 10 languages participating. It bothers me how much Dave boasts about the number of participants when he so casually disregarded the vast majority of people's effort on the leaderboard he ignored.
@@GordonChil I'm sorry if I'm wrong, but isn't that JIT? Which means that it can be either a hit or miss depending on how predictable your code is during runtime. (still learning please correct me if wrong)
Don't forget Java has Just In Time compilation (JIT)
Very interesting. I'm not terribly convinced of the ranking, but I really appreciate the introduction to the languages with the ranking as one factor to consider. I hope you will dive into the reasons for the differences. It looks like there really isn't much differences among the top ranks in single threaded.
Shockingly impressive! It would be great to see at some point a performance analysis of the best solutions to understand why Zig is so fast compared to C and Rust.
I'm wondering the same. Something weird must be going on with the benchmark. I don't know about Zig but even the Rust numbers look very suspicious. I love Rust and think it's the best language in existance right now, but it definitely shouldn't be 4 times faster than C/C++. It can be faster sometimes because the compiler can enable optimizations that would be unsafe in C, but not by a factor of 4. Something weird is going on there.
@@DefnDKMC Maybe it's Dave's constraints?
Better use of threads? Or compile time calculation?
If it's the former, I might convert.
The benchmark is flawed. Dave compares sieve passes per second *per thread* , which basically means that the benchmark doesn't deal with the raw passes. This obviously skewes the overall metric to favour solutions with less threads since launching more threads will just incur more overhead and turns out the metric being used was passes done by one thread.
Using this metric, Rust wins with 13k passes/sec. And guess what? This solution was single-threaded. And single-threaded solution were not included.
@@VivekYadav-ds8oz yeah, IMO it should be passes/second single threaded and passes/second multithreaded on a given CPU. Letting Zig claim twice the performance by turning off multithreading is ridiculous.
The compiler could also make a substantial difference. Around 1995 I compared different C compilers and there was quite a large difference running the same code compiled with borland, djgpp, quickc, and gcc. Gcc won, though not an equal test since that one was on linux instead of dos.
I raised an eyebrow at Rust making the list ahead of the most common languages, because I was shocked, but I've heard of Rust before. I am a python, go, and Javascript/Typescript developer so I knew my favorite languages were not going to make the top 5 although I am curious where golang ended up.
I am floored by zig, a language I've never heard of, dominating the list. I'm now hanging on every word, Dave! This is a very fun video! Thank you!
I'm looking forward to the GPU versus CPU video because I am familiar with shaders and how they can use the GPU to do some very impressive visuals on screen with next to no perceivable impact to phone performance because it utilizes the often dormant GPU rather than the heavily taxed phone CPU to draw to the screen.
Thanks for the great content! I'm loving your videos even though I think you played a major part in those windows progress bars I have stared at for a lot of my career! (I got paid to watch a lot of them, so I'm not really mad) 😆
Read the Description of the Zig Solution. They are using compile time code execution to precompute primes. This is not cheating, because compile time code execution is one of the most important features of Zig. The speedup by a factor of 6 over c/c++ cannot be achieved without some out of the box tricks.
The problem is that this speedup factor is not possible for more general tasks, like a database query or a rendering engine.
Thanks!
The best answer in comment that explain the issue instead of whining why rust should be #1 and that zig cheat to the leader spot.
> This is not cheating, because compile time code execution is one of the most important features of Zig
With all repect to Zig, it just means they had another algorithm during runtime than other competitors. Dave was talking on that - that the algo should stay the same.
And I totally agree that for more general tasks it is not possibe. So, addressing "one of the most important features of Zig" argument, let's imagine that we created a special language that's most important feature would be printing out prime numbers up till one million - and we just hardcoded these numbers into the languge's source. So it will be O(1) task for the language - because it would just print out the stuff it has in memory. I think it would be same sort of "compile time" trick.
'This is not cheating"
Of course it's cheating. It explicitly violates the rule that no prebaked information can be included about primes other than 2. Just because the prebaking happens at compile time makes zero difference.
@@isodoubIet so what language you think should be the winner?
I was legitimately surprised to see C++ for far down in this list given its powerful compile time features, so this was very interesting. While I don't use Zig myself it's impressive to see how much the compiler can do in this language to optimize the runtime code. Very excited for a deeper dive into the solutions to find out the sources of these massive performance gaps
I can't wait to see the CPU vs GPU comparison. This series has been heaps of fun and I'm glad its not over just yet. Thanks for your and everyone else's efforts to bring this to life.
I also have to go take a look at Zig now. I can hardly grasp how much faster it is than the rest.
Honestly, I've never heard of Zig, and I'd love it if you would do a video focussing on explaining what Zig is all about and how it differs from Rust or C/C++. Great video, and great effort by all those contributing!
Those are amazing results for Zig! I never heard of the language prior to this competition and so, naturally, I need do a bunch of research on it. Thanks for pulling this together and thanks to everyone who contributed and helped to manage the competition. Bravo!
That was cool.
The gap from C to Rust then to Zig was quite surprising. As someone who used to write some C and now writes a combo of Python and Rust ( with a smattering of allsorts in between ) it was very interesting indeed.
As for the CPU vs GPU, I wonder what tools exist around compiling, optimisation and memory management?
I'd imagine there might be some interesting results here, even changing drivers about.
Another interesting comparison would be comparing NPU or FPGA results to GPU and/or CPU. Only if you can manage to get your hands on one of course.
That's a serious leap to Zig there, a clear doubling on the final spot is quite something. And your bit about how assembly is hard to optimise for is a great insight into the very nature of programming languages as a concept.
The CUDA programming sound very interesting and I look forward to that episode
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave didn't pick up on this !!!
As someone who loves the concept of Rust and has used it from time to time, glad to see it rank well. At a previous employer they refused to let me have access to a powerful machine and code I had written in PowerShell to interpret a large file set literally took hours to run. Moving to Rust it was done in minutes. It’s a great language.
Thanks for what you do Dave. One of my active projects is providing a place where people in rural areas can access great technology training and job opportunities. We plan to launch coding courses this year as well as courses related to OS and Security issues. I’m sure I’ll be referencing some of your content along the way.
Keep on trucking Dave :)
The wide gulf between Zig and everyone else didn't pass the smell test, so I pulled the code and ran the benchmark on my system to see if that was still the case 8 months later.
Single -threaded: Zig 14850.57, Rust 14832.62.
Multi-threaded: Rust 14545.44, Zig 13082.760
Also the Java solution is suspect. Three of the solutions are based on outdated versions of Java (8, 16, and 16).
Thank you, this really helps when deciding which programming language to learn and implement for the more casual programmer!
Very interesting. I'd never heard of Zig before, and I was almost certain that Java wouldn't make the top 5. Would you be willing to do a review of the disassemblies in an attempt to compare and contrast?
What an amazing project this was (and still is, I suppose) and what a great insight it gives into the wonderfull world of programminglanguages. Thanks for this amazing content Dave.
I think it would be cool to see build size and system memory usage in these top contenders. I wonder if Zig’s usage of heap allocations helped with its speed at the sacrifice of memory allocation.
I've really enjoyed watching episodes of Dave's Garage every since I've found his channel. It is piquing my interest in code, something I haven't touched since I was a kid. Thanks, Dave!
Super interesting. I definitely didn’t expect a language I’ve never heard of to take the cake. Thanks for the video.
I'd never heard of Zig, so that was a surprise. I'm currently learning to use Rust and love how simple it is to write code and documentation with an included markdown feature. It seems like the toolchain grabbed the best of C++, Doxygen, and Markdown ... added stuff I'm still trying to understand, and came up with something less OOP without losing anything. $0.02
Those are some interesting and unexpected results. Didn't expect such a big jump between C and Rust nor for Zig's #1 spot. Time to learn Zig!
Zig probably uses SIMD automatically ;-)
It used probably a mix of simd and compile time execution automatically or semi-automatically, That’s why all the speed ups. Still it is seems like a nice language to learn.
Someone in the Zig subreddit that contributed to the project said that for the Zig code they used SIMD everywhere they could. C and C++ only used it in the single threaded implementation.
TLDR: The single threaded benchmarks show a different pictures as they're the ones people focused on.
@@igorthelight so does C and C++. even where its not expected, such as copying a small struct uses (v)movaps/movups from sse. something is very wrong with the test here.
This was fantastic Dave. I was expecting C to win but was blown away by the results of Rust.
I did not even know Zig existed and it destroyed everything. This result made this series one of my all time favourites.
I am hoping the GPU might win the cpus but I have no clue. I am so excited.
One snag in the Zig result is that it spawns the amount of cores of threads instead of hardware threads. So it only spawned half the threads of the Rust implementation.
Sounds impressive right? Except that the result is based on the resolves per job, so divide the Zig result by 2 and you have a more real world comparison to Rust.
Someone just pushed a change to the Rust code to spawn the amount of hardware cores of threads in that implementation, so now it's comparable, and it wins against Zig.
The test was simply wrong, now that it's fixed Zig fell behind again.
I wonder if things really got tied down as well as they should've been with the requirements and guidelines for the comparisons, going through some of the comments and code logic it seems like some hardware factors were involved in it too which tells me the race was much closer than it really was. Either way I am glad to know that in any performance-centered application you can't got wrong with the go to langs like C, C++ and Rust, it's really nice to know about Zig too for future considerations!
Found this channel by the grace of the algorithm and this man is a gem and I’m pleased to have come across his content and personality, cheers!
Great explanation of the features, strengths, and differences between the top five languages in this test. It'll be interesting to see if the administrative overhead of assigning and receiving the tasks to the GPU will substantially affect the relative speed compared to the number of cores the task is assigned to.
Uh gosh a 4080 has 9728 cuda cores each of which can do SIMD up to 16. You can expect a performance increase of a view 1000x if his algorithm can run completely parallel.
It would be interesting to see the entire list of results. I have just started learning Rust so it was good to see how fast it can run.
I’m loving this series. It inspired me to write some prime sieves (and even a factoring tool) in BASIC, the only language I know. I ran it on my vintage Tandy hardware and had a blast looking for ways to optimize and speed up the code on the old hardware. Thanks for the content, Dave!
The different ways these languages go about accomplishing their performance is so interesting. Never been that interested in studying compilers, but now Zig has got me revved up to find out the details 😳
Check out Compiler Explorer. Check out the video "What Has My Compiler Done for Me Lately?"
Does Dave know about these things? I haven't seen very many of his videos.
Thank you Dave and team for running these tests. I did not expect Rust or Zig to do that well. I am interested in the GPU vs CPU results.
The difference between the top 3 entries is astonishing. After seeing c++ and c with marginal improvement over Java (quite surprised to see Java do high), Rust score blew me away. And then came Zig. Has the resulting code been decompiled to see if there haven't been any 'unfair' optimisations put in place by the compiler?
Apparently the scores are per thread and Zig used half the threads compared to rust and had less overhead and memory bottlenecks because of that. Something weird is going on with C/C++ too for sure, they aren't 5x slower than Rust
Honestly I was expecting good Java performance. After using the language for over 10 years it's actually pretty fast. The JIT is really good at compiling these types of tight loops and it produces code that's pretty close to the performance of C/C++ if you're not requiring anything that Java can't do easily. My question is how is Rust 3 times fast than C/C++?
Suprising results. Could you detail a bit the reasons for the large differences between Zig, Rust ans C? Comparing the machine code / disassembly of the main functions could be interesting. Also which solutions are actually using multi threading?
This is the question I also think should be made "Which are using MT?". My guess is that Rust is making heavy using of multithreading due to how easy it is in this lang and that Zig is using some sort of compile time evaluation to obtain such performing results.
@@irisaacsni Didn't he mention that you can't precompute the values beforehand? Otherwise the same can be done in Rust as well
@@phoenix-tt Do this apply to the compiler? Is metaprogramming prohibited?
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave - and almost no one in this thread - didn't pick up on this !!!
I think the biggest takeaway from this video is that we are capable of making modern languages like Rust and Zig that are just as fast as C or C++. As you said with assembly, just because a language is lower level that does not necessarily mean it will be faster if people can't manage to write extremely efficient code in that language.
Thanks for the great content!
I didn't even hear of Zig really much before this, and I am SO impressed with the preformance. I mean, basically doubling the second place winner, Rust no less, is no easy feat. Love the videos, keep it up!
I was going to move on to rust after completing c language study but am now considering zig. Thanks for the informative video. Concise, precise and well delivered.
I have heard about Zig for the first time about two weeks ago. Usually I write programs in C. But maybe I should take a look on Rust and Zig some day as well. I'm surprised D didn't evolve better.
They are definitely worth a look if you have a open mind for new languages. Zig is still in its early days but is already looking very interesting. Rust is more mature and can already be used in production, but it is still evolving (and very fast) and has a steeper learning curve.
No one wrote a good D solution. There is a compile time solved version (D is really good at this) but, obviously, it has not been included by Dave
I was not expecting such a large discrepancy between 1, 2 and 3.
I only program occasionally but Java was a big surprise to me to make top 5.
Given Rust and Zig's C relationship, an interesting follow up video could be to run the top C example in Rust and Zig with as little medication as possible. See any optimisations or other changes in compiler etc
I never understood why people think java is so slow. its used in many cluster, communication and database apps.
@@AnthonyJClink some people think that Python is efficient XD
@@ac4694 fair enough LOL
@@AnthonyJClink think people got bit of bad idea because of games that were coded in Java before or still is like minecraft and runescape and I think legacy code is biggest problem most code you see in wild is was coded in Java 8 while Java 17 is so much faster than Java 8 so people can have misconception of Java performance because of it
@@AnthonyJClinkNever got that either. Java's VM has had millions and millions of dollars poured into it over 3 decades, it's quite possibly the absolute fastest VM based language out there
Very interesting results. Being an oldschool 6502/Z80 (with the Rodeny Zaks ref guide)/X86 and then various 3GL and 4GL languages the results have re-opened my interest in learning some of the newer languages. Will be checking out some of the features in Rust, and Zig. You never know you still might be able to teach an old dog new tricks.
I still have my "Programming the 6502" book by Rodnay Zaks. I remember poking in machine code before I got my assembler.
That series is what brought me to your channel. I am not even a developer myself but I love the information!
Thanks Dave!! I'd be interested to see a breakdown of why the top 3 had such differences in speed. The results are very surprising to me. I would have expected only a few percentage points of difference.
That is an interesting outcome. I wonder how well this translates into real world performance. It would seem like rust or zig would be a good choice for microcontrollers based on the performance aspect. I guess that would depend a lot on many other aspects of the languages. I am very interested to see the gpu vs cpu episode.
Firstly i got to say i didn't expect Java to make it to the top 5,and also be close to C and C++, that is very interesting!
The other thing that blew my mind is the order of magnitude performance difference of Zig to C. And Rust too being about 4 times faster than C also was a surprising performance jump!
Pretty insane stuff.
Java is such a tricky language, like Double-edged sword. If written properly and carefully, java is so fast. Already many benchmarks prove this since long time ago, you can google it.
Was hoping for Zig and also guessed it would be of double the Rust solution performance, so this video is extremely satisfying for me. Great video and the project itself, thanks to all who did it!
Java is bit of a surprise but it just showcases how little we know about possible language optimizations just by judging from the outside, or even knowing the language but not deep (and I leave space for individual talent as well) as compared to experienced and dedicated programmers.
Very interesting comparison (and analysis from a perspective I have not considered), I'm still a novice learning python right now, but my list of priorities of what to look into next changed significantly now... thanks!
Great to see the grand finale🎉. Pity the final ranking was not standardized on thread configuration to have more comparable scores.
How C can be faster than C++ if I copy the c code and compile it with g++. If the generated machine code is different then, something is horribly wrong.
The only way to test speed is to write a real world application, not one routine. You need array stuffing retrieving, math incl. multiply, divide, sin, cos and integer, floating point, fixed point, bcd etc. string ops and others that are used in real live operations. You might have a hard time beating assembly or cross assemblers for portability.
Phil
Programmer since 1976 and un appoletic. I wrote many languages in my time.
Yup
Congratulations to the zig team, must be their strong efforts in data driven programming. However, also a strong result from Rust 🎉
No it's because of their strong efforts into the multithreaded leaderboard. Most other languages put their effort into the single threaded leaderboard, seemingly thinking it was the important one. I don't know why Dave decided to go with the leaderboard that had clearly way less participants.
@@Nox3x3 this is the correct response.. single threaded should be more competitive
As others have said, it's great to see a ranking of raw speed and it's certainly fun to see a competition like that. But that metric alone doesn't tell the whole picture when trying to figure out what language would be best to use for a given situation. I suspect if metrics included the consumption of various system resources on a weighted scale, we would see something quite different in terms of overall rankings. But still a fun competition none the less. Thanks for the great video Dave!
Thank you for posting this awesome video. I didn't expect Zig to be so much faster than the other languages.
It's interesting to see how different programming languages perform when it comes to prime number sieves. I was particularly impressed with Zig's performance, it's amazing how fast it was compared to the other languages. I also noticed that the video briefly touched on thread implementation, which is a crucial aspect of programming for performance. It would be great to see a more in-depth analysis of how each language handles threading and how it affects their performance in different scenarios. Overall, this was a great video for anyone interested in programming language performance.
Something must have happened with Zig and Rust for their results to be so much faster than C.
C was 4 threads (I think)
Rust was 32 threads. (16 core machine with 32 HT)
Zig was 16 threads (so as to avoid the slower HT virtual cores)
So around a4x gain over the C implementation is sort of expected. Close to double throughput of Zig compared to Rust implementations would also be expected.
End of the day, Rust / C / Zig are about equivalent in speed ... in that order, everything being equal.
Zig does get a slight edge over C in rare cases, as its more explicit than C and may give the compiler better info optimise off
This benchmark is showing implementation differences, not compiler differences.
Having said that, Zig is awesome. Its not radically different to C, just a better C
@@steveoc64 so basically this whole video is BS if this is true
Hmmm, something fishy about the the top two results there.
Those kind of performance increases suggest something very different is happening under the hood.
Given the nature of the program, there isnt actually a need to compute much at all during run time, no?
Couldnt basically the whole program just be a constexpr function (computed at compile time) ?
And to what level is computing at compile time vs run time cheating?
If it is not, then what we are seeing could be how fast Zig access the terminal to print out a constant string, not how fast it has optimized run time computation.
If the compiler decides something is constant at compile time, and therefore just reduces it to a constant, but the programmer hasnt explicitly told the compiler to do so, is it still cheating?
I would argue; yes. Because what we are seeing is the effect of something completely different.
If however, the optimization is due to f.ex concurrency, then the computational work is still performed, and i would consider the result more in line with a result you could reasonably expect to somewhat duplicate for a similar task.
I am totally flabbergasted by the jump in performance of the top two. I would have never expected this.
Mind you, a comparison based on several benchmarks is necessary as there is no guarantee that the order of the finalists will be the same.
The original episode is definitely my favorite. Never heard of Zig. Thanks Dave and everyone else.
Congratulations Zig on being the fastest language! It's amazing to see the advancements and innovations in programming languages, and Zig's speed is truly impressive. Keep up the great work!
A comparison of languages that optimize for developer time would be interesting too (CPU time is cheap, developer time is expensive).
I'd say most modern language are optimized for developer time, they just make different level of compromises to fit their use case.
Even languages like Rust. Compare to C++, it shifts a lot of checks from runtime to compile time. So though it takes longer to write, it takes less time to debug.
What I don't understand is why manually hand-optimized ASSEMBLY is not the winner here ALWAYS? One should just grab the Assembly output from the fastest compiled language so far, and then have experts for that CPU inspect that manually and see what can be optimized further, and that way Assembly should ALWAYS be the fastest. I find that ZIG's DOUBLE speed performance IS at least somewhat SUSPECT. Is it because they ran it on _16_ threads only (no hyperthreading) and then multiplied the performance statistic by 2 to extrapolate 32 thread performance? That would not be realistic because 16 threads may fit better into cache whereas 32 threads would NOT. Where would I find the ZIG source code and the generated compiled Assembly code? Does ZIG (and Rust) allow multiple copies of the same code (=32 threads) to just execute IN ONE CACHE MEMORY copy? That could explain the huge boost.
And YES, I would absolutely LOVE to see the CUDA optimizations and performance. My bet: If GPU CUDA is well programmed it WILL run faster than CPU code when it reaches 64+ concurrent threads.
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave - and almost no one in this thread - didn't pick up on this !!!
I'm no programmer, but I'm still surprised to see a language I've never heard of take the top spot. Especially by such a wide margin.
Good stuff as always, looking forward to CPU vs GPU!
Learning something new and relevant/interesting with every video. I hadn't heven heard of Zig before and I would never have believed that there is so much headroom above C/C++. Happy to see Java doing so well, as it's still my weapon of choice for most applications. Please keep the great content coming Dave! 🙏
As a Java developer I was very pleased to see my favorite language makes the top 5. I admit I was somewhat surprised to see Rust crushing C and CPP by such a large lead. But... then I watched the whole video. Jesus Christ, Zig...
That's kind of mindblowing. I expected that Rust would beat C, and once it hit second place I pretty much knew it would be Zig to top out, but the sheer difference between the top three absolutely baffles me! Definitely looking forward to the deep-dives, in the hope that they'll shed some light on exactly what's different between the languages and why the top 3 were so vastly different. Also eager to see how CUDA performs, because I don't immediately know how parallelisable this would be. I also think it'd be cool to discuss how SIMD may or may not help out with this problem.
I have seen some programmers underestimating zig, but for a language that is not even in version 1.0 this is incredible, I really hope to see its progress and how the community begins to consider it more for use in the industry.
I'm coding now since about 13 years( 9 of them with actual engineering in mind :D) and never heard of Zig - definitely have to take a look! Thanks for the great series!
Always enjoy your language speed tests. Here's to 100!
It would be interesting to see the entire ranking of all 94 languages. We would then know what languages to avoid, or we could tell our managers why certain areas of the project are slowing the system down.
Spoiler: The fastest was COBOL.
No, Delphi 😂
Was the COBOL done on punched cards?
Are the people in this comment section trolling? How can there be so many people saying that they have never heard of Zig and praising this video without even checking the code to see why Zig supposedly won? Like, shit, i knew intelligence was on decay but i did not expect to see this level.
Great review! I am not surprised at the four of the top 5 but I am surprised at #1 'Zig' which I had not heard of before - more to learn!
Assembly...
I've been writing a lot of assembly firmware for Microchip PIC 8bit µCUs over the last two and half decades; once I decided to try an high level language compiler. When decompiling the code I found a couple of interesting solutions I'd probably would never have thought about (one of which, putting two conditional jumps one after another, was mind-blowing, and I've been using it ever since in my Assembly code)
In short, I'm not surprised that no one managed to write an assembly code as optimized as a good compiler could.
Expert level hand-written assembly, 100%, should have been at position #1. That it didn't is a shocking and damning reflection on today's programming skills. It is embarrassing beyond words that Dave didn't pick up on this !!!
Yeah, no. There are clearly shenanigans going on here. This feels like a UserBenchmark AMD situation where the test is rigged to keep the obvious winners from beating the preferred winners. There is no way that properly optimized C code can be slower than other languages, especially given that 30+ years of work have gone into GCC and the "winners" over C are far newer with far less development. I've also seen some comments mention that optimized code for some languages was rejected.
I call BS.
So true, and the total omission of Pascal/modern Win32 Delphi Pascal, which is basically a better C++ in many ways, shows how imperfect and misleading these results are.
I had a strong feeling Zig might be a contender, when you mentioned that the usual suspects weren’t the fastest, but the difference between the top 2 & others is incredible.
Everyone seems surprised about Java making #5, as am I.. I thought it might have been Swift or oCaml or something like that 😅
Comparing "the language" is a red haring... it's the compiler that matters. And I don't buy for one second that a someone skilled in ASM optimization couldn't start with the compiled Rust ASM solution and be unable to find a single optimization to run any faster (considering the head-room of ~5000 p/s that Zig showed is possible) which would by definition put it in spot number 2 on the list. Nope, don't buy it.
I agree, analysis beyond the headline figures is required to understand what happened here.
Dont forget this is all click bait for the channel
I think that's against the rules. You can't start with disassembled code
What a nice video, fun competition to us as a programmer, what a surprise from Rust beating C/C++, and more surprise with Zig with double performance lead against Rust!
I did not expect the huge differences between the top 3. Thanks for this Mike.