Corrections: 3:22 It completed in 50% of the time and is 2x faster 5:24 It completed in 50% of the time and is 2x faster 6:06 Is between 1.5x and 2x faster
the big surprise for me is how similar are the performance between the risc-v core and the arm core. thank you Gary. btw, if possible, compare the sleep mode with esp32.
The M-33 ARM cores in the 2350 are rather old, very simple cores that lack the features that accelerate the faster cores. For example, there is not instruction cache, no data cache, no out-of-order execution and so on. Take all that trickery out and you will get comparable performance.
@@TheEulerID Instruction cache, data cache or out-of-order execution are all general concepts that are ISA independent (well, mostly; ooo can be a pain with some ISAs). Put "all that trickery" into both and you again have a similar performance.
@@petrkubena Yes, but the difference is that such features are way more developed on the more advanced ARM cores than what is available on the RISC-V as it takes a lot more development effort. If all you are doing is comparing stripped-down RISC cores designed for use as microcontrollers, then providing the ISA designers have done their jobs right, then you will tend to get very similar results. So, the OP was surprised by the similarity in performance. It should be the other way around; if there had been a significant difference i performance, then that would have been more surprising.
Am I bad at math? Some of the tests 2350 is twice as fast as the 2040, as it completes the work in half the time. Twice as fast is 100% faster… not 50%… right?! 😊
Hmmm... You got me worried now... I hope I haven't expressed it wrong. I am working with this definition: If something is "50% faster," it means that the time it takes to complete a task or action is reduced by 50% compared to the original time. For example, if a task originally took 10 minutes to complete, and now it is 50% faster, the new time would be 5 minutes (50% of 10 minutes is 5 minutes, so 10 - 5 = 5 minutes). So in summary, if something completes in 50% of the time, it means it is 2 times faster or can complete the same task in half the original time.
Great job Gary! You should do an in depth episode on how well the 2350 overclocks with benchmarking and so forth. It has lots of interesting hardware bits that goes well beyond just performance too. Such as the advanced DMA, the advanced PIO and the hstx interface and other hardware bits for hdmi support! Great new board!
@GaryExplains Thanks for running these benchmarks and publishing the results! But I couldn't find the code in your Github repos to try and reproduce the results on the Pirmoni 2350 board I have. Would you be able to publish the benchmark code and the methodology that you used to run the tests?
Im very impressed with the RISC-V core performance here. I was expecting it to be substantially slower than the M33 cores so a draw is a very promising result
It makes me suspect that actually there's no two separate cores but one hybrid core with common ALU and separated decode stages. And this is the reason why 4 cores are impossible to run simultaneously.
@@Механизм-ж9я I was thinking that too, especially with the power usage being identical. However looking at the spec sheet they are Hazzard3 cores and fully independent from the M33 cores. They just multiplex the busses
@@Механизм-ж9я you don't have to suspect anything, you can read the source code for the RISC-V cores and satisfy yourself that it is its own wholly independent design.
@@Механизм-ж9яwe have the source code for the RISCV cores available. We also know m33 is a very specific ARM CPU. They wouldn’t allow a hybrid anyway (ARM is fighting RISCV hard), but even if they did grant an expensive custom license, they wouldn’t allow Pi to dilute the branding of their cores.
Hey Gary, thanks for the informative video. Clear and conscience as always. Could you do a video comparing the RISC cores to the ARM cores. I wonder what their main strengths and differences are.
The results for the nqeens test is not making sense from a clock speed perspective. You would expect a 12% increase in performance between the 133 & the 150Mhz, however in actual tests its a 100% increase. What is the cause for such a drastic improvement? Is it that the new cores optimize instructions in some manner that doubles the throughput, or is it due to optimised C/C++ library routines?
I would love to see a test which demonstrates the benefits of having floating point support on the RP2350 over the integer-only calculations of the RP2040.
I wonder if you could run the same tests you did a year ago in your video "Arm vs ESP32". You could simply add the 2350 results to the prev results. The only thing that was not present in the previous video from a year ago was sleep power consumption.
The floating point unit is my interest as well, as my current Pico project has some floating point math. I’m waiting for my Pico 2 to arrive. Thanks for the video, Gary!
I think doing a video on the sleep stuff is likely worth the effort. You could also try a bit of a reduced clock and if possible a bit of over clocking.
Very interesting. Waiting for pico 2 board at the moment - only seems to be available via official suppliers at the moment and on backorder. I am porting my Nintendo GameBoy emulator to the pi pico 2040 at the moment. Given these benchmarks, maybe I can get it running GameBoy Color on the 2350!
Thank you for the video, despite that the percentages were wrong 🙂 it would have been interesting to also compare floating point performance, as the cortex-m33 comes with FPU. It also has DSP instructions. There is no one-to-one mapping between these and the cortex M0, but a comparison would be interesting anyway.
How is the RP2350 on integer divisions, compared to the RP2040? I'm under the impression the RP2040 has to do it in software (and also has no FPU), and hopefully the RP2050 cores have hardware to do that faster. It could still be different for 32 bit or 64 bit calculations though. For some applications, doing a lot of integer arithmetic in a loop, the lack of a hardware divider could make a lot of difference.
The RP2040 had a special divider circuit that the Raspberry Pi engineers added to the chip, it was actually quite good at integer division. See www.raspberrypi.com/documentation/pico-sdk/hardware.html#hardware_divider The Cortex-M33 and the Hazard3 both have integer division built-in.
@@GaryExplains Interesting. Thanks. I'm a bit surprised by this sentence, there: "On RP2350 there is no hardware divider, and the functions are implemented in software". I'm also having trouble working out if it's possible to make efficient use of, say, a 16 bit divider to do 64 bit arithmetic and wondering how many bits that one does in one operation. The part that says "The divider calculates the quotient / and remainder % of this division over the next 8 cycles," suggests 8 bits, to me. I once wrote an integer division routine for a Z80. I forget a lot of the details but it had to loop for each bit, doing long division (quite simple, because it's binary) so the time taken was proportional to the number of bits allowed for. I expect a hardware method still has the same limitations, but just does it quicker.
What that sentence means is that the RP2350 doesn't have the bespoke divider added to the first chip because the CPUs have instructions for integer division.
Corrections:
3:22 It completed in 50% of the time and is 2x faster
5:24 It completed in 50% of the time and is 2x faster
6:06 Is between 1.5x and 2x faster
The "50% faster" thing will be good for comment engagement.
Haha any time I have a percent sign in my script I know I'll screw *something* up!
the big surprise for me is how similar are the performance between the risc-v core and the arm core. thank you Gary. btw, if possible, compare the sleep mode with esp32.
The M-33 ARM cores in the 2350 are rather old, very simple cores that lack the features that accelerate the faster cores. For example, there is not instruction cache, no data cache, no out-of-order execution and so on. Take all that trickery out and you will get comparable performance.
@@TheEulerID Instruction cache, data cache or out-of-order execution are all general concepts that are ISA independent (well, mostly; ooo can be a pain with some ISAs). Put "all that trickery" into both and you again have a similar performance.
@@petrkubena Yes, but the difference is that such features are way more developed on the more advanced ARM cores than what is available on the RISC-V as it takes a lot more development effort.
If all you are doing is comparing stripped-down RISC cores designed for use as microcontrollers, then providing the ISA designers have done their jobs right, then you will tend to get very similar results.
So, the OP was surprised by the similarity in performance. It should be the other way around; if there had been a significant difference i performance, then that would have been more surprising.
Am I bad at math? Some of the tests 2350 is twice as fast as the 2040, as it completes the work in half the time. Twice as fast is 100% faster… not 50%… right?! 😊
Hmmm... You got me worried now... I hope I haven't expressed it wrong. I am working with this definition: If something is "50% faster," it means that the time it takes to complete a task or action is reduced by 50% compared to the original time. For example, if a task originally took 10 minutes to complete, and now it is 50% faster, the new time would be 5 minutes (50% of 10 minutes is 5 minutes, so 10 - 5 = 5 minutes). So in summary, if something completes in 50% of the time, it means it is 2 times faster or can complete the same task in half the original time.
You are right.
Who is "you", me or the OP?
You cant say, that 400 F is twice as hot as 200 C, because it is from different worlds. If something is twice as fast, it needs only half of the time.
@@GaryExplains It takes 50% of the time but its 100% faster
Great job Gary! You should do an in depth episode on how well the 2350 overclocks with benchmarking and so forth. It has lots of interesting hardware bits that goes well beyond just performance too. Such as the advanced DMA, the advanced PIO and the hstx interface and other hardware bits for hdmi support! Great new board!
Great info, Gary. Thanks for sharing this comparison!
Great video Gary. Can’t wait to see the next video where you test the FPU and encryption features on the m33
You and me both!
Interesting video. Thanks.
Amazing, thank you!
Excellent Gary. Thanks!
@GaryExplains Thanks for running these benchmarks and publishing the results! But I couldn't find the code in your Github repos to try and reproduce the results on the Pirmoni 2350 board I have. Would you be able to publish the benchmark code and the methodology that you used to run the tests?
Sorry for the delay, all the code is now uploaded: github.com/garyexplains/examples/tree/master/rp2350
You are best. Exclusive content.
Im very impressed with the RISC-V core performance here. I was expecting it to be substantially slower than the M33 cores so a draw is a very promising result
It makes me suspect that actually there's no two separate cores but one hybrid core with common ALU and separated decode stages. And this is the reason why 4 cores are impossible to run simultaneously.
They are not hybrid cores, that is plain. 4 cores need all the plumbing and infrastructure to run simultaneously, it only has that for 2 cores.
@@Механизм-ж9я I was thinking that too, especially with the power usage being identical. However looking at the spec sheet they are Hazzard3 cores and fully independent from the M33 cores. They just multiplex the busses
@@Механизм-ж9я you don't have to suspect anything, you can read the source code for the RISC-V cores and satisfy yourself that it is its own wholly independent design.
@@Механизм-ж9яwe have the source code for the RISCV cores available. We also know m33 is a very specific ARM CPU. They wouldn’t allow a hybrid anyway (ARM is fighting RISCV hard), but even if they did grant an expensive custom license, they wouldn’t allow Pi to dilute the branding of their cores.
When doing the deep-sleep test, please compare it with esp32
Hey Gary, thanks for the informative video. Clear and conscience as always. Could you do a video comparing the RISC cores to the ARM cores. I wonder what their main strengths and differences are.
If you mean Arm vs RISC-V in general, I already have a video about that.
thanks for the interesting insights. how do the arm and riscv cores compare in floating point performance?
The results for the nqeens test is not making sense from a clock speed perspective. You would expect a 12% increase in performance between the 133 & the 150Mhz, however in actual tests its a 100% increase. What is the cause for such a drastic improvement? Is it that the new cores optimize instructions in some manner that doubles the throughput, or is it due to optimised C/C++ library routines?
The RP2040 uses the Cortex-M0+ but the RP2350 uses the Cortex-M33. The latter gives more performance per clock cycle.
I would love to see a test which demonstrates the benefits of having floating point support on the RP2350 over the integer-only calculations of the RP2040.
Yes, that video is coming soon.
@@GaryExplains You just made my day! I struggled to use the RP2040 for DSP work, but the announcement of the RP2350 with FPU support has me stoked!!
Any news about the A/D converter, the 2040 one has a non-linearity in the middle severely limiting it's usefulness, hopefully the 2350 has fixed it
I've seen postings from RPi engineers saying it's fixed
I wonder if you could run the same tests you did a year ago in your video "Arm vs ESP32". You could simply add the 2350 results to the prev results. The only thing that was not present in the previous video from a year ago was sleep power consumption.
The RP2350 has hardware floating point support, can you test the speed difference with a floating point intensive test.
I will likely do another video about the special features like the hardware SHA256 and the FPU. 👍
Only the arm cores, have a fpu. The Hazard3, is calculating with an abacus.
The floating point unit is my interest as well, as my current Pico project has some floating point math. I’m waiting for my Pico 2 to arrive. Thanks for the video, Gary!
Thank you. I assume that during the encryption test that you did not use the hardware accelleration feature of the M33. So
Yes, correct. I will likely do another video about the special features like the hardware SHA256 and the FPU.
@@GaryExplains Decoding MP3, probably needs the fpu heavily?
I think doing a video on the sleep stuff is likely worth the effort. You could also try a bit of a reduced clock and if possible a bit of over clocking.
RP2350 is faster and it used less power than RP2040. It is a good product!
Very interesting. Waiting for pico 2 board at the moment - only seems to be available via official suppliers at the moment and on backorder. I am porting my Nintendo GameBoy emulator to the pi pico 2040 at the moment. Given these benchmarks, maybe I can get it running GameBoy Color on the 2350!
Thank you for the video, despite that the percentages were wrong 🙂 it would have been interesting to also compare floating point performance, as the cortex-m33 comes with FPU. It also has DSP instructions. There is no one-to-one mapping between these and the cortex M0, but a comparison would be interesting anyway.
I plan to do another video about the special features like the hardware SHA256 and the FPU. 👍
@@GaryExplains thank you, looking forward to it!
Nice talk! Do you have a git repo for us to look at the code?
Thanks. Yes, I have a GitHub repo, as I mentioned in the video. Just Google for "garyexplains GitHub" and you will find it 👍
I am a noob and having a question: Does hazard3, has something like "sse2"? If not, it will be much slower than arm, it seems to me.
Neither the Arm Cortex-M33 or the Hazard3 have something like SSE2, these are microcontrollers not general purpose application processors.
How is the RP2350 on integer divisions, compared to the RP2040? I'm under the impression the RP2040 has to do it in software (and also has no FPU), and hopefully the RP2050 cores have hardware to do that faster. It could still be different for 32 bit or 64 bit calculations though. For some applications, doing a lot of integer arithmetic in a loop, the lack of a hardware divider could make a lot of difference.
The RP2040 had a special divider circuit that the Raspberry Pi engineers added to the chip, it was actually quite good at integer division. See www.raspberrypi.com/documentation/pico-sdk/hardware.html#hardware_divider The Cortex-M33 and the Hazard3 both have integer division built-in.
@@GaryExplains Interesting. Thanks. I'm a bit surprised by this sentence, there: "On RP2350 there is no hardware divider, and the functions are implemented in software". I'm also having trouble working out if it's possible to make efficient use of, say, a 16 bit divider to do 64 bit arithmetic and wondering how many bits that one does in one operation. The part that says "The divider calculates the quotient / and remainder % of this division over the next 8 cycles," suggests 8 bits, to me. I once wrote an integer division routine for a Z80. I forget a lot of the details but it had to loop for each bit, doing long division (quite simple, because it's binary) so the time taken was proportional to the number of bits allowed for. I expect a hardware method still has the same limitations, but just does it quicker.
What that sentence means is that the RP2350 doesn't have the bespoke divider added to the first chip because the CPUs have instructions for integer division.
@@GaryExplains Okay. Thanks.
Which tests do use the FPU a lot? Only the M33 has the FPU.
This bodes well for things like the BlueScsi that uses the Pico as a brain.
19.6 to 40 secs is 100% faster !!?
Indeed. I added UA-cam corrections 2 days ago.
All you need is Esp32-C6
Power consumption is measured in Watt x time…
Using 50% of the time to complete the task should make the ARM twice as fast, not 50% faster…
First comment. I want a present.
🎁
@@GaryExplains Thank you!
Ok ill let you explain