Yes, I'd love to see low power modes; especially with WiFi, LoRa, nRF24, etc. There are a lot of use cases for something that periodically gathers some sensor data and transmits it to a central location, and you'd like it to run on batteries. So that ends up being a combination of brief amounts of radio usage with long periods in some lower power mode.
As soon as you want something that runs on battery, the deep sleep power consumption is by far *the* most important number. More important than number of cores or ram, even.
Agreed! A typical use case for a uC is measure/send/sleep, most of the time in sleep. So a more realistic test (for power efficiency anyway) is probably to measure a fixed unit of work, that is small compared to the loop interval (so mostly sleep), and measure the total energy used over a number of loops.
Very nice comparison, Gary! Much appreciated. ST has very low and ultra low power microcontrollers designed for battery applications. These are the L series and the U series. They also have the H7 series that can run up to 550MHz (I LOVE these faster boards, btw). For the Nucleo or Discovery boards, looking at the power consumption is a bit tricky, as these boards come with ST-LINK which if active, will draw power too (ST used to use an STM32F103 for their ST-LINK, which is the same uC as the bluepill). So you'll need to use an external serial programmer with these while disabling ST-LINK. Furthermore, using Arduino IDE is a bit tricky, as you never know how the uC is set (which peripherals are turned on). This is one instance where using CubeIDE offers so much more control and optimization features compared to Arduino IDE.
The H7 series is an application processor with an MCU as a co-processor. That is not comparable at all. I can take a I7 and compare it to an Iphone processor and say look the I7 is better.
@@excitedbox5705 Some H7 chips contain a Cortex M7 and a Cortex M4 core, which are intended for MCU applications. By definition, MCUs are a/some CPUs wired with a bunch of peripherals in a single package.
@@excitedbox5705 Not correct. The H7 is still a microcontroller and not an application processor. Some of them are dual cores. The H7s have a Cortex M7 with some units having an additional Cortex M4 processor.
Definitely interested in more tests, especially involving risk-V. I would love a glimpse of the asm output (just the inner loop) to be sure the differences are not down to the compilers being less clever with some of the cpus. Gary, you are such a resource, there is no way you are getting enough paid for this. 😊
My first thought as well....i think some of the difference may come out to be due to compilers......from what i learn is that compilers count a lot if your wright your code in C or C++, or any other language, except ASM.
It might not be so important for most applications, but I think it'll be very interesting to test floating point performance on this boards, since some of them don't include a floating point unit. Pure floating point workloads might not be very realistic, but at least some mixed workloads could be relevant. I can think of a applications, like a PID controller, that could easily apply to this microcontrollers
THANK YOU! A really well made clip, explaining the efficiency and how it was computed, all the steps required to asses a particular chip when figuring out your particular case. Rarely there's a mention that other things on the board need power too, so if you care about that, you should think about that. So very nice to see, also with numbers, the difference between the same microcontroller, but in different boards, how much different it can be in both performance and power consumption and performance/watt too! Rarely there's a mention in the quality of the benchmark and the relevance of it (in this case that it basically uses just the CPU/compute intensive, and doesn't stresses the RAM bandwidth or latency or the SD card/SSD. Because in the end, when basing a decision on this efficiency, the task(s) that will/would be run should be taken into consideration. And of course, it's nice to see relevant adjusted numbers so they can be compared easily. Like performance / MHz and total energy required to run a task, in mWh. While, conceptually I didn't really saw something new, that I didn't knew before, this is a video (or style of presentation for this topic) that I wanted to see for a looong time. Others usually rush a bit and only present the benchmark numbers and 1-2 calculations based on that. They don't focus on providing the full picture, to point to all things that can vary. For someone who has no idea how these things work, it's really good to know what to look for/at, not focus on one single thing. I have to say, I didn't knew that the Pi Pico is so efficient. Very nice to see that! Also didn't knew about that Magma Splash and that it's made in Romania. That's where I'm from! :D Though to fair, I didn't got to play with microcontrollers and SBCs yet, though I planned for a long time to, just never been forced to and never felt like having proper time and space to play with one.
A related question is how much power they use when doing nothing. That is, if the fastest boards are put into a reasonable (wake-able, say by timer interrupt) power save mode when they complete the task, what are the total maH used over the period of time that the slowest takes to complete the task? And over ten times that long? The energy cost of other operations, for example, I/O operations, is a whole other interesting kettle of fish when choosing a board for best battery efficiency in it's given application. But thanks for this data. It gives one a leg up.
At these numbers the regulator on the boards play a measurable role in power consumption as well. The ESP32 devkits especially the stock chinese ones have a big but not very efficient linear regulator.
Since you used the arduino ide sketch for both the esp and the arm m, wouldn't the underlying arduino core also affect performance? I wrote a simple program for the blackpill using the arduino stm core and also the stm HAL library(stm cube ide), the difference in code loop time is very big. Arduino core 300kHz, HAL library >1MHz.
Absolutely correct the Arduino IDE uses a hardware abstraction layer to provide a common API that is not very efficient and its performance varies a lot between platforms so this test is really pointless.
Indeed, also it varies even with the HAL, LL an assembly language, all within the same stm32 cube ide, thus all these ways to program the hardware using various "translation" to a more human friendly language affects the performance even as minimum as microseconds, since it has a bit more instructions to perfom for the same action
@@camiloherrera9268 Indeed, the results with Arduino IDE are often many times slower than STM32 HAL which is again slower than LL low level abstractions. The closer you get to the hardware the more efficient it is but also more complex to program. Also there is no way to program clock speeds or other low level functionality like shutting down parts of the chip for low power operation. STM32 has a low power core which is limited in functionality but uses tiny amounts of power. It is also possible to shut down the WIFI completely which halves the overall power consumption but none of this is available through Arduino IDE or is at best very difficult to use.
HAL has got nothing to do with it. This test is using simple C operations what compile directly to machine code instructions in all environments. However it's a very bad test for completely other reasons than that (see separate comment)
Enjoyable and informative video. A factor in the processors efficiency is also how much it consumes while idle (or asleep), the total efficiency is the sum of the power whilst active plus the power whilst idle over a given period. Also not mentioned is the scalable clock on the Pico, this can run from 16MHz up to 420MHz ,(albeit with some some issues/workarounds), and can be dynamically changed over a program cycle. Having this scalability in clock frequency can give the user much more flexibility in their design. The overclocking is discussed in a pair of Robin Grosset videos on youtube. In comparison the ESP32 can only operate at 80/160/240 MHz. I don't know what clock rates the other controllers can operate at.
verses an SMT32H7 Dual core M7 that would be interesting. As far as price it would be weird since they target different sectors. There are industries where an ESP device would not be legally allowed because it doesn't meet standards while the H7 is, thus their is a large price difference for that level of product. I think one single H7 chip cost like 25 bucks. Versus an entire ESP dev board is idk 10 bucks or something.
@@EdwinFairchild For fairness I think all options should be presented whether or not it meets standards. It may not meet yours but it meets mine. The high clock speed of the ESP32 when using both cores makes it a beast, albeit a power hog, but nonetheless a very powerful and very cheap($) MCU.
Really great video. We've used the cortex M4 in an embedded railway application to great success in the past for control. I think there is forward compatibility with cortex M4/M7 as well.
Great video, and explanation. Thanks! Subscribed I'm going to stick with the LOLIN ESP32 boards. I have many Wemos D-1 minis at work 24/7 for a few years without fail. Just started playing around with other boards. I have a LOLIN ESP32 Pico and Pico W. ESP32wins in my book.
@8:28 The units are seconds/cycles/second, so just "cycles" (shifted by whatever order of magnitude). This is a plot of how many clock cycles it took to run the algorithm.
A couple of points to note about this, you also need to consider the manufacturing process used as well. The process can have a big impact on energy efficiency, ie a lower process node will do the same work with lower power. In the comparison of the PI Pico vs the ESP 32 though, the process used was the same with both chips. However, it may explain why some ARM chips were not running as efficiently as others.
The PICO is really great for most things other than its sleep current draw. The STM and probably they Microchip part are orders of magnitude better in this. For the classic battery powered IOT device that might wake up every hour, do a bit of number crunching and transmit the result, the energy consumption is overwhelmingly the sleep state. Would be interested to see how the ESP chip does this as IOT is it's main use case. The M7 and H7 chips are great when you need raw performance and have less concern about power draw. I see these used a lot on drone flight controllers. The power draw really doesn't matter compared to the motors and the ability of the MCU to instantly switch on, reliability and plenty of grunt wins out.
Hey Gary, I really enjoyed this video. I’m sure many others have commented on this but if you cast those integers as floats and ran the same test, you would’ve seen extremely different results. It would be interesting to use a 50/50 integer and floating point workload and run the test again. The cortex M4Fs and M7s have built-in floating point hardware, as does the ESP32. The M0+ boards chug away with software emulation. The Pico would get hammered in this test.
Yes, true, but floating point operations only make up a small part of a program. Writing a whole benchmark which does nothing but emphasize floating point operations is unfair and doesn't reflect anything like the real world.
@@GaryExplains an excellent point! The same is also true of a benchmark based around division, which is an extremely uncommon operation in most programs and no representative. I'll post a separate comment on this topic.
Happy I skipped to 17:15 before commenting… Anyhow would have been great if you could have emphasis power consumption… Eg either trying for a relatively similar execution speed to compare power usage or oppositely for a relatively similar power use compare execution speeds.
Sorry, I just don't understand the point you are trying to make. Two of the four points on the summary are about power consumption. If you want the details watch the whole video. 🤷♂️
Really interesting video Gary thanks. This will probably sound stupid but I also want to say thank you for pronouncing Nvidia as en vidia, instead of how I've heard literally everybody else, (aside from me) say it, which is nur vidia! They may be right and we may be wrong but I much prefer your pronunciation. 😁
It's great to see performance comparison normalised to performance/hz - but it's still pretty much abstract and academic. In reality the only real world measure is performance/dollar (or euro/pound) It would be great if you could include this.
I think a couple of ARM cortex M3's can beat ESP32, easily. 😋 I have a couple of Cortex M3 microcontrollers that run at 256MHz (default is 216MHz) called AIR32F103. The AIR32F103 has a performance rating of 2.54 DMIPS/MHz (coremark), which is significantly higher than the STM32, which has only 1.25 DMIPS/MHz. The AIR32F103 has a much lower wait stage to read code from Flash, allowing it to execute code twice as fast as the STM32, and it can do so at a much higher clock speed. And they are more power efficient, they are produced on a smaller production process(45nm, not 100% sure) by TSMC 27.73 mA @ 72Mhz ( all peripherals enabled ) 38.50 mA @ 216Mhz ( all peripherals enabled )
esp32 boards uses linear regulator that waste like 40% of the power , pico board uses a dc-dc buck-boost regulator . Just run the esp32 with 3.3v by the 3.3v pin and retest .
Great video and lots of details. Just curious to understand if the ESP32 was running with dual core or single core for all the measurements. If it was dual core, it would be great to know how these numbers change if 1 core could be powered off. Will the performance degrade or was the code single threaded and only the power will decrease..
First generation of ESP32 is ancient now. ESP32-S3 would've moped the floor with ARM, if it was included. Talking about performance, ESP has wifi and Bluetooth by default.
@@mecatronicsforeveryone9565 Totally agree.. ESP32-S3 has better performance, but still the question remains about ESP32 dual core power numbers vs M0+ or M4 single core power numbers
@@anantaustin I think he used only a single core with the Arduino framework for his tests. I too would like to see the ESP32 tests be redone with both cores working at the task.
Very detailed and interesting and yet convincing comparisons. I wonder how the AVRs would fair in similar comparisons since many hobbyists are still sticking to them and some argued that the AVRs are more power efficient..
AVRs are great chips, but being 8 bit they will really suck doing a 32 bit divide, needing several hundred instructions. That together with the low 16-20 MHz clock speed will make each division take maybe around 10 µs. Sadly, Gary didn't provide his source code so we can try it ourselves.
Also, it would be interesting to know whether the Arduino IDE setup for the pico uses the hardware divider in the pico or whether it is doing software division.
@@ryanbellyt Today's video (which I will upload shortly) features the S2 and the C3. The S3 will appear in my dual-core MCU showdown video, in the next week or so.
I'm regards to power consumption, you can shut off WiFi. There are even ESP32 variants without WiFi. It would be interesting to see the power consumption of the system itself without peripherals like WiFi.
Which of course raises the question why use a microcontroller with WiFi if you want to shut it off. Also the Pico W has WiFi and it was more efficent than the ESP32.
@@GaryExplainsnot to revive an old post but maybe familiarity and ease of use. The fact that the s3 has flashing/jtag debugging available just by wiring USB to 2 pads of the chip without extra stuff makes it really convenient on a board. All it really needs to power up is a clock, heck u could even drop decoupling caps and it would still work just fine (though not recommended). The external components needed to get the bare chip running is pretty much nothing and if u use a module then it's all built in already. Makes sticking on a board really convenient. Idf is very nice to use. Examples everywhere on how to code just about anything due to popularity. The price u pay for the easyness of it is of course power and lack of gpio but for some cases those benefits could just outweigh the cons
@@GaryExplains make sense, but i was interested to see how it would compare to it bigger sister the esp32 so can you please make another video comparing all the esp microcontrollers
Thanks for your enthusiasm, at the moment I don't see myself making a video including the ESP8266 or the other ESP32 variants like the ESP32-S2 or -S3. These videos take a lot of time and effort and while asking is simple, actually making them is hard.
What I find interesting is how well the Pico M0+ holds up, the fact it’s dual core with a highly repetitive task like this could put it on par with the STM32 F7 board, and the ESP32 also being dual core would move in front, both are quite inexpensive too
There is a very simple reason the Pico does so well, beating the other M0+ boards by a large margin, and matching M4 despite the M0+ core not even having a division instruction. Sadly, Gary obviously isn't aware of this reason. See my separate comment.
Gary, Please, load me up on benchmarks..!! Any boards you can get a hold of. How about benchmarks such as... - Matrices operations - String comparisons - Power usage for IoT sensor read scaling (e.g. 1/s incrementing to (n)) - IoT with multiples of sensors - IoT remote push, and pull.
@Gary, Thanks for this vvideo! I am very interested in seeing what other benchmarks you could make to compare microcontrollers like this. This one helped me to decide which controller to use in one of my current projects. I figured out that the esp32's I already have will do what I need. I was thinking that I would have to buy something with more processing power.
@@GaryExplains I love the Nucleo STM32H743ZI. I did a project with it in my research lab, simulating some fake detector data to test a new DAQ system. Worked like a charm. But you're absolutely correct. For makers, the boards used in the video are the most relevant.
Very nice results, please can you make a hardware and software floating point calculation comparision because here will shine the M4 ,M7 and ESP32 as i think.
Had there been a meta block that would let you matel the probe of the ASETEK Vapochill Lightspeed Phase Change Cooler unit with the Raspberry Pi IV but also keep the moisture away from the electronics of the SBC then that might be a idea.
Also note that ESP32 has 2 cores, and I don't think you can shut one down. So distribute the prime factoring task among two cores should improve the power efficiency a bit? 🤔
Thanks for the video. Int32 is one thing, int64 is another. Try the double float on the Teensy 4@816Mhz, K210@600Mhz, Pi Pico@240Mhz and you will see all the differences! BTW, ESP8266 is really good on integer, better than ESP32. And ESP32 S3 is much better than ESP32! STM32's compiler is really smart. If you use -O3, when add up from 1 to a million, the compiler will give you the result rather than STM32 doing the real calculation in run time!!!
I doubt there are many microcontroller programs that need to do intensive double floating point operations. All compilers do the optimization you mention.
Hi Gary, not sure if this question makes sense, but I believe the Pico has 2 cores, did your trial use both (I assume other boards have multi core, did you use those)? If not, I'm guessing that would make a difference to speed, but how much difference to power usage?
I would love to see power consumption results with some higher precision. To be honest 1mA granulity is not so great and whole calculations seems really crude. Btw, how did you get those current numbers? Did you capture whole current profile for the board and calculate the mean of it?
ESP32-S3 is now available, and is especially interesting because it now includes an FPU for fast floating-point calculations, and ESP32-C3 is also available which is their newest "basic" chip, is the RISC-V one you're talking about, and includes inbuilt usb-serial (you can skip an external chip like CP2104 / CH340 etc if you want) and basic wifi & bluetooth.
another question i have, what is the mWh for say a pico, running at 10mhz? 50mhz? 100mhz? does it get more efficient at lower speeds? or higher speeds? what if you clock every board, so they all complete the benchmark in the same amount of time, and have equal performance/time ratings will the relative mWh performance change?
A Pico running at 50MHz is less efficient than one running at 133MHz (58 vs 41 mWh). Same for the ESP32, running it at 160MHz is actually less efficient, and 80MHz even less efficient.
@@GaryExplains sounds like there is a constant current draw that acts as a baseline, and then a second variable*freq current draw, so the faster you go, the smaller the baseline is relative to the total
Indeed. As I mention in the video, I am actually measuring the current draw of the board (which should be constant minus the MCU current), not just the MCU. Interestingly the Pico overclocked at 240MHz is more efficient than at 133MHz.
Hey, did you turn off/disable the WiFi in software? Asking because I know you can do this for the esp32 and that's supposed to make the WiFi chip use a lot less current.
@@GaryExplains because otherwise its an unfair comparison? If you need WiFi for your project, you won't go with one of the ones without WiFi. If you don't need it, you'll turn it off, meaning if you want to measure across them all then the fair thing to do is assume WiFi won't be used and disable it.
@@AbelShields How can it be unfair when I included the Pico W. How can it be unfair when I am testing the boards in their default configurations. Your logic makes me dizzy.
@@GaryExplains because the standard use case may not be in the default configurations, you have the option of turning off WiFi for higher performance per watt, and that's exactly what would be done if WiFi is not being used, as in your benchmark. You're handicapping the WiFi-enabled chips purely for having more features, which seems unfair.
@@AbelShields So you are trying to defend a bad design. The chip could easily be designed to have the wifi off by default until it tries to connect. So you are basically complaining that the chip is badly designed and why didn't I write my tests to bypass the bad design 🤦♂️
I appreciate the amount of work Gary puts into a video like this, and running all the tests but, sadly, this is a really terrible benchmark for most people. Finding prime numbers by Trial by Division is unrepresentative of most programs because it is built around -- obviously -- division, while most programs use division very very rarely. So rarely that many CPUs, such as the Cortex M0+, don't even bother to have a division instruction, instead using a software subroutine that probably takes up to around 100 clock cycles in the worst case. Other CPUs have a hardware division instruction, but some of these take 32 clock cycles (for division of 32 bit numbers) while others take 8 or even 4 cycles. Sometimes CPU cores with the same model name can be licensed from the supplier with a choice of fast (but big) or slow (but small) divide units. Cortex M3/M4 take 2-12 cycles for a divide depending on how many significant bits the result has. M7 seems to be 3-19 cycles. And then there is the Pi Pico, which has an M0+ that (like the other M0+ boards) doesn't have a divide instruction in the CPU. BUT, the Pico's SIO peripheral has a memory-mapped division unit. Rather than an instruction in the CPU, you write the dividend to one memory location, then the divisor to another memory location, and then 8 clock cycles later you can read the quotient and the remainder from two more memory locations. If you use a generic Cortex M0+ compiler then it might not know about this, but a specialised Pico compiler will know how to use the SIO divide. Or maybe it's just a custom __aeabi_idiv / __aeabi_uidiv function in the runtime library. I hope it's inlined. It is also very unfortunate that Gary has not published the source code he used so people can try it on other things, and also check that no mistakes have slipped in. If people are interested in a primes benchmark that IS representative of average programs -- because it doesn't use division -- they might want to look at hoult.org/primes.txt which I have tried on many machines including M3/M4/M7 and ESP32 as well as various RISC-V and x86 and Pi-like ARM boards. I don't have an M0+ result but I'd welcome one (or others)
If you truly want to compare efficiency, you have to measure the power for the mcu. For excample, the ESP32 often uses a ams1117 LDO, which has a quiscient current of about 15mA. If you use a different board, you will have a hugely different power consumption
I haven't studied the circuit diagrams of the myriad of ESP32 boards. Do you happen to know if, in general, powering the boards via the 3.3v pin bypasses the voltage regulator?
@@GaryExplains At least on my board, it does. I think it should always work like that, or the board would have to have a different voltage regulator for the 3.3V output afaik
@@GaryExplains At least i think thats how it works. I dont think there flows a quiscient current through the ldo, if you only apply power to the 3.3V line But that should be a simple test, I think
I wonder what would be the results if you didn't use arduino libraries (ideally CMSIS for arm chips). would the performance be better for all of them or some of the boards would catch up to eachother. honestly I knew bluepill would lose but didn't expect that much of a gap since it's not floating point operation. Sleep mode power consumption comparison would have been interesting too.
It could be mildly interesting to compare the efficiency of a few different single board computers and also compare those to a handful of older phones, since they're basically just computers you could for example repurpose one sitting in a drawer as a pi-hole or an octoprint server instead of spending $40+ on a RBpi.
@@GaryExplains Might try out Andronix it's a bit limited without root or do you mean performance wise? That depends on exactly how old we're talking, such as early SD 800-810 chips are getting into the range of barely useable (as a smart device) in 2022, but still about as powerful as a RBp 2 (maybe 3, but that's pushing it), so still fine for low power things and I've seen RBp 4 levels at between SD821 to SD835 which are technically still rather old devices these days. Honestly, the more ambitious thing about smartphones is just getting the bootloader unlocked and rooting a lot of them.
It seems like the test was run on a single core. ESP32 double core calculations is tricky thing. In default setup it uses second core to handle Bluetooth and Wifi network stacks.
A big factor for power usage of an entire board is the various support circuitry, like voltage regulator, whether it has a battery charging circuit, a separate USB chip, pull up/down resistors on buttons, etc.
I would love to see more benchmarks because the STM F7 series is not really comparable to the ESP32. The M0+ and M4 are more in the same tier. I think the FPU tests will show a lot more of the ARM strengths.
So you tested 4 different ARM generations and just one ESP32? Where is ESP32S3? How about the RISCV ESP32C3? Waiting for the video to finish to see if you make any comments on the ARM prices and availability :)
Gary, would you specify which ESP32? Since you say LX6, I believe you are using the original ESP32. The newest ESP32-S3 uses an LX7 processor. The new ESP32-S3 is basically the equivalent to the old ESP32 and the new ESP32-C3 is equivalent to the ESP8266. The latter is a closer match to the RPi Pico in terms of perf/power mix. Additionally, the quoted consumption of the ESP32 in the video looks like maybe both cores were active. The ESP32 (and probably the STM's) can shut down one core to have more favorable power profile. From its datasheet, the ESP32-S3 claims to scale from 1 core 40Mhz 32b inst at 22mA to 2 core 240Mhz 128b instr at 108mA. Quite a range! In any case, thank you for making this video. It would be excellent if you and Andreas Spiess (or others) would coordinate testing methods so we mortals can more easily compare apples to apples.
I feel like most microcontroller projects are not math-intensive, but timer or interrupt driven-especially the battery-powered ones. I’d also be curious how the ESP32 would compare at calculating primes if you used C++ code.
Or if you really need some more performance, step up to the ARM Cortex-M7 at 600 MHz (native, up to 1Ghz overclocked) on the Teensy 4.x series. I'd love to see what they do with the M8 or M9.
The Pico kinda baffles me. Especially compared to the other M0+ based cores. Usually the M0+ core does not include a floating point unit, hence finding prime numbers by division should be overall very poor. The Raspberry Pi Foundation must have implemented some sort of FPU extention to the M0+ core.
Well spotted! Not only does the M0+ not have an FPU (not useful here anyway), it also doesn't have any division instruction. There is a very simple reason the Pico does well.. It's not an FPU. Hints: SIO, memory mapped.
Very interesting! Basically it‘s a disappointment for the M7 whose boards are quite expensive compared to eg. the ESP32. Maybe you could also take the Kendryte K210 into your comparisons - also cheap chinese stuff with lots of power (even AI support) and support for micropython.
If you put ESP32-S3 in the list, the difference will be dramatic. not mentioning the price, also S3 has FPU and AI acceleration and wireless capability by default.
Don't forget differences go beyond raw power. U can also get way more gpio than the esp32 can dream of as well as better features and expansion capabilities. The fact that u can use greater external ram at faster speeds. More robust and faster interfaces, and a slew of other things. With the m7 that is possible as it's just an architecture and the manufacturers can add a ton more but with esp32 what is available is all u get. There aren't other manufacturers that offer xtensa with more features like with arm cores
Current consumption in different sleep modes is also very interesting. A lot of applications uses sleep mode to cut power usage.
Yes, I'd love to see low power modes; especially with WiFi, LoRa, nRF24, etc. There are a lot of use cases for something that periodically gathers some sensor data and transmits it to a central location, and you'd like it to run on batteries. So that ends up being a combination of brief amounts of radio usage with long periods in some lower power mode.
As soon as you want something that runs on battery, the deep sleep power consumption is by far *the* most important number. More important than number of cores or ram, even.
Actually in most low power use cases this is the most relevant metric. On battery they won't keep crunching bitcoins or prime numbers anyway.
Agreed! A typical use case for a uC is measure/send/sleep, most of the time in sleep. So a more realistic test (for power efficiency anyway) is probably to measure a fixed unit of work, that is small compared to the loop interval (so mostly sleep), and measure the total energy used over a number of loops.
@@markday3145 ... "Sleep and Peep"
Very nice comparison, Gary! Much appreciated. ST has very low and ultra low power microcontrollers designed for battery applications. These are the L series and the U series. They also have the H7 series that can run up to 550MHz (I LOVE these faster boards, btw).
For the Nucleo or Discovery boards, looking at the power consumption is a bit tricky, as these boards come with ST-LINK which if active, will draw power too (ST used to use an STM32F103 for their ST-LINK, which is the same uC as the bluepill). So you'll need to use an external serial programmer with these while disabling ST-LINK. Furthermore, using Arduino IDE is a bit tricky, as you never know how the uC is set (which peripherals are turned on). This is one instance where using CubeIDE offers so much more control and optimization features compared to Arduino IDE.
IIRC, there's a jumper onboard which you can remove and insert a current shunt or current meter.
The H7 series is an application processor with an MCU as a co-processor. That is not comparable at all. I can take a I7 and compare it to an Iphone processor and say look the I7 is better.
@@excitedbox5705 Some H7 chips contain a Cortex M7 and a Cortex M4 core, which are intended for MCU applications. By definition, MCUs are a/some CPUs wired with a bunch of peripherals in a single package.
@@excitedbox5705 Not correct. The H7 is still a microcontroller and not an application processor. Some of them are dual cores. The H7s have a Cortex M7 with some units having an additional Cortex M4 processor.
Definitely interested in more tests, especially involving risk-V. I would love a glimpse of the asm output (just the inner loop) to be sure the differences are not down to the compilers being less clever with some of the cpus. Gary, you are such a resource, there is no way you are getting enough paid for this. 😊
My first thought as well....i think some of the difference may come out to be due to compilers......from what i learn is that compilers count a lot if your wright your code in C or C++, or any other language, except ASM.
It might not be so important for most applications, but I think it'll be very interesting to test floating point performance on this boards, since some of them don't include a floating point unit. Pure floating point workloads might not be very realistic, but at least some mixed workloads could be relevant. I can think of a applications, like a PID controller, that could easily apply to this microcontrollers
You could use fixed point numbers instead
@@Henrix1998 that's not always realistic. Whether you can use fixed point depends a lot on your application.
THANK YOU! A really well made clip, explaining the efficiency and how it was computed, all the steps required to asses a particular chip when figuring out your particular case.
Rarely there's a mention that other things on the board need power too, so if you care about that, you should think about that. So very nice to see, also with numbers, the difference between the same microcontroller, but in different boards, how much different it can be in both performance and power consumption and performance/watt too!
Rarely there's a mention in the quality of the benchmark and the relevance of it (in this case that it basically uses just the CPU/compute intensive, and doesn't stresses the RAM bandwidth or latency or the SD card/SSD. Because in the end, when basing a decision on this efficiency, the task(s) that will/would be run should be taken into consideration.
And of course, it's nice to see relevant adjusted numbers so they can be compared easily. Like performance / MHz and total energy required to run a task, in mWh.
While, conceptually I didn't really saw something new, that I didn't knew before, this is a video (or style of presentation for this topic) that I wanted to see for a looong time. Others usually rush a bit and only present the benchmark numbers and 1-2 calculations based on that. They don't focus on providing the full picture, to point to all things that can vary. For someone who has no idea how these things work, it's really good to know what to look for/at, not focus on one single thing.
I have to say, I didn't knew that the Pi Pico is so efficient. Very nice to see that! Also didn't knew about that Magma Splash and that it's made in Romania. That's where I'm from! :D Though to fair, I didn't got to play with microcontrollers and SBCs yet, though I planned for a long time to, just never been forced to and never felt like having proper time and space to play with one.
A related question is how much power they use when doing nothing. That is, if the fastest boards are put into a reasonable (wake-able, say by timer interrupt) power save mode when they complete the task, what are the total maH used over the period of time that the slowest takes to complete the task? And over ten times that long?
The energy cost of other operations, for example, I/O operations, is a whole other interesting kettle of fish when choosing a board for best battery efficiency in it's given application.
But thanks for this data. It gives one a leg up.
Its a great concept to benchmark MCUs please carry on doing it , thanks alot
At these numbers the regulator on the boards play a measurable role in power consumption as well.
The ESP32 devkits especially the stock chinese ones have a big but not very efficient linear regulator.
That's a lot of hard work, Gary. Thanks for doing this!
Very informative, thank you! The flexibility of these microcontrollers is pretty amazing, allowing them to fit a wide range of use-cases.
Awesome video Gary.
More of these please.
Great video! Thank you for doing all the work. I’m sure this took a considerable about a time.
Since you used the arduino ide sketch for both the esp and the arm m, wouldn't the underlying arduino core also affect performance? I wrote a simple program for the blackpill using the arduino stm core and also the stm HAL library(stm cube ide), the difference in code loop time is very big. Arduino core 300kHz, HAL library >1MHz.
I have found the same results using a Blue Pill.
Absolutely correct the Arduino IDE uses a hardware abstraction layer to provide a common API that is not very efficient and its performance varies a lot between platforms so this test is really pointless.
Indeed, also it varies even with the HAL, LL an assembly language, all within the same stm32 cube ide, thus all these ways to program the hardware using various "translation" to a more human friendly language affects the performance even as minimum as microseconds, since it has a bit more instructions to perfom for the same action
@@camiloherrera9268 Indeed, the results with Arduino IDE are often many times slower than STM32 HAL which is again slower than LL low level abstractions. The closer you get to the hardware the more efficient it is but also more complex to program. Also there is no way to program clock speeds or other low level functionality like shutting down parts of the chip for low power operation. STM32 has a low power core which is limited in functionality but uses tiny amounts of power. It is also possible to shut down the WIFI completely which halves the overall power consumption but none of this is available through Arduino IDE or is at best very difficult to use.
HAL has got nothing to do with it. This test is using simple C operations what compile directly to machine code instructions in all environments. However it's a very bad test for completely other reasons than that (see separate comment)
This was very useful Gary. Thank you.
Enjoyable and informative video.
A factor in the processors efficiency is also how much it consumes while idle (or asleep),
the total efficiency is the sum of the power whilst active plus the power whilst idle over a given period.
Also not mentioned is the scalable clock on the Pico, this can run from 16MHz up to 420MHz ,(albeit with some
some issues/workarounds), and can be dynamically changed over a program cycle.
Having this scalability in clock frequency can give the user much more flexibility in their design.
The overclocking is discussed in a pair of Robin Grosset videos on youtube.
In comparison the ESP32 can only operate at 80/160/240 MHz.
I don't know what clock rates the other controllers can operate at.
Could you re-run the tests of the ESP32 using both of its cores? And maybe add the cost variable in the mix.
verses an SMT32H7 Dual core M7 that would be interesting. As far as price it would be weird since they target different sectors. There are industries where an ESP device would not be legally allowed because it doesn't meet standards while the H7 is, thus their is a large price difference for that level of product. I think one single H7 chip cost like 25 bucks. Versus an entire ESP dev board is idk 10 bucks or something.
@@EdwinFairchild For fairness I think all options should be presented whether or not it meets standards. It may not meet yours but it meets mine. The high clock speed of the ESP32 when using both cores makes it a beast, albeit a power hog, but nonetheless a very powerful and very cheap($) MCU.
Really great video. We've used the cortex M4 in an embedded railway application to great success in the past for control. I think there is forward compatibility with cortex M4/M7 as well.
Also, it would be great to get your views on the cortex m85. Is it a natural successor to the M7?
Great video, and explanation. Thanks! Subscribed I'm going to stick with the LOLIN ESP32 boards. I have many Wemos D-1 minis at work 24/7 for a few years without fail. Just started playing around with other boards. I have a LOLIN ESP32 Pico and Pico W. ESP32wins in my book.
@8:28 The units are seconds/cycles/second, so just "cycles" (shifted by whatever order of magnitude). This is a plot of how many clock cycles it took to run the algorithm.
Nice video on a difficult topic. A 8-Bit AVR as bottom anchor would be nice. And of course the ESP32-S3 should be included.
A couple of points to note about this, you also need to consider the manufacturing process used as well. The process can have a big impact on energy efficiency, ie a lower process node will do the same work with lower power. In the comparison of the PI Pico vs the ESP 32 though, the process used was the same with both chips. However, it may explain why some ARM chips were not running as efficiently as others.
The PICO is really great for most things other than its sleep current draw. The STM and probably they Microchip part are orders of magnitude better in this. For the classic battery powered IOT device that might wake up every hour, do a bit of number crunching and transmit the result, the energy consumption is overwhelmingly the sleep state. Would be interested to see how the ESP chip does this as IOT is it's main use case. The M7 and H7 chips are great when you need raw performance and have less concern about power draw. I see these used a lot on drone flight controllers. The power draw really doesn't matter compared to the motors and the ability of the MCU to instantly switch on, reliability and plenty of grunt wins out.
Image processing is a nice heavy useful benchmark. I run motion detection and jpeg decompression on ESP32 and it really pushes.
Thank you, Gary, for your most Appreciated, Articulated Interesting Hard Work.
Great info! this is exactly what I needed! Looking forward to the comparison with the RISC-V!
Glad it was helpful!
really good video 👍🏻
Good and solid comparison! Great work!
It feels like this channel is reading my mind, great videos btw!
@1:00, what about MIPS processors like the PIC32?
I don't think the PIC32 is that popular.. But the list of boards and processors that I DIDN'T include is long.
Hey Gary, I really enjoyed this video. I’m sure many others have commented on this but if you cast those integers as floats and ran the same test, you would’ve seen extremely different results. It would be interesting to use a 50/50 integer and floating point workload and run the test again. The cortex M4Fs and M7s have built-in floating point hardware, as does the ESP32. The M0+ boards chug away with software emulation. The Pico would get hammered in this test.
Yes, true, but floating point operations only make up a small part of a program. Writing a whole benchmark which does nothing but emphasize floating point operations is unfair and doesn't reflect anything like the real world.
@@GaryExplains an excellent point! The same is also true of a benchmark based around division, which is an extremely uncommon operation in most programs and no representative. I'll post a separate comment on this topic.
Happy I skipped to 17:15 before commenting… Anyhow would have been great if you could have emphasis power consumption… Eg either trying for a relatively similar execution speed to compare power usage or oppositely for a relatively similar power use compare execution speeds.
I think you should watch the video rather than skip to the end and then ask questions that were answered in the video 🤦♂️
@@GaryExplains Yes from 17:15 ⇒ It would have been great if you could highlight that more.
Sorry, I just don't understand the point you are trying to make. Two of the four points on the summary are about power consumption. If you want the details watch the whole video. 🤷♂️
Really interesting video Gary thanks.
This will probably sound stupid but I also want to say thank you for pronouncing Nvidia as en vidia, instead of how I've heard literally everybody else, (aside from me) say it, which is nur vidia! They may be right and we may be wrong but I much prefer your pronunciation. 😁
Great review video
Great to see good compered chips video
Thanks for sharing your expirence with all of us 👍 😀
It's great to see performance comparison normalised to performance/hz - but it's still pretty much abstract and academic.
In reality the only real world measure is performance/dollar (or euro/pound)
It would be great if you could include this.
But what about efficiency/dollar? The huge range of devices reflects the applications.
I think a couple of ARM cortex M3's can beat ESP32, easily. 😋 I have a couple of Cortex M3 microcontrollers that run at 256MHz (default is 216MHz) called AIR32F103.
The AIR32F103 has a performance rating of 2.54 DMIPS/MHz (coremark), which is significantly higher than the STM32, which has only 1.25 DMIPS/MHz. The AIR32F103 has a much lower wait stage to read code from Flash, allowing it to execute code twice as fast as the STM32, and it can do so at a much higher clock speed.
And they are more power efficient, they are produced on a smaller production process(45nm, not 100% sure) by TSMC
27.73 mA @ 72Mhz ( all peripherals enabled )
38.50
mA @ 216Mhz ( all peripherals enabled )
esp32 boards uses linear regulator that waste like 40% of the power , pico board uses a dc-dc buck-boost regulator . Just run the esp32 with 3.3v by the 3.3v pin and retest .
Great video and lots of details. Just curious to understand if the ESP32 was running with dual core or single core for all the measurements.
If it was dual core, it would be great to know how these numbers change if 1 core could be powered off. Will the performance degrade or was the code single threaded and only the power will decrease..
First generation of ESP32 is ancient now. ESP32-S3 would've moped the floor with ARM, if it was included. Talking about performance, ESP has wifi and Bluetooth by default.
@@mecatronicsforeveryone9565 Totally agree.. ESP32-S3 has better performance, but still the question remains about ESP32 dual core power numbers vs M0+ or M4 single core power numbers
The Raspberry Pi Pico W is dual-core, and has wifi and Bluetooth by default.
@@anantaustin I think he used only a single core with the Arduino framework for his tests. I too would like to see the ESP32 tests be redone with both cores working at the task.
Very detailed and interesting and yet convincing comparisons. I wonder how the AVRs would fair in similar comparisons since many hobbyists are still sticking to them and some argued that the AVRs are more power efficient..
AVRs are great chips, but being 8 bit they will really suck doing a 32 bit divide, needing several hundred instructions. That together with the low 16-20 MHz clock speed will make each division take maybe around 10 µs. Sadly, Gary didn't provide his source code so we can try it ourselves.
Gary, it would be interesting to see the results running your test program using both cores of the Raspberry Pi Pico.
Also, it would be interesting to know whether the Arduino IDE setup for the pico uses the hardware divider in the pico or whether it is doing software division.
or both esp32 cores
Hello Gary, the ESP32 sports a dual core CPU. I was wondering if the benchmark tests accounted for it
The RP2040 is also dual core.
I'm looking forward to Risc V evaluation.
Could you try again with the new esp32 s2 and s3? Maybe the risk v c3?
If you watched the video to the end you would have heard me say that I will be testing the C3.
@@GaryExplains Yep heard that, main point was around s2 and s3 as they are the current generation
@@ryanbellyt Today's video (which I will upload shortly) features the S2 and the C3. The S3 will appear in my dual-core MCU showdown video, in the next week or so.
I'm regards to power consumption, you can shut off WiFi. There are even ESP32 variants without WiFi. It would be interesting to see the power consumption of the system itself without peripherals like WiFi.
Which of course raises the question why use a microcontroller with WiFi if you want to shut it off. Also the Pico W has WiFi and it was more efficent than the ESP32.
@@GaryExplainsnot to revive an old post but maybe familiarity and ease of use. The fact that the s3 has flashing/jtag debugging available just by wiring USB to 2 pads of the chip without extra stuff makes it really convenient on a board. All it really needs to power up is a clock, heck u could even drop decoupling caps and it would still work just fine (though not recommended). The external components needed to get the bare chip running is pretty much nothing and if u use a module then it's all built in already. Makes sticking on a board really convenient. Idf is very nice to use. Examples everywhere on how to code just about anything due to popularity. The price u pay for the easyness of it is of course power and lack of gpio but for some cases those benefits could just outweigh the cons
i loved it when you hosted back then at android authority and now I love your own content more
Always interesting!
Why you did not include the esp8266 its still popular and it can run @160MHz
The list of what I DIDN'T include is long...
@@GaryExplains make sense, but i was interested to see how it would compare to it bigger sister the esp32 so can you please make another video comparing all the esp microcontrollers
Thanks for your enthusiasm, at the moment I don't see myself making a video including the ESP8266 or the other ESP32 variants like the ESP32-S2 or -S3. These videos take a lot of time and effort and while asking is simple, actually making them is hard.
What I find interesting is how well the Pico M0+ holds up, the fact it’s dual core with a highly repetitive task like this could put it on par with the STM32 F7 board, and the ESP32 also being dual core would move in front, both are quite inexpensive too
Indeed. I will be making a dual-core battle video soon!!!
There is a very simple reason the Pico does so well, beating the other M0+ boards by a large margin, and matching M4 despite the M0+ core not even having a division instruction. Sadly, Gary obviously isn't aware of this reason. See my separate comment.
Gary,
Please, load me up on benchmarks..!!
Any boards you can get a hold of.
How about benchmarks such as...
- Matrices operations
- String comparisons
- Power usage for IoT sensor read scaling (e.g. 1/s incrementing to (n))
- IoT with multiples of sensors
- IoT remote push, and pull.
@Gary, Thanks for this vvideo! I am very interested in seeing what other benchmarks you could make to compare microcontrollers like this. This one helped me to decide which controller to use in one of my current projects. I figured out that the esp32's I already have will do what I need. I was thinking that I would have to buy something with more processing power.
Ever look at the STM32G and STM32H serieschips?
Not specifically as they don't seem to be popular in consumer level microcontroller boards.
@@GaryExplains I love the Nucleo STM32H743ZI. I did a project with it in my research lab, simulating some fake detector data to test a new DAQ system. Worked like a charm. But you're absolutely correct. For makers, the boards used in the video are the most relevant.
both are used in drone flight controllers (G4 & H7)
@@marc_frank The H7's do look interesting.
The RP Pico will run at 250MHz all day. It can do over 400MHz but inevitably will have some issues with I/O not working properly.
Price is also an important factor. Do include that and price to performance ratio as well.
Hey Garry i love watching your videos . If possible make a video on snap 8 gen 2
The Snapdragon 8 Gen 2 hasn't been released yet, how can I make a video about it? I am sure I will publish a video about it on November 15th.
@@GaryExplains Will be waiting for it . I am pretty much excited about Snap 8 gen 2
Where power is limited such as in space projects and time is not an issue, I can see them using boards such as the pico.
Very nice results, please can you make a hardware and software floating point calculation comparision because here will shine the M4 ,M7 and ESP32 as i think.
Yes do a more extensive test pls!
Had there been a meta block that would let you matel the probe of the ASETEK Vapochill Lightspeed Phase Change Cooler unit with the Raspberry Pi IV but also keep the moisture away from the electronics of the SBC then that might be a idea.
Also note that ESP32 has 2 cores, and I don't think you can shut one down. So distribute the prime factoring task among two cores should improve the power efficiency a bit? 🤔
The RP2040 in the Pico also has two cores.
Thanks for the video.
Int32 is one thing, int64 is another. Try the double float on the Teensy 4@816Mhz, K210@600Mhz, Pi Pico@240Mhz and you will see all the differences!
BTW, ESP8266 is really good on integer, better than ESP32. And ESP32 S3 is much better than ESP32!
STM32's compiler is really smart. If you use -O3, when add up from 1 to a million, the compiler will give you the result rather than STM32 doing the real calculation in run time!!!
I doubt there are many microcontroller programs that need to do intensive double floating point operations. All compilers do the optimization you mention.
Interesting how clock speed seems to matter a lot even across completely different architectures. The architecture only matters a little bit.
With the graph at 9:00 you’re basically comparing number of clock cycles, no?
Hi Gary, not sure if this question makes sense, but I believe the Pico has 2 cores, did your trial use both (I assume other boards have multi core, did you use those)?
If not, I'm guessing that would make a difference to speed, but how much difference to power usage?
ESP32 and Pico both have two cores. This was a single core test. I am doing another video to look at dual core performance and power consumption.
I would love to see power consumption results with some higher precision. To be honest 1mA granulity is not so great and whole calculations seems really crude.
Btw, how did you get those current numbers? Did you capture whole current profile for the board and calculate the mean of it?
Looking forward to seeing the ESP32-S2 (or what the Risc-V model number is) in comparison
ESP32-S3 is now available, and is especially interesting because it now includes an FPU for fast floating-point calculations, and ESP32-C3 is also available which is their newest "basic" chip, is the RISC-V one you're talking about, and includes inbuilt usb-serial (you can skip an external chip like CP2104 / CH340 etc if you want) and basic wifi & bluetooth.
good comparison but power saving/sleep/deep sleep modes power consumption in Wh or actually nWh are very imporant too.
What is -O3?
Maximum compiler optimization.
I'm wondering what would be the results if you will use both cores of pi pico
another question i have, what is the mWh for say a pico, running at 10mhz? 50mhz? 100mhz?
does it get more efficient at lower speeds? or higher speeds?
what if you clock every board, so they all complete the benchmark in the same amount of time, and have equal performance/time ratings
will the relative mWh performance change?
A Pico running at 50MHz is less efficient than one running at 133MHz (58 vs 41 mWh). Same for the ESP32, running it at 160MHz is actually less efficient, and 80MHz even less efficient.
@@GaryExplains sounds like there is a constant current draw that acts as a baseline, and then a second variable*freq current draw, so the faster you go, the smaller the baseline is relative to the total
Indeed. As I mention in the video, I am actually measuring the current draw of the board (which should be constant minus the MCU current), not just the MCU. Interestingly the Pico overclocked at 240MHz is more efficient than at 133MHz.
Hey, Gary. How about a comparison between STM32F4x1 Blackpill vs Raspberry Pi Pico?
Comparison as in features, price, and ecosystem, or in terms of performance and efficiency? Because the data for the latter is already in this video.
@@GaryExplains The former, I think.
We’re both cores used on the PICO?
This makes me wanna see you do an analysis on apple’s “effecient” cores, nice work
add the teensy boards at a few speeds?
Sadly I don't have any Teensy boards and they are hard to obtain at the moment.
Hey, did you turn off/disable the WiFi in software? Asking because I know you can do this for the esp32 and that's supposed to make the WiFi chip use a lot less current.
Why use a chip with builtin WiFi, if you don't want to use the builtin WiFi. The Pico W also has WiFi.
@@GaryExplains because otherwise its an unfair comparison? If you need WiFi for your project, you won't go with one of the ones without WiFi. If you don't need it, you'll turn it off, meaning if you want to measure across them all then the fair thing to do is assume WiFi won't be used and disable it.
@@AbelShields How can it be unfair when I included the Pico W. How can it be unfair when I am testing the boards in their default configurations. Your logic makes me dizzy.
@@GaryExplains because the standard use case may not be in the default configurations, you have the option of turning off WiFi for higher performance per watt, and that's exactly what would be done if WiFi is not being used, as in your benchmark. You're handicapping the WiFi-enabled chips purely for having more features, which seems unfair.
@@AbelShields So you are trying to defend a bad design. The chip could easily be designed to have the wifi off by default until it tries to connect. So you are basically complaining that the chip is badly designed and why didn't I write my tests to bypass the bad design 🤦♂️
Is relative performance a (i.e. performance/clock speed) more a measure or software , firmware and compiler or a measure of hardware and architecture?
When the Pico's Bluetooth comes online, which is already there on the ESP32, Will the power drain go up on the Pico W?
I would be interested to also see how these compare to the teensy 3.2 and teensy 4
Can you please include cost of each mcu?
What equipment do you use to measure electric power consumption?
the rp2040 and the ESP32 (some of them) are dual core and i expect that your sketch only ran on 1 core so in a way it is not the full power
No atmega328 in test. Even for fun. Rest in peace 😭
I would love to see a much broader and more intensive benchmark!
I appreciate the amount of work Gary puts into a video like this, and running all the tests but, sadly, this is a really terrible benchmark for most people. Finding prime numbers by Trial by Division is unrepresentative of most programs because it is built around -- obviously -- division, while most programs use division very very rarely. So rarely that many CPUs, such as the Cortex M0+, don't even bother to have a division instruction, instead using a software subroutine that probably takes up to around 100 clock cycles in the worst case. Other CPUs have a hardware division instruction, but some of these take 32 clock cycles (for division of 32 bit numbers) while others take 8 or even 4 cycles. Sometimes CPU cores with the same model name can be licensed from the supplier with a choice of fast (but big) or slow (but small) divide units. Cortex M3/M4 take 2-12 cycles for a divide depending on how many significant bits the result has. M7 seems to be 3-19 cycles. And then there is the Pi Pico, which has an M0+ that (like the other M0+ boards) doesn't have a divide instruction in the CPU. BUT, the Pico's SIO peripheral has a memory-mapped division unit. Rather than an instruction in the CPU, you write the dividend to one memory location, then the divisor to another memory location, and then 8 clock cycles later you can read the quotient and the remainder from two more memory locations. If you use a generic Cortex M0+ compiler then it might not know about this, but a specialised Pico compiler will know how to use the SIO divide. Or maybe it's just a custom __aeabi_idiv / __aeabi_uidiv function in the runtime library. I hope it's inlined. It is also very unfortunate that Gary has not published the source code he used so people can try it on other things, and also check that no mistakes have slipped in.
If people are interested in a primes benchmark that IS representative of average programs -- because it doesn't use division -- they might want to look at hoult.org/primes.txt which I have tried on many machines including M3/M4/M7 and ESP32 as well as various RISC-V and x86 and Pi-like ARM boards. I don't have an M0+ result but I'd welcome one (or others)
Thanks Bruce that is some valuable insight. I guess that explains why the SAMD21 is so slow. I will change the benchmark for my next round of testing.
If you truly want to compare efficiency, you have to measure the power for the mcu. For excample, the ESP32 often uses a ams1117 LDO, which has a quiscient current of about 15mA. If you use a different board, you will have a hugely different power consumption
I haven't studied the circuit diagrams of the myriad of ESP32 boards. Do you happen to know if, in general, powering the boards via the 3.3v pin bypasses the voltage regulator?
@@GaryExplains At least on my board, it does. I think it should always work like that, or the board would have to have a different voltage regulator for the 3.3V output afaik
@@davidpetry7853 So if I power the board via the 3.3v pin then the current measured won't include the voltage regulator drain.
@@GaryExplains At least i think thats how it works. I dont think there flows a quiscient current through the ldo, if you only apply power to the 3.3V line
But that should be a simple test, I think
Is this data available on a repository like GitHub?
I wonder what would be the results if you didn't use arduino libraries (ideally CMSIS for arm chips). would the performance be better for all of them or some of the boards would catch up to eachother. honestly I knew bluepill would lose but didn't expect that much of a gap since it's not floating point operation.
Sleep mode power consumption comparison would have been interesting too.
I'd like to see a comparison with the new ESP32-C3 with the RISC-V core
It could be mildly interesting to compare the efficiency of a few different single board computers and also compare those to a handful of older phones, since they're basically just computers you could for example repurpose one sitting in a drawer as a pi-hole or an octoprint server instead of spending $40+ on a RBpi.
Installing Linux on an old smartphone sounds quite ambitious.
@@GaryExplains Might try out Andronix it's a bit limited without root or do you mean performance wise? That depends on exactly how old we're talking, such as early SD 800-810 chips are getting into the range of barely useable (as a smart device) in 2022, but still about as powerful as a RBp 2 (maybe 3, but that's pushing it), so still fine for low power things and I've seen RBp 4 levels at between SD821 to SD835 which are technically still rather old devices these days.
Honestly, the more ambitious thing about smartphones is just getting the bootloader unlocked and rooting a lot of them.
It seems like the test was run on a single core. ESP32 double core calculations is tricky thing. In default setup it uses second core to handle Bluetooth and Wifi network stacks.
And?
I think power draw also depends on the process nodes of mcu, rpi2040 is built on 28nm while tensilica lx6 is probably above 32nm or 40nm
A big factor for power usage of an entire board is the various support circuitry, like voltage regulator, whether it has a battery charging circuit, a separate USB chip, pull up/down resistors on buttons, etc.
I would love to see more benchmarks because the STM F7 series is not really comparable to the ESP32. The M0+ and M4 are more in the same tier. I think the FPU tests will show a lot more of the ARM strengths.
So you tested 4 different ARM generations and just one ESP32? Where is ESP32S3? How about the RISCV ESP32C3? Waiting for the video to finish to see if you make any comments on the ARM prices and availability :)
No, I don't mention prices or availability, as they change over time making it pointless.
@@GaryExplains well… M7 will always cost as much as a bag of ESPs. Time will not matter.
Gary, would you specify which ESP32? Since you say LX6, I believe you are using the original ESP32. The newest ESP32-S3 uses an LX7 processor. The new ESP32-S3 is basically the equivalent to the old ESP32 and the new ESP32-C3 is equivalent to the ESP8266. The latter is a closer match to the RPi Pico in terms of perf/power mix.
Additionally, the quoted consumption of the ESP32 in the video looks like maybe both cores were active. The ESP32 (and probably the STM's) can shut down one core to have more favorable power profile. From its datasheet, the ESP32-S3 claims to scale from 1 core 40Mhz 32b inst at 22mA to 2 core 240Mhz 128b instr at 108mA. Quite a range!
In any case, thank you for making this video. It would be excellent if you and Andreas Spiess (or others) would coordinate testing methods so we mortals can more easily compare apples to apples.
Yes, the original ESP32 with the LX6. I don't have S2 or S3 boards. I have a C3 board on the way and that will be part of my RISC-V testing.
I feel like most microcontroller projects are not math-intensive, but timer or interrupt driven-especially the battery-powered ones. I’d also be curious how the ESP32 would compare at calculating primes if you used C++ code.
What about dspics?
Or if you really need some more performance, step up to the ARM Cortex-M7 at 600 MHz (native, up to 1Ghz overclocked) on the Teensy 4.x series.
I'd love to see what they do with the M8 or M9.
The Pico kinda baffles me. Especially compared to the other M0+ based cores.
Usually the M0+ core does not include a floating point unit, hence finding prime numbers by division should be overall very poor.
The Raspberry Pi Foundation must have implemented some sort of FPU extention to the M0+ core.
You can do it with just integer division. You only need to know if there is a remainder.
Well spotted! Not only does the M0+ not have an FPU (not useful here anyway), it also doesn't have any division instruction. There is a very simple reason the Pico does well.. It's not an FPU. Hints: SIO, memory mapped.
Very interesting! Basically it‘s a disappointment for the M7 whose boards are quite expensive compared to eg. the ESP32. Maybe you could also take the Kendryte K210 into your comparisons - also cheap chinese stuff with lots of power (even AI support) and support for micropython.
If you put ESP32-S3 in the list, the difference will be dramatic. not mentioning the price, also S3 has FPU and AI acceleration and wireless capability by default.
Don't forget differences go beyond raw power. U can also get way more gpio than the esp32 can dream of as well as better features and expansion capabilities. The fact that u can use greater external ram at faster speeds. More robust and faster interfaces, and a slew of other things. With the m7 that is possible as it's just an architecture and the manufacturers can add a ton more but with esp32 what is available is all u get. There aren't other manufacturers that offer xtensa with more features like with arm cores
Q: is #DDR5 memory *10 times faster* than the old brand X memory?
comparing M7 with the first generation of ESP32 is not fair. I hope to see ESP32-S3 in the list in the future.
As far as I can tell, the first generation of ESP32 is newer than the M7. So why is that unfair?
Hmm. I suspect that turning off unused radios and stop/idling unused related cores might make a huge difference in all of this.