For those asking about the rack, I don't remember exactly where I bought it. But here are a few from Amazon that you might like: geni.us/3AAUtfx and geni.us/DbWsT and for the UK this one: geni.us/rGjT6
Does this method your demonstrating using the MPI library allow these RPi's to combine system resources ? I know you said that it sees 4 Pi's = 16 Cores , does it also pool the RAM ? Assuming the answer to both those questions are yes , can you still enable ZRAM within the cluster ? When compiling programs on the Pi it tends to hit swap after a point , ZRAM allows the Pi to swap to the RAM instead of the MicroSD (im sure you familiar). I would want to use a setup like this for compiling , does this cluster configuration help me in that regard ? Thanks Gary .
Hello Gary, I would like to thank you for making this video and maintaining such an informative channel. you are very easy to listen to and your explanations are to the point. Keep up the good work. Today was the first day i saw your channel link and decided to give it a go. the raspberry pi is a great pltform. However the raspberry PI Zero is a very compact development board as well. The Pi Zero does not have the compute power of the big pi. however the integration of the wifi on the board makes it one of my favorites. I built a super computer using the PI Zero as the main board. I loaded each PI with aversion of Windows 2008 data Center Server. I used RUFUS to flash 64 GB microSD cards with the OS. Once i had the OS working I cloned it using tis program... clonezilla.org/clonezilla-SE/ I have a multi-slot microSD card reader. It holds 24 microSD cards and connects via usb 3.1. I installed teh Windows 2008 Data Center Server on the microSD cards in about 15 minutes. Windows Server 2008 Data Center can cluster up to 32 machines processors at once. The Windows platform is very stable yet it is a little large. About 3 GB on each microSD. I used the data center server services to aggregate all the Raspberry PI processors and resources. i used the Wifi on each chip to bridge them all together. I only spaced them about 1/2 inch apart. The network and teh data processing is so fast that it returns almost instantly. I mainly use the cluster just to browse the web and play games. It is absolutely over kill on any kind of gaming or graphics program. I set the paging file on all the drives to 512 initial max 4096. I also connected an external 3.0 6 TB hard disk to the USB mini on one of the PI. Number 32 in the cluster. Then all teh data i download goes thru the other 31 and passes down stream directly to the external like a funnel. I can download a 4K movie in about 55 seconds with my AT&T fiber. I thought you might be interested in looking at this flavor of PI super computer. Of course i have Python, PHP 7.2 and AMPPS installed on the cluster. Automatic load balancing and wifi VLAN tagging. The windows 2008 server data center can open up a whole venue of new and powerful applications you may be interested in
A long time ago, a man named Bill Gates had a vision: "A computer on every desktop" Now, thanks to Raspberry Pi, a new vision has emerged: "A supercomputer on every desktop"
I looked a few of these Raspberry Pi clusters and for less than $800 I bought a used quad processor 32 core Xeon Dell R820 with 96 GB of memory..... and it just works. Sure when it's running it consumes more power but it's a unified memory across the 4 processors which makes HPC easier.
What people forget about the old microwulf clusters is they use 2 gigabit connections to per board to share data. The Pi3 has 1 ethernet connection capped at 300mbps, which made clusters actually slower than a single Pi. Now that the Pi4 is here with true gigabit and USB 3 support to add a second one, a Pi cluster might actually be a viable project.
It depends on the problem you're trying to solve. If the problem you're trying to solve requires minimal network bandwidth(small inputs and outputs) but requires a large amount of CPU processing time... Then the older pi's will work just fine.
What I find interesting is that that program you ran is the equivalent of what was run on the EDSAC computer in the 1950s when it was doing nothing else and you are generating more prime numbers in 30 seconds than it could in just under 10 years.
Yeah, even the chips you find in those musical birthday cards have more computing power than all the Allied Forces put together did in WW2. It's crazy.
@@dashboy007 even your current computer/laptop/phone no matter the model is still many times more powerful than the greatest computers of the ones used for the first few moon landings
Thank you Gary. There are a lot of videos on how to build a cluster with multiple rapberrys, but this is the first time I actually see it running as a cluster. All other videos stopped after the build, or ran them as individual computers.
Reminds me of concurrent & parallel computing class back in college, specifically the grid computing chapter. The classic example we used back then was matrix multiplication, while for the project we choose to parallelize inefficient sequential sorting algorithm with final goal to beat quicksort up to certain data size (because eventually quicksort still wins, it's just a much more efficient algorithm after all).
@@infinity5288 Less in detail with more effort put into explaining the detail. I would say that teaches more, because the less effort put in, the less that you get your point understood, therefore you teach LESS because it is not taught, just stated.
The example you used with primes is one of concurrency rather than parallelism it seems. This is a very good primer on the basics of high performance computing though. Good video.
I do enjoy your videos, even though I already know pretty much everything you discuss usually. And I have recommended it to people who do need to learn about a topic. Great format, production quality and content. Thanks for making this.
perfect explanation 2:30 - 3:00 planning to have one built soon aside from having rpi4 , i was also thinking about orange pi to have as another alternative then mix them if possible just need more research on this thx sir Gary
excellent explanation. Thank you! Back in the early days of the IBM PC, I wrote a game with virtual robots that did combat in a virtual arena, and each "Warbot" ran its own program, which was an interpreted language I wrote just for that game. The language was called R-Code. In this case, The R-code interpreter was running 5 programs at once, and each program had it's own simultaneous i/o. That was pretty cool in the old DOS days before windows and multitasking.
@@hammercanttouchthis By that time, 1991? I was running it mostly at work on IBM XTs running MSDOS 3.3 or later, as i recall because we had 3.5 inch floppies. The entire programming environment and game fit on a single diskette. It was only 10,000 lines of QuickBASIC code. I never released it to the public, but I had one other friend who was interested in programming who liked writing R-code for the warbots.
Forgetting the cost of power for a moment I am curious how the performance compares to a PowerPC based cluster which was probably the first "out of the box" consumer level hardware solution available that could be configured as a supercomputer cluster.
Yeah, I agree that would be interesting. In fact building different clusters from various bits of historical and new hardware and then benchmarking them would be quite interesting, but alas very time consuming!
@@GaryExplains my master thesis where done in the early 90:es, and it was about creating dynamic computing clusters using heterogeneous computers (various hardware architectures at the time). Given the overall limitations we identified types of problems which could scale using the available technique. Great fun and on the cutting edge for it's time. The main benefit was that the computer clients connected where largely unaware that they committed computing cycles, the jobs was running int the background. The base was done in pvm, in many aspects the successor to mpi.
Thanks Gary. I also tried Apache Spark on Jetson Nano and it works. So I expect Apache Spark can work with Raspberry Pi too. The concept is the same.
5 років тому+21
Could you use a lower latency/higher throughput interconnect like direct PCIe connection to increase the performance? For a few computers it could be possible (certainly not on large scale supercomputers since PCIe 3 has max. cable length of 8 inches).
Closet we can get now a day in my Datacenter that i take cared is FC (Fiber Channel). Every single server in DC and inside cluster are connect together in network via FC for lowest latency and maximum transfer rate.
About checking if a number is prime: square it, subtract one and divide by 24. If that is a whole number (no digits after the decimal point) it MIGHT be a prime number, otherwise don't waste time checking further.
Nice work, Gary! And if you want to remove the overhead and speed up calculations greatly, you can switch from Python to Golang, for example, and have microservices do the work
I would much recommend using the raspberry pi compute module 3 instead of the regular boards. It has the same specs as the regular pi but are much more suited for this sort of thing, they're smaller so you can pack them more densely and need less wires, and can be ordered in bulk and cheaper. Unfortunately though you'd need to make your own custom board to connect them all together.
Just to be clear, you would recommend the modules to people who have the skills, knowledge and money to make their "own custom board to connect them all together."
@@GaryExplains that's correct. The assumption is that someone willing to build a cluster of raspberry pis will also be willing to make a custom board for his custom application
Relatively new Gen X'r eatinging up your content like a kid on Halloween. Thank you. 👊 I haven't finished watching this one yet. But I've got an ambitious build that could use this idea with what i hope is all Pi-5s, with the NVMe hat's or bases? My current Rpi5 in looking for setting it up through your content and keep getting distracted. Like just before i hit this i came your three week old post about Ai clipboard........ Since I'm simple. Before i break my brain. I want an offline LLM Ai that can, manipulate the training of working alongside another more critical* thinking Ai dev board for speech and facial recognition, while also making the Ai more personalised. Thank you for the link to github on that last one, simplified finding you. Reluctant to say anything else other than please don't Stop! Your extremely informative in a way this old man understands at least Ahaa. Great work Brother! Great work. May God Bless your Journey
MPI - messaging service used in supercomputer cluster Scatter n gather (lesser data, high latency) (more data, less latency as compared to single core) Public n private key- security
This just shows the purpose of Raspberry Pi, a learning tool. I never thought they would bring it to Server/Clusters. It's a great teaching tool from basic programming to now supercomputers. Raspberry Pi may not be a world record PC nor a Supercomputer with Tera flops in processing power. But it has proven to be a super teaching device that's caught a lot of interest world wide for those who want to jump in and learn. And a great gaming emulator! d^_^b
That is a neat idea of making a Super Computer from a group of small computers like the ones you mentioned. I wonder if IBM has thought of that since they are into the Super Computer business.
Yup, that was a project that the US Military did because the PS3s had a lot of cores in a relatively compact form. It made being able to source machines from around the world really eaay.
Gary, I was using two vastly different computers. Computer 1 System 76 Meerkat with an i5 7260u processor. That computer alone could do the prime number program in about 4 seconds. 2nd computer, celeron based ChromeBox but in developer mode and also running Linux. Both systems using PopOS! 19.04 from system 76. When I add the celeron computer into the mix it takes MUCH longer. Like 1 min 30. I thought that even though the second computer was much less powerful, it could at least add some help to the calculations. I was somewhat surprised by the outcome.
I think with MPI Scatter and Gather the controlling node waits for ALL the computers to finishes their calculations, so slower computers in the cluster will slow everything down. You can write the program differently so it doesn't have that problem.
@@YouArentValid Yeah, this struck me as one of those "I need content for my channel" videos. I'll bet he took it from a howto found on the Interwebs too? Plus, it's been done to death already.
@@twistednickster2653 not sure if ur joking or not but pong can be played on a web browser if your lazy, or if you have some time, you could get a retro console emulator and set up a pong ROM on it. Or you could program one with python.
You should try this with the rockpro64 and connect each board together through the pcie to get really low overhead. Might even get to write your own kernel and mpi layer!
@@NoorquackerInd Well if you're going to use a comm card, it would be a lot easier to put a ten Gb network card in the pcie slot. I was just thinking to keep costs down, you could try to run the messages over just the pcie lanes.
Hi sir, is it important to use same models of pi to make a cluster, or we can use different models to make one nice powerful computer. i have two pi3 and one pi4.. thanks.
@@GaryExplains thanks, but the situation i got is this. when i try to install pyqt5 through pip it starts to compile it with gcc. So i wonder if gcc would work or i should change the compiler inside that pip script. Edit: i know that python3-pyqt5 packages available through apt but they are outdated.
It is very easy to find either using UA-cam's search box or in my programming play list. Anyway, here is the link ua-cam.com/video/Tn0u-IIBmtc/v-deo.html
Interesting video. However, the Cortex A53 used in the Raspberry Pi is not the fastest. It would have been interested to know how this compares in terms of performance compared to a regular computer, e.g. how many RPi boards do you need to get similar performance.
As a follow-up, I ran it on an AMD FX8320 desktop PC (with Linux) using 4 cores and it took 3.3 seconds! Interestingly, when I ran it on all 8 cores it took twice as long (6.6 seconds). I am guessing that this i because the FX8320 isn't a true octa-core, but rather a quad core with this pseudo hyper-threading. The max clock speed of the FX8320 is 3.5GHz.
@@GaryExplains Interesting test, thanks! So with a PC you get the test done in 3.3 seconds. With one RPi board (4 cores) it takes 33 seconds, and with 4 boards (16 cores) it takes 16 seconds. The parallelization here doesn't scale that well, so some more tweaking may be required, otherwise it's likely to require a LOT of RPi to get down to 3.3 seconds.
That would be pointless, because unless your doing instense audio processing, there would be no point in having a dedicated audio pi, and just one pi is not enough to do much video editing on, and at that point what would your third be used for if not everything else?
PS... if dare to suggest a 'petit' correction to the video... in the demo show the htop from the 4 Pi(s) and dont forget to select the process sort of each htop to CPU load...
I wonder how many Pi 4s it would take to produce the same number of flops you get from the minimum baseline Cray super computer set up. Does that setup you have produce the amount of calculations that can be performed by a current gaming laptop? It might be interesting to know how many Pis it would take to produce the same computational power as a gaming laptop and what the difference in price point would be. Thanks for the interesting demonstration.
Could you run this kind of cluster with multiple different OSs and machine types? For example, I have a few Pis and also some old office desktops. Could I simply run them together?
Isn't this similar to how computer systems in server rooms and on-premise datacenters inside companies have been operating all along? How is it any different from say Yahoo provisioning mailing servers etc? Can any datacenter with lots of servers be called a "supercomputer"?
Cluster configuration is a complex topic, one that I don't go into in this video. There are dozens of ways to configure clusters. The short answer to your question is "no, this isn't how computer systems in server rooms have been operating all along." The key point in this video is the use of the MPI library to run the prime number app, not the topology or the provisioning of the cluster itself.
thanks for the info.need to study more about the clustering pi .For example begin to cluster 2,then 4, then 8, ....until recently Oracle already built the 1060 Pi s
🖐🏻 question: In the super computer setup - the other prime numbers are missing when in a cluster setup. Wondering where are those results? I am thinking why the other nodes didn’t return the values?
Making a Pi cluster is cool, but common. Even with desktop clusters. The problem is - once a home user makes one.... then what? What can actually be done with it, that doesn't require writing code (other than playing with light blinking sequences). (Pi calculation, prime number generation, and benchmarks are basically the same as making the lights dance in sequence. Just short term novelties, not actually a use.)
I have three HP DL380 G6 machines, 32GB each, each connected by four Gb Ethernet to a DLink switch. They run OpenFOAM cfd software. Each machine puts out 300W of heat when running flat out. So a simulation costs the same as a 1kwh electric fire and I have to make sure I have decent air flow in the house, else it becomes unbearably hot !
@@usquanigo I run air flow studies to see what effects changing the internals of electronics units has on where the air flows and whether the flow is faster / slower. For example, a friend suggested, out of curiosity, moving a series of ventilation holes which separate two electronics compartments, to see what effect that would have. Turns out this was a much better solution than that which went into production. Now his idea is causing a revision 2 of the final metal work.
My idea/wish is to make my own low budget 4K editing mini PC or "laptop" using a portable monitor (Running Windows 10 - 64 bit, using Premiere Pro or Davinci Resolve). Is it possible to make it with a Raspberry Pi 4 cluster (8GB RAM)? How about the graphics? Can I connect a graphing card to it? Can I combine a Raspberry Pi 4 with NVIDIA Jeston? Is what I want possible??? If you give me some help I will really try it! I am not an expert but I had some experience in the past (programming Atmel microcontrollers and FPGA's, making PCB's and building Pc's since the early 90's...)
On my old Amiga I did a lot of fractal scenery animation and it took ages, this would be a really good demo of connecting more computers thru a slow bandwidth link. And raytracing also, Lightwave was really great as renderfarms go, one master/server (with GUI) then just send the resources to a bare minimum program that actually calculates the different images and sends it to the master/slave and it is pretty good on resource management as it just hands the nodes the images that is not done (hard to explain) but it really meant that you could connect pretty much anything... a slow computer, a fast computer and so on... it used everything at 100% all the time. Do Povray exist on the Pi? If it does that might be a good start for a great demo of connecting PI's :-)
i would love to see people make raspberry pi supported or just games for the os and then see people make a pi super computer to get like 500 fps or even a video editor supported for the os
That is super cool! (Pun not intended.) It's one of those things that I'd want to do just out of the fun of making it, regardless of how actually useful it is. Of course it is useful for many things.
Cool video! One question, though: can the Raspberry Pis be connected via GPIOs and make them behave like the cluster in the video? Will it be more effective that way?
HI thanks! Beautiful video. I tried your program on 2 RasPi 3's. But it still says running on 2 cores instead of 8 When I use a single RasPi it runs on one core instead of 4. Any idea how to fix this?
Gary: First of all 👍 on the video ! Why not explain how the supercomputer manage the task at hand and explain how it can automatically spread the work across all nodes to use the computing power more efficiently without having to do it manually as you did with the input file with the different node addresses ???
Obviously there are different management layers that can be built on top of clusters. What particular software were you thinking about when you wrote, "it can automatically spread the work across all nodes to use the computing power more efficiently without having to do it manually"?
@@GaryExplains Gary, I have worked with Oracle EXADATA, and to make handling jobs more efficiently it has to have management layers built in the software and hardware, and it has to be configured based upon the particular installation. I was just watching your video and wondering why a manual input versus a job manager that divides the job amongst the nodes automatically ????
@@cliffallen2663 Also remember oracle exadata is running a database, a database needs a lot more guarantees to make sure transactions remain valid and no data is corrupted or lost. Even something as simple as making sure an auto incremental number for an insert doesn't overlap between nodes if you had a similar system you want to prevent so it is something that needs to be coordinated or centralized. Coordinated or centralized add a lot of communication overhead and thus lag for any thing you are doing, so that is why you'd want to have specialized hardware to make sure communication goes fast, efficient and without spikes in latency. An MPI cluster usually just does calculations I think, if data is lost, just re-compute it or even just let multiple nodes run the same calculation for redundancy (don't know if MPI supports that, it's an example of what can be done).
I am running Motioneye OS on a Raspberry Pi4 4Gb, this connects to 4 x Wyze Cam V2. The problem i have is the videos are not smooth and the frame rate is low. If i cluster some Raspberry Pi4 would this be possible and make performance better, also how could i do this, Thanks
I would use Udp to send packets to one of the hosts, then that host sends to the next host, etc etc and then the final host with all that information processes it and sends it to the main computer
Processing wise yes because you would end up with around 3.6 ghz of processing power for 130 dollars wich is a good deal Idk whether it would be effective though because of the low power gpu the pi has
It would be safe to say lower, I should think. As it would likely be for any active cooling solution required. The real question though: does it compare on the speed/horsepower scale. If so, it’s the clear winner from the cost perspective, especially now the RasPi4 has dropped. I’d expect to see people using these as coprocessors to their main router in short order.
For those asking about the rack, I don't remember exactly where I bought it. But here are a few from Amazon that you might like: geni.us/3AAUtfx and geni.us/DbWsT and for the UK this one: geni.us/rGjT6
Does this method your demonstrating using the MPI library allow these RPi's to combine system resources ? I know you said that it sees 4 Pi's = 16 Cores , does it also pool the RAM ? Assuming the answer to both those questions are yes , can you still enable ZRAM within the cluster ? When compiling programs on the Pi it tends to hit swap after a point , ZRAM allows the Pi to swap to the RAM instead of the MicroSD (im sure you familiar). I would want to use a setup like this for compiling , does this cluster configuration help me in that regard ? Thanks Gary .
I suggest another rack with fan kit:
www.amazon.com/dp/B07MW24S61
and for the UK:
www.amazon.co.uk/dp/B07J9VMNBL
Hello Gary, I would like to thank you for making this video and maintaining such an informative channel. you are very easy to listen to and your explanations are to the point. Keep up the good work. Today was the first day i saw your channel link and decided to give it a go.
the raspberry pi is a great pltform. However the raspberry PI Zero is a very compact development board as well. The Pi Zero does not have the compute power of the big pi. however the integration of the wifi on the board makes it one of my favorites.
I built a super computer using the PI Zero as the main board. I loaded each PI with aversion of Windows 2008 data Center Server. I used RUFUS to flash 64 GB microSD cards with the OS. Once i had the OS working I cloned it using tis program...
clonezilla.org/clonezilla-SE/
I have a multi-slot microSD card reader. It holds 24 microSD cards and connects via usb 3.1. I installed teh Windows 2008 Data Center Server on the microSD cards in about 15 minutes. Windows Server 2008 Data Center can cluster up to 32 machines processors at once. The Windows platform is very stable yet it is a little large. About 3 GB on each microSD.
I used the data center server services to aggregate all the Raspberry PI processors and resources. i used the Wifi on each chip to bridge them all together. I only spaced them about 1/2 inch apart. The network and teh data processing is so fast that it returns almost instantly. I mainly use the cluster just to browse the web and play games. It is absolutely over kill on any kind of gaming or graphics program.
I set the paging file on all the drives to 512 initial max 4096. I also connected an external 3.0 6 TB hard disk to the USB mini on one of the PI. Number 32 in the cluster. Then all teh data i download goes thru the other 31 and passes down stream directly to the external like a funnel. I can download a 4K movie in about 55 seconds with my AT&T fiber.
I thought you might be interested in looking at this flavor of PI super computer. Of course i have Python, PHP 7.2 and AMPPS installed on the cluster. Automatic load balancing and wifi VLAN tagging.
The windows 2008 server data center can open up a whole venue of new and powerful applications you may be interested in
@@kevindeng1889 hi
@@eg3730 hi
A long time ago, a man named Bill Gates had a vision: "A computer on every desktop"
Now, thanks to Raspberry Pi, a new vision has emerged: "A supercomputer on every desktop"
But if every desktop computer is a supercomputer…
@@SS-ARYAN then we can only dream bigger, my guy 😎
considering the physical limitations of transistors, the only way to turn a single device into a supercomputer is through the cloud.
I looked a few of these Raspberry Pi clusters and for less than $800 I bought a used quad processor 32 core Xeon Dell R820 with 96 GB of memory..... and it just works. Sure when it's running it consumes more power but it's a unified memory across the 4 processors which makes HPC easier.
What people forget about the old microwulf clusters is they use 2 gigabit connections to per board to share data. The Pi3 has 1 ethernet connection capped at 300mbps, which made clusters actually slower than a single Pi. Now that the Pi4 is here with true gigabit and USB 3 support to add a second one, a Pi cluster might actually be a viable project.
So this video is misleading? 🤔
@@hammercanttouchthis it was a nice simple example.
It depends on the problem you're trying to solve. If the problem you're trying to solve requires minimal network bandwidth(small inputs and outputs) but requires a large amount of CPU processing time... Then the older pi's will work just fine.
@@hammercanttouchthis It was obviously a demo of a theory of clustering put to practise, not a video about optimisating data bandwidth and latency.
What I find interesting is that that program you ran is the equivalent of what was run on the EDSAC computer in the 1950s when it was doing nothing else and you are generating more prime numbers in 30 seconds than it could in just under 10 years.
WOW!
Yeah, even the chips you find in those musical birthday cards have more computing power than all the Allied Forces put together did in WW2. It's crazy.
@@AbhinavSubramanian but we went to the moon on that power?
@@dashboy007 even your current computer/laptop/phone no matter the model is still many times more powerful than the greatest computers of the ones used for the first few moon landings
@@bnbnism I was trying to be sarcastic. There is no way my phone today could power anything else but itself, let alone a rocket ship.
Thank you Gary!! I learn more watching one video than spending hours on so called Tech Channels.
I agree... He is very good at the explanation and easy to listen to.
Watch tech quickie on yt
@@falcondarkshadow agreed linus and the gang really do good job there
@@philh98 definitely
This is by far the best example i have seen on this topic, excellent vid
Thank you Gary. There are a lot of videos on how to build a cluster with multiple rapberrys, but this is the first time I actually see it running as a cluster. All other videos stopped after the build, or ran them as individual computers.
Reminds me of concurrent & parallel computing class back in college, specifically the grid computing chapter. The classic example we used back then was matrix multiplication, while for the project we choose to parallelize inefficient sequential sorting algorithm with final goal to beat quicksort up to certain data size (because eventually quicksort still wins, it's just a much more efficient algorithm after all).
Wow! 12 minutes super computing lecture gives you more than a 4 year bachelor degree
Guessing you have that 4 year bachelor's degree and you're referring to it aren't you?
No. Just no.
I know you're trying to compliment the video (maybe inflate the audience's ego?) but I think your university ripped you off...
the university teaches you in more detail and less effort. this video is less in detail (in a nutshell)
@@infinity5288 Less in detail with more effort put into explaining the detail. I would say that teaches more, because the less effort put in, the less that you get your point understood, therefore you teach LESS because it is not taught, just stated.
The example you used with primes is one of concurrency rather than parallelism it seems. This is a very good primer on the basics of high performance computing though. Good video.
I do enjoy your videos, even though I already know pretty much everything you discuss usually. And I have recommended it to people who do need to learn about a topic. Great format, production quality and content. Thanks for making this.
*GARY!*
*Good Evening Professor!*
*Good Evening Fellow Classmates!*
MARK!!
Very informative, i've always considered making a mini supercomputer (raspberry pi 3 +b)
perfect explanation 2:30 - 3:00 planning to have one built soon
aside from having rpi4 , i was also thinking about orange pi to have as another alternative
then mix them if possible just need more research on this
thx sir Gary
Yeah, i did it months ago with 4 raspberry pi 3 and MPI4py and it work very well!
I also used psh (parallel secure shell), very useful tool.
excellent explanation. Thank you! Back in the early days of the IBM PC, I wrote a game with virtual robots that did combat in a virtual arena, and each "Warbot" ran its own program, which was an interpreted language I wrote just for that game. The language was called R-Code. In this case, The R-code interpreter was running 5 programs at once, and each program had it's own simultaneous i/o. That was pretty cool in the old DOS days before windows and multitasking.
What version of DOS did it run on? And did you mean it ran on the IBM PC or XT? :)
@@hammercanttouchthis By that time, 1991? I was running it mostly at work on IBM XTs running MSDOS 3.3 or later, as i recall because we had 3.5 inch floppies. The entire programming environment and game fit on a single diskette. It was only 10,000 lines of QuickBASIC code. I never released it to the public, but I had one other friend who was interested in programming who liked writing R-code for the warbots.
Reminds me of Robot Wars. My friend and I had a blast programming our robots to pummel each other in the ring.
Forgetting the cost of power for a moment I am curious how the performance compares to a PowerPC based cluster which was probably the first "out of the box" consumer level hardware solution available that could be configured as a supercomputer cluster.
Yeah, I agree that would be interesting. In fact building different clusters from various bits of historical and new hardware and then benchmarking them would be quite interesting, but alas very time consuming!
@@GaryExplains my master thesis where done in the early 90:es, and it was about creating dynamic computing clusters using heterogeneous computers (various hardware architectures at the time). Given the overall limitations we identified types of problems which could scale using the available technique. Great fun and on the cutting edge for it's time. The main benefit was that the computer clients connected where largely unaware that they committed computing cycles, the jobs was running int the background. The base was done in pvm, in many aspects the successor to mpi.
Man I love this channel, always something interesting to learn
Really interesting! I’ve always wondered how that worked.
This video is awesome. A brilliant explanation to some more complex computing systems.
Thanks Gary. I also tried Apache Spark on Jetson Nano and it works. So I expect Apache Spark can work with Raspberry Pi too. The concept is the same.
Could you use a lower latency/higher throughput interconnect like direct PCIe connection to increase the performance?
For a few computers it could be possible (certainly not on large scale supercomputers since PCIe 3 has max. cable length of 8 inches).
Closet we can get now a day in my Datacenter that i take cared is FC (Fiber Channel).
Every single server in DC and inside cluster are connect together in network via FC for lowest latency and maximum transfer rate.
We use infiniband in an HPC setting. Its connected through the PCIE bus.
Hey Gary could you please upload a step by step video to achieve node cluster????
You should make a video about quantum computers. A lot of youtubers have failed to present that topic in a neat way...
About checking if a number is prime: square it, subtract one and divide by 24. If that is a whole number (no digits after the decimal point) it MIGHT be a prime number, otherwise don't waste time checking further.
Yes, there are plenty of different ways to check for primes are likely primes. But that isn't the main point of the video.
Nice work, Gary! And if you want to remove the overhead and speed up calculations greatly, you can switch from Python to Golang, for example, and have microservices do the work
Thanks I learned something new today!
I would much recommend using the raspberry pi compute module 3 instead of the regular boards. It has the same specs as the regular pi but are much more suited for this sort of thing, they're smaller so you can pack them more densely and need less wires, and can be ordered in bulk and cheaper. Unfortunately though you'd need to make your own custom board to connect them all together.
Just to be clear, you would recommend the modules to people who have the skills, knowledge and money to make their "own custom board to connect them all together."
@@GaryExplains that's correct. The assumption is that someone willing to build a cluster of raspberry pis will also be willing to make a custom board for his custom application
@@9a3eedi Well that is a bit of a bad assumption. I just built a cluster and I am NOT willing/able to make custom boards.
@@GaryExplains hmm... Makes me wonder if there's something available off the shelf..
@@GaryExplains just googled a bit and found something sold for 259 dollars that supports 5 cm3s, connected with switched GbE. Very cool :D
Relatively new Gen X'r eatinging up your content like a kid on Halloween.
Thank you. 👊
I haven't finished watching this one yet. But I've got an ambitious build that could use this idea with what i hope is all Pi-5s, with the NVMe hat's or bases? My current Rpi5 in looking for setting it up through your content and keep getting distracted.
Like just before i hit this i came your three week old post about Ai clipboard........
Since I'm simple. Before i break my brain.
I want an offline LLM Ai that can, manipulate the training of working alongside another more critical* thinking Ai dev board for speech and facial recognition, while also making the Ai more personalised.
Thank you for the link to github on that last one, simplified finding you.
Reluctant to say anything else other than please don't Stop!
Your extremely informative in a way this old man understands at least Ahaa.
Great work Brother! Great work.
May God Bless your Journey
Overclock them and add heatsink and fans to the cluster for extra power.
Wonderful way to explain Gary, thanks a ton.
Wonderful explanation!! Great channel.
Glad you think so!
awesome, never knew anything about super computing, and now i know, thanks.
Dragonfly BSD is an OS designed specifically to handle a cluster like this.
Loren Sims does each node have to match the rest, or can any machine get added to the array?
MPI - messaging service used in supercomputer cluster
Scatter n gather
(lesser data, high latency)
(more data, less latency as compared to single core)
Public n private key- security
This just shows the purpose of Raspberry Pi, a learning tool. I never thought they would bring it to Server/Clusters. It's a great teaching tool from basic programming to now supercomputers. Raspberry Pi may not be a world record PC nor a Supercomputer with Tera flops in processing power. But it has proven to be a super teaching device that's caught a lot of interest world wide for those who want to jump in and learn.
And a great gaming emulator! d^_^b
Question! Can a raspberry pie supercomputer be used for blender software for faster renderings! And can the raspberry pies be configured with gpus.
That is a neat idea of making a Super Computer from a group of small computers like the ones you mentioned. I wonder if IBM has thought of that since they are into the Super Computer business.
That was done with the PS3.
Yup, that was a project that the US Military did because the PS3s had a lot of cores in a relatively compact form. It made being able to source machines from around the world really eaay.
Gary, I was using two vastly different computers. Computer 1 System 76 Meerkat with an i5 7260u processor. That computer alone could do the prime number program in about 4 seconds. 2nd computer, celeron based ChromeBox but in developer mode and also running Linux. Both systems using PopOS! 19.04 from system 76. When I add the celeron computer into the mix it takes MUCH longer. Like 1 min 30. I thought that even though the second computer was much less powerful, it could at least add some help to the calculations.
I was somewhat surprised by the outcome.
I think with MPI Scatter and Gather the controlling node waits for ALL the computers to finishes their calculations, so slower computers in the cluster will slow everything down. You can write the program differently so it doesn't have that problem.
@@GaryExplains Thanks! I'll look into that!
Why did you not use the Lite version of Raspbian? You lost no end of MIPS to running the GUI.
And would disabling the Desktop interface from raspi-config increase the performance?
@@0lAlex0 heck yes. You'd have gained about a 20% performance increase. The GUI is a beast.
But the demo video would suck
Because he doesn't actually want to calculate prime numbers as efficiently as possible, he just wants to make a video about cluster nodes lol.
@@YouArentValid Yeah, this struck me as one of those "I need content for my channel" videos. I'll bet he took it from a howto found on the Interwebs too? Plus, it's been done to death already.
Super cool video - would be great to see more examples!
Hi, I have created a cluster programm, that runs before I even press
This was such an interesting video. One of the best I've seen in a long while!!
Wow, thank you!
Looks nice but would I be able to play Pong on it?
Nah we won't prob get that for another 5-7 years :(
@@twistednickster2653 not sure if ur joking or not but pong can be played on a web browser if your lazy, or if you have some time, you could get a retro console emulator and set up a pong ROM on it. Or you could program one with python.
@@mjs2016 i am joking lol dont worry i wont woosh you
You should try this with the rockpro64 and connect each board together through the pcie to get really low overhead. Might even get to write your own kernel and mpi layer!
PCIe doesn't act super friendly all the time. It's probably better using InfiniBand cards
@@NoorquackerInd Well if you're going to use a comm card, it would be a lot easier to put a ten Gb network card in the pcie slot. I was just thinking to keep costs down, you could try to run the messages over just the pcie lanes.
Can cluster with nas can be build,?
Whats is this used for
Hi sir, is it important to use same models of pi to make a cluster, or we can use different models to make one nice powerful computer. i have two pi3 and one pi4.. thanks.
You can use different models but beware that if you sent the same load to each node then the slower nodes will take longer to complete.
Could this be used to compile c code? So that it will take less time
Yes. Look into a tool like distcc.
@@GaryExplains thanks, but the situation i got is this. when i try to install pyqt5 through pip it starts to compile it with gcc. So i wonder if gcc would work or i should change the compiler inside that pip script.
Edit: i know that python3-pyqt5 packages available through apt but they are outdated.
Thanx a lot Gary. merry christmas.
Nice. Very clearly explained and demo'd
Could I make a supercomputer by connecting old laptops/desktops/wiis/ds/dsi/cell phones/smartphones together?
Linux has a kernel module that handles the nodes so you dont have to make complicated programs
Which module are you referring to?
I'm also interested
For what I knew it is in arch Linux, maybe
Where is the link to the multi programming video?
It is very easy to find either using UA-cam's search box or in my programming play list. Anyway, here is the link ua-cam.com/video/Tn0u-IIBmtc/v-deo.html
Interesting video. However, the Cortex A53 used in the Raspberry Pi is not the fastest. It would have been interested to know how this compares in terms of performance compared to a regular computer, e.g. how many RPi boards do you need to get similar performance.
This video might have some of the answers you are looking for: ua-cam.com/video/KLz8gC235i8/v-deo.html
As a follow-up, I ran it on an AMD FX8320 desktop PC (with Linux) using 4 cores and it took 3.3 seconds! Interestingly, when I ran it on all 8 cores it took twice as long (6.6 seconds). I am guessing that this i because the FX8320 isn't a true octa-core, but rather a quad core with this pseudo hyper-threading. The max clock speed of the FX8320 is 3.5GHz.
@@GaryExplains Interesting test, thanks! So with a PC you get the test done in 3.3 seconds. With one RPi board (4 cores) it takes 33 seconds, and with 4 boards (16 cores) it takes 16 seconds. The parallelization here doesn't scale that well, so some more tweaking may be required, otherwise it's likely to require a LOT of RPi to get down to 3.3 seconds.
@@GaryExplains I already watched this video a while ago. But I'm not aware of any Raspberry Pi using a Cortex A72 ;-)
The RockPro64 uses the Cortex-A72 and I hope to finish my review next week.
I just desire to have one pi for audio, one for video, and one to run them, to turn three pi's into a great desktop computer.
That would be pointless, because unless your doing instense audio processing, there would be no point in having a dedicated audio pi, and just one pi is not enough to do much video editing on, and at that point what would your third be used for if not everything else?
wow great video dude!
Gary, this video was informative :) Thank you
PS... if dare to suggest a 'petit' correction to the video... in the demo show the htop from the 4 Pi(s)
and dont forget to select the process sort of each htop to CPU load...
Can it run crysis now?
I wonder how many Pi 4s it would take to produce the same number of flops you get from the minimum baseline Cray super computer set up. Does that setup you have produce the amount of calculations that can be performed by a current gaming laptop? It might be interesting to know how many Pis it would take to produce the same computational power as a gaming laptop and what the difference in price point would be. Thanks for the interesting demonstration.
Could you run this kind of cluster with multiple different OSs and machine types? For example, I have a few Pis and also some old office desktops. Could I simply run them together?
Isn't this similar to how computer systems in server rooms and on-premise datacenters inside companies have been operating all along? How is it any different from say Yahoo provisioning mailing servers etc? Can any datacenter with lots of servers be called a "supercomputer"?
Cluster configuration is a complex topic, one that I don't go into in this video. There are dozens of ways to configure clusters. The short answer to your question is "no, this isn't how computer systems in server rooms have been operating all along." The key point in this video is the use of the MPI library to run the prime number app, not the topology or the provisioning of the cluster itself.
@@GaryExplains thanks for the clarification Gary......(so cool that you replied :))
Wow.. I'm an idiot and you explained this so well that i think i could try to do this.
is there any up to date links on how to do this?
What types of links? I think the information here is still current and correct.
But... can it run crysis?
Only in 480p
@@simonralph7720 medium settings?
Maybe, but then you will have a burnt pi.
Nothing can run crysis!
i HAVE A $3000 DESKTOP AND IT STILL WONT RUN SMOOTH ON HIGHEST SETTINGS. I'm beginning to believe nothing will run crysis on highest settings. lol
Great explanation!
Nicely done. Thanks!
thanks for the info.need to study more about the clustering pi .For example begin to cluster 2,then 4, then 8, ....until recently Oracle already built the 1060 Pi s
What are some use cases for a super computer of this small of a caliber?
(I am a bit new to the tech scene so if this is a dumb question, that is why)
🖐🏻 question: In the super computer setup - the other prime numbers are missing when in a cluster setup. Wondering where are those results? I am thinking why the other nodes didn’t return the values?
They do return the values.
4:14
On serious note. Love the channel. Love the video. Very informative. Thank you!
Great job. Thanks for the video.
Cost of setup?
Electricity cost?
Can this mini super computer used for AI ML DL research?
Making a Pi cluster is cool, but common. Even with desktop clusters. The problem is - once a home user makes one.... then what? What can actually be done with it, that doesn't require writing code (other than playing with light blinking sequences).
(Pi calculation, prime number generation, and benchmarks are basically the same as making the lights dance in sequence. Just short term novelties, not actually a use.)
I have three HP DL380 G6 machines, 32GB each, each connected by four Gb Ethernet to a DLink switch. They run OpenFOAM cfd software. Each machine puts out 300W of heat when running flat out. So a simulation costs the same as a 1kwh electric fire and I have to make sure I have decent air flow in the house, else it becomes unbearably hot !
That sounds really cool (though you just said it was hot). When you say "simulation", what are you actually doing with it?
@@usquanigo I run air flow studies to see what effects changing the internals of electronics units has on where the air flows and whether the flow is faster / slower. For example, a friend suggested, out of curiosity, moving a series of ventilation holes which separate two electronics compartments, to see what effect that would have. Turns out this was a much better solution than that which went into production. Now his idea is causing a revision 2 of the final metal work.
@@BobBeatski71 That is seriously cool. But also sounds like work (business). Not home-gamer (as AvE would phrase it).
My idea/wish is to make my own low budget 4K editing mini PC or "laptop" using a portable monitor (Running Windows 10 - 64 bit, using Premiere Pro or Davinci Resolve). Is it possible to make it with a Raspberry Pi 4 cluster (8GB RAM)? How about the graphics? Can I connect a graphing card to it? Can I combine a Raspberry Pi 4 with NVIDIA Jeston? Is what I want possible??? If you give me some help I will really try it! I am not an expert but I had some experience in the past (programming Atmel microcontrollers and FPGA's, making PCB's and building Pc's since the early 90's...)
now the real question is, Can you play games on it? can it play the benchmark games?
On my old Amiga I did a lot of fractal scenery animation and it took ages, this would be a really good demo of connecting more computers thru a slow bandwidth link. And raytracing also, Lightwave was really great as renderfarms go, one master/server (with GUI) then just send the resources to a bare minimum program that actually calculates the different images and sends it to the master/slave and it is pretty good on resource management as it just hands the nodes the images that is not done (hard to explain) but it really meant that you could connect pretty much anything... a slow computer, a fast computer and so on... it used everything at 100% all the time.
Do Povray exist on the Pi? If it does that might be a good start for a great demo of connecting PI's :-)
If I wanted to replicate this today which rapsberrys would you recommend, raspberry3 or 4?
i would love to see people make raspberry pi supported or just games for the os and then see people make a pi super computer to get like 500 fps or even a video editor supported for the os
That is super cool! (Pun not intended.) It's one of those things that I'd want to do just out of the fun of making it, regardless of how actually useful it is. Of course it is useful for many things.
Can you do a video of one running folding@home?
enjoyed this video well presented
Cool video! One question, though: can the Raspberry Pis be connected via GPIOs and make them behave like the cluster in the video? Will it be more effective that way?
HI thanks! Beautiful video.
I tried your program on 2 RasPi 3's.
But it still says running on 2 cores instead of 8
When I use a single RasPi it runs on one core instead of 4. Any idea how to fix this?
Gary:
First of all 👍 on the video !
Why not explain how the supercomputer manage the task at hand and explain how it can automatically spread the work across all nodes to use the computing power more efficiently without having to do it manually as you did with the input file with the different node addresses ???
Obviously there are different management layers that can be built on top of clusters. What particular software were you thinking about when you wrote, "it can automatically spread the work across all nodes to use the computing power more efficiently without having to do it manually"?
@@GaryExplains
Gary, I have worked with Oracle EXADATA, and to make handling jobs more efficiently it has to have management layers built in the software and hardware, and it has to be configured based upon the particular installation. I was just watching your video and wondering why a manual input versus a job manager that divides the job amongst the nodes automatically ????
Basically, for a 10 minute video a manual approach is the simplest and the easiest to explain.
@@GaryExplains ..... I understand, not to be a troublesome nerd, but, just thought it would be interesting 👍🤔
@@cliffallen2663 Also remember oracle exadata is running a database, a database needs a lot more guarantees to make sure transactions remain valid and no data is corrupted or lost. Even something as simple as making sure an auto incremental number for an insert doesn't overlap between nodes if you had a similar system you want to prevent so it is something that needs to be coordinated or centralized. Coordinated or centralized add a lot of communication overhead and thus lag for any thing you are doing, so that is why you'd want to have specialized hardware to make sure communication goes fast, efficient and without spikes in latency.
An MPI cluster usually just does calculations I think, if data is lost, just re-compute it or even just let multiple nodes run the same calculation for redundancy (don't know if MPI supports that, it's an example of what can be done).
I am running Motioneye OS on a Raspberry Pi4 4Gb, this connects to 4 x Wyze Cam V2. The problem i have is the videos are not smooth and the frame rate is low. If i cluster some Raspberry Pi4 would this be possible and make performance better, also how could i do this, Thanks
Thanks for the nice explaination 😁
amazing explanations. Great work, thanks
I would use Udp to send packets to one of the hosts, then that host sends to the next host, etc etc and then the final host with all that information processes it and sends it to the main computer
Just curious how would you use this for video editing, is it possible? Would it be worth the time, money and effort?
Processing wise yes because you would end up with around 3.6 ghz of processing power for 130 dollars wich is a good deal
Idk whether it would be effective though because of the low power gpu the pi has
There are new SBC micro computers which have a RYZEN 8 core processor called the UDOO BOLT V8 get eight and you could have 64 cores.
Thank you Gary.
This was fantastic!
Imagine if you had hundreds of Raspberry Pi's and then used this method to create the ultimate super Raspberry Pi
Are they connected over ethernet?
This is one of the best tech channel in the UA-cam.
Where's the link you promised to the multithreading video?
Sorry I forgot to include it in the description: ua-cam.com/video/Tn0u-IIBmtc/v-deo.html
What is the power consumption comparison for something like this vs Intel chip based multicore multiprocessor?
Depends very much on which Intel chip.
It would be safe to say lower, I should think. As it would likely be for any active cooling solution required. The real question though: does it compare on the speed/horsepower scale. If so, it’s the clear winner from the cost perspective, especially now the RasPi4 has dropped.
I’d expect to see people using these as coprocessors to their main router in short order.