@@plica06 Why would you want a 2TB Thumb drive/Flash drive instead of an external HDD? They wear more quickly due to storage technology. Heck even utilities like #Ventoy now support HDDs which are more reliable and you can carry more OSes...
@@adrianteri Man, it is a joke, a statement and also a confession by plica06. A joke like Jeff's project. Which doesn't matter, because it is not about if it is useful or practical, but: Showing that it CAN BE DONE! BTW Great job Jeff and respect to plica06!:) P.S.: I get it that you are joking, too: "They wear more quickly ..." NO! They don't wear, because they aren't real! Hehehehe
@@adrianteri On the other side, when I take your comment serious ... I can't take you serious:) A portable media, 2022 in the 2TB range ..... and you say spinning media? That must be some back-to-the-future joke I do not understand. Anyway, misunderstanding plica06's grotesque comparison and taking it as a spin(a straw-man!) to compare USB-Sticks[2] against enterprise level hard disks (as in the video, or consumer ones doesn't matter) is like comparing apples to hard drives. Enterprise level HDDs have to be compared to enterprise level SSDs. Latter one beat the first in every point, including price[1], reliability, data safety and energy efficiency in that sector. [1] The purchase price is simply irrelevant compared to the maintenance costs, since the former is already planned in the product and can simply be written off after the service life of the medium. [2] actually 2TB(etc.) NVME "USB-Sticks"(miniature-Adapters) exist. Try to beat their MTBF (vs. spinning media that IS transported around) ... ;)
I had this colleague who told this story: I had boss who did not understand anything about computer or electronics. When ever I was trouble shooting something he would come and watch over my shoulder and comment "Have you checked the power?" This was very annoying as he did not understand anything. What made it doubly annoying that his advice was spot on so many times. Check the power!!
I absolutely need it to store my GPT-3 training data. Turns out having 16 GPUs wasn't enough, but I run out of space even before running out of compute time. Its probably more on the scale of 1PB + 8192 GPUs.
do i really need 128 gb of storage on my mac? "your mac is almost out of storage" wait- whY IS MY SYSTEMS FOLDER 70GB whY does microsoft word take up 2gb?
Jeff, I'm catching up with thx, since I've admired your ability for a while. I am an old geek and you do magic. Reminds me of my S-100 days with CromixOS.
How about one controller to one pi, then stripe 4 pis together, that should increase the throughput and scale down the issue to a more managable chunk, and still a pi project.
This is probably the most reasonable way to do it, and use Ceph (or another network-based filesystem)... and indeed I will be testing that out soon. Still bottlenecked but probably more reliable and would not run into as many errors on the PCIe bus!
I think you will need more than 4 Raspberry Pis unless the pool is for backup or archiving purposes only. Even a low-to-mid end Intel or AMD system would make more sense.. though then the project will lose its “Piliness”. 😂
@@paulz1780 We can then expose each drive as an iSCSI target and aggregate them at a server over ethernet. And viola we have a not-so-poor man's SAN haha. It reminds me of those shiny new network addressable NVMoF SSDs btw.
"I was going to unbox this on camera but Fedex already did" As someone who's had multiple expensive items destroyed by Fedex and actively avoids doing business with companies that ship with them, I felt that...
The reason for FedEx packages being more damaged than USPS packages actually isn't FedEx's fault. They accept heavier packages than USPS, so that means if a shipper doesn't pack their package correctly, when that 150lb box of farm tools slams into your 6lb box of plastic in the sorting machine, your package gets crunched. Meanwhile USPS only goes up to 50(?) lbs, so your box just doesn't get slammed as hard. TL:DR your packages get damaged in FedEx because the shipper didn't pack them right. Source: I ship around 5 packages a day with all 3 major US carriers and rarely have an issue since I actually package my shit
@@philiam0420 That doesn't mean Fedex isn't partially at fault. Especially because on the rare occasion that I do get a damaged shipment from UPS, they leave a note with a number to call offering to pay for a replacement if the item is broken. Fedex just delivers my $300 PC case looking like a truck backed over it and acts like it's no problem. And that's without even mentioning the countless packages that never showed up at all, or that got delivered to the wrong address and sent me on a trip across town trying to find it.
In 2021 I had 5 items fulfilled through FedEx. 1 never arrived, 2 arrived damaged, 1 arrived late, 1 was early and undamaged. "20% delivery reliability!" 😵💫
Well, pairing €20.000 - €24.000 worth of drives with a 100 dollar raspberry seemed like a solid plan to start with. Having it run a 1pb raid0 config while hosting that storage seems like a task made for this little arm processor 😂 Nice to see you try out such extreme things with the raspberry😀
1942: "We need to figure out a solution to digitally store dozens of bytes at a time. Vaccuum tubes, maybe? This is going to cost us millions, but it will be worth it to finally have accurate artillery range tables!" 2022: "I'm going to hook up this petabyte of data storage to this cheap single board computer!"
1969: "we need to figure out how to use our cutting edge 1.5 million doller, 4kb ram 32kb harddrive to bring people moon" 2025: lauching a rocker with a single board computer...
Wait until the average person doesn't click on a video below 16k 120fps. There will be always a way to fill it 😅 but I remember it's gotten better over time.
@@vaisakh_km Pi's and Arduino's are already used for avionics on model rockets, and cubesats can have total build costs of like $50,000. It's amazing how far the industry has come and is going
This showed up in my recommendations. I have no clue what happened here and I know nothing about data managment or IT in general. But I watched it to the end not realizing that this vid is 22 minutes long 😅good content 9/10
Fascinating! A lot of work! The issue it seems is having to tether to the hard-disk through a band shared with other drives, rather than universal. There is a technique where using data-points written to each hard-drive mathematical computations can rewrite and measure the data for inference machiningly- without even using the shared io-bus anymore! You’d see a slowdown in computation monologue but all speed ahead on as many drives as you want!
@@rvmiv_ First of all and most importantly, the speed, those were gen 4 nvme ssd drives with extreme speeds, second of all this is just a storage rig but what ltt built is a server, I mean have you seen how they run NASA simulation? And lastly there drives are extremely reliable
Glad to see Red Shirt Jeff back. I was wondering what happened to him. Geez, your projects are soooo extreme and cutting edge. Your troubleshooting processes are very informative and helpful to your viewers, who otherwise wouldn't have a clue where to look. We never see that on "How to" setup videos, where everything just automagically works. BTW, I recently used Styrofoam to separate 2 prototype boards while testing them out. It got hotter than expected and the foam sagged, allowing the power rail of one board to touch the other board, which fried as soon as I powered up the following morning. Don't try this at home, folks. Cardboard insulation good. Styrofoam bad. Thanks for sharing Jeff. I'm fully confident that you'll get your bandwidth up on the 60 HDD RAID. Looks to me like you're already nearly there. Instead of installing it inside a server rack, perhaps a locked cage would be a better idea to protect it from Red Shirt Jeff. 😎
@@JeffGeerling Good point. 👍 Now, my 2nd attempt is safely mounted inside a custom acrylic case, too. One less thing to worry about, amongst the multitude of other potential mistakes. Live & learn, eh? Just gotta keep putting one foot in front of the other until the final goal is achieved, right? The hardware for the Teensy Laser Synth waveform module w/ILDA DAC is complete. Only need to flush out the code to full functionality, before moving on to the custom MIDI controller. BR 😎
Hey Jeff, I’ve been using your ansible roles for almost 10 years. Love seeing you around HN and the other traps, great to see your UA-cam channel is doing so well. Very cool projects!
Jeff, you have carved yourself a niche channel in an overcrowded tech community, taking the Pi to new heights in every video. Keep it up. Loving every video. I'm left astounded on the Pi's capabilities and untapped potential.
The last thing I wanted to try (even had it in the final edit but cut it for time) was 4x hardware RAID cards... but I only have one on hand. I was thinking of setting up 4 hardware RAID 6 arrays, then uniting them on the Pi as a RAID 0 array and seeing if that performed better since individual drives would all go through one HW raid card, and that would also give redundancy. (And who said hardware RAID is dead? You still need it if your computer performs like one from 2010!).
@@JeffGeerling Yeah, exactly, who said hardware RAID is dead ? Clearly unrelated, did you ever thought of doing a collaboration with Wendell ? Him and Red Shirt Jeff would surely push themselves to insanity :D Seriously speaking you two seem to do similar exploratory courses, though, of course, he's much less Pi-centric.
When I first saw the title I figured this would be one of your massive multiprocessor/multi-Pi projects combined with a massive amount of storage. Something closer to 20 TB per Raspberry Pi. At 60 Drives and 60 Raspberry Pi's, that would still be way beyond any normal homebrew project.
Absolutely fascinating and completely, totally barmy! What an immense amount of work you obviously put in to this - research, sweet talking 45Drives, research, RAID solutions, research, talking to 45Drives techs, research, etc. I am *extremely* impressed, both by you AND by 45Drives for their courage! And my abiding thought? "Chassis" is pronounced "shassey", not "chassey"! Pfft! So I looked up chassis in Cambridge Dictionary to back my obviously accurate opinion and ... well, blow me, North Americans really do say "chassey"! Every day is a learning day! Pedantry isn't good
@@JeffGeerling I have to picture you in dungarees, a battered straw hat, and with a long stalk of grass hanging from you mouth when you call me neighbour! Hock-diggardy, or something like that
There's something about seeing you go from putting the last drive in the rack to immediately plugging in the dinky micro SD card that makes me giggle 😃
My NAS is a pi4 with a desoldered USB chip to expose the PCIe bus. It's connected to a PCIe switch(to improve transmission and prevent crashes) and then to a cheap ASMedia SATA controller. The kernel is also patched to force PCIe gen 1 to prevent crashes(this is probably the same thing that happened to you in the video btw). PCIe gen1 is still faster than gigabit so it doesn't matter. Over samba or NFS I can get the full gigabit speed even on large data transfers so no bottlenecks there. Would I recommend this setup? no. Does it work? hell yeah (longest uptime was like 90 days or something and then a power outage killed it, I need to get(or more likely make) a UPS, I know). If anyone wants photos lmk
@@JeffGeerling That was my first idea as well. Those switches are used to run gen1 or maybe max gen2, when used for GPU mining. Above that it will throw errors.
@@wayland7150 yep, pretty much. My old boss when I worked in computer repair used laptop hard drives to level out his microscope. Funny thing is the drives weren't even dead, they were just something like 160GB 5400RPM drives that were more useful for that task than storing data.
Many Pi’s running something like Minio might allow for some interesting single box hardware redundancy. Also might be able to get over the 1Gb limitation since you’d have many pi’s each with their own 1Gb connections.
Well Jeff, one thing's for sure, if you and others aren't pushing the pi to bleed on the edge, no progress will ever be made in this direction. I'm not sure what kind of useful stuff this direction will yield, but it surely will yield something. Keep on pushing ya madlad!
My hope is the next Pi at least has the PCIe bus bugs sorted so any card will 'just work'. After that, any more bandwidth they could squeeze out would be appreciated. The CM4 is actually great for many 1 Gbps network use cases-but with a little more bandwidth, it could be great for 2.5 Gbps (or heck, more than that if we're dreaming!).
I just completed a 24 X 18T build that's almost 1/2 PB. I bought the drives a few at a time all Segate recertified it seems to be the sweet spot in price for me. I had to upgrade and rebuild things several times. Your video was just like my experience switching OS, FS, etc. I had endless drives/arrays just dropping out, mostly on startup. I tried Fedora, Centos, Open Suse, Suse JeOs, and Ubuntu Server all let me down for one reason or another. I tried them with various shares, raid arrays and file systems; plus not all would run my app. I ultimately got it working with Ubuntu workstation and the Ubuntu share - no samba. I'm not happy with ZFS. I had to kill the swap and add more RAM (which meant a new motherboard) to keep ZFS cache from crashing. I solved the dropouts problem by putting the drives really close to the host board and using short/expensive data cables that are all the same length. I'm using a very old Athlon FX-8300 8 core and 64 gig of Ram. I found a great last generation Adaptec 52445 raid card new old stock. I had to install 2 power supplies and rewire one to all molex to get enough amperage on the 5 volt rail. New 1200 watt supplies have plenty of 12v but almost no 5v power. I also upgraded to 2.5 gbps network card. The write speed is NOT stellar. With Raid0 it goes real fast at first 450mbps filling all that cache, but slows down to about 150mbps. With a single JBOD I only get 130mbps, two drives at the same time still go 130 each, and I can transfer to 4 drives at the same time before it bogs down the network and write speed drops to 70 each drive or 280 total. I already got 4 sas expanders and plan to continue adding drives (and power supplies) up to 2.5 PB. My box is an old IBM 2401 tape drive converted to rack space. I yell at the You Tube screen, not my computers. That's not true I also yell at my computer at work (it's windows).
Peta-Pi go BRRRRR Im genuinely surprised this worked, very impressive. As for what to do with it, a video on downloading/hosting a local copy of Wikipedia would be pretty cool.
I tried to experimentally find out what command created the nice coloured output which you used to watch the drives boot up. I tried journalctl, tried to reach the last line. When I had held down Picture down (I think it's called like that in English) for several minutes and was at line 30.000-ish I gave up and redirected it to a file. I opened the folder in nemo and saw: 70.5MB! of journal messages! xed crashed while trying to open it.
Heh... well I used `dmesg --follow` which returns colored output by default in Debian/Ubuntu's default terminal. Then I used `atop` to monitor the drive activity.
@@JeffGeerling Thanks! I actually knew about dmesg a year ago, but then forgot about it... I was searching through my entire /bin/ folder (have stopped after a few tens of programs) for which command I had seen before to output system messages, thanks also for the re-introduction!
Jeff - I mourn you missed seizing the opportunity to officially name this The Pi-tabyte Project - a portmanteau teed-up for a long drive, but you duffed it, lol. - great vid, keep 'em coming, I'm having a grand time doing some of your projects. Cheers.
One thing I saw you doing when wiring up the NAS was that you connected the molex adapters to two sata power connectors on the same line. If the psu has another line, try connecting to that, as each line can draw a limited amount of current and there might be an issue there. Then again, the issue might be anywhere else but that's a thing you can easily try
Even before doing a raid or zfs test; I’d have run a single drive test (either benchmark or simple linear read/write). Loop that 60 times and see success. Then repeat with two drives in parallel and loop 30 times. Redo with three in parallel 20 times, etc etc. When you start seeing failures you really will see the cause-effect point. There is technically no reason why this won’t work with 60 drives - if you ignore performance. Any bugs that are exposed that can be fixed will simply improve the base users world. This is an awesome test that pushes the RPi and kernel to the limit. Making this work at that limit helps all of us just running one drive. Plus, the errors you saw, as bad as they were, should somehow restart the drive (without a reboot).
When those errors occurred, the HBA reset itself and the drives always came back-at least 15/16 of them! That was one concern from the Broadcom engineer I spoke with, and the reason he really wanted me to run the latest firmware. Unfortunately due to time constraints I couldn't flash all the cards in a separate PC then bring them back to the Pi and re-test. But I plan on trying that out.
Extra testing has been done-tl;dr the breaking point is 3 cards (or more than 30 direct attached drives). But forcing PCIe Gen 1 speed also fixes the issue. More to come in my next video!
For every anecdote (usually it's "all my Seagate drives exploded in giant fireballs!"), there's an opposite anecdote. In aggregate, if drives like these were truly failing at the rates some people think, Seagate would not be in business :)
@@JeffGeerling The reason all of this information was popularized in the first place is because of backblaze's reports (back in the day) but if we look at their stats now, year over year, seagate is constant.
EXOS and businness drives in general are fine, it's the consumer lines that are more "hit and miss", but even then it's easy to find a pattern unless you buy hundreds of them. I.e. a brand doesn't just "consistenly fail 4x more than another"
Ripps out high end setup for a PI! You Monster! Though a Fun Project to play with a pi, maybe in the future with a PI 14 hope you enjoy your new Perabyte Server
As an HDD engineer usually drives are built to compensate for vibration due to certain fan RPMs. Especially if we have a big customer we'll optimize things for the frequencies of vibration in their trays.
Is there a Raspberry PI which fits inside a standard size 3.5" drive enclosure? I don't plan to use a SATA connector, just want the form factor. I was thinking of stacking a 3.5" NAS vertically with a PI stacked on top, so it would need the same form factor as the drives.
"I grabbed a small piece of cardboard to insulate the boards from each other. At least temporarily." Yeah. Sure. I think red shirt Jeff was trying the break back into the room.
The problem you're facing is that USB 3 connection to the board. The way the Pi interprets all of that data is time sensitive. Basically, when the USB 3 connection to the board and then through the kernel saturates, it only gives the data in the buffer so much time to be read/written and when that time expires, it basically dumps the packets and then goes back to look for more data. The BIG problem here is the IRQ signals are ALSO going through that port to the PCIe Expansion board. This is why your drives will just disappear after awhile. I ran into the same problem setting up a Pi driver 70TB NAS at home and in the end, I had to basically setup a CRON job to monitor for things like dropped IRQ's and whatnot and if there were any hardware failures, it would force a reboot. Bottom line is that yes, you can connect allot of hardware through expansion boards and use USB 3 connections to bridge it all, but that is going to be VERY flakey at best.
Integrating a Pi compute module with Ethernet into each hard drive can make scaling easier. It lets each drive connect to a network independently, simplifying data handling in big storage setups.
OMG the original hardware is so beautiful it brought a tear to my eye to see it removed. Given the resources I would have 2 or 3, one for local redundancy and one for off site but given net speeds the off site would mostly be pointless. And no I don't need one, I just want one so bad it feels like I need it :)
Hi there! I work for Infomaniak, and I am managing storage networks. We offer backup services and connected drive (we call it kDrive, it's a bit like google drive, just we protect your data and they are hosted in Switzerland). Adding 96 Exos 20TB HDD in my swift storage cluster is what I do every day. I manage "moderately large" Swift clusters. On them, we add 6 2U servers at a time, each of them holding 16 HDD (so 6*16 = 96 HDDs). In total, I calculated that I am managing more than 5000 spinning drives in our OpenStack swift clusters, which amounts probably around 100PB (so not just 1 PB like you're doing...). Just this week, I added 12 HDD storage nodes, and 6 proxies (with 2x SSD each, plus the system drive), and 9x NVMe storage nodes (10 NVMe each, to be used in a Ceph cluster). Your idea of a Raspbery-Pi is fun, but instead of one RPi for all HDDs, I would setup one RPi *PER* HDD, and then it all makes sense, and you may have decent speed. BTW the casing you bought seems of very bad quality compared to what we get from HPe or Lenovo. I've been running this service for nearly 5 years now, and we never lost a single bit of data... :)
What PCIe Gen (e.g. Gen3) does the CM4 carrier run? I have some experience trying to push FPGAs over those crummy 1x risers, and almost always had to drop to PCIe Gen2 or even Gen1 to maintain signal integrity over even the shortest USB cables. I'm curious if it would be more stable, at the cost of being even more painfully slow on the upper end. Edit: Oh wow just saw the follow-up video to this, and sure enough, dropping to gen2 improved stability a bit, and dropping to gen1 seemed to make things solid. Would not surprise me if just about all of it came down to the riser and USB cable carrying PCIe signaling.
Hu, I know that modern cases have that 1 stud so you can place your motherboard in on the stud which makes it easier to align the rest of the screws, but even then every motherboard I have installed has had 8 - 9 screw holes, sometimes there is a heatsink or something in the way of having all 9 screw holes, but 8 is the minium I have seen, this is only for full ATX though not ITX / mATX.
Jeff, Did I hear you right I. The beginning…..u said you got the kernel to identify qty 16 drives successfully? (Regardless of storage capacity for those drives I assume.) ?
Okay, you're a bit crazy if you need a petabyte of storage. You're really crazy if you want to have it backed up in a proper 3-2-1 config. Even LTT doesn't do that. Also btrfs is either pronounced BetterFS or ButterFS.
I work with a 100x drive server similar in design that uses 8 fans. I am sure with the pi/limited activity those 2 fans will suffice but withe the stock internals I doubt the drives will be reliably cooled below tdp. You would need to keep it a room controlled to
More bottlenecks than a Coca Cola factory.
That was really good
I already stole this 😂
💀
💀💀
True, and also the caffeine consume is the same 😅
60 HDDs in RAID 0 is the definition of "all gas no brakes"
Glorious, glorious RAID 0 🤩
glass cannon lol
Who needs raid, why aren't you using LVM and just mounting it as direct-IO.
And no steering
@@monad_tcp Who needs LVM. Just use ZFS.
A petabyte and a raspberry pi, the crossover we didn’t know we needed. This is sick.
Except it's so slow it's like writing to one of those fake Chinese 2TB USB keys! Yes I fell for that scam on eBay.
@@plica06 Why would you want a 2TB Thumb drive/Flash drive instead of an external HDD? They wear more quickly due to storage technology. Heck even utilities like #Ventoy now support HDDs which are more reliable and you can carry more OSes...
literally Pi-tabyte lol
@@adrianteri Man, it is a joke, a statement and also a confession by plica06.
A joke like Jeff's project. Which doesn't matter, because it is not about if it is useful or practical, but: Showing that it CAN BE DONE!
BTW Great job Jeff and respect to plica06!:)
P.S.: I get it that you are joking, too: "They wear more quickly ..." NO! They don't wear, because they aren't real! Hehehehe
@@adrianteri On the other side, when I take your comment serious ... I can't take you serious:) A portable media, 2022 in the 2TB range ..... and you say spinning media? That must be some back-to-the-future joke I do not understand.
Anyway, misunderstanding plica06's grotesque comparison and taking it as a spin(a straw-man!) to compare USB-Sticks[2] against enterprise level hard disks (as in the video, or consumer ones doesn't matter) is like comparing apples to hard drives. Enterprise level HDDs have to be compared to enterprise level SSDs. Latter one beat the first in every point, including price[1], reliability, data safety and energy efficiency in that sector.
[1] The purchase price is simply irrelevant compared to the maintenance costs, since the former is already planned in the product and can simply be written off after the service life of the medium.
[2] actually 2TB(etc.) NVME "USB-Sticks"(miniature-Adapters) exist. Try to beat their MTBF (vs. spinning media that IS transported around) ... ;)
Pushing the limits of "hackey" tech is what most hardware and software engineers should be shooting for. Very well done. Kudos to 45 Drives!
As someone who is about to endeavor on a new NAS project, this was a fun watch! Thanks!
just keep upgrading the drives, I'm sure there'll be 1 PB 3.5" drives in like 20 years 🤪
the PetaPi seems like a winner, I can only imagine the folks at 45 drives watching this with a mix of awe and horror.
PetaPite
Just like watching two ships colliding. 😎🚢
It IS a petabyte file server... but probably SHOULDN'T go with the name "PetaFile".
@@BradCozine PETAFile, = a database of People Eating Tasty Animals!
@@BradCozine petaPile?
I had this colleague who told this story:
I had boss who did not understand anything about computer or electronics. When ever I was trouble shooting something he would come and watch over my shoulder and comment "Have you checked the power?" This was very annoying as he did not understand anything. What made it doubly annoying that his advice was spot on so many times.
Check the power!!
damn.. i feel that
Did you check the power?
no
why
@@miriko1297 idk, just lazy i guess
Do I need a petabyte of storage? No. Would I mount one in my rack? Yes.
30 years ago: Do I need a Gigabyte of storage? No.
I absolutely need it to store my GPT-3 training data. Turns out having 16 GPUs wasn't enough, but I run out of space even before running out of compute time. Its probably more on the scale of 1PB + 8192 GPUs.
10 years from now, Do I need a Petabyte, yes.
do i really need 128 gb of storage on my mac?
"your mac is almost out of storage"
wait- whY IS MY SYSTEMS FOLDER 70GB
whY does microsoft word take up 2gb?
@@kevinbissinger I still remember buying my first gigabyte hard drive. Kept a grad student in school.
This is a bonkers crazy setup. It's so mad I just had to watch, 100%. Kudos to you for getting it to work at all. Your persistence is inspiring.
Jeff, I'm catching up with thx, since I've admired your ability for a while. I am an old geek and you do magic. Reminds me of my S-100 days with CromixOS.
1 month ago and no recognition? Especially for a $50 USD donation? Sadge.
@@spoils8179 Thanks for the sentinent, Aiden, but no problem. Hope he's doing well, and you as well for that matter. 👍
How about one controller to one pi, then stripe 4 pis together, that should increase the throughput and scale down the issue to a more managable chunk, and still a pi project.
This is probably the most reasonable way to do it, and use Ceph (or another network-based filesystem)... and indeed I will be testing that out soon. Still bottlenecked but probably more reliable and would not run into as many errors on the PCIe bus!
I think you will need more than 4 Raspberry Pis unless the pool is for backup or archiving purposes only.
Even a low-to-mid end Intel or AMD system would make more sense.. though then the project will lose its “Piliness”. 😂
@@qazwsx000xswzaq Wirh these Network Speeds you would need 1 Pi per Disk to max out the disks🤣
@@paulz1780 We can then expose each drive as an iSCSI target and aggregate them at a server over ethernet. And viola we have a not-so-poor man's SAN haha. It reminds me of those shiny new network addressable NVMoF SSDs btw.
Gluster might be a good option too.
"I was going to unbox this on camera but Fedex already did"
As someone who's had multiple expensive items destroyed by Fedex and actively avoids doing business with companies that ship with them, I felt that...
The reason for FedEx packages being more damaged than USPS packages actually isn't FedEx's fault. They accept heavier packages than USPS, so that means if a shipper doesn't pack their package correctly, when that 150lb box of farm tools slams into your 6lb box of plastic in the sorting machine, your package gets crunched. Meanwhile USPS only goes up to 50(?) lbs, so your box just doesn't get slammed as hard.
TL:DR your packages get damaged in FedEx because the shipper didn't pack them right. Source: I ship around 5 packages a day with all 3 major US carriers and rarely have an issue since I actually package my shit
@@philiam0420 That doesn't mean Fedex isn't partially at fault. Especially because on the rare occasion that I do get a damaged shipment from UPS, they leave a note with a number to call offering to pay for a replacement if the item is broken. Fedex just delivers my $300 PC case looking like a truck backed over it and acts like it's no problem. And that's without even mentioning the countless packages that never showed up at all, or that got delivered to the wrong address and sent me on a trip across town trying to find it.
Yep. I had a laptop battery crunched by FedEx. It's a wonder that they can stay in business this way.
You just didn't package correctly, how is that FedEx's fault?
In 2021 I had 5 items fulfilled through FedEx. 1 never arrived, 2 arrived damaged, 1 arrived late, 1 was early and undamaged.
"20% delivery reliability!" 😵💫
I think i heard Seagate in the US having a heart attack watching drives being juggled.
I live in Taiwan.
Even though the idea is crazy, Jeff knows his craft. The tips at time 09:00 on how to select hard drives for the task are priceless!
Well, pairing €20.000 - €24.000 worth of drives with a 100 dollar raspberry seemed like a solid plan to start with. Having it run a 1pb raid0 config while hosting that storage seems like a task made for this little arm processor 😂 Nice to see you try out such extreme things with the raspberry😀
1942: "We need to figure out a solution to digitally store dozens of bytes at a time. Vaccuum tubes, maybe? This is going to cost us millions, but it will be worth it to finally have accurate artillery range tables!"
2022: "I'm going to hook up this petabyte of data storage to this cheap single board computer!"
1969: "we need to figure out how to use our cutting edge 1.5 million doller, 4kb ram 32kb harddrive to bring people moon"
2025: lauching a rocker with a single board computer...
Wait until the average person doesn't click on a video below 16k 120fps. There will be always a way to fill it 😅 but I remember it's gotten better over time.
@@vaisakh_km Pi's and Arduino's are already used for avionics on model rockets, and cubesats can have total build costs of like $50,000. It's amazing how far the industry has come and is going
I hope LTT sees this. Great stuff as always, Jeff!
I wondered how long it would take for LTT to be mentioned in the vid
I always like to think that every time Jeff publishes a new video, the raspberry pi design team feels a disturbance in the force.
As they should, it's been 3 years, where's the raspberry pi 5?
I find your lack of faith disturbing 😅
@@jstan5802 you're gonna be happy
@@jstan5802 Here.
This showed up in my recommendations. I have no clue what happened here and I know nothing about data managment or IT in general. But I watched it to the end not realizing that this vid is 22 minutes long 😅good content 9/10
Fascinating! A lot of work! The issue it seems is having to tether to the hard-disk through a band shared with other drives, rather than universal. There is a technique where using data-points written to each hard-drive mathematical computations can rewrite and measure the data for inference machiningly- without even using the shared io-bus anymore! You’d see a slowdown in computation monologue but all speed ahead on as many drives as you want!
Woah.
woah
you get seal of approval
Woah.
Woah.
Woah.
woahhhhhhhhhh
Linus would be proud
Torvalds too. Nice example of the Linux versatility
electroboom too.
I was waiting for the water bottle :D
It's a good example of why the ltt pedibyte project is so expensive
@@rvmiv_
First of all and most importantly, the speed, those were gen 4 nvme ssd drives with extreme speeds, second of all this is just a storage rig but what ltt built is a server, I mean have you seen how they run NASA simulation?
And lastly there drives are extremely reliable
I saw you posted some on the homelab subreddit last week, THIS is what you were hiding from us?? What a fun idea. Can't wait to see your next project!
Homelabbers unite!
In a few years I can imagine an image comparing this to a micro sd card and a caption saying “this used to be a petabyte in 2022”
Jeff you never cease to deliver, you're an absolute legend, great video!!
Glad to see Red Shirt Jeff back. I was wondering what happened to him.
Geez, your projects are soooo extreme and cutting edge. Your troubleshooting processes are very informative and helpful to your viewers, who otherwise wouldn't have a clue where to look. We never see that on "How to" setup videos, where everything just automagically works.
BTW, I recently used Styrofoam to separate 2 prototype boards while testing them out. It got hotter than expected and the foam sagged, allowing the power rail of one board to touch the other board, which fried as soon as I powered up the following morning.
Don't try this at home, folks. Cardboard insulation good. Styrofoam bad.
Thanks for sharing Jeff. I'm fully confident that you'll get your bandwidth up on the 60 HDD RAID. Looks to me like you're already nearly there.
Instead of installing it inside a server rack, perhaps a locked cage would be a better idea to protect it from Red Shirt Jeff. 😎
Heh, cardboard insulation 'better', but I now have it on a 3D printed box that's a little more secure too.
@@JeffGeerling Good point. 👍
Now, my 2nd attempt is safely mounted inside a custom acrylic case, too. One less thing to worry about, amongst the multitude of other potential mistakes. Live & learn, eh?
Just gotta keep putting one foot in front of the other until the final goal is achieved, right?
The hardware for the Teensy Laser Synth waveform module w/ILDA DAC is complete. Only need to flush out the code to full functionality, before moving on to the custom MIDI controller.
BR 😎
Hey Jeff, I’ve been using your ansible roles for almost 10 years. Love seeing you around HN and the other traps, great to see your UA-cam channel is doing so well. Very cool projects!
Jeff, you have carved yourself a niche channel in an overcrowded tech community, taking the Pi to new heights in every video. Keep it up. Loving every video. I'm left astounded on the Pi's capabilities and untapped potential.
My God! 20TB 60HDD I never dreamed of such a volume
Can you frankenstein Gamecube or cut down wii Motherboards together into a one display all working together?
You tried all the things I wanted to see. You know your audience!
The last thing I wanted to try (even had it in the final edit but cut it for time) was 4x hardware RAID cards... but I only have one on hand. I was thinking of setting up 4 hardware RAID 6 arrays, then uniting them on the Pi as a RAID 0 array and seeing if that performed better since individual drives would all go through one HW raid card, and that would also give redundancy.
(And who said hardware RAID is dead? You still need it if your computer performs like one from 2010!).
@@JeffGeerling Yeah, exactly, who said hardware RAID is dead ? Clearly unrelated, did you ever thought of doing a collaboration with Wendell ? Him and Red Shirt Jeff would surely push themselves to insanity :D Seriously speaking you two seem to do similar exploratory courses, though, of course, he's much less Pi-centric.
When I first saw the title I figured this would be one of your massive multiprocessor/multi-Pi projects combined with a massive amount of storage. Something closer to 20 TB per Raspberry Pi. At 60 Drives and 60 Raspberry Pi's, that would still be way beyond any normal homebrew project.
Absolutely fascinating and completely, totally barmy! What an immense amount of work you obviously put in to this - research, sweet talking 45Drives, research, RAID solutions, research, talking to 45Drives techs, research, etc.
I am *extremely* impressed, both by you AND by 45Drives for their courage!
And my abiding thought? "Chassis" is pronounced "shassey", not "chassey"! Pfft!
So I looked up chassis in Cambridge Dictionary to back my obviously accurate opinion and ... well, blow me, North Americans really do say "chassey"!
Every day is a learning day! Pedantry isn't good
lol we North Americans are weirdos. Or maybe you are... I guess it's a matter of perspective, neighbour!
@@JeffGeerling I have to picture you in dungarees, a battered straw hat, and with a long stalk of grass hanging from you mouth when you call me neighbour! Hock-diggardy, or something like that
I think Arthur C Clarke hypothesized that this would be enough to store a few people's minds into it
There's something about seeing you go from putting the last drive in the rack to immediately plugging in the dinky micro SD card that makes me giggle 😃
My NAS is a pi4 with a desoldered USB chip to expose the PCIe bus. It's connected to a PCIe switch(to improve transmission and prevent crashes) and then to a cheap ASMedia SATA controller. The kernel is also patched to force PCIe gen 1 to prevent crashes(this is probably the same thing that happened to you in the video btw). PCIe gen1 is still faster than gigabit so it doesn't matter. Over samba or NFS I can get the full gigabit speed even on large data transfers so no bottlenecks there. Would I recommend this setup? no. Does it work? hell yeah (longest uptime was like 90 days or something and then a power outage killed it, I need to get(or more likely make) a UPS, I know). If anyone wants photos lmk
Nice. Coreforge also suggested forcing Gen 1 speeds elsewhere in the comments, so I may need to test that out.
@@JeffGeerling That was my first idea as well. Those switches are used to run gen1 or maybe max gen2, when used for GPU mining. Above that it will throw errors.
Oh my goodness, that juggling of those drives.😳
Every IT engineer has a stack of broken hard drives just for juggling with.
Haha true.
@@wayland7150 yep, pretty much. My old boss when I worked in computer repair used laptop hard drives to level out his microscope. Funny thing is the drives weren't even dead, they were just something like 160GB 5400RPM drives that were more useful for that task than storing data.
Many Pi’s running something like Minio might allow for some interesting single box hardware redundancy. Also might be able to get over the 1Gb limitation since you’d have many pi’s each with their own 1Gb connections.
finally a way to store warzone updates, updates will take a shit ton of time cuz its hdds though
"Somehow I convinced 45 Drives to send me the server AND all these hard drives"
The impressive bit to me is that you were able to get hold of a CM4!
Well Jeff, one thing's for sure, if you and others aren't pushing the pi to bleed on the edge, no progress will ever be made in this direction. I'm not sure what kind of useful stuff this direction will yield, but it surely will yield something. Keep on pushing ya madlad!
My hope is the next Pi at least has the PCIe bus bugs sorted so any card will 'just work'. After that, any more bandwidth they could squeeze out would be appreciated.
The CM4 is actually great for many 1 Gbps network use cases-but with a little more bandwidth, it could be great for 2.5 Gbps (or heck, more than that if we're dreaming!).
@@JeffGeerling with a USB3.something port, it could get almost-5Gbe, so not that much of a stretch for faster than 2.5gbe
This is amazing. Good job charming 45drives into sending the case! They must be Red Shirt fans 🤣
Maybe the CABD order is like the firing sequence of a 4-cylinder petrol engine lol 🤔
The amount of power this thing draws could probably be measured more easily in horsepower, so you're not wrong there!
@@falxonPSN the PSU isn’t that big - a horsepower is roughly 750W, so it’s going to be around the 1-2 mark.
@@JasperJanssen fair enough. I can't argue with good pedantry! 🤪
@@JasperJanssen Could run a petabyte server off a lawnmower engine...
@@fohkukohgeki this is the project we need to see!
I am afraid someone addicted to this work is called a petaphile person! 😱 But great, that there are people doing these kind of projects!
I just completed a 24 X 18T build that's almost 1/2 PB. I bought the drives a few at a time all Segate recertified it seems to be the sweet spot in price for me. I had to upgrade and rebuild things several times. Your video was just like my experience switching OS, FS, etc. I had endless drives/arrays just dropping out, mostly on startup. I tried Fedora, Centos, Open Suse, Suse JeOs, and Ubuntu Server all let me down for one reason or another. I tried them with various shares, raid arrays and file systems; plus not all would run my app. I ultimately got it working with Ubuntu workstation and the Ubuntu share - no samba. I'm not happy with ZFS. I had to kill the swap and add more RAM (which meant a new motherboard) to keep ZFS cache from crashing. I solved the dropouts problem by putting the drives really close to the host board and using short/expensive data cables that are all the same length. I'm using a very old Athlon FX-8300 8 core and 64 gig of Ram. I found a great last generation Adaptec 52445 raid card new old stock. I had to install 2 power supplies and rewire one to all molex to get enough amperage on the 5 volt rail. New 1200 watt supplies have plenty of 12v but almost no 5v power. I also upgraded to 2.5 gbps network card. The write speed is NOT stellar. With Raid0 it goes real fast at first 450mbps filling all that cache, but slows down to about 150mbps. With a single JBOD I only get 130mbps, two drives at the same time still go 130 each, and I can transfer to 4 drives at the same time before it bogs down the network and write speed drops to 70 each drive or 280 total. I already got 4 sas expanders and plan to continue adding drives (and power supplies) up to 2.5 PB. My box is an old IBM 2401 tape drive converted to rack space. I yell at the You Tube screen, not my computers. That's not true I also yell at my computer at work (it's windows).
Be careful with those sata to molex adapters...they are prone to fires!
Red Shirt Jeff liked this comment.
Peta-Pi go BRRRRR
Im genuinely surprised this worked, very impressive. As for what to do with it, a video on downloading/hosting a local copy of Wikipedia would be pretty cool.
Jeff Geerling: My storage setup registers on the Richter Scale
I tried to experimentally find out what command created the nice coloured output which you used to watch the drives boot up. I tried journalctl, tried to reach the last line. When I had held down Picture down (I think it's called like that in English) for several minutes and was at line 30.000-ish I gave up and redirected it to a file. I opened the folder in nemo and saw: 70.5MB! of journal messages! xed crashed while trying to open it.
Heh... well I used `dmesg --follow` which returns colored output by default in Debian/Ubuntu's default terminal. Then I used `atop` to monitor the drive activity.
@@JeffGeerling
Thanks!
I actually knew about dmesg a year ago, but then forgot about it... I was searching through my entire /bin/ folder (have stopped after a few tens of programs) for which command I had seen before to output system messages, thanks also for the re-introduction!
Jeff - I mourn you missed seizing the opportunity to officially name this The Pi-tabyte Project - a portmanteau teed-up for a long drive, but you duffed it, lol. - great vid, keep 'em coming, I'm having a grand time doing some of your projects. Cheers.
60x USB-connected HDDs... I'm telling you, a missed opportunity :D
Yeah use the maximum possible number of USB hub daisy chaining
Heh, the USB controller would probably just set itself on fire!
One thing I saw you doing when wiring up the NAS was that you connected the molex adapters to two sata power connectors on the same line. If the psu has another line, try connecting to that, as each line can draw a limited amount of current and there might be an issue there. Then again, the issue might be anywhere else but that's a thing you can easily try
Even before doing a raid or zfs test; I’d have run a single drive test (either benchmark or simple linear read/write). Loop that 60 times and see success. Then repeat with two drives in parallel and loop 30 times. Redo with three in parallel 20 times, etc etc. When you start seeing failures you really will see the cause-effect point. There is technically no reason why this won’t work with 60 drives - if you ignore performance. Any bugs that are exposed that can be fixed will simply improve the base users world. This is an awesome test that pushes the RPi and kernel to the limit. Making this work at that limit helps all of us just running one drive. Plus, the errors you saw, as bad as they were, should somehow restart the drive (without a reboot).
When those errors occurred, the HBA reset itself and the drives always came back-at least 15/16 of them!
That was one concern from the Broadcom engineer I spoke with, and the reason he really wanted me to run the latest firmware. Unfortunately due to time constraints I couldn't flash all the cards in a separate PC then bring them back to the Pi and re-test. But I plan on trying that out.
Extra testing has been done-tl;dr the breaking point is 3 cards (or more than 30 direct attached drives). But forcing PCIe Gen 1 speed also fixes the issue. More to come in my next video!
It's impressive that this works at all.
It almost hurts to watch the board being removed. I've been wishing for a storeage like this for decades.
Still a very interesting project.
**CM4:** Hey Jeff! What are we going to do today?
**Jeff:** You will handle 60 enterprise grade hard drives.
**CM4:** Oof
I have smaller versions of those EXOS drives.
>200 drives, over the past 5 years, I’ve had 4 failures. 3 covered under RMA. 1 had just expired.
For every anecdote (usually it's "all my Seagate drives exploded in giant fireballs!"), there's an opposite anecdote. In aggregate, if drives like these were truly failing at the rates some people think, Seagate would not be in business :)
@@JeffGeerling The reason all of this information was popularized in the first place is because of backblaze's reports (back in the day) but if we look at their stats now, year over year, seagate is constant.
EXOS and businness drives in general are fine, it's the consumer lines that are more "hit and miss", but even then it's easy to find a pattern unless you buy hundreds of them. I.e. a brand doesn't just "consistenly fail 4x more than another"
Ripps out high end setup for a PI! You Monster! Though a Fun Project to play with a pi, maybe in the future with a PI 14 hope you enjoy your new Perabyte Server
When you're testing tech and it registers on a seismometer, you've accomplished something.
Jeff doesn't have the word "why" in his vocabulary :)
(Just kidding, things like this are fun, which is "why" enough for me).
45drives marketing team is on a roll
hi
Were they taking ab big chance? What if Jeff had proved you could do it all better with a PI? Hahaha.
@@wayland7150 Not much of a chance. They are well aware of the performance of their product, and the limitations of a Pi.
As a St. Vincent fan, I'm calling it "Pietabyte" whether anybody else does or not.
Dude can fit 0.01% of the internet on this damn thing. Insane.
Hey! 45 Drives is here in my home town!
Next step: 3.14PB on a PI
How many times did you recompile the kernel?
If lsblk starts to run out of letters and shows drives as "sdaa, sdab, sdac" etc. you know you have a data hoarding problem.
You gotta admire a guy who takes $50k worth of server and... plops a Raspberry Pi in it. Brave and crazy. :)
As an HDD engineer usually drives are built to compensate for vibration due to certain fan RPMs. Especially if we have a big customer we'll optimize things for the frequencies of vibration in their trays.
Amazing content as always, keep up the great work!
I'd almost be interested in seeing each raid card assigned to 1 PI and then them clustered together.
Would love to see this with a RISC V processor
Is there a Raspberry PI which fits inside a standard size 3.5" drive enclosure? I don't plan to use a SATA connector, just want the form factor. I was thinking of stacking a 3.5" NAS vertically with a PI stacked on top, so it would need the same form factor as the drives.
Jeff: "It's gonna be a while before S-Tier data hoarders deal with petabytes."
LTT: "Petabyte project is full."
"I grabbed a small piece of cardboard to insulate the boards from each other. At least temporarily."
Yeah. Sure.
I think red shirt Jeff was trying the break back into the room.
You can see red shirt Jeff is actually in the room with Jeff at one point in the video.
it’s a shame you didn’t call it “PetaPi”
I did! (Eventually)
They must really trust you to borrow $35k+ worth of disks
s/'borrow'/'juggle' :D
This guy is the only human capable of winning in a debate with Data from star trek.
This is an awesome project. Makes me wanna build a large storage server.
It was only about time that the new storage measurement of reference was going to be the petapite... I'll show myself out :D
Hi - im interrested in the "working mat" on your table - where i can get this? And thanks for the entertaining clips!
Yet another absurd carage setup. Just love it!
dude that hard drive juggle made me freak out a little
🤹♂
The problem you're facing is that USB 3 connection to the board. The way the Pi interprets all of that data is time sensitive. Basically, when the USB 3 connection to the board and then through the kernel saturates, it only gives the data in the buffer so much time to be read/written and when that time expires, it basically dumps the packets and then goes back to look for more data. The BIG problem here is the IRQ signals are ALSO going through that port to the PCIe Expansion board. This is why your drives will just disappear after awhile.
I ran into the same problem setting up a Pi driver 70TB NAS at home and in the end, I had to basically setup a CRON job to monitor for things like dropped IRQ's and whatnot and if there were any hardware failures, it would force a reboot.
Bottom line is that yes, you can connect allot of hardware through expansion boards and use USB 3 connections to bridge it all, but that is going to be VERY flakey at best.
Integrating a Pi compute module with Ethernet into each hard drive can make scaling easier. It lets each drive connect to a network independently, simplifying data handling in big storage setups.
Can you post what hardware (Pi related) you used in the video? Like the compute module, raid cards and interface boards?
OMG the original hardware is so beautiful it brought a tear to my eye to see it removed. Given the resources I would have 2 or 3, one for local redundancy and one for off site but given net speeds the off site would mostly be pointless. And no I don't need one, I just want one so bad it feels like I need it :)
Hi there! I work for Infomaniak, and I am managing storage networks. We offer backup services and connected drive (we call it kDrive, it's a bit like google drive, just we protect your data and they are hosted in Switzerland).
Adding 96 Exos 20TB HDD in my swift storage cluster is what I do every day. I manage "moderately large" Swift clusters. On them, we add 6 2U servers at a time, each of them holding 16 HDD (so 6*16 = 96 HDDs). In total, I calculated that I am managing more than 5000 spinning drives in our OpenStack swift clusters, which amounts probably around 100PB (so not just 1 PB like you're doing...). Just this week, I added 12 HDD storage nodes, and 6 proxies (with 2x SSD each, plus the system drive), and 9x NVMe storage nodes (10 NVMe each, to be used in a Ceph cluster).
Your idea of a Raspbery-Pi is fun, but instead of one RPi for all HDDs, I would setup one RPi *PER* HDD, and then it all makes sense, and you may have decent speed.
BTW the casing you bought seems of very bad quality compared to what we get from HPe or Lenovo.
I've been running this service for nearly 5 years now, and we never lost a single bit of data... :)
What PCIe Gen (e.g. Gen3) does the CM4 carrier run? I have some experience trying to push FPGAs over those crummy 1x risers, and almost always had to drop to PCIe Gen2 or even Gen1 to maintain signal integrity over even the shortest USB cables.
I'm curious if it would be more stable, at the cost of being even more painfully slow on the upper end.
Edit: Oh wow just saw the follow-up video to this, and sure enough, dropping to gen2 improved stability a bit, and dropping to gen1 seemed to make things solid. Would not surprise me if just about all of it came down to the riser and USB cable carrying PCIe signaling.
PPP is a great acronym, so I stand by Petabyte Pi Project
Did you just use a Type A to Type A USB cable to connect the switchboard input to the pie?
Hu, I know that modern cases have that 1 stud so you can place your motherboard in on the stud which makes it easier to align the rest of the screws, but even then every motherboard I have installed has had 8 - 9 screw holes, sometimes there is a heatsink or something in the way of having all 9 screw holes, but 8 is the minium I have seen, this is only for full ATX though not ITX / mATX.
Yay! Tiny computers doing huge things!!! 🔥🔥🔥
Jeff another wonderful video. Keep up the great work.
I would love to see a reboot with the Pi5
Jeff,
Did I hear you right I. The beginning…..u said you got the kernel to identify qty 16 drives successfully? (Regardless of storage capacity for those drives I assume.) ?
Curious why you didn't choose to run ZFS, which is tailor made for these applications!?
Okay, you're a bit crazy if you need a petabyte of storage. You're really crazy if you want to have it backed up in a proper 3-2-1 config. Even LTT doesn't do that.
Also btrfs is either pronounced BetterFS or ButterFS.
Do you happen to have a link to that USB to PCI Express switch board?
I work with a 100x drive server similar in design that uses 8 fans. I am sure with the pi/limited activity those 2 fans will suffice but withe the stock internals I doubt the drives will be reliably cooled below tdp. You would need to keep it a room controlled to