Hardware Raid is Dead and is a Bad Idea in 2022

Level1Techs

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 кві 2022
Hot take? Maybe? Maybe not? idk I'm just the editor. ~Editor Autumn
**********************************
Check us out online at the following places:
linktr.ee/level1techs
IMPORTANT Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.
-------------------------------------------------------------------------------------------------------------
Music:
Intro: "Follow Her"
Other Music:
"Lively" by Zeeky Beats
"Six Season" and "Vital Whales" by Unicorn Heads
Outro: "Earth Bound" by Slynk
Наука та технологія

КОМЕНТАРІ • 1,5 тис.

@Level1Techs 4 місяці тому ⁺¹⁷
Hey! There is a new video on this topic "So if Hardware Raid is Dead... then What?" ua-cam.com/video/Q_JOtEBFHDs/v-deo.html
@__teles__ 2 роки тому ⁺¹⁷⁰
In 40 years of IT, I have experienced two occasions where the RAID controller card went mad and wiped the data. I stopped relying on RAID but backing up bejesusbytes and having a way to restore it without taking days is the real problem. I've recovered from a hardware disaster but the business was out of action for days. The cloud has the advantage that you can blame somebody else.
@Ughmahedhurtz Рік тому ⁺¹⁰
"But storage is so expensive versus just buying our own!" The law of TANSTAAFL is immutable. Trying to explain that to management is frustrating.
@666Tomato666 Рік тому ⁺³³
@@Ughmahedhurtz sure, but when cloud storage costs per year the same as buying the drives yourself (and they'll last way more than a year), the calculation isn't so straight-forward.
@PanduPoluan Рік тому ⁺⁸
@@666Tomato666 Agree. There's always an inflection point where you'll be better served with rolling your own solution.
@harryhall4001 Рік тому ⁺⁵
@@Ughmahedhurtz Or you could just use a modern file system like ZFS with built in RAID and automatic integrity checks.
@nilsfrahm1323 Рік тому ⁺¹
Just a crazy idea: at filesystem level, you can zip or rar the data in 'store' mode (no compression), add recovery info and later it can detect corruption or even recover.
Writing many files in such archive is sequential, works much better on hdds (less seek) and overhead is very small on modern cpus.
But it only works for files that are very rarely written.
@LAWRENCESYSTEMS 2 роки тому ⁺⁶⁶⁶
Great topic and long live ZFS! I have been a bit curious about the claims made GRAID and this video really helped me understand it better, and of course better understand its shortcomings compared to ZFS & BTRFS.
@shadow7037932 2 роки тому ⁺¹²
All hail ZFS!
@Marc.Google 2 роки тому ⁺⁹
Long live the cult 😉
@Gjarllarhorn1 2 роки тому ⁺¹²
I read the video title and immediately though... ZFS!
@YvanDaSilva 2 роки тому ⁺⁶
Long live CEPH & GlusterFS :)
@ewenchan1239 2 роки тому ⁺⁴
LTT also tested and experimented with GRAID already. They also found some of the similar kinds of problems that Wendall has found (and talked about) here.
@DrathVader 2 роки тому ⁺¹⁹⁹
Before even watching the video, I'm guessing it's gonna be Wendell telling me to use ZFS
@williamp6800 7 місяців тому ⁺¹²
This is the way
@derfacecrafter1869 4 місяці тому
Why everyone say, use ZFS
@Real_Tim_S 4 місяці тому
@@derfacecrafter1869 "Why everyone say, use ZFS[?]"
ZFS is "better" than HW RAID as it allows spanning of a volume across multiple physical devices (of any bulk-storage type), but also allows file/file-system based integrity verification completly agnostic of the storage medium's physical make-up. With a dedicated storage system (with a very perfomant modern CPU) and a decent amount of ECC system memory, it shortens the loop between the file system and the redundancy - allowing for de-duplication, compression, and advanced features low-level HW RAID devices were not able to do like copy-on-write (instead of overwrite) and snapshots making backups and recovery MUCH easier. ZFS can use both reported errors from the physical drive(s) AND do patrol scrubs of the volume (both scheduled and forced) including the repair/re-silvering. It goes without saying that the loss of a drive still can behave like a legacy conventional re-silvering of a replacement drive.
ZFS has one shortcoming IMHO, and that is volume resizing is difficult. Once a ZFS volume is created, ADDing additional drives to create more space in a ZFS volume is requires moving the data to a new volume with more space. The solution I use to get around this ZFS shortcoming is GlusterFS using ZFS as a data-store only, not as a native volume. GlusterFS allows for storage-host-to-storage-host redundancy for a volume and allows for taking a ZFS data-store offline without losing data which you can then add back in (larger).
@reaperinsaltbrine5211 4 місяці тому ⁺¹
UHUM LOL :DDDDD ZFS wants AT LEAST 8GB for itself...."Lean" "Efficient" Like HTTP /me barfs
@swissrock1492 4 місяці тому
object store like s3 or swift
@NickF1227 2 роки тому ⁺⁸³³
Linus: Holy sh!t this implementation from Nvidia is FAST. Wendell: This implementation doesn’t prevent data corruption.
;)
@SupremeRuleroftheWorld 2 роки тому ⁺²⁵
linus did not knew that when he made the video
@transcendtient 2 роки тому ⁺²⁰⁵
@@SupremeRuleroftheWorld Why people gotta simp? Its a joke that points out the often surface level overview LTT does of tech and their general incompetence stemming from the barrage of videos they release on a regular basis.
@Siroguh 2 роки тому ⁺⁵⁷
@@transcendtient you can't go to a grocery store and complain they dont sell cars.
@javierortiz82 2 роки тому ⁺¹³⁸
I was just about to comment just that, LTT cares more about selling you water bottles than the actual quality of the review, their approach to reviews is just wrong when it comes to this kind of content they went, brute forced some testing by throwing thousands of dollars in storage on a server and only tested speed, threw two sponsors plus two LTT store plugs in the mix and called it a day.
And god forbid you use an adblocker while watching them.
@CheapBastard1988 2 роки тому ⁺⁵³
@@Siroguh As a European, I expected Walmart to sell cars in North America by now.
@WillCMAG 2 роки тому ⁺⁷¹⁰
"ZFS has Paranoid levels of Paranoia." "My kind of file system." Every time I learn more of about the file system my TrueNAS box runs the more boxes it ticks off and the more I like it.
@dineauxjones 2 роки тому ⁺³³
I like TrueNAS. They made making an enterprise class solution like ZFS stupid easy and accessible.
@thehorse6770 2 роки тому ⁺³⁷
ZFS is not a blanket solution either, you get enough failures in your drive pool, and, unlike with RAID, _all_ data recovery becomes impossible. This video makes a lot of claims about the superiority of ZFS. Yeah, RAID is not a backup solution, but what's the point because in a degraded ZFS disk pool you're even worse off in terms of data recovery than i.e. with RAID1/10 where even with a bad drive (or multiple, depending on the array), you're still able to recover the data! In ZFS, your data is equal to having been scattered into the wind in billions of pieces.
@hoagy_ytfc 2 роки тому ⁺¹¹
We ran ZFS on FreeBSD. It was a massive disaster.
@nobodynoone2500 2 роки тому ⁺¹³
@@hoagy_ytfc How so? I have multiple production servers using that setup.
@lawrencedoliveiro9104 2 роки тому ⁺²⁴
ZFS is to filesystems what Java is to programming languages. It will happily consume all the RAM on your system if you let it.
I think it is best restricted to a dedicated storage server that performs no other function.
@Techieguy93 2 роки тому ⁺³²
I appreciate the detailed explanations in this video! I am working on a storage solution for a small office and had already decided to go with ZFS, and this solidified my reasoning in doing so. It's been a LONG while since I have set up a RAID controller, and as it appears we are moving backwards in functionality (for most solutions). Thanks, Wendell!
@JohnClark-tt2bl 2 роки тому ⁺²⁹
My how things have changed. I remember not long ago that software raid was constantly shit on.
@marcogenovesi8570 2 роки тому ⁺¹⁵
if by "software raid" we mean the motherboard integrated RAID or Windows Storage Spaces, it's 100% justified to keep shitting on it. Software raid on Linux/BSD and/or ZFS has always been where it is at
@cdoublejj 2 роки тому
Wendell's JUST made a video not THAAAAAT long ago shitting on bios software raid.
@TAP7a 2 роки тому ⁺¹
@@marcogenovesi8570 ZFS came from Solaris and illumos didn’t it?
@marcogenovesi8570 2 роки тому ⁺⁶
@@TAP7a Yeah but nobody uses those anymore, thanks to Oracle. 99% of real world usage of ZFS is on BSD and Linux now
@totojejedinecnynick 2 роки тому ⁺⁴³³
Took me months of self-researching and testing to get my head around pitfalls you explained so clearly in a 20 minute video. Bravo. Most people think that their raid5-6 array is safe, until it isn't and data recovery fails. Silent bitrot, scheduled parity checks, recovery, rebuild performance... as an individual I ended up between zfs (on linux) and btrfs. Now I am on a long, very long journey to covert all that mess into a giant ceph deployment which is a different level of headache but it seems to be a solution for availability, correctness, consistency and performance (in that order)
@Anonymous______________ 2 роки тому ⁺³⁹
RAID arrays were never a guarantee of data integrity or protection. They are simply ways of dealing with hardware failures, so that there's no single point of failure. Even worse, all of those fancy file systems such as btrfs or ZFS are not traditionally found in Enterprise class Linux distributions, which precludes them from most environments aside from going out and having to download them through additional repositories. There is nothing worse than updating your machine to find out your ZFS volumes no longer mount because the kernel module was built with older headers and dkms didn't automatically recompile it based on the new kernel version. Those are just some of the issues Linux "storage architects" either ignore or dismiss as just normal woes of using those particular file systems. Issues with bitrot and silent data corruption have always existed, and most people do not discover them until they attempt to perform a certain file operation somewhere down the rabbit hole in a directory that almost always exceeds the 255 character limit of windows lol.
@totojejedinecnynick 2 роки тому ⁺¹⁷
@@Anonymous______________ I don't see anything wrong with new optional features (like parity checks, bitrot protection, deduplication, COW, snapshots) built on top of basic RAID functionality (redundancy in case of dead drive). It is clever engineering leveraging potential which raid brings and using it to something useful because we have extra cpu cycles available anyway. Otherwise your only option is deployment of aforementioned 520b drives. As far as compatibility goes, btrfs is now in mainline kernel, developed by names like FB, SUSE, Oracle... yeah, it will be fine since breaking things in userspace is traditionally a big no-no. Makes me sleep well, just don't use it with raid56 (for now) and you will be fine. That being said, my cold storage is running zfs (or is it zfs on linux? OpenZFS 2.0? I don't even), my notebook is running f2fs...
@NalinKhurb 2 роки тому
@@totojejedinecnynick Maybe the wrong place to ask but I would appreciate your input as I only have a very elementary understanding of RAID systems
What would be the best strategy for preventing bit rot for multiple terabytes of data? Like what RAID level or file systems would be beneficial for this scenario. Thanks
@totojejedinecnynick 2 роки тому ⁺²⁸
@@NalinKhurb Big problem was highlighted by Wendell - bitrot needs to be detected first, figured out second and repaired/corrected third. So you need system that checks parity on every read/access, verifies it, has ability to figure out whether error is in the data itself or just checksum of data, have enough uncorrupted information to correct said error and ideally do all that seamlessly without user ever noticing. But we live in allegedly real world (or matrix, depending on what pill you choose), things don't work like that. Some systems (Nvidia, mdraid) will happily feed you garbage. Some systems will refuse to feed you data if it is corrupted (zfs). Some systems will do their best to figure it out and feed you what they think should be correct data (btrfs, ceph) - but can you really trust them? I am yet to find a truly fully self-healing system, user needs to be involved in regular maintenance and debugging... For home gamers with just TBs of data, the best solution is to just buy a new backup drive, DOA test it, copy your data to it and unplug it. When your primary data gets corrupted, reach out for backup drive :) KISS. Honestly, there is no simple one-size-fits-all solution for storage - every case is different so I am afraid that if you need an array of drives, there is a lot to learn. Ask around on level1forums, lttforums, reddit...
@NalinKhurb 2 роки тому ⁺¹⁰
@@totojejedinecnynick Thanks for the detailed overview. Really appreciate it!
I'll start with ZFS :)
@Robert_DROP_TABLE_students-- 2 роки тому ⁺²⁶
that's two LOTR references in one video. did kreestuh write this script?
@Arexodius Рік тому ⁺²⁰
This is exactly the kind of stuff *_everyone_* should be learning about in basic computer courses!
I mean... not even (some) enterprise solutions are really functionally enterprise grade anymore???
Do we really live in such a fast paced world that the only ones concerned with data integrity are massive data collection maniacs like Google?
@TheSikedSicronic 4 місяці тому ⁺²
This is far from basics lol this is a super advanced topic that many people won’t even begin to understand unless they take a cyber forensic/ computer forensic class where u learn about storage and how it works.
@Arexodius 3 місяці тому
@@TheSikedSicronic I think you missed my point. It's not basic now, because nobody made it basic. If it's important, people should be learning about it. The earlier the better. It's only a matter of making it simple enough to understand, and you cannot convince me that it would be impossible. That's the essence of teaching - making things simpler to understand - and it's been done with many things throughout history. At the very least, people should be made aware of the topic early on and why it matters. That way the ones who are really interested can dive deeper into something they might not have learned about until much later.
@gto11520 3 місяці тому
The future is moving towards SPEED. Data assumed backed up on a incremental and differential level. There is no need for raid controller. And eventually as hardware becomes more powerful the need for ZFS might be behind us. As for the data it will be purely be backed up instantly into the cloud. Basically storage companies are more concern in redundancy in power failure and making more lasting storage devices.
@0blivioniox864 2 роки тому ⁺¹⁹⁸
Fascinating how much verifying the integrity of the actual data on the devices is an afterthought. So much attention is paid to RAID's capability to allow an array to survive the loss of a device that we almost completely forget about consistency.
@c128stuff 2 роки тому ⁺³⁰
Consistency was not something raid was ever supposed to solve, it is merely intended to give you enough redundancy to deal with hardware failures and recover from data corruption.
Sure, enterprise solutions used 520 byte sectors, but its all in the name.. redundant array of *inexpensive* drives. Traditionally the extra checksumming was totally independent of a raid solution being used.
@marcogenovesi8570 2 роки тому ⁺²⁵
@@c128stuff Lol storage vendors redefined the acronym to "independent" drives pretty quickly
@raddysurrname7944 2 роки тому ⁺¹⁵
meanwhile error correcting in RAM is not allowed in most consumer computers because of probably arbitrary decisions.
@c128stuff 2 роки тому ⁺²
@@marcogenovesi8570 Sure, but that does only confirm what I wrote.
I'm sorry to look back a bit further than the early 2000s, but history doesn't start there, and nor does raid.
@marcogenovesi8570 2 роки тому ⁺⁴
@@c128stuff dude wtf? The industry renamed RAID back into the 80s, not "early 2000s"
@Gryfang451 2 роки тому ⁺³³
Those of us who lived in the enterprise with our massive SCSi, then SAS direct attached, then finally FC SANs, iSCSI,etc..the most fear inducing moment was always when you lost power, lost a drive, and started what, at the time would have been an offline rebuild, hoping you wouldn't lose more drives in the process. One time is all it takes to make you shudder as you are hoping the backup tapes actually have your data! I just decommissioned an LTO6 library! One year of keeping it around for retention. My backups are all on Synology RS units. No more hand carrying tapes offsite either. Hyper backup works well, where various other schemes didn't. I now have backups of backups for two locations, and BTRFS goodness. All because a long time ago I learned that the D in RAID actually stood for dependable, and the RAI was for Really AIn't. Neither are tapes unless you like rolling the dice on year old magnetic bits. Don't ever trust your data to be there, and make sure you have multiple, verifiable backups and actuality test recovery once in a while. Because it may suck to be down for a little while, but if you lost months of data, we call that a resume generating event or RGE in the computer janitor business. And you don't want to be that person.
@wendelltron 2 роки тому ⁺⁹
generally I just like the idea of a weekly or monthly "yep I scanned all the file hashes and everything was exactly what it was last time, except for the stuff you changed" seems like a basic feature to me in 2022 but some folks here are spurging out saying you dont need that kind of checking. lol
9 місяців тому ⁺²
"People don't want backup, they want restore"… ;-)
@mistermac56 2 місяці тому
I am an OG as you and I agree with your comments 100 percent. I am a retired IT manager now, but I remember that all too often, the server administrators in my IT shop depended far too much on RAID and only periodic tape backups. I would send out a reminder email every month to them don't depend on RAID and make DAILY backups in off hours. I have to admit that I gave the administrators too much slack and should have demanded daily backups. I had a web server administrator and their server's RAID controller borked all of the RAID 5 disks and our web site was down. And the only tape backup they performed was two weeks old. I didn't single out the administrator in a all hands on deck meeting, but I made it crystal clear if any server administrator didn't make daily tape backups in the future, they would be fired. I also informed them that I would be upgrading the Synology units for more tape storage capabilities.
@Yves_Cools 2 роки тому ⁺²²
@Level1Techs : this is a very insightful video Wendell, I wish I could give you 2 thumbs up but youtube only allows me to give one.
@lucidnonsense942 2 роки тому ⁺²⁷¹
I unironically miss the days when you'd buy a decommissioned raid controller and a stack of random 12k drives, from a data centre, then bodge it all into a mutant raid array for your gaming pc... Once a month it would all come tumbling down and need to be resorted, until then your were speed...
@PainSled 2 роки тому ⁺²⁷
Why did my internal narrator pronounce "speed" with falling intonation?
10/10: I have coffee in my nose now
@udirt 2 роки тому ⁺⁸
@@PainSled 12k drives... either 10k or 15k.
@StefanReich 2 роки тому ⁺²³
@@PainSled Speed goes in the nose, coffee in the other orifice. HTH
@freedomspeech9523 2 роки тому ⁺⁸
I have one of those hardware RAID5, with 7.2k drives, working for years. Migrated couple of times to increase capacity.
Zero issues.
@morosis82 2 роки тому ⁺⁵
@@freedomspeech9523 that you know of
@Phynellius 2 роки тому ⁺¹⁰
Raid is not a backup, but your backup may use raid
@miff227 Рік тому ⁺⁷
And now a word from our sponsor: RAID Shadow Legends
@davidmeissner5010 Рік тому ⁺¹⁸
Wow, that is like a crash course on RAID. Since I really am an amateur about it, my takeaway is that we could forget the hardware raid controllers and move to a file system based raid platform called ZFS. Question is (provided my understanding is correct), do you have any more details on ZFS and how to implement it? Remember, I am a newbie regarding RAID.
Great video by the way. I'll be following more. Thanks.
@JeffGeerling 2 роки тому ⁺²⁶²
B-b-but what if you only have a Raspberry Pi? Hardware RAID is like 10x faster if you need parity on a Pi.
@chinesepopsongs00 2 роки тому ⁺¹⁸
Then use a card and drives that are at least 20 years old on the PI. :-)
@JeffGeerling 2 роки тому ⁺²⁵
@@chinesepopsongs00 The ironic thing is that only the newest Broadcom/LSI RAID cards will work on the Pi, because the older cards required features that aren't even present on ARM implementations to work.
And the newest drives still need some modifications to work on the Pi since it's PCIe bus is a little funky.
@roysigurdkarlsbakk3842 2 роки тому ⁺¹³
I know you love Raspberry pi and so do I, to some extent. When it comes to storage, I'd rather use a good old computer with a bit beefier CPU and memory bus, preferable with ECC memory etc. Yes, a pi can do a lot of stuff. I've setup some 40 infoscreens at work with them, running octoprint on them, using them for various IoT related stuff including home assistant, but I don't use them for storage. I have some 12 drives in my home server, a fine combination of SSDs and HDDs of different sizes and makes. I'm quite sure you would be able to use a pi to control that as well, that's for sure, but it won't be particularly fast.
@GrahamCantin 2 роки тому ⁺⁴
"Hardware" RAID on a pi4? With what HBA? I seriously doubt most of the broadcom LSI chips' control interfaces are going to be available to you on an ARM platform. Adaptec will have the same problem unless you can find some interesting qemu tricks to emulate the EFI/nonEFI Option ROM interfaces. So seriously, Jeff, what "Hardware" RAID are you talking about?
Just use btrfs. Every btrfs gigablock has enough metadata inside of it for tree recovery. Or go full paranoid with ZFS, but I wouldn't recommend it because Oracle's chased after users of their patent stack with a courtroom in the past, and is insane enough to demand money from Java users and anyone who wants USB3 in virtualbox these days. ZFS is tunable, you don't 'need' a ton of ram for L2ARC and ZIL if you're just running a 'small' pool of consumer data (~20TB) and not some crazy data warehousing database or machine learning models.
@eugenesmirnov252 2 роки тому
I've done Minio storage with 6x2TB drives, with bit-rot protection. It's not so hard
@Foiliagegaming 2 роки тому ⁺⁶⁹
I’m still learning and most of this went over my head but this was a great video! I watch the GRAID LTT video and was pretty stoked about something like that. But big sad. Wendell you are my IT janitor hero!!
@TheGuruStud 2 роки тому
@@pieluver1234 bc Microsoft is paying them to spread propaganda that they don't suck...and then the server fails weekly lmao. Linus is and always was a sell out circus actor.
@Mpdarkguy 2 роки тому ⁺⁴
@@pieluver1234 I mean they mostly have windows clients; you'd think a windows server would make sense
Shit happens
@tobylegion6913 2 роки тому
@@pieluver1234 the one who doesn't know what he is talking about is you.
"You'd think a windows server would make sense paired with a client"
Keyword WOULD THINK. That doesn't mean that it is the case, but that it is a valid assumption that systems from the se manufacurere work better together.
So get your head out of your own ass and stop being so condescending, especially if you keep talking out off your ass.
@99mage99 2 роки тому ⁺¹²
@@pieluver1234 Because the LTT audience is a mix of enthusiasts and casuals mostly centered around DIY gaming/workstation PCs. Most of them don't care enough to go learn about enterprise level hardware because it really doesn't mean anything to them. LTT gets content like that to a wider audience and presents it in a way that someone with no experience could probably follow along. Which is pretty much the point. LTT doesn't present it as "this is what we did step by step and so should you". It's more like "look at these fun toys, I'll explain how it works like you're 5 and give you some examples of why it's better than what we were using"
I'd agree that watching LTT for advice on enterprise hardware and config would be insane, but watching it for the fun of them messing around with hardware most of us will never see in person is pretty neato.
@Mpdarkguy 2 роки тому
@Malice I was trying to put myself in the place of someone doesn't have a cs degree you know
@solidreactor 2 роки тому ⁺⁵⁵
Would love to have a video with the topic "Welcome to ZFS and BTRFS - Your Wendell introduction and video guide". Basically helping us migrating from ntfs/ext/etc to these other two.
@nobodynoone2500 2 роки тому ⁺²
Hopefully, with the reach of this video, that one is next.
@Banner1986 2 роки тому ⁺⁷
I still remember working on rebuilding arrays from hex back working at LSI, which was only possible BECAUSE the 520 byte sectors. Ah, the stories... every large company used the same storage HW, all white labeled LSI stuff, and I miss being in the thick of it there
@jsebean 2 роки тому ⁺¹⁶³
Great video Wendell, however as a Btrfs advocate myself, I think it's important to represent it properly so people know the difference. I personally believe Btrfs and ZFS, while both have a lot in common, serve two very different use cases. ZFS is the enterprise solution, Btrfs is the home owner/NAS solution filesystem.
First, Btrfs does not support per file or folder level redundancy options. This idea may have been tossed around, but Btrfs redundancy is only on a per volume basis. Not even on a vdev basis, which ZFS has, it has no concepts of vdevs. Not sure I'd personally want per file redundancy option anyway, that could become a management nightmare, but per subvolume redundancy options would be nice. Alas, neither is a thing yet.
As for RAID5/6, consider it unusable on Btrfs. Like just straight up don't use it, the mkfs and balance tool will warn users for a reason if they do try to use it.
It needs a total rewrite to be usable (which means an on disk format change is likely. I believe Western Digital was working on some fixes but this is still a ways out). While btrfs will indeed "protect" you from bitrot, all it can do is prevent returning bad data, it is unable to repair all issues. As such it's nowhere near trustworthy as ZFS RAID-Z.
It's not resilient to the write hole issue at all, and what makes matters worse, it will lie to you which device may be throwing corruption at you. To me, that's as good as not having device stats at all, as you're forced to rely on the disk to actually indicate issues yet again. Apart from the fact the fs won't return bad data, it still forces you to go to a backup or play wack a mole in figuring out which disk is causing the issue. If the parity bits themselves becomes corrupt, scrub won't care to fix it, only if the data itself is corrupt.
Additionally, if data does go corrupt, and the stripe that data was in is updated, when Btrfs updates the parity bits for the stripe, the corrupt data will now be reflected in the updated parity, destroying your only chance at recovery.
Now to be clear, Btrfs' other RAID levels are indeed fine and great at preventing, identifying and fixing bitrot in all cases, but it does require some knowledge about management, as Btrfs will not automatically rebuild without you triggering it.
First, you need to be familiar with how it allocates data to know when a balance is needed, vs a scrub.
A scrub will indeed scrub the filesystem, and repair data with redundancy, but data is allocated on a "chunk" basis (usually 1GiB in size on most volumes), and these chunks are what have a redundancy policy (not really the volume overall). It is this people need to look out for.
First, If the array becomes degraded, btrfs will not automount without user intervention because any new writes to the fs may not be redundant (It's impossible if you have the minimum disks required for a profile, as it's degraded, otherwise it won't be balanced properly). If the allocator is unable to allocate a chunk to store the data in, in a redundant fashion, it will instead create a chunk with the "single" profile. If the filesystem was already mounted, and it can't satisfy the allocator requirements to create a redundant chunk, it's forced read only (this is why people often suggest RAID1 users on Btrfs use 3 disks instead of 2, otherwise high uptime is impossible with it).
Now, even after a drive is replaced on Btrfs after it has been mounted with the degraded, you need to keep an eye on the chunks as they're indicated in it's management tools to see if you need a convert any non-redundant chunks with a balance to restore full redundancy.
However, Btrfs it is quite innovative in other ways, so home users in particular shouldn't write it off too quickly. It allows mixed size disks to be used redundantly. If you have 2x2TB + 1x4TB disk, in a RAID1 configuration, it will have 4TB usable space. You can still only lose one disk without losing the filesystem, but the data on the 4TB disk is balanced between the 2x2TB disks. It also supports duplication on single disks for some added bitrot protection in those use cases, along with RAID10 (and RAID0 if you don't want redundancy, I guess). To get mirroring across three, you would need to use the RAID1c3 profile, which would make 3 redundant copies across the three disks, at which point you'd only have 2TB usable space in the above example, but with the resilience of being able to lose two disks. There's also a raid1c4 option if one wants it. Finally, shadow copy is possible with it, and Btrfs is a great open source alternative to what Unraid provides when it comes to flexibility with disks.
Btrfs also supports converting from any RAID profile on the fly as you like. Wanna go from RAID1 to RAID10? Easy, just add the disks and rebalance to raid10. If you have a RAID10 now, and a disk fails, and you're unable to replace it right away, you can rebalance to a RAID1 to get redundancy back without needing a replacement disk right away. All this stuff can be useful in some cases, and if RAID5/6 ever does get reworked, anyone running it's current stable RAID profiles can easily convert to RAID5/6 later. Now, as cool as this functionality is, it's a bit of a niche, it's more a thing home operators would usually be concerned with, not enterprise users who would just install another disk or configure another vdev.
It is overall a great choice for homeowner/NAS uses cases, and it's built into Linux in just about any distro. I use a 5 disk RAID-10 array with it here and it has been serving me well. It's also a spectacular desktop filesystem for those who run desktop linux -- I wouldn't choose any other, as snapshots, compression, and incremental backups with send/receive is too much to pass up. There's a reason distros like Fedora and OpenSUSE use it by default on their desktop flavors, it really is great, but I just wanted to clear up people's expectations with it so they pick the right tool for the job ;)
BTW: For anyone who cares to have a true Linux alternative to ZFS, keep an eye on Bcachefs ;) ... sorry for the long winded post, I could make a series of videos discussing Btrfs because it is my personal favorite Linux filesystem lol
@AI-xi4jk 2 роки тому ⁺¹⁰
Thanks for sharing!
@YaroKasear 2 роки тому ⁺⁹
I used btrfs off and on for years. Every single time it silently corrupted itself to the point it would stop working. No other filesystem would do this. I wasn't even using any RAID.
I am watching bcachefs, but it's been a few years since that went anywhere.
Personally, I would still choose ZFS over btrfs even on NAS setups. ZFS beats the snot out of btrfs on reliability and their devs definitely seem more competent.
@leexgx 2 роки тому ⁺⁷
@@YaroKasear (sounds like a hardware problem not filesystem) if using it on a SSD make sure kernel is above 5.15+ or make sure dup for metadata is set (btrfs balance start -mconvert=dup /mount/point if already created or -m dup at manual filesystem creation)
before 5.15 or last 10 years btrfs has defaulted to single for metadata if ssd/non spinner was detected at filesystem creation (due to assumptions)
witch is bad as btrfs without dup on metadata is bad as a single error on the metadata can Hose it without a chance to auto correct attempt
Another note if your using dm-integrity or dm-crypt in front of mdraid your raid now has self heal capability (dm-integrity or dm-crypt checksums all 4k blocks by default and any errors get passed up the layer as read error so the raid can deal with it, has a 20-30% write penalty depending on loads)
Not had problems with Synology or netgear readynas implement way of doing it (assume asustor has done same thing as well) uses btrfs on top of md-raid so don't have any of the problems of raid56 of btrfs (they have modified btrfs and mdadm so that btrfs can talk to mdadm and issue a parity or mirror repair attempt if btrfs receives valid data mdadm writes the correct data back to the array (when a scrub is ran it first runs a btrfs scrub Once finished it then runs a raid sync so no way stored errors can replace parity as it would normally do)
one key important thing on Synology is you must make sure you have ticked the checksum/integrity box when creating the share folder as it has a habit of not pre ticking it (need a + nas and filesystem chosen needs to be btrfs at volume creation)
netgear readynas enables it by default unless unselected by user (or CPU is arm type or old CPU model you get a warning when trying to enable it, mostly affects write speed on older devices)
Asustor allows btrfs based on model number (and you must tick the snapshot box when you first setup the nas or it uses ext4 as default filesystem, whole nas has to be factory reset to change it)
@wishusknight3009 2 роки тому ⁺⁶
BRTFS has given probems to at least 2 people that I know who have tried it. To me it seems gimmicky and pointless. Best practice is just use symmetrical disks and ZFS2 on Freenas. over 10 years use for me and never an issue, even when drives died. Its always been that "just works" solution.
@eDoc2020 2 роки тому ⁺¹
@@leexgx I was wondering why there wasn't a way for btrfs or other filesystems to interact with the RAID layer. If these NAS companies have made it happen, have they released their modifications for usage by the general public?
@RN1441 2 роки тому ⁺¹⁰⁷
Once again we get bitten by marketing playing fast and loose with implying that their products solve problems for us that they don't. The storage media space seems absolutely crammed full of this type of thing the past few years. WD hiding SMR in their NAS drive series, NAS units themselves being open to cloud vulnerabilities even when you try to turn off all their stupid cloud extras, SSD manufacturers substituting different controllers, drams and flash chips on their shipping drives compared to what they sent out to get good reviews (always downgrades) and reliability numbers that I wouldn't trust let alone rely on. Now I learn that some of the RAID volumes I've set up to prevent decay of precious files and memories are probably not protected at all against decay or loss. LOL.....
@realjoecast 2 роки тому
You mention precious files and memories.... I would assume not just setup on your RAID Volumes :-)
@DergEnterprises Рік тому ⁺³
My NAS machines have their gateway set to a blackhole.
@ashkebora7262 Рік тому ⁺¹⁴
Nah, it's in _all_ industries. Unless you're getting a canned product that already has fully defined spec sheets to reference, it's all fluff. Even in the software industry, _way too often_ does marketing _literally sign contracts_ with customers for features that the product doesn't do, yet. Then the marketing guy wonders why he and the company looks bad when the customer receives a rushed implementation...
Until companies start getting in trouble for "not _technically_ false advertising", it's going to keep getting worse.
@Hostilenemy 2 роки тому ⁺¹³¹
My first Raid Z2 server had 32TB of storage, my next one had 60TB, and my current one has 180TB. Never lost a file or had one corrupted. Trust in ZFS.
@Anonymous______________ 2 роки тому ⁺¹¹
Unfortunately RHEL and CentOS do not have native support for ZFS and require a kernel module. Which, If you know anything about dkms, poses the potential that ZFS volumes won't mount on boot up after you apply any kernel updates. Generally, any respectable cyber security oriented business uses a long-term support Linux distribution (i.e. RHEL, CentOS, or Ubuntu LTS, etc). Once ZFS enters kernel mainline status (most likely never) more Linux focused companies will replace ext4 or xfs due to zfs's significantly better scalability.
@space_cowboy007 2 роки тому ⁺¹³
@@Anonymous______________ all the more reason to choose a LTS distro that actually supports it
@aeonikus1 2 роки тому ⁺⁵
@@Anonymous______________ I literally detest every Red Hat/Fedora/Mandriva etc linux flavour since late '90s. Slackware is my best Linux friend :)
@georgen9838 2 роки тому ⁺²
@@Anonymous______________ Isn't Btrfs mainline?
@stevedixon921 2 роки тому ⁺⁴
@@Anonymous______________ For those curious: I believe the reason ZFS will not be mainlined into many of the 'enterprise' distros is a compatibility issue between GPL and whatever the other one is (escapes me at the moment). I think the core was whether you had to make any custom code changes available to the public versus being able to keep said changes private (though I could be way off base on this one, it exceeds my knowledge depth). I think ZFS started in the 'unix' realm (with BSD and the like) so it has different open licensing rules.
And yeah, anything than can just stop working after updating your system is not going to make it far in the enterprise.
@stevedixon921 2 роки тому ⁺¹²
Years ago I was shocked to discover that almost no part of storage or software was responsible for data integrity, with each part assuming someone else was doing it or that you had backups. HDD = no (silent bit rot not detected), File system = no, Application = no. The good news is the next gen of file systems (REFS, ZFS, etc) are supposed to fill in part of this deficiency. Seems that no one wants to take on being responsible for data integrity.
Logically the drive should be responsible for at least detection and reporting that it cannot return exactly what was stored. All drives should silently be storing parity data for each sector *by default* to assist the file system in recovery options if nothing else (akin to the 520 byte sector size thing mentioned in the video, just updated to handle 4k sectors).
@leexgx 2 роки тому ⁺¹
Refs requires integrity to be set to true (default is false so by default so does bugger all to protect data even if your using mirroring unless you set integrity to true via powershell)
and worse the enforce part (true by default and ignores per folder setting when set to false, just sets new created files to true) if integrity is set to true as well it will delete the file if there is a single uncorrectable bit of data (just get an event log and file disappears)
@leexgx 2 роки тому ⁺²
Drives do have built in ECC, I have rarely found 4k physical (512 logical / 4k physical) drives to get hung up on a ECC block it can't repair (if it hits a read error that ecc can't correct it just returns a UNC/red error quite quickly )
older 512/512 would get hung up on trying to correct a read error it any going to be able to correct as to why TLER/ERC was made so they give up in 7 seconds so Raid can deal with the problem
The difference is that the 4k only needs to do a single 4k ecc check on 512e/4kn and has higher chance of ecc repair vs 8 ecc blocks on old 512 physical drives
@666Tomato666 Рік тому
@@leexgx I had a crib death of a 4k physical 512b logical disk that wouldn't go through simple file system format, it would just spit out garbage when trying to read the sectors just written
you were just lucky: storage is not to be trusted, they sell on performance numbers, not on hard to prove and hard to show reliability and resiliency
@kevinlassure6214 2 роки тому ⁺¹⁸
Meanwhile in France, 90% of companies use RAID 5 over 5/6 disks if not more and bosses answer: it cost less money and we have backups anyway. Or even, as i also heard: i never saw issues yet with raid 5 or 6. Beceause our country is from stone age and want to spend just enough for something to work even if it isnt really reliable
@sobertillnoon 2 роки тому ⁺²
I guess I missed the boat. The one thing I wanted raid for is gone. Gonna convert all my arrays to 0. Might as well get as much space and speed as I can seeing as the only thing on offer. Raid is not a backup.
@raulsaavedra709 2 роки тому ⁺⁶
Awesome video! Besides the detailed problems explained here for RAID 5 and even 6, I remember quite a few years ago Dell kind of officially discouraged the usage of at RAID 5 for any business critical data, and that was because the likelyhood of a second drive failing or of the appearance of an uncorrectable error precisely when trying to rebuilding the array after a first disk failure was much higher than people typically assumed/understood.
@DeltaSierra426 8 місяців тому
Yeah, I remember being surprised when I started to see RAID 5 discouraged. HPE (then HP) said the same around the same time. I still see people recommending RAID 5 in IT forums and such and just shake my head. Even RAID 6 is loosing support. Honestly, I like my RAID 10 to the maximum degree that it still makes sense for the scenario.
@EpicWolverine 2 роки тому ⁺⁴
StableBit DrivePool also has file/folder duplication policies that can be inherited and whatnot but isn’t a full native proprietary file system. It just makes a virtual drive on top of NTFS. Very cool stuff.
@ChrisDerichs 2 роки тому ⁺³²
This got me thinking, what is the answer these days? While looking into zfs and btrfs and being annoyed that neither are available out of the box on the RHEL clones starting with 8, I noticed a RH's blog talks about using dm-integrity to handle bit rot detection. Might be useful with any of those software raid combinations.
@deViant14 2 роки тому
SLES
@mckidney1 2 роки тому ⁺²
LVM with dm-integrity and parity based on logical volumes instead of md. The biggest pitfall of LVM is storing the metadata, just like on those proper controllers. I also looked for more LVM in the video - but Wendell does love ZFS :D
@ChrisDerichs 2 роки тому ⁺⁵
@@mckidney1 I did some experimenting. By default using lvm's --raidintegrity option, any corruption I introduced to the drive caused it to completely drop out of the array. I eventually figured out disabling the journal was closer to the behavior I wanted as devices under the raid. LVM also has an option to disable the journal but it seems only by enabling bitmap mode instead. Bitmap mode has its own problem. Still trying to narrow down the best combination.
@etome8 2 роки тому ⁺²
Maybe checkout the CentOS Hyperscale SIG, it adds back btrfs
@leexgx 2 роки тому ⁺¹
@@ChrisDerichs if your using dm-integrity or dm-crypt (dm-integrity is just dm-crypt without the encryption part) under mdraid so if dm detects any checksum fails it report it as read error so mdraid can use mirror or parity to correct it (if you use btrfs on top of mdraid you get filesystem integrity validation so if dm or mdraid fails you get btrfs checksum errors)
Don't know about how LVM handles it
@frenchpressfinance 2 роки тому ⁺¹
I'm retired from the IT game and somehow your video popped up randomly via the YT algo. I may be retired but I still keep up a little for my own purposes and this was a GREAT explainer!
@mike_pj 2 роки тому ⁺³
Thanks for posting this video. So what's the best option if you want to have a single server with redundant SSD storage running something like ESXi? I've had plenty of drives die on me, and it's really convenient to have a hardware RAID where I can just swap in a new disk with 0 downtime. Seems like the only option is to have a second TrueNAS server providing storage via iSCSI for the VMware server. But if you're coloing this kind of setup, your costs just doubled.
@CheapBastard1988 2 роки тому ⁺¹⁶
The policy based redundancy on a file level sounds awesome!
@wishusknight3009 2 роки тому ⁺¹
It isn't as good as it sounds. At least until it matures some. 2 friends of mine have lost data to it when he thought he had policies set correctly. And when the drive dies, turns out the policies didn't stick for subsequent writes. BRTFS to me is a joke.
@chloefletcher9612 2 роки тому
Good video - I've been out of the server hardware world for a few years now but how does ReFS stack up?
@NullStaticVoid 2 роки тому ⁺²
Even 7 years ago when I worked for NBCUni I had a hard time finding raid specific drives for our HP Proliant servers.
SAS drives are almost all going to be raid drives. But 2.5" SATA seems to be going away. Nobody is making anything bigger or faster than what they sold 5 years ago.
And RAID specific drives are getting harder to find.
@GrishTech 2 роки тому ⁺³⁵
I have a good bitrot example. Some of my oldest google photos started to have gray lines in them, then over time it went half gray. This started happening across photos. Confirmed I can see this across multiple devices. Can’t believe this happened at a google data center, where data integrity is critical.
@-aexc- 2 роки тому ⁺³
wait, Google has bitrot too? I've been using rclone crypt to backup all my files to my uni gsuite under the assumption google has a way better data integrity than me
@Dalewyn 2 роки тому ⁺⁹
@@-aexc- The moral here is the 3-2-1 rule: 3 backups, at least 2 of them on different storage mediums, at least one of them off site. For any mission critical data, only having one backup is equivalent to having none.
Redundancy is not a backup, but backups need redundancy.
@GrishTech 2 роки тому
@@-aexc- I wouldn’t say it has bitrot, but just some of my photos had it.
@mogoreanu 2 роки тому ⁺³
Not 100% sure about photos in particular, but pretty much all of the data stored at Google has checksums associated with it and bitrot is detected and fixed automatically. I do not think your gray lines are due to bitrot on the storage layer. Maybe recompression artifacts.
@BikingWIthPanda 2 роки тому
this didn't happen
@dolfinmicro 2 роки тому ⁺⁵
This was a great video. Just when I think I know all I need to know about RAID, you set me straight. Thanks!
@chromerims Рік тому
Great video, thank you 👍. 0:10 nice white Lian Li O11 case in the background . . . unless I'm mistaken.
@padraics 2 роки тому ⁺³
Yo you're looking at some crazy speeds. I just did my first build nvme drives, needed four fully independent hosts (ESXi 7), using local storage. Used megaraid 9560 cards and the performance of 12 kioxa 3.2tb drives is lovely. Had no idea btfrs had those features. Good info!
@bmiller949 Рік тому ⁺⁴
Wow, things have changed a lot since I was in school in '96. I use to run a hardware raid 5 system at home office. It was truly hardware raid as the card cost me over $500 at the time.
@YolandaPlayne 8 місяців тому
Now I think the best solution for the home office is a Synology system.
@mitch7918 2 роки тому ⁺¹⁰
Wendell... This was the best explanation or discussion on this subject I have ever seen. You are truly, seriously knowledgeable. Been watching you since the start of the Tek days and have always had great respect for you. Much love and long live ZFS!
@gorillaau Рік тому
I have been trying to figure out how ZFS handles different sized disks. Supposing you have 2x 2TB drivers and 2x 1TB drivers, so 6TB of raw storage.
@mnemonic6047 8 місяців тому ⁺¹
as an apprentice in IT, i created myself a hardware raid, this video updates me on the current topology of hardware/software raid on what to expect in enterprise, thanks Wendell!
@corvoattano9303 2 роки тому
Never clicked on a video this fast before. This topic is so fascinating for a beginner like me. Thanks so much for saving me a lot of potential headaches had I not watched your ZFS and bitrot videos here and on the L1Enterprise channel.
@Jules_Diplopia 2 роки тому ⁺⁶
I am a long time out of the IT world... thanks for a quick update on the current state of RAID. I had wondered how it worked on NVMe SSDs. Clearly it doesn't. Glad to hear that ZFS is still going strong.
@SmallSpoonBrigade Рік тому
I kind of think this video is like 2 decades late to the party. FreeBSD had Vinum at that point and ZFS was starting developed thing. At any rate, software raid has been better than hardware for at least a decade at this point, and probably more like 15 years.
@martinhovorka69 4 місяці тому ⁺¹
Current MBs support RAID at BIOS level, you just need to load the appropriate drivers when installing the OS. I run RAID 0 on a PCI 5 SSD and apart from the higher sequential reads the performance hasn't changed compared to a single SSD, RAID has little practical significance from a performance standpoint.
@Jules_Diplopia 4 місяці тому
@@martinhovorka69 But the whole point of RAID was that it protected data if a single drive failed. So for RAID to be useful you would need at least 3 drives, be they HDD or SSD so that if one fails the other 2 can recover the data.
It may well be that things have changed since my time, but I would have thought that that basic principle remained.
@stevec00ps 2 роки тому ⁺¹³
Oh this explains why I had to go through a load of hassle to reformat some 520 byte formatted SAS disks from an old EMC SAN to get them to be 512 byte format to use on a normal PC!
@marcogenovesi8570 2 роки тому ⁺¹
yeah they use a bigger block size to store checksums and stuff somewhat "in hardware" kind of
@kernkraft-2354 2 роки тому ⁺³
It was surprisingly more difficult to fix than it needed to be. I had to force my HPE RAID controller into HBA mode against it's will. Then you start the operation and just hope for the best. I was getting 600gb 520byte drives on ebay for $8. sg_utils FTW, and now nvme-cli for all your low level hardware magik
@jrussellmoore 2 роки тому
Not long ago I had a visit to a small local datacenter and inquired on how they were running things, it was a CentOS at the time and I think they told me they weren't using RAID internally in their data storage arrays, that they were using BeeGFS instead. Do you have any knowledge on how (or if) it does any sort of parity checking or whether it also relies on the disks reporting?
@alexatkin 2 роки тому
Thanks for this. Still running ext4 on my NAS and had been pondering if I should migrate to btrfs.
Also, should I stick with ext4 or use btrfs for the USB backup drives? I do run LUKS on the drives too.
@SmallSpoonBrigade Рік тому ⁺¹
Don't, btrfs is kind of silly, you'd be better off moving to ZFS. And definitely ditch ext4, I'm sure it's improved since I was using it, but it was the least reliable FS that I've ever used. And I've been using computers for the better part of 40 years with just about every generation of Mac and PC since the '80s and none of the filesystems involved were as terrible as ext4 was. I would literally, shut the computer down properly and be unable to restart due to filesystem corruption.
@marcogenovesi8570 2 роки тому ⁺⁶
Afaik LVM RAID has integrity mode to do data integrity checking and autocorrect, also dm-integrity can be used under mdadm raid to do integrity checking and report the error to mdadm that will fix it from parity or the mirror
@nayphee 2 роки тому ⁺¹⁶
The thing with ZFS is that it needs routine scrubbing to check for bit-rot/cosmic ray damage. You can't wait for a disk to fail before you discover that bit rot has also wiped out enough parity to recreate the data on the replaced disk
@ahrubik Рік тому ⁺¹⁰
A common practice in the storage industry is having RAID scrubbing set up as a scheduled task. It all but resolves this risk vector.
@Loanshark753 Рік тому ⁺¹
You also need triple checksums
@mrpeterfrazier 7 місяців тому
Don't act like you know what ZFS is. Have you considered wut knowing is? Like if you want to see Corsica maybe you start seeing with eyes, rather than via disease? ...else it is like the pot calling the kettle black. Like if you're engaged in espionage, the pre-condition there is that your country has some agenda... then you have to figure out what a country is, and what it means for that entity to have requirements that could require espionage. You're getting ahead of yourself without a head... that's being a knit-wit.
@HawkFest Рік тому ⁺¹
Super, thank you very much ! I like how you scratch well below the surface but are still able to present a clear view of the situation, exposing a construction of your knowledge in a comprehensible (rational / logic) manner. You truly understand your expertise.
*A question follows.* For a couple of new builds I'm working on, I was thinking about systematically install the OS on 2x NVMe SSD in RAID 0 (rather than RAID1, since an OS can always get reinstalled or "cloned" into an image, as there's no user data loss potential.. Apart some eventual customization to the GUI and devices, which isn't of vital nature - except in a production line facility)... Your clip gave me a cold shower !😅I think I'll do fine with the OS Volume[:partitions] on one physical NVMe - in the end, cold showers are better for the wallet (and health).
Thumbs up and subscribed.
*2 Question :* for having already fiddled with this context, I know that there's a speed gain when installing all programs (especially their libraries) on a RAID 10 volume with NVME - sometimes RAID0 when programs can be easily reinstalled or backed up/cloned -, and the user data + data bases on other RAID 10 volumes of NVMe (I do regular backups, thus RAID10 is sufficient in case of some physical disk failure with slightly above a 2:5 max ratio). But is data integrity 100% guaranteed, at least as much as operating without RAID volumes ? What would you recommend as a better alternative, or other virtualization technique adding security and performance, for the above-context ? _Note : I try to avoid using any "onboard RAID" controllers (those that are provided by manufacturers on their motherboard lineups)_
@seanthomas2906 2 роки тому ⁺²
Watched the Linus video about this and thought wow!. You've brought me crashing into a wall. Great insight and well executed .
@MrGsteele 2 роки тому ⁺⁹
What you are saying applies equally well to all forms of storage. That is, unless you read back what has been written to a storage device - of any description - and compare it with what you attempted to write, you cannot be sure that the data has been stored with any integrity. All storage devices rely on some form of qualitative protection - in the form of parity, or CRC, or LRC, etc. - but those are statistically (and that's important) validated protections. That is they reduce the readback error rate from the natural error rate to a lower error rate by providing a mathematical way to detect and correct bit errors - up to a point. That is why error rates are specified - 1 bit error in 10^14 bits, for example - it's a foregone conclusion that recording media are imperfect, and you are relying on the capability of the EDAC codes to keep the error rate that low. Not error free - just low.
RAID, however, addresses a different aspect of storage system behavior - the ability to retrieve data when a catastrophic hardware failure of one or more drives happens. Without RAID, a catastrophic failure may destroy your data, obliging you to recover it through auxiliary sources like the last backup, or forensic recovery from a disassembled drive. An array contains an error recovery mechanism that allows you to continue operation while the failure is operative. The array dynamically reconstructs the data stream using that redundant data, providing operational continuity. Advanced arrays isolate the failed drive, reconstruct the data stream, and may even call in a hot spare and begin rebuilding what was on the failed drive or drives to permit resumption of fully-normal (non-reconstructed) operation. Operational continuity has business value. So does RAID - just as any redundancy does when its purpose is to provide operation through failure.
When you say that "drives do not honestly report their bit errors" it may well be the case that a RAID array does not employ a diagnostic read that provides information on whether the data read passed the test of conforming to the on-controller's assessment that the data did not require error correction, or required the use of an on-controller error correction calculation in order to (if successful) reconstruct the data originally written. A properly implemented array, however, will do so in order to detect at the earliest possible point any incipient failure and compare it to the array threshold for error rate or error growth in order to isolate the drive experiencing the problem - i.e. to proactively move a drive out of operation before things get to the point that error correction can no longer be relied upon for faithful reconstruction.
Incidentally, there is no such thing as "RAID 0" - the name itself is an oxymoron. What is billed as "RAID 0" lowers, rather than increases, the reliability of the storage system by requiring both devices in the stripe to work in order to provide the data; it is a performance-oriented concept that is misnamed - misnamed because there is NO redundancy, which is the first word in the RAID acronym.
A card that performs data reconstruction that has a maximum 16 GB/s I/O capability, moreover, does not limit the read rate to no more than 16 GB/S; realistically, errors are "bursty" - that is, you don't have 16 consecutive gigabytes of data that is rife with errors needing correction in real time - in fact, real time is in and of itself a misnomer, since I/O is asynchronous and, perhaps even more importantly, interrupted constantly by rotational and seek latencies that dominate the I/O equation, as well as buffering and I/O driver turnaround time, memory address allocation, bus handshaking, and on and on. Most performance data specifications relate only to a maximum theoretical rate derived from the mathematics of bit density, rotation rate, and interface electronics - not performance in practice. After all, the operating system must respond to I/O complete interrupts, interrupt processing time, buffer management, etc. No hard drive in existence, rated at 100 GB/S transfer rate, will transfer 100 GB in one second into a computer system. It's a fantasy.
Another nit is that neither 512 byte, 520 byte, 1024 byte, 2048 byte or any other byte count sector size (the latter sizes were commonly used on optical disks) have that many bytes in a sector; they have that many USER bytes in a sector. They also have additional bytes, located in each sector, dedicated to EDAC codes that are calculated by the drive controller in real time as the user data is being written, and then appended to the user data to serve in the readback process. In the early days of optical disk write/verify, for example, those EDAC codes were read back and verified against the data during the readback process to ensure that the data was written readably. If not, the data failed verification, and required rewriting for data integrity. More extensive codes were used as 4.75 inch disks became widely employed to reduce the necessity for verify, compared to their 5 1/4, 8, 12, and 14 inch predecessors.
It is definitely true that an improperly implemented, distributed parity, hot-spared, auto-rebuild array may be out of touch with best practice, and that attribute may be applied to any given vendor's product. But RAID, as a tool designed for a specific characteristic of higher uptime and operational continuity through failure, with admittedly - in fact, acknowledged - degraded performance during rebuild, still has value in 2022.
@mishasawangwan6652 Рік тому
finally some knowledge
@rudypieplenbosch6752 2 роки тому ⁺¹⁷
Wow, what a story, glad I am using ZFS. But these HWraid suppliers should go bust with delivering their nonsense to the enterprise market.
@insanemal 2 роки тому ⁺⁴
If enterprise RAID solutions were all this bad they would. Hint: They aren't. Source, I worked for a major raid vendor. Their stuff would have picked up this issue on read because it does read verify on all operations.
@rudypieplenbosch6752 2 роки тому
@@insanemal Probably you are one of those "IT professionals" not really understanding how it should work, like L1 mentioned. But his simple test by modifying the data and without changing the checksum clearly shows the absolute nonsense implementation of the HW raid vendor. When he mentioned a vendor just rewrote the parity block only, makes the whole HW raid really a big joke. Unless you provide us with something that contradicts L1's findings I can not take you seriously. As a sw architect, myself I have dealt with a lot of "IT Professionals" who basically told me on more than one occasion, to just format my drive to "solve" certain application issues, etc, when I wanted an SSD for my workstation, this of course was out of the question, due to the evil that would descent upon me daring to use such an unreliable drive, what a joke these guys were most of the time and I worked for several big companies. Even with my limited experience with servers, I could probably sell myself as an "IT professional" since the standard seem to be pretty low in that kind of business. Show us something that contradicts L1's findings.
@insanemal 2 роки тому ⁺⁶
Dude I work in Supercomputing. I worked for DDN. I know how it works. This particular product is a hardware RAID Accelerator. It's not a hardware RAID controller. Of course with the way this works read verify won't work. The card isn't in the read path. That doesn't mean it's true for all RAID controllers.
@rudypieplenbosch6752 2 роки тому
@@insanemal Well than, even with super computing, there are people that are blind to obvious facts, something the latest covid debacle has shown, its morons galore on every level everywhere, in any business you will find them. It is a clown world we are living in, sad but true. But I am sure your HW vendor is very happy with you 🤗
@jonathanbuzzard1376 2 роки тому
@@insanemal To many people who have not a scooby about enterprise storage come out on the internet. If you are not spinning hundreds of hard drives then you should STFU IMHO as you are just making a fool of yourself. The likes of ZFS/btrfs are good if you have more than 10 drives and less than say 50. If you have say 200+ hard drives then ZFS/btfs is a solution for idiots.
@SmokingBeagles Рік тому
Good to see you looking so well man, I could listen to you talk about computers all day and intend to on my day off. Collab with Dave's Garage when??
@PatrickDKing 2 роки тому ⁺¹
What's a good system to use for a beginner to raid? A premade box from qnap or synology? I kind of wanted to tinker with some old pcs that run but are just really old. Any videos that cover this or recommendations from the community? I understand hardware very well but I've never used linux or build a raid system either hardware or software based.
@finnderp9977 2 роки тому
Afaik Synology uses linux mdadm raid+lvm+btfrs file system. Dunno if btrfs can repair what mdadm loses
@fat_pigeon 2 роки тому ⁺¹⁶
16:22 Correction: you're confusing Btrfs with something else. I'm pretty sure that Btrfs sets the allocation profile at the filesystem level and doesn't let you set it per file (or even per subvolume). You can only set it separately for data vs. metadata. It likely wouldn't be difficult to add that feature, probably using an extended attribute similar to how compression is set per directory, though you'd need to run a `btrfs balance`. It definitely doesn't allow 5-way mirroring; they implemented 3- and 4-way mirroring by adding RAID1C3 and RAID1C4 profiles a couple of years ago.
However, I've read that the upcoming Bcachefs *does* let you specify replication level per inode (for data, not metadata). See the `data_replicas` option in the "Principles of Operation" doc. Maybe that's what you were thinking of.
@AegisHyperon 2 роки тому ⁺²
DrivePool lets you set replication at a per-file level (Windows only)
@AgentOffice Рік тому
@@AegisHyperon i like drive pool
@tad2021 2 роки тому ⁺¹²
Minor thing, the super caps on (all?) raid cards are for "flash-back". The caps power the cache module to store itself to local flash instead of powering the DRAM for the duration of the blackout. Gives blackout tolerance of the effective powered-off storage duration of NAND. A fresh LiPo on battery-backed might have maybe a day at most.
@wishusknight3009 2 роки тому ⁺¹
My old HPE P400 would promise about 50 hours. And the most I had power cut to a server due to grid loss was just under 30ish hours and it held up. I don't know if it would hit the promised 50 or not but it certainly held up for a full day. And that was with the 512mb BBWC.. the 256MB module is supposed to last somewhat longer. My LSI controller though only promises 24-36 hours depending on cache size. Thankfully my new array is based on the HPE P420 that is flash backed. Being in an area with bad power and frequent blackouts its nice to have. And my UPS only goes so far. When I am not home, and the server cant gracefully shut down. The raid has never suffered any issues on my VMhead.
That said my main storage server array is ZFS2. So its never an issue regardless. But ESX does not have native support for it, so I use the p420 for that.
@Chris-ut6eq 2 роки тому
just found your channel. Nice intro, so I subscribed! Thanks for posting this.
@clarkpatrick3754 Рік тому ⁺¹
Hello, this is an interesting and very insightful video. For a home user (non-enterprise business level) - what product or set-up would you recommend for a home back-up system? I have RAID drives and already didn't trust them... I've had drive failures before and lost data. What is the best way to protect my data?! I have about 10TB of data from my lifetime so far.
@vorpled 2 роки тому ⁺⁴
Are there any smaller NAS-type commercial/prosumer products that do this right (with proper parity checking)?
Huge thanks for this! Growing up in the 80’s I always presumed that quality hardware cards were going to be faster and more reliable than any software raid solution - and that they’d be doing it properly!
@ryanwallace983 2 роки тому ⁺⁵
TrueNAS Core and TrueNAS Scale both implement ZFS
Core is based off FreeBSD and Scale is Linux
@proxgs7703 2 роки тому
I would say Synology. They use BTRFS on top of md raid.
@theundertaker5963 2 роки тому ⁺²¹
Amazing video like always.
Can we please, please get a series on ZFS now to allow more folks to better understand, and implement it to save everyone from bitrot?
You are the only one who can take the mystique out of ZFS and make it approachable to technically apt, and the power users alike.
@JMHands 2 роки тому
coming from someone who has a patent in RAID rebuild, Wendell, you really know your stuff. What do you think about Synology approach with mdadm RAID 5 and btrfs with scrub/compression on top? Also, modern enterprise NVMe like P5510 has support for variable sector size / NVMe protection information, but it is very tricky to use without custom software (like you mentioned with NetApp)
@13_death_jester24 Рік тому
Hey thank you for the short class. I found it helpful about how Raid works.
@utp216 2 роки тому ⁺⁸
That new card reminds me of the PhysX add in cards back in the day. Just something to help the math get done faster. Then it was turned in to a software solution…
@CMDRSweeper 2 роки тому ⁺⁸
PhysX was never turned into a software solution, all they did was allow it to be accelerated by a GPU rather than the old PCI PPU.
The PhysX that were capable of running on the CPU is the same simpler software PhysX that you see a lot of games implement today after Nvidia bought it out, but very few of them allow you to accelerate it on a GPU which is what the old Ageia PPU (Physics Processing Unit) did in a lot of titles like Mafia 2 and Ghost Recon Advanced Warfighter or Borderlands 2.
@toxy3580 Рік тому
@@CMDRSweeper rip the physx effects
@opopopop6286 2 роки тому ⁺⁷
Into computers for multiple decades at this point. Yet very little experience with RAID. Your talk is so super technical that even I can only understand a little over 2/3s of it..so you must be a super expert pretty much, and teaching it all right. This much I can gather/surmise :)
@TheOtherNEO 2 роки тому ⁺²
Having had the experience of an early Adaptec software RAID controller disaster where the Windows drivers corrupted the drives, not sure I’d take the risk again. Physically the drives were fine but all data lost as it wrote junk to the drives.
@az09letters92 Рік тому ⁺¹
A little secret for you. All RAID is "software RAID". It's just a matter of where the software is running.
@daz4172 6 місяців тому
Great video and concise explanation. I would love to see a L1Tech analysis of RAID parity rebuild and URE (Unrecoverable Read Error), as not a whole said is said about this situation. A lot of faith is put into parity meaning safety of data, when it could very well mean the entire loss of it, if a URE occurs during a parity rebuild.
@Niarbeht Рік тому ⁺³
So one idea I've had kicking around in my head for years is the idea of a "hardware raid" ZFS setup, where you've got a parity computation accelerator card that has a direct connection to the drives. You could set up a pool that's entirely local to the card. Optimally, you'd be able to inspect it using the normal zfs command-line stuff (zpool status, etc). However, all processing would occur on the card itself. Scrubs, parity, reslivering, everything would be local to that card.
In essence, like having a ZFS NAS box, but it fits into a PCI Express slot and publishes itself to the ZFS implementation on your system.
Probably a pain in the ass to engineer, though.
@BattousaiHBr Рік тому
i don't understand the necessity of that in a storage system which is going to have a CPU anyway, so why not let the CPU do it instead, which in a storage system isn't being used for anything else anyway.
@harryhall4001 Рік тому
@@BattousaiHBr Because lost of places won't use dedicated storage boxes. That's why they have hardware RAID in the first place, because it's supposed to be transparent to the OS such as ESXi.
@BattousaiHBr Рік тому
@@harryhall4001 that's not why people (historically) used hardware raid at all, it was because raid cards had batteries to make sure in case of a power outage that all data would finish to be written to disk to avoid data corruption.
if power outage corruption is not an issue, people tend to avoid hardware raid entirely.
@theredscourge 2 роки тому
I have two VM host servers that run dm-crypt volumes on top of software RAID10 and RAID1 respectively, and every month or so I get kernel error messages about block device issues, which sometimes freeze the volume on the host for a while, as the guests are performing a backup, leading to the VM guests marking the root filesystems as read only and then crashing with a kernel panic, and yet the MDADM never finds any issues and nothing "seems" to be getting corrupted as far as I can tell.
Might this block device level corruption be what's going on in my situation, and if so, is there anything I can do about it? Should I be doing my mirroring (and striping) on the volume level instead of using Linux software RAID?
@nemanjailic9612 2 роки тому
Could you do an up to date tutorial on how to set up ZFS ZRAID in the best and most bullet proof way in order to not get data corruption? Also, a periodic and automatic backup solution that wouldn't need to be that bulletproof for such an array would be also great (preferably not in the same location).
@jasonabettan5778 2 роки тому ⁺⁶
Any thoughts on a part 2 where windows storage spaces and ReFS fits into this?
@Raletia 2 роки тому
+1 I'm curious about this too! My NAS is my old fx8350 system with Windows 10 pro and storage spaces, with 12 hdds, a mix of 3, 4, & 6 TB. Backed up with Backblaze. I like storage spaces flexibility for upgrading drives one at a time.
P.S. if you didn't know, even though creation of ReFS was removed from win 10 pro a bit ago, if you format with it using an old build or some other means you can still use it just fine.
Or you can get creative with 'upgrading' to pro for workstations, to get back functionality they stole from you.
@katbryce 2 роки тому ⁺¹
@@Raletia My preferred approach there is to put FreeBSD on a Hyper-V VM. Take the storage pool drives offline in Windows, and assign them to the VM. Pick them up in FreeBSD and format them as a zfs pool, and share it with Samba. Connect to it in Windows as a network drive.
@wishusknight3009 2 роки тому
@@katbryce I even simplified my setup by running a separate Truenas machine bare metal, and my hypervisor on its own box using an HBA local store for VM's.
@Raletia 2 роки тому
@@katbryce Is there any way to use ZFS with a mix of drives? Everything I researched(a while ago, admittedly) indicated you needed fixed drives of identical size, and 1GB of ram for every 1TB of space, and you could not add or remove drives without destroying and remaking the entire pool?
I'm using storage spaces for two main reasons, really. The flexibility, and Backblaze(personal home thingy) does not let you backup from network drives, only local.
I'm not as experienced with Linux/FreeBSD, etc, so that's also a factor, but if given the choice, I'd rather not have to rely on Windows for my NAS. Right now that's literally all that PC does, it's headless and in my bedroom next to my router.
I just cannot deal with having to rebuild the entire thing just to upgrade a drive. I don't have enough space anywhere else, not to mention the time required to shift so much data(~12TB). It's also super anxiety inducing restoring from backup when your local copy is gone.
I've been getting used Hitachi/HGST enterprise drives for cheap, and slowly adding/upgrading one at a time. There's a particular seller I trust, had good luck so far, and they do honor their warranty and send drive replacements.
Been building up my NAS for like...6 years or more, that way.
Edited Some: typos, clarity, rephrasing.
@katbryce 2 роки тому ⁺¹
@@Raletia I currently have 4x 10TB with 8GB RAM. I've done it with 4TB, and it works fine, but a bit slower because there is less RAM available for cache. You can replace smaller drives with larger drives and grow the pool. Adding additional drives doesn't work so well and isn't recommended.
@TheHerrHorst 2 роки тому ⁺⁸
Thank you for this video. Now I know I did the right thing with btrfs instead of raid md.
Do you have any links about the btrfs policy based folder redundancy? I couldn't find anything about that, other than people wished for it
@Derek.Iverson 2 роки тому ⁺³
It isn't supported yet, but will likely be implemented in the future.
@mdd1963 2 роки тому
Try a Windows Storage Spaces Parity on 3 spinning drives, and check write speeds via CrystalDiskMark....; a staggering 33 MB/sec sequential....! :)
@pheatton Рік тому
Great video, thank you so much! I had a RAID0 array go kaput recently and had counted on the drive to report it. Apparently I haven't kept up on drives and RAID. I'll need to figure out something else other than RAID now.
@aracrg 2 роки тому ⁺¹
17:25 although in ZFS one can't specify redundancy at a file or directory level, it can be done at a file system level by setting the copies property of the file system to the number of copies you want that file system to use. In ZFS file systems are almost as light weight as directories.
@666Tomato666 Рік тому
but you need to assign the space a priori to the pool with the given redundancy level, in btrfs it's totally dynamic, with a 1GB granularity; so you don't have to guess how much space you will need in the no-redundancy pool vs the triple redundancy pool
@hdtvkeith1604 2 роки тому ⁺⁴
I use older LSI Raid controllers in my white box servers. I use mostly for storage and reads so raid 6 works great and I do monthly media scrubs and then consistency checks
@Dan-Simms 2 роки тому ⁺³
As someone who hasn't used RAID since the early 2000s, this was wild to me. Very interesting, good to know.
@666Tomato666 Рік тому ⁺¹
You can combine linux md-raid with dm-integrity to get silent data corruption detection (and correction in case there is redundancy, so with RAID 1, 10, 5, or 6), but it is at quite a performance hit
@BillWood1 2 роки тому
Great insight, what about a similar deep dive into MS REFS? Or, what about a comparison between ZFS and REFS?
@sjones72751 2 роки тому ⁺³
That's why I set my expectations. I use RAID5 mainly just to pull drives into an array and have at least a little bit of safety against a defective drive. For data integrity I'll back up that whole array to an external server. Nothing's perfect but this has been good enough for me for like 10 years now.
@lozboz63 2 роки тому ⁺¹
But you miss the point of this video, what is to say when you read the raid5 data for backup you get what was written?
@mikereeves4723 2 роки тому ⁺⁵
As a ZFS fanboy I can say hardware raid is still useful in many enterprise applications. Mirroring the OS drive is a perfect example where hardware raid makes it much easier than using ZFS. Pop in the new disk and continue on with your day vs having to run commands to replace the disk. A raid 6 on say 6 disks will perform much better than a single vdev raidz2. So its a balancing act between speed and safety. Achieving high speeds on ZFS typically means lots of disks. I personally use a single ZFS pool with 4 x 6 disk raidz2 that gives me the most balance between the two. For the typical home user, ZFS is the way to go but I don't think its time to call hardware based raid dead yet for enterprise. I would say you have quite the argument though for home use.
@PanduPoluan Рік тому ⁺¹
I'd say data integrity is even more important for Enterprise. In the consumer market, losing your data sure is annoying and causes grief if it's something irreplaceable such as memories of your family events. But with Enterprise, it can mean business interruption, or compliance violation resulting in legal issues.
@harryshuman9637 5 місяців тому
ZFS isn't hardware RAID tho, it's a software level raid.
@df98156 2 роки тому
Do you use a Panasonic camera? Bit slow on the autofocus :(
@herpmcderp5707 Рік тому
Thanks for this video. What about all those 2 drive NASes you see everywhere? My dad has at least 8 of them, all in raid 1 probably, with like 4TB drives. Should I just build him a semi-small server? Like a 10 drive synergy NAS or whatever. He just maps them as network drives in windows and he's starting to run out of letters to assign drives to, with the 8 slot memory card reader lol
@ChaJ67 2 роки тому ⁺⁷
While ZFS RAID Z level 2 is about as indestructible as it gets and you mentioned some even more insane things, I think there are some things to add to this and some things to question:
1. Hardware RAID cards don't hold onto the data with the supercap. Instead the supercap keeps the card alive in a power loss event long enough to dump to flash, just as that SSD you were holding does. Just to clarify an error in your script.
2. Hardware RAID cards haven't always done the right thing when they hit corruption. LSI cards for a little while at least had an issue where they kept writing out random data when they encountered corruption on one of the hard drives. It was a major ordeal trying to get them to fix that mess. I mean giving more control over to the hardware RAID controller hasn't always ended well. Sometimes the drives do a better job themselves and they are produced in such a higher quantity that you may be more likely to find those kinds of problems sooner with the drives over finding it with a RAID controller. Like FOSS software tends to be better at say the kernel level than some arcane piece of application software because everyone uses the kernel and stuff gets spotted right away while the arcane piece of software has fewer eyes on it and so things tend to go longer without being caught. There are simply a whole lot more hard drives out there than there are hardware RAID controllers and if you deal with hardware RAID controllers enough, you will probably see some odd things that you don't understand why this doesn't get fixed.
3. RAID can be worse than having a single drive, especially when dealing with older hardware RAID cards and some software / firmware RAIDs. Those older Dell Perc cards stored their configuration with a battery. When that battery went and power was lost, you would see the controller report a number of independent hard drives instead of a single logical RAID drive and the system would report no OS found. I have seen MD RAID arrays get a superblock not found / corrupt error after a simple power loss event and there was no way I could find to get the array to work. Seeing that you mentioned NetApp, back in the day they were doing WAFL and RAID 4 with up to 15 drives, I would see things like a power transfer switch was flapping all night long just after the administrators went home for the night and by 8:00am the next morning the UPS battery was long dead and the NetApp was reporting 2 drives had popped out of one of the 15 drive arrays when RAID 4 can only handle a single drive failure. That single parity drive also tended to get clobbered and would bottleneck the whole array. It has been a while since I have used NetApp, so I hope that is better. Overall it was ahead of its time back in the day.
4. I have to wonder if this whole not trusting a single drive is overblown. I thought the whole point of going to 4k physical sectors was to improve the CRC checks the drives did internally while more efficiently using the physical space on the drive. I guess I have just never heard of anybody in the field actually complain of a modern 4k enterprise class drive silently corrupting the data.
5. I think it is worth mentioning more on which RAID levels are better in more detail. RAID 1/10 is not as great as some think because sometimes the mirrored disk fails before you can reconstruct the array. RAID 10 just adds more mirrors where this can and has happened. RAID 5, I see this from time to time where more than one drive pops out of the array and you can't fix it. Usually you can catch it before it happens, but not always. Sometimes like with that NetApp with 2 drives that popped out, someone managed to get one of the drives going eventually and recovered the array. For all practical use for hardware RAID, RAID 6 is the gold standard anymore. I never hear anybody besides you complain about silent corruption, though I do see it a lot on USB pen drives, which work to a different standard, but I do hear from time to time about double drive failures before the array can be rebuilt. For the highest I hear anybody trying to go besides maybe a bank is RAID Z level 2 with up to 8 drives per vdev and then just add more vdevs to the zpool if you want a bigger array. This keeps things down to a mathematical improbability of losing your data to random drive failures. Granted sometimes disasters happen, which is why RAID is not a backup.
6. The thing with hardware RAID controllers anymore is they are stuck in an old ideology for the most part. That is you only want to do RAID and the drives are only going to live attached to this RAID controller and you are never going to swap out that RAID controller. You can flip a RAID controller over to being an HBA, but then it is only an HBA. 3ware allowed you to do both and in general was a more forward thinking implementation, but they got killed off by the big boys in RAID, so that is dead and gone now. If the RAID card messes up, you are in a world of hurt. You want to move arrays around, well that is a good way to lose all of your data as 'foreign' arrays even if the RAID controller recognizes them, the RAID card is all too eager to re-initialize the array, in other words re-write the drives with all zeros, wiping your data out, especially if say an array ends up in storage and one of the drives doesn't work when you put them on a new controller.
7. The thing is with BTRFS is it is just less polished and robust than ZFS when used for RAID directly. BTRFS can recover from common errors, but when I abuse tested it, the BTRFS process just kept crashing, which isn't particularly good. When things go sideways, you really have to be a bit of an expert to pull BTRFS out of its funk. You usually can, unlike say MD RAID, which I consider the worse in Linux, but it can be a hard road and it doesn't always work out. I mean BTRFS RAID is head and shoulders better than MD RAID, and I would consider it good enough for an enthusiast who doesn't care too much about their data, but then you get to ZFS. ZFS RAID Z level 2 in my abuse testing survived everything that was potentially survivable. At this it tends to just heal itself whenever it can; you don't have to be a pro to get it to fix itself, unlike BTRFS.
8. The thing with ZFS under Linux at least is historically you couldn't grow a vdev. I mean in general expanding your array kind of sucked as you had to add a new array to the existing array and you are just kind of appending the arrays together. That should be fixed with the latest versions of ZFS. This was a big thing that made hardware RAID nice is you could always easily expand a hardware RAID one or a few disks at a time if you wanted. It takes a while, but it can be done. Resilvering has also gotten a lot better in the newer versions of ZFS.
9. Something important to mention is both ZFS and BTRFS really need systems with ECC RAM to perform their best. A bit of an omission in your script when going on about data integrity and bit rot.
10. Something really important to point out seeing you are talking about SSDs in your piece is ZFS RAID Z can talk directly to the drives and can do TRIM while doing this. At this you can do encryption in ZFS directly and stick basically whatever you want in a dataset. In contrast a RAID controller does not understand TRIM. A RAID controller does not understand all of this other stuff you may find yourself doing with your file system, so things end up layered on top of each other, which is inefficient. This leaves SSDs attached to hardware RAID controllers to take quite a beating and anything less than enterprise class high endurance SSDs will likely get beaten to death in short order. With ZFS RAID Z on the other hand, you may just find that lower endurance SSDs with capacitor backup is good enough for personal use. Even for business, those super high endurance SSDs are rather expensive, so if you can drastically cut the writes and say do the same job with cheaper 1 DWPD drives over 10 DWPD just because ZFS RAID Z is a lot nicer to the drives over a hardware RAID controller, that is a good deal.
11. Something that is really nice about ZFS RAID Z is you can just pull an array out of a system, maybe even have the whole array in a USB-C / TB3 enclosure and plug it into another system and just start using it. That is really handy. There are a number of reasons why you would want or even need to do this, even if you initially never planned on this, things happen. At this unlike hardware RAID, there is no risk of losing your data where as mentioned above, hardware RAID is all too eager to wipe data in these scenarios. It cannot be understated how useful SSD caching drives are with ZFS when the main drives are mechanical.
The thing is if you care about your data and just across a wide array of use cases, ZFS is the best, beating out everything else including hardware RAID. So my conclusion is the same, but with some of these extra nuances thrown in.
@_unknown_guy 2 роки тому ⁺⁹
As backups were mentioned, can we get a video about those? Guess my interest would be at home server/nas level. Currently have simple/naive setup using abraunegg/onedrive.
@TheRMUPs 2 роки тому
Craft Computing has several videos on setting up a home storage server and the TrueNas operating system
@Raletia 2 роки тому
I've had pretty good luck with Backblaze. Saved my butt twice already. Pretty cheap for one PC. Though, in my case, my NAS that I'm backing up is running Windows with Storage Spaces and ReFS(MS's answer to ZFS). I don't know if there is a way to use it with TrueNas or the like.
Edit: For context, I have 12 drives, 3, 4, & 6 TB, for a total of around 40TB before redundancy. Though I only have about 12TB of data stored & backed up.
@itskdog 2 роки тому ⁺¹
@@Raletia I would probably use Backblaze were it not that they only seem to charge in USD. I don't want to be paying currency conversion fees to my every month, and for the monthly payments in GBP to not be the same amount each month.
@Raletia 2 роки тому ⁺¹
@@itskdog That's fair! I don't really know of any other services, as I stopped looking once I found and tested backblaze, for the price it's been very affordable compared to anything else I could find. I wish you luck! (I do hope there are alternatives.)
@lemonbrothers3462 2 роки тому
@@itskdog you could use revolut to hold and pay directly in usd and save on the fees
@depth386 2 роки тому
Small guy here, I don’t do that much work so I just occasionally clone my drive to a mechanical HDD. This enables me to boot from the HDD and do a rollback as a catch all for malware disasters and other issues. I could lose a little work in between clone updates but if I do more intense work I can always start the clone when I’m done for the day.
@boedillard270 Рік тому
Thanks for the extensive video - can you talk about LSI and or microsemi raid cards and perfomance vs. other systems? I'm using 16x16TB RAID 5 if you are interested. I'm running Windows 10.
@127ibenedict 2 роки тому ⁺³
Is it worth it to even use a raid configuration on a personal machine nowadays? Or is paying for a bigger individual storage solution a better option now? I assume that more modern SSDs either consumer or enterprise grade do a decent job of avoiding the issues that raid configurations helped deal with back in around 2012 when I first started seeing people use it.
@darbyevert828 2 роки тому ⁺⁶
Not for like 99% of people, you can get 8tb ssds of you really need space. I think it's mostly hoarders raiding hdds for more space and vfx engineers trying to Lower their load times.
@chrisbaker8533 2 роки тому ⁺⁶
If you're doing cost to capacity, raid still, largely, wins out for ssds.
For instance, the cheapest 8TB ssd on newegg is $710 usd, 2x4TB, $680 usd, 4x2TB ssd's, $600 usd, 8x1TB $576 usd.
For spinning rust, it's not a contest, no raid, buy bigger.
Low end, 8TB is $130 usd, 8x1TB would be 256.
It gets a bit harder with higher, above 12TB, capacities and can be cheaper to use raid instead.
Another consideration is power usage, multiple drives increase power usage exponentially.
While ssds are far lower than hdd's, it's still an increase and should be a consideration.
With HDDs, you also have to consider vibration, more drives, more vibration.
Then there's the backup and redundancy consideration, can you afford to lose all your data or pay for recovery?
If you're as A*** retentive as me, you backup religiously, but that raid offers a level of protection as well.
Instead of losing one drive and it's all gone, you can lose one, or more, drives and still maintain the data.
Individual use case is always going to be the determining factor.
Just consider the pros and cons of each.
@marcogenovesi8570 2 роки тому
do you need data integrity, uptime or performance on the workstation? It depends really. If it's just a gaming PC with stuff mostly synced to cloud or nas then it's fine on a single SSD
@vgamesx1 2 роки тому ⁺¹
@@chrisbaker8533 I honestly don't get how it offers any meaningful protection, at least speaking for home users who will typically have at most 4-8 drives, aside from parity data, you can do sort of the same thing as a raid1/10 by just making a copy of everything to another drive and the benefit there is that you don't have to reformat drives already in use or setup raid volumes which poses a slight risk itself should anything go wrong and this way you can dynamically assign importance/redundancy to your files for example I keep a copy of important things like photos on 4 drives plus remote backups, movies or stuff I want to keep on 3 drives and replaceable files like steam games can just exist anywhere, so I can lose one or more drives and get a little extra usable space that isn't being used for redundancy.
Basically, it appears to me that raid for home or fewer than like 10 drives, mainly just offers a form of automation and a nicer way to pool drives as a single one, but in the case of raid 5/6 anyway, it also brings complexity that can make it harder to recover data than if you just had a drive failure on a standalone disk, so I can see why you might make a few raid1 volumes but aside from that, what am I missing that makes raid useful for home users?
@Raletia 2 роки тому ⁺¹
@@vgamesx1 For me it's useful for using an old PC as a NAS for centralized storage & backup for all my devices, desktop, laptop, phone, tablet, etc. I use Win10pro & storage spaces, and have Backblaze for backup of the NAS.
@5urg3x 2 роки тому ⁺⁴
I used to work in the storage industry, for a company that had their own RAID products, and I am glad that HW raid is dying / dead. They were pretty good about continuing to update older products so that they could be used with modern operating systems, but many other manufacturers (cough PROMISE cough cough) are not. Want a new driver? You're going to have to buy a new one, bro. Doesn't matter that the existing product you have still works fine, and you don't need faster speeds or any other features. We're just not going to update the driver anymore, because capitalism. Sucks to be you and anyone else that bought it!
@DavidtheSwarfer Рік тому
I gave up on hardware raid in about 2002, went with plain disks in the Novell server until 2004 when I implemented our first freebsd server, software raid or mirrors ever since. Love that I can move the drives to a completely different motherboard and it all just works . Actually did that on Friday after the mb in a server failed, was working by 5pm, weekly backup ran as usual at 10pm, all is well.
@eugenesmirnov252 2 роки тому
Great stuff. Minio does bit-rot protection in some configurations.
@wskinnyodden 2 роки тому ⁺³
Yep, old school raid (under SCSI preferably) KICKS BUTT data quality wise.
@Scoopta 2 роки тому ⁺⁴
Technically BTRFS stands for B-tree filesystem named after the B-trees it uses for directory information but it's often pronounced better FS
@hellterminator 2 роки тому ⁺²
I've literally never heard anyone call it that. I have, however, heard “butter FS” a few times. :D
@levygaming3133 2 роки тому ⁺²
@@hellterminator the difference between butter and better is so small someone could have said “ButterFS is so great, it’s the best file system” and they could have assumed it was called BetterFS
@Scoopta 2 роки тому ⁺¹
@@hellterminator I've heard both, although I think butter FS is more common, the Wikipedia page lists both as alternative names tho
@mdd1963 2 роки тому
I've never heard anything other than 'Butter-FS'....
@doctorpex6862 2 роки тому
Back in late '90 I had have a couple of consumer raids in computer, my favorite one is Promis raid, worked flawless every time, even when moving it into another computer and attaching drivers in any order.
@SaroG Рік тому ⁺¹
ZFS actually does something similar in terms of redundancy policies at the dataset level with the 'copies' property: zfs set copies=2 pool/dataset
@flecom5309 2 роки тому ⁺⁹
ZFS is great, I love the idea of a bad update breaking all my storage... anyway any kind of storage array redundancy is just for hardware failure, you really need a proper backup strategy if the data is of any value... the best ZFS setup won't help you if your datacenter catches fire (see the one in France for a good example)
@CallousCoder 2 роки тому ⁺¹⁷
I’ve been saying that in the early 2000s already. When I worked at IBM we always had problems with serveRaid cards. All the software raid systems just kept chucking along. And performance was as good as the hardware. That was a fake argument from the hardware fan boys. Calculating a cluster checksum is a very simple low level xor operation. That’s a native cpu instruction and even on 2000-2002 hardware had no burden on the CPUs. Sure it had a 5% overhead on your CPU but who cares. Hardly any CPU was used 100% and on the big Unix systems - where software raid was the standard solution. Unless you connected a Shark or a NetApp, had no noticeable CPU overhead.
And I think The Unix boys realized the issue with dependency on proprietary hardware issues. Because all of them use software raid. LVM Or Veritas volume manager.
But the greatest thing with software raid was, that there was no proprietary firmware and when your old serveRaid 4 died and you had to go to serveRaid 6 that you needed the right firmware to read the 4 Config off the disks. And that was a bitch to find!
Or most hardware raids on the 90s and 2000s didn’t allow you to use large drives. And that was a problem! Because the small scsi drives were rare. And then you were stuck! We had a dedicated firmware version made at IBM to ignore this on I believe serveRaid 4 and 5 (6 I believe didn’t bother anymore - unless you worked on a 4 or 5 initiated disk set) but… it had to have the same amount of heads and platters!
And I’ve been using ZFS since early days on Solaris and I adore it.
Last year after 7 years a drive started to fail in my synology raid 6.
I always buy extra drives to swap. So I took the bad one out. Pushed the new one in clicked initialize drive and rebuild. And it got stuck at 0.66% the whole web UI froze, the ssh didn’t react. A true hard kernel crash!
So I had to turn it off in the midst of a rebuild…
Well, when I booted it crashed on boot. I contacted synology what to do. But big companies means slow service!!! So I decided to wipe the OS. Reinstall it and I reconfigured the raid exactly the same.
It started scanning disks and low and behold. That new drive was a DOA!!! So again that bay flashed red but my data was back. I pushed in my second spare and it happily reconstructed.
@jfbeam 2 роки тому ⁺¹
That "5%" of your general purpose CPU was significantly more power than the dedicated ASIC engine of a RAID card. (plus the overhead of the IO.)
@CallousCoder 2 роки тому ⁺²
@@jfbeam true but you wouldn’t notice it, at all. But the flexibility it gave was/is outweigh the little CPU overhead. Being down because of an issue with a controller (and there were a lot), is slowing you down more that you every could regain.
Like adding a new drive with most controllers that drive had to be identical. In the 90s/2000s drives only grew and retention was hard. We so often had to do a whole migration, because a stupid drive failed and we had to swap out all the good ones too.
And I’ve never had that ever done on a software raid system.
@djmulder 2 роки тому
On that thought, my motherboard has 2x oculink and 2x minisas (in case you want to look up it's a romed8-2t from asrock).. now I'm sporting a lsi sas 9205 controller.
I'm having somewhat issues with io (unzipping basically makes plex/emby unwatchable) .. could that be because of the age of the raid controller? and .. would it be worth it to just utilize the motherboard's io? Now this raises the issue that the oculink cables seem to be very hard to get in my country but will probably figure it out, just wondering if it's worth the effort
@mcwild11 11 місяців тому
I would love deeper dive into zfs and particularly synology implementation of BTRFS.
Thx, great video
@roysigurdkarlsbakk3842 2 роки тому ⁺⁴
Thanks for this one. I've been working with ZFS for a little more than a decade and some 12 years back, I started reading about btrfs as a ZFS replacement with all the goodies you mentioned (well, not all of them back then, but hell). Still - it's been 12 years and the RAID-[56] subsystem is still utterly broken. Perhaps wait another decade and it might be ok-ish.
PS: A colleague of mine said he used btrfs on top of md for this reason, not to get the full use of btrfs' checksumming, which you obviously won't, but you'll get snapshotting and so on. So I tried it - on top of 5x2TB drives on md RAID-10 for VM storage on my home server. It's not a particularly fast machine, but it doesn't do that much either. It was dead slow. After a while, I replaced it with XFS again and it was back on track. Btrfs is a good idea, but I guess it needs a lot of funding or at least a lot of work to get stable and useful.
PPS: I'm aware of (as you said) that zfs and btrfs will never be as fast as md and its friends, but this was just horribly slow - half of or perhaps one third of iops compared to xfs on the same md raid. So regardless of the iops and cpu needed for checksumming, I really doubt that is the root cause of this.
@wishusknight3009 2 роки тому
ZFS i found was much faster than BRTfs. And I use a ZIL on my truenas box for a large ZFS2 array. Speeds are quite good while still having sync writes. Without the zil its horrifically slow like you said though. 10 disks will write about 10MB a second with just sync writes, and at 240MB/s with the zil. Which is the limit of the zil itself and not the drives or network. Which consists of 2 Intel S3700 SSD's with PLP on a sata2 connection... A sata3 conection would be faster. And this is also using a large single file copy. Lots of small files are slower of course.
My system without a zil using non-sync writes was roughly 3 times as fast at around 650MB/s. However I need to walk on tiptoes with tightened butt cheeks all the time hoping the worst doesn't happen. I would have expected to peg out the 10g link but not sure where my bottleneck is.. I run pretty budget hardware so I have to make due. An internal DD showed way higher than 650, though I cant remember what it was.
If you are ok with lazywrites than performance of ZFS is fantastic without the need for a ZIL.... But best practice is to use a mirrored zil and have the best of both.. And actually mirroring the zil is not really needed either and is just overkill according to the documentation and many top supporters as well.
@roysigurdkarlsbakk3842 2 роки тому
@@wishusknight3009 you always have a ZIL ;) But I guess you mean an SLOG - a separate ZFS intent log placed on an SSD (or preferably a pair of those in a mirror)

Наступне

Автоматичне відтворення

So if Hardware RAID is dead... then what?