Why Size Matters | RAID strip size, stripe width, and stripe size

Art of Server

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 гру 2024

КОМЕНТАРІ • 57

@spacewolfjr Рік тому ⁺⁹
This is a really great video, I appreciate how you explained the basics first and then went into detail. May your kitty cats be merry and meowy!
@ArtofServer Рік тому
Glad you enjoyed it!
@jonathanbuzzard1376 Рік тому ⁺¹⁰
Something to bear in mind when doing RAID5 or 6, the number of data drives should ideally be a power of two. The reason for this is down to how the checksum is calculated and it is just more efficient when the number of data drives is a power of two. So for RAID5 2D+P, 4D+P and 8D+P. For RAID6 4D+2P and 8D+2P are the sensible options in an array. There is a scheme where you do divide you drives up into say 8GB chunks and create loads of RAID6 8D+2P arrays from the chunks scattered over all the drives. All the arrays are then concatenated together to make a large volume. Goes by various names but I don't think any free software does it to my knowledge. Gives really fast rebuilds from failed drives as every drive you have participates in the rebuild. You also don't get any hotspots so performance is on average much better.
@ArtofServer Рік тому ⁺¹
Thanks for sharing your thoughts! yes, I chose to use RAID-0 in this demonstration to keep thing simple. there are definitely other factors for consideration when using RAID levels that involve parity.
@mech-keyboardist Рік тому ⁺⁶
Appreciate the time and effort you put into this presentation!
@ArtofServer Рік тому ⁺¹
Thanks! Hope it was helpful! :-)
@SrdjanRosic Рік тому ⁺³
...and, if you have random parallel reads, your strip size needs to be bigger than your read size to minimize iops. (awesome visual presentation btw, really really impressed with the setup and the effort)
@ArtofServer Рік тому ⁺¹
Thanks!
@myjunkarea5808 Місяць тому ⁺¹
Awesome experiment, thank you!
Assuming your Supermicro 836 server uses "not" one-to-one corresponding number of connections from initiator(s) to backplane's output ports for the disks which are under test. (IMHO obviously, number of initiator ports used should be equal to backplane's number of input ports used).
I mean... Total number/count of initiator(s) "ports or single/narrow PHYs" used, and also number of backplane's input ports used are (equal to each other, but are) less than the number of "disks" (16). If I'm correct, here's my question:
For the blocksizes that cause penalty wrt chunksize, meaning "blocksize < chunksize"...
(i.e. chunk=128K , bs=1K , bs=16K & chunk=1M , bs=1K , bs=16K , bs=128K)
What if...
one-to-one connections (not multiplexed via backplane's expander/switch I guess) were used from initiator(s) ports (16 ports) to backplane's same number of input ports (16 ports) for 16 drives?
.
I can only hope I correctly reflected what I mean.
@ArtofServer Місяць тому ⁺¹
This 836 has the BPN-SAS2-836EL1 backplane. However, unless the test saturates the 8x6Gbps link to the HBA, I don't think there's much difference compared to the "TQ" backplane version.
@flintlock1 Рік тому ⁺²
Thanks for the video, some very useful information here.
@ArtofServer Рік тому
Glad it was helpful!
@vfries2 Рік тому ⁺³
Thanks for the video, gave it a thumbs up
@ArtofServer Рік тому
Thanks for watching!
@oso2k Рік тому ⁺¹
OMG!!!! Dancing Bear Stripper! Good times!
@ArtofServer Рік тому ⁺¹
LOL.. just having some fun! :-) thanks for watching!
@ricsip Рік тому ⁺⁴
It would be really educational, if you add some diagrams about the file allocation happening in nowadays storage systems. I mean, if I consider a single hard disk, the physical sector size, the logical sector size, and filesystem clustersize all must be chosen optimally. Now, lets add RAID into the equation. For example the RAID stripe size is seen in the OS as the physical sector size? If I want to store a 1-byte, or a 1-KB file, 1-MB, 1-GB file, how many sectors, clustera, raid strips are filled with actual data? Also, how much will be waste of storage due to cluster size inefficiency? I hope my question is clear.
@ArtofServer Рік тому ⁺²
Thanks for the suggestion. I'll put this on my list of future videos.
@craigleemehan Рік тому ⁺²
I believe I saw a Dell SC200 Compellent in your rack. It would be great if you could do a video on that unit. I have one and would like for my NUT server to be able to command a shutdown, also the fans seem to not throttle at all. I really like the unit, but I am having trouble finding info on the unit.
@ArtofServer Рік тому ⁺¹
Good eye. yes it is a SC200. It is very loud and there's not a good way to control the fans. It otherwise works great as a disk enclosure.
@gamr_py5352 Місяць тому ⁺¹
Nice video! can you explain what happed to the performance on different with situation? for example a Raid 6 (14+2) vs Raid 6 volume with 2x(6+2)
@ArtofServer Місяць тому
That's a great question, but perhaps for a future video. Briefly, I'll just say that the number of "effective" spindles helps to improve throughput performance, but IOPS is still limited since an entire stripe must be read / written for each I/O operation. Large sequential read/write benefit the larger the strip size and stripe width.
@jenxrj 2 місяці тому ⁺¹
This was a nice video. I have a similar related question. How does drive allocation sizes during formatting have an effect in these scenarios? What would be the ideal allocation size during formatting of the virtual disk? I'm guessing that it should be equal to or smaller than the strip size.
@ArtofServer Місяць тому ⁺¹
That's a really good question, because in an ideal situation, the various data management layers should all align exactly, and should be sized towards the most frequently used transaction sizes by the applications. When the layers are not aligned, a single I/O can amplify into multiple I/O in an underlying layer. And of course, trying to match the applications, of which there could be many that do vastly different things when it comes to data I/O, is always challenging. But, even if you went for the "average", the objective is to minimize the amount of I/O to complete the entire transaction stack. This is particularly important for magnetic storage where IOPS capacity is limited, but is less of an issue for solid state storage.
@jenxrj Місяць тому
@@ArtofServer Okay, then I guess the ideal allocation size would be the size of the strip which would be written on 1 of the disks, ie. (total strip size ÷ number of disks) OR a factor of it. Like you pointed out, allocation equal to strip size might only work when all layers are aware of each other and work in tandem to make best use of the resources.
For sequential reads/writes, it could end up having little to no effect at all. But this is certainly interesting from the point of random reads/writes.
@hescominsoon Рік тому ⁺⁴
unless you are doing incredibly large files all the time 128k or even 256k is the absolute biggest. for HDD i will bounc3e between 64k or 128k depending on the use case. for file servers with tons of small files i will go with 64k. for backup servers, or file servers with a mix or all large files i will go with 128k. for ssd based servers i stick with 128k...it's just simpler that way.. I only use ZFS for my filesystems now for non windows servers. I now do not use hardware raid at all. I will put linux(zfs) or truenas on the base metal. if i intend to run vm's then i will use linux and run zfs off that. for storage onlys ervers i use truenas core/enterprise. The concepts are similar though..:)
@ArtofServer Рік тому ⁺³
thanks for sharing your thoughts. the default for Linux software raid chunk size (strip size) is 512KB. I chose 128KB to spread the difference between 128KB and 1MB for the demonstration.
please note that ZFS recordsize is not the same as strip/chunk size. the behavior of a ZFS record size being written to a raidz vdev is not like traditional raid (hardware or software).
@cliffie024 Рік тому ⁺¹
💯 thanks so much I'd love to take 4tb x 4 stripe and partition and use das for hosting vms lol
@ArtofServer Рік тому
Good luck!
@cliffie024 Рік тому
@@ArtofServer lol base calculations and price are roadblocks lol
@mistercohaagen Рік тому ⁺¹
What is this chassis? Do you have any recommendations for a decent chassis for lots of hot-swappable drives? How are the red LED's activated on the front panel? Can they be used to identify faulted or specified drives under Linux? Your rig is very cool... I'm finally outgrowing my old Antec 300 case in terms of 3.5" bays.
@ArtofServer Рік тому ⁺²
This machine is a Supermicro 836. I've made some videos about it on my channel so search around if you want to know more about it.
@GbpsGbps-vn3jy Рік тому ⁺¹
@Art of Server
Quick question - have 10 2TB HDDs ready for hardware based RAID 10 (dedicated controller), what are the best settings for it assuming all types of file sizes are there (videos, 1KB small TXT, any size)?
@ArtofServer Рік тому ⁺¹
That's a good question and what I was hoping to demonstrate in this video. You can't really optimize traditional RAID strip size for all use cases. You sort of have to pick where you want to optimize and move your strip size to cater to your most common use case. If you're going to be handling equal amounts of all types of I/O sizes, then I would just aim for the middle or defaults.
@brainthesizeofplanet Рік тому
Whats the bes strip size for HW Raid hosting VMs on ReFS?
Does it matter what kind if file sizes I have within the vhdx containers?
@NUCLEARARMAMENT Рік тому
In general, a 2-way mirror will get you to about 99.99% reliability per pool per year (assuming 44-vdev mirrored zpools) But that means in one out of 10,000 pools per year, you're going to lose the whole pool due to a double-disk failure. Scheduled scrubs can reduce the risk somewhat, but never eliminate it.
I've researched some interesting statistics from our internal ZFS data supporting this assertion but am not at liberty to share them yet...
@ArtofServer Рік тому ⁺²
not sure what this has to do with the video, but i find mirror vdevs to be wasteful for storage efficiency. good for iops perhaps, but if you're after iops these days you should just be using flash storage. if you're using magnetic storage, you should not expect high iops. if you're using flash, it seems too wasteful to throw half of it away for mirror redundancy.
@NUCLEARARMAMENT Рік тому ⁺¹
@@ArtofServer mirror-vdevs is basically the equivalent of RAID 10 in software ZFS, which I'm sure you already know but that's my favorite setup. Like hardware RAID, you get some form of redundancy running 2-way ZFS mirrors, but you also get scalability in terms of random and sequential IOPS, something you don't get with parity RAID--hardware or even ZFS I believe. Only sequential, random is stagnant.
I use lz4 compression in ZFS, which gives me significant storage gains on data that isn't already heavily compressed like videos, so while a mirror-udev configuration only has 50% usable data on paper and technically slightly less because mirror-udevs applies parity bits to each sector (think T10-PI), I'm getting closer to 70%-75% effective capacity thanks to lz4 compression while getting read/write increases on sequential and random IOPS, while also getting the redundancy.
I use Lustre for parallel file system since it's open source and the best type of parallel file system (it's used by more than half of the supercomputers on the Top500), and it allows me to basically use zfs as a backend with lz4 compression, chaining together multiple mirror-udev pools across multiple systems (24X4 TB configurations across 3 systems). It's like a software-based 3 RAID 10s wired together via InfiniBand network in Linux, and I also use U.3 TLC NAND SSDs used as a cache as Lustre supports tiered storage configurations like the one I'm using.
@josephp1592 Рік тому ⁺¹
Agreed, I feel they are safer than the often used z* vdevs when properly setup. Each of my mirrors is one exos and one wd ultrastar, what are the chances of them failing at the same time? If I do lose a drive, pool performance seems entirely unaffected and resilvers are quick and painless with no parity bs involved - less stress on the remaining drive. I dont mind the hit on storage efficiency, the peace of mind and iops is worth it
@GyRTer Рік тому ⁺¹
Hello, I like your wallpaper. Can you share it?
@ArtofServer Рік тому
artofserver.com/downloads/wallpapers/aos_wallpaper.png
@GyRTer Рік тому
Thank you.
@reddlief Рік тому
any metric on disk failures?
@ArtofServer Рік тому
Not sure what you mean? This video is about RAID strip size...
@apreviousseagle836 Рік тому ⁺³
Hmm, so the kitty likes the size of the sausage? lol
@ArtofServer Рік тому
lmao
@gwojcieszczuk Рік тому ⁺¹
From my experience strip size of 64k or 128k is the most optimal for majority of the workloads.
@ArtofServer Рік тому
Thanks for sharing!
@change_your_oil_regularly4287 10 місяців тому ⁺¹
Thank u
sub'd
@ArtofServer 10 місяців тому
Thanks! :-)
@timkarsten8610 Рік тому
Larger font please
@ArtofServer Рік тому
What device were you watching this on?
@M.Voelkel Рік тому ⁺¹
Dank, danke und nochmals Danke !
Herzlichste Grüße aus Baden-Württemberg,
27.08.´23
@ArtofServer Рік тому ⁺¹
Thanks for your comment. I hope you find my videos helpful!
@ewenchan1239 Рік тому ⁺⁴
I don't remember exactly for sure, but I think that my strip size right now is probably the ZFS default, which I *think* is 128 kiB.
Also please bear in mind though, that the size of your files as a function of the native, physical sector/block size of your HDD (whether that's 512, 512e, 520, 528, or 4k native (a.k.a. 4kn)) will also make a difference in terms of the performance of your file system.
If you have a wide spread of file sizes (ranging from really small files to really large files), trying optimise it for that is virtually impossible as the configuration that would be good for really small files would be terrible for the really large files and vice versa.
For that reason, I think that I either tend to stick with something like 64 kiB or 128 kiB strip size because writing a lot of tiny files hurts performance much more than having a-less-than-optimal strip size for writing large files.
(i.e. if the strip size is 64 kiB, it's not great for writing large files, but it isn't going to completely kill it neither, but if you set the strip size to be too large when you're writing a lot of tiny files, then your throughput can be single digit MiB/s.)
@ArtofServer Рік тому ⁺³
Keep in mind ZFS works differently than traditional RAID. the ZFS record size does not behave like the RAID strip size.
@ewenchan1239 Рік тому
@@ArtofServer
Yeah. They introduced the concepts of ashift and record size, and I'm not really sure how it relates to the more "traditional" definitions of the various sizes, in more "traiditional" RAID implementations, per @2:01 in your video here.
Nevertheless, the optimisation of the performance is a function of all of that, plus the histogram, by size, of the data that you plan on putting onto your ZFS pool.

Наступне

Автоматичне відтворення