In our studio we have a (128GB RAM) 512TB in 4 pool TRUENAS server for VFX and 3D animations ( mainly 32 bit EXR multiLayer files), each pool have have a 4TB pcie 4.0 SSD for L2ARC that can feed the 40Gbe NIC . the advantages are just mind blowing. we will upgrade to 8TB nvme probabli in Q1 2024
Just want to say thank you for this great video, installing L2ARC speeded up SMB performance drastically ! Looked a lot in forums when building my Truenas Sever, and evereyone told that you don't need L2ARC. But now with L2ARC opening large ProTools Sessions (wich contain a lot of small files) is more than four times faster !!
NVME L2ARC and metadata special really helped my SAN. Metadata special is something you should look deeper into. Esepcially if you're running something like Adobe Bridge, AVID, or another 3rd party media and metadata manager. It really helped load times with all my brushes etc for AI/PS/LR and my Native Instruments VSTs. If you want to get crazy build a server with a Xeon gold that supports Optane PMEM / NVDIMMs and run them in disk mode, this 1TB or so will be your L2ARC, NVME Metadata special ~4 in mirrors, you don't want this to die it is all the metadata for your pool, you need high endurance here, pool of mirrored HDDs for archive, RAW, assets, and bulk. Configure your NLE's folders accordingly and use as fast of a network interface as you possibly can. More modern SFP28 25G NICs are killer for this. The metadata special knows where the files are and cuts seek on spinning rust. Then build a second pool with SATA SSDs in mirrors for ingest and scratch. You'll ingest to here, toss the bad stuff, then move it to your RAW/Bulk pool. You can totally edit off of it too. 24 cheap 2TB SATA SSDs will run you less than $2000.
Your style of presentation is excellent and you absolutely demonstrated your subject matter expertise in what is a very complex and misunderstood component of ZFS. Well done!
@SpaceRex , fantastic video. thank you for that. I was every time hit by those negatives when trying to look for answer if i should use l2arc ;) I guess you are one of few ones that say that l2arc is awesome :)
L2ARC is awesome but is awesome in certain specific scenarios. Most "amateurish" users don't need and don't even understand in which scenarios L2ARC is great. Most people that say that L2ARC is useless or worst test in one user scenario reading sequential data. In real life a NAS in a business never reads sequentially because you have many requests at the same time on different files. Nowadays with many businesses using 25/40Gbe nic on their server and using pcie4.0 nvme for L2ARC is not only awesome but is a cheap solution to drastically improve the overal performance.
Great vid SpaceRex. Would love to see your L2ARC hit-ratio how much data has been served from it since you have added it to your server. Would you mind sharing? You can get a summary from the CLI with arc_summary and its under L2ARC hit ratio. Thanks, Chris
hit ratio is a misleading metric for home use. Most stuff you do at home is an atypical workload that can't be cached efficiently. Conventional metrics don't apply nor do they help. So brute-forcing like 5% of pool capacity as L2ARC covers most of the important stuff according to Pareto distribution. This is why L2 is great on home use.
Home vs Business makes no difference here at all. If your ARC hit ratio is above 95% then L2ARC will be useless and most likely hurt your performance. Think about it if 95% plus of your reads are coming from RAM what’s the issue? L2 only catches data evicted from ARC. Do the testing and see, I have.
@@chrisparkin4989 think of it this way: what does ARC hit rate measure? If I read 200G I use regularly and my memory is 32G. Drives are running and ARC hit ratio is >90%. If I have a 2T L2ARC, most of the data is coming from ARC+L2ARC and HDDs are barely running if at all. Getting 256GB of memory just isn't economically with NVMe this cheap. L2ARC myth was valid 10 years ago where memory was just better all the time.
In our studio we have a 3d animation and VFX department and we work mainly with raw footage and image sequences (.ext) concurrently on many workstations(10Gbe NIC) and different projects. The server/NAS has 512Gb of ram, 384GB reserved for ARC and 2 x 4TB nvme pcie4.0 L2ARC (40Gbe NIC). In the morning of a normal day if i force flushing the ARC is full again in less than 1H, because we work a lot with image sequences the metadata occupies a huge part of the ARC. On a normal Day the L2ARC can reach 45% of occupation. The hit-ratio in ARC reached 99.9% very fast. This is a specific case scenario and L2ARC is awesome for this kind of situation: the same huge amounts of data that are pulled again again and again. At home even if I have a TRUENAS with only 16GB of RAM I don't have and I don't need a L2ARC.
Awsome video Mister Rex! :) Very good points you're making. The L2ARC informations from 10 years ago aren't relevant anymore. I've read that the newer open-zfs versions use much less ram space to allocate L2ARC too so it's indeed not a deal anymore. I would point out another case where L2ARC isn't useful. It was happening to me : I had 128 GB of ram and 20-40TB of media data for my media servers. Even when adding L2ARC, when the reads are too random L2ARC isn't that useful. The chance that people watch the movie that was just added or the same that another user watched was very limited on the 20-40 TB of movie files that was available. I had a lot of reads on my open-zfs pools but they were so random that L2ARC didn't make much of a difference. Below 5% usage.
Yes, something like this where your workflows are truly random and the working data is far larger than your L2 size, you will not have a huge performance increase
The biggest thing is that L2ARC is *persistent* across reboots. You kinda glossed over it, but, in terms of maintenance downtime, it's a must, because upon reboot, you have *nothing* in your ARC, so it's going to be very, very slow until it's used enough to fill the ARC, basically a full day of slowness for the office bees. The other thing is, a COW filesystem is going to write slower than a journaled system. You can improve this a little with ZIL, and Metadata vdevs on SSD mirrors, you can improve it *alot* if your workload doesn't care about potentially losing 45 seconds worth of writes, which is what you could lose, worst case, if you turn off sync writes. What L2ARC doesn't improve is write speeds though, and people don't really understand that. Some workloads, it makes much more sense to have one instead of just having an SSD pool with those same drives. Also, on SSD wear, who cares? They fail predictably, so you can replace them before they fail outright, and SSD's are CHEAP now, and by the time you kill one, it'll be CHEAPER to buy something even better. Storage arrays in production last 3-5 years, so, most of the time, you'll be upgrading drives anyway before your SSD's wear out.
@williamcleek4922 L2arc can be set to persist through reboots. This can dramatically improve system performance on bootup while the ARC is still being populated.
I use L2ARC on a $349 desktop! I use ZFS since Apr 2018 on my desktop at that time a Phenom II X4 with 8GB DDR3, now a Ryzen 3 2200G with 16GB DDR4. My usage is completely on the other side of the ZFS scale. I collected ~70 VMs and all my main apps run from 6 more or less specialized VMs (4x Linux and 2x Windows). L1ARC: Those main VMs run from my 512GB nvme-SSD resulting in Linux boot times between 6 and 12 seconds (Xubuntu and Ubuntu). I have only one pair of hands so in general a VM runs after ~1 second from L1ARC, so L1ARC has a ~99% hit rate :) L2ARC: Most of my VMs are stored on a 2TB HDD (192MB/s) and those VMs are cached by L2ARC, a 90GB partition of a sata-SSD (530MB/s), througput is 2.5x higher, but better the L2ARC has no HDD-arm movements. Linux boot times from the L2ARC cached HDD are between 15 and 25 seconds :) Note that after ~1 second that VM will also run from L1ARC again :) :) For this reason the L2ARC has a hit rate of
@@TheGolenEgg I have no server, I run everything on one desktop. The Host OS is a minimal install of Ubuntu 24.04 LTS, it boots from a 20GB ext4 partition and a 4GB swap partition on my nvme. I run all my apps in my VMs. The 11 most used VMs run from the 3rd OpenZFS partition of 488GB on the nvme. My 2 TB HDD has 2 partitions each with a datapool. The first part of of a HDD has a higher throughput, so the first 1 TB is for a VM datapool/partition (say ~50 VMs more), the second 1 TB is for data, photos, music, videos, etc. Like I said, my SATA-SSD is my cache drive and it has 4 partitions; L2ARC: 90GB for the VMs; 30GB for my data and for the ZIL/LOG: 5GB for the VMs and 3GB for the data. I did use storage that I already had and only the nvme I bought new in 2019, My 2TB HDD is a replacement for 2 ancient HDDs (500GB + 1TB) with both 10 power-on years, when they died within say 2 months. Last week I replaced CPU and memory with Ryzen 5 5600GT and 32GB of DDR4. The DDR was mainly, because I got tired of estimating whether the VM would fit in memory without reducing the L1ARC memory cache too much. I replaced the CPU, because I had the feeling that during especially booting the lz4 decompression took to much time on the 2nd slowest Ryzen ever, and that was true, because now with the 5600GT (3x faster than the 2200G) the Xubuntu VM boots in 4.5 seconds instead of 7 seconds. Both of course are nice boot times,
@@bertnijhof5413 Thanks for reply. I am also looking to build my primary system where I can use vm and stare data. I do not have a need of a Server. Just looking for a resilient system with fault toleration. My usages are data storage 50% vms 25% a/v editing 25%.
Just been playing around with l2arc on a nvme 2 usb adapter, so maximum 1GB/sec speeds, and filling the cache up with data (after applying tweaks like disabling noprecache etc) and it's really fun to watch pulling a file in through dd and seeing it come from the hdd, then you do it again and it's coming from the cache. Speed isn't what I'm looking for, it's to get flash storage in between me and the hdd's, so the drives don't have to spin up on pre-cached data. Like if I'm binge watching something, I'd rather it come from the SSD rather than having 4 enterprise drives burning power.
About 17 min into the video it seemed the high spec server running TrueNAS only had about 1000MB/s read when reading from the drives with no L2ARC. And that server had more than 100 hard drives with multiple vdevs. Since ZFS natively writes to all those vdevs at once and therefore reads files from all those drives at the same time, why was this number so slow? I am one of those people who most of the time do not recommend L2ARC for accelerating sequential reads. But if the large ZFS servers I sold could only get 1000MB/s from the Hard drive pool it would be much more likely that the L2ARC would help sequentially
Alot of the old guard ZFS admins are still stuck in parity raid days, which doesn't really perform all that well, not to mention the fault tolerance is much, much lower than it appears when using large disks. Mirrors are the only way to go with drives 8TB and larger if you plan on being able to resilver. That's not opinion, it's math, math that nobody bothers to do for some reason... Also, the reason why they were only seeing about 1000MB/s is because it's not truly a sequential read, because there are multiple reads being requested simultaneously, which is the case in almost every environment. They also had 4x NVME L2ARC devices, which will essentially run at drive speed as threads increase. Modern NVME is good for an easy 1000MB/s per drive, so, 4 of them will be good for 4x that without issue, which is more or less what you're seeing in the example slide.
Huh, I always thought it was "level 2 ARC" not "layer 2 ARC." Hearing layer 2 repeatedly makes me think it operates at the data link layer instead of the network layer...
A 1TB L2ARC will take up ~750mb of ram. I would say that that is well worth it for a disk volume. If you had under 12 gigs of ram I would not. But I would just get a 1 TB NVMe and use it!
Hey Rex, about that white plastic pin that holds the NVMe drives on Supermicro boards. You should push it further down in order to hold the NVMe horizontal and steady.
Technology improves and things change. I still see some people showing the install of FreeNAS/TrueNAS on a cheap USB key. As stated, it all depends on what the NAS is being used for. The only big reads on my system would be from my media pools that has all my tvs and movies.
I'm looking at putting together a system with 64 gig of RAM but everything I'm reading says don't go above five times that for the LT arc. Any updated info on this?
Much of the old knowledge is not useful. It all depends on what you are storing. My 2TB of L2ARC takes 1 gig of RAM so you could easily do 20TB of L2arc
My system is 512 gb of ram, 400 set to arc, with 2tb nvme set to l2arc, 1 tb mirrored to meta, 512 sata mirrored ssd to logs, my hdd are rarely touched with 200tb of storage and about 55% full my personal data doubles ever 2 to 3 years, my backup Nas is just my old HD before upgrading.
NVMe/SATA SSD drives - they did change over the years. Write resilience for a single cell dropped a lot (SLC->MLC->TLC->QLC) while the overall size of the drives increased. So you need a lot more data written to the big drive for a single cell to be used X times vs a lot less data written to smaller drive for the same cell to be used the same X amount of times. That's why for example Intel DC S3710 400GB server-grade drive from 2016 has a 8300TBW warranty, while brand new commercial 1TB SSDs hover around 600TBW warranty. Resilience of the flash aside, you will probably burn the controller or others support components on the drive sooner than the flash chip itself. Commercial grade drives are not designed for 24/7 operations with a lot of data moving. For a homelab/media server with small daily data transfer it should be fine, especially if drive is only used for L2ARC. For a small business use, better get at least NAS designed drives, because it's always a downtime that costs the most, not the hardware alone :)
Great video! I have multiple (3) large pools, that benefit from dedicated L2ARC NVMEs. I was toying with the idea - Since the L2ARC is 'disposable' if something happens, is it worth building a stripped NVME pool of 2-4 NVMEs and creating a partition for each of my pools? The idea being that each pool would benefit from insanely fast stripped NVME reads. Thoughts? I know it sounds a little janky, but it's just for a media server at home.
You want to shot a fly using a canon my man :) For media home server having a simple L2ARC is more than enough - you will easily saturate 1GbE or 2.5GbE NIC, even 10GbE with a single L2ARC drive. To be frank not having an L2ARC would be fine as well, since even when streaming 4K BluRay not recompressed movies you will need 100Mbps for 1 client. So decently configured pool would have no troubles with sustaining that read. L2ARC vs building a stripped pool in ZFS - two totally different things. Like you said, L2ARC is disposable, but stripped pools without any protection, running on commercial NVMe drives is just asking for trouble, because you loose your whole pool with 1 drive failure. I think only benefit you will see will be in the synthetic benchmarks, not in real use case for home media server. Plus you would have to have a multiple 10GbE clients pulling data at the same time to stress your server.
Yes, L2ARC is only used for reads, ZFS does not have an option for real write cache. However, as far as understands ZFS combines write requests to write them as efficiently as possible, it's close to being RAM write cache out of the box. So, my guess would be that you will not have issues copying something to ZFS pool really fast, but you only option to make it faster is to add more ram or striped pools.
sup spacerex, I'm having a hard time finding which video of yours it was I watched but you talked about why I shouldn't get the DS2422+ because it only supports synology drives which are overpriced and they don't even have 20tb or 22tb yet. If I wanted to maximize the amount of how much storage I could have with as many 22tb drives as possible.. I think I recall you mentioning that the DS1821 was the best bet. I read somewhere today that if you don't use synology's drives, that even with something like DS1821+, this prevents you from being able to upgrade and scale it up to 18 drives. Is this true? Apologize for the lengthy question.
Synology doesn't keep their QVL's updated for very long, so you won't see those drives as supported, even though they likely will work just fine. Nowadays, a disk is a disk basically. There are other reasons to not go with Synology though, namely you're paying more for less capable hardware and a close to zero config out of the box experience. If that's what you want, buy a similarly specced qnap device and save yourself a little money.
I feel like I have enough ram that l2 arc isn’t going to give me a boost considering the small size of my pools. My pool is 12Tb and I my ram cache is 140GB
It really comes down to how you use your data. For example a Plex server will have no help with L2ARC as 99% of files are not going to be watched back to back. But something like a video editing file server where you have 1 TB projects you are working on would be greatly increased by L2ARC
I have yet to have a worload that benifited from l2arc in the last 15years. My main workload gets around 5-8k read iops, but only 4-10 of those iops actually hit the disks, everything else is from arc, so when I have 1tb of l2arc, only 2-4 of those 4-10 iops even hit l2arc, reeally pointless. My other usecases is for streaming, and those are only ever read once, so caching doesn't matter at all. Sure if you dont have enough ram for your working set, it will help. I have rarely seen people that have a huge working set. The size of your disks doesn't matter
Some uses of ZFS have huge servers with 128 or 512GB or more, of RAM, so they can hold that much in RAW video files when editing. And they tend to be a small group. LARC is still valid even if you are the only user or editor.
Just built a truenas scale server with 512gb of registered ECC. There's a bug in truenas scale where it only uses half the ram. You have to add a little code for boot so it uses more.
Question, I have 2 x 2gb nvme sticks that I am attaching via carrier boards to slimsas ports (motherboard only has 1 M.2 slot, using for OS drive, and I don't have anymore PCIe slots left) can I mirror them and then partition them, so I can use a small parts of them for discrete caches (read & write) and then the rest for SLOG? The primary use of this NAS is going to be VM hosting with a smigg of fileshare. Also, what is the suggested block size for VM hosting scenario vs a fileshare scenario?
I have 60 HDDs running 4GB/s. I don't see how adding L2Arc will help except in access times, but spread out among 60 drives, you're gonna have a larger initial latency, but it's just as fast after that for sequential reads and writes. dRAID actually made this fast. 4 vdevs, 1 spare, 2 parity, 5 data, 15 children each. 1M recordsize. 128K record size slowed everything down, even my SSD mirrors. It's not even funny. Also, large numbers of mirrors were many times slower than these 4 dRAID vdevs. I think mirrors don't scale the more you have. My SSD array has 40 mirrors, and it's slower than my HDDs.
40 SSD mirrors should absolutely be able to do 4GB/s. There probably is a misconfiguration / hardware bug somewhere L2ARC would help with random iops on that HDD pool
@@Saturn2888 random is when your data is not streaming. If you were streaming a 40GB video, then needed a different file totally unrelated from a different process (maybe a docker container or VM, but quite literally anything but the video) that's a random I/O. The more different things there are requesting reads or writes, the more random I/O you'll have. IOPS (Input/Output Operations per Second). Random is the opposite of sequential basically.
@@kdb424 I don't think that's quite right because there's a separate fio test for random read and write different from its sequential read and write test. After more testing, 16 threads rather than 1, I ended up writing 9GB/s sequentially on this pool. Crazy. Don't remember reads. Something high as well. I have two metadata SSDs, and I think they help with random reads and writes by knowing exactly where the files are located as my random write speed was still 1GB/s. Random reads were almost the same as sequential reads. I have the data if you want it. No L2Arc required. I think the video is important, but I can't see a need for my use case. Maybe if you only have 1 vdev of HDDs, it'd make more sense.
@@Saturn2888 Special vdevs are very useful as they cache metadata, and can optionally, depending on block size, store some of that data (not just the metadata) on SSD's as well. I don't run an L2ARC, nor do I feel that I need one, but it will depend massively on your workload and how your vdevs are set up if it will matter at all.
@@BoraHorzaGobuchul It's been well documented that most video streaming never rises to the level to come close to touching L2ARC, unless your system is severely deficient in system RAM. And obviously photo are miniscule in size comparison to editing video. So I stand by my original premise...L2ARC is awesome if you're editing video off your NAS.
@@Matlock69 ok, so photo work is likely to benefit more from an nvme metadata vdev, since photo workflow often deals with a large number of files if it's serious. However, that would require at least a 2-way mirror of Enterprise-grade ssds, and those aren't cheap...
In our studio we have a (128GB RAM) 512TB in 4 pool TRUENAS server for VFX and 3D animations ( mainly 32 bit EXR multiLayer files), each pool have have a 4TB pcie 4.0 SSD for L2ARC that can feed the 40Gbe NIC .
the advantages are just mind blowing. we will upgrade to 8TB nvme probabli in Q1 2024
How did the upgrade go?
Just want to say thank you for this great video, installing L2ARC speeded up SMB performance drastically ! Looked a lot in forums when building my Truenas Sever, and evereyone told that you don't need L2ARC. But now with L2ARC opening large ProTools Sessions (wich contain a lot of small files) is more than four times faster !!
Thanks man!
NVME L2ARC and metadata special really helped my SAN. Metadata special is something you should look deeper into. Esepcially if you're running something like Adobe Bridge, AVID, or another 3rd party media and metadata manager. It really helped load times with all my brushes etc for AI/PS/LR and my Native Instruments VSTs. If you want to get crazy build a server with a Xeon gold that supports Optane PMEM / NVDIMMs and run them in disk mode, this 1TB or so will be your L2ARC, NVME Metadata special ~4 in mirrors, you don't want this to die it is all the metadata for your pool, you need high endurance here, pool of mirrored HDDs for archive, RAW, assets, and bulk. Configure your NLE's folders accordingly and use as fast of a network interface as you possibly can. More modern SFP28 25G NICs are killer for this. The metadata special knows where the files are and cuts seek on spinning rust. Then build a second pool with SATA SSDs in mirrors for ingest and scratch. You'll ingest to here, toss the bad stuff, then move it to your RAW/Bulk pool. You can totally edit off of it too. 24 cheap 2TB SATA SSDs will run you less than $2000.
this is so much better than the video you posted when you moved. you explain everything well
phenomenal video. ZFS has a lot of old rules-of-thumb that aren't super useful anymore. We need to be retesting this "old knowledge"
Your style of presentation is excellent and you absolutely demonstrated your subject matter expertise in what is a very complex and misunderstood component of ZFS. Well done!
nvme L2ARC rocked my world everything is so snapy
@SpaceRex , fantastic video. thank you for that. I was every time hit by those negatives when trying to look for answer if i should use l2arc ;) I guess you are one of few ones that say that l2arc is awesome :)
L2ARC is awesome but is awesome in certain specific scenarios. Most "amateurish" users don't need and don't even understand in which scenarios L2ARC is great.
Most people that say that L2ARC is useless or worst test in one user scenario reading sequential data. In real life a NAS in a business never reads sequentially because you have many requests at the same time on different files. Nowadays with many businesses using 25/40Gbe nic on their server and using pcie4.0 nvme for L2ARC is not only awesome but is a cheap solution to drastically improve the overal performance.
Great vid SpaceRex. Would love to see your L2ARC hit-ratio how much data has been served from it since you have added it to your server. Would you mind sharing? You can get a summary from the CLI with arc_summary and its under L2ARC hit ratio. Thanks, Chris
Any news on this? I'd love to see just how effective your L2ARC has been since you added it to your system or perhaps it hasn't?
hit ratio is a misleading metric for home use. Most stuff you do at home is an atypical workload that can't be cached efficiently. Conventional metrics don't apply nor do they help. So brute-forcing like 5% of pool capacity as L2ARC covers most of the important stuff according to Pareto distribution. This is why L2 is great on home use.
Home vs Business makes no difference here at all. If your ARC hit ratio is above 95% then L2ARC will be useless and most likely hurt your performance. Think about it if 95% plus of your reads are coming from RAM what’s the issue? L2 only catches data evicted from ARC. Do the testing and see, I have.
@@chrisparkin4989 think of it this way: what does ARC hit rate measure? If I read 200G I use regularly and my memory is 32G. Drives are running and ARC hit ratio is >90%. If I have a 2T L2ARC, most of the data is coming from ARC+L2ARC and HDDs are barely running if at all.
Getting 256GB of memory just isn't economically with NVMe this cheap. L2ARC myth was valid 10 years ago where memory was just better all the time.
In our studio we have a 3d animation and VFX department and we work mainly with raw footage and image sequences (.ext) concurrently on many workstations(10Gbe NIC) and different projects. The server/NAS has 512Gb of ram, 384GB reserved for ARC and 2 x 4TB nvme pcie4.0 L2ARC (40Gbe NIC).
In the morning of a normal day if i force flushing the ARC is full again in less than 1H, because we work a lot with image sequences the metadata occupies a huge part of the ARC. On a normal Day the L2ARC can reach 45% of occupation. The hit-ratio in ARC reached 99.9% very fast. This is a specific case scenario and L2ARC is awesome for this kind of situation: the same huge amounts of data that are pulled again again and again.
At home even if I have a TRUENAS with only 16GB of RAM I don't have and I don't need a L2ARC.
All great points, Will. Thanks so much for your time.
Awsome video Mister Rex! :)
Very good points you're making.
The L2ARC informations from 10 years ago aren't relevant anymore. I've read that the newer open-zfs versions use much less ram space to allocate L2ARC too so it's indeed not a deal anymore.
I would point out another case where L2ARC isn't useful. It was happening to me : I had 128 GB of ram and 20-40TB of media data for my media servers. Even when adding L2ARC, when the reads are too random L2ARC isn't that useful. The chance that people watch the movie that was just added or the same that another user watched was very limited on the 20-40 TB of movie files that was available.
I had a lot of reads on my open-zfs pools but they were so random that L2ARC didn't make much of a difference. Below 5% usage.
Yes, something like this where your workflows are truly random and the working data is far larger than your L2 size, you will not have a huge performance increase
The biggest thing is that L2ARC is *persistent* across reboots. You kinda glossed over it, but, in terms of maintenance downtime, it's a must, because upon reboot, you have *nothing* in your ARC, so it's going to be very, very slow until it's used enough to fill the ARC, basically a full day of slowness for the office bees.
The other thing is, a COW filesystem is going to write slower than a journaled system. You can improve this a little with ZIL, and Metadata vdevs on SSD mirrors, you can improve it *alot* if your workload doesn't care about potentially losing 45 seconds worth of writes, which is what you could lose, worst case, if you turn off sync writes. What L2ARC doesn't improve is write speeds though, and people don't really understand that. Some workloads, it makes much more sense to have one instead of just having an SSD pool with those same drives.
Also, on SSD wear, who cares? They fail predictably, so you can replace them before they fail outright, and SSD's are CHEAP now, and by the time you kill one, it'll be CHEAPER to buy something even better. Storage arrays in production last 3-5 years, so, most of the time, you'll be upgrading drives anyway before your SSD's wear out.
L2ARC is indexed by ARC. L2ARC needs to be warmed per power cycle - so there will be a rebuild time for L2ARC to be repopulated by ARC activity.
@williamcleek4922
L2arc can be set to persist through reboots. This can dramatically improve system performance on bootup while the ARC is still being populated.
@@execration_texts Also does this by default in truenas scale, but not Core.
Having it enabled for large pools will increase your boot time however.
You rock my friend! Great learning from you.
I use L2ARC on a $349 desktop! I use ZFS since Apr 2018 on my desktop at that time a Phenom II X4 with 8GB DDR3, now a Ryzen 3 2200G with 16GB DDR4. My usage is completely on the other side of the ZFS scale. I collected ~70 VMs and all my main apps run from 6 more or less specialized VMs (4x Linux and 2x Windows).
L1ARC: Those main VMs run from my 512GB nvme-SSD resulting in Linux boot times between 6 and 12 seconds (Xubuntu and Ubuntu). I have only one pair of hands so in general a VM runs after ~1 second from L1ARC, so L1ARC has a ~99% hit rate :)
L2ARC: Most of my VMs are stored on a 2TB HDD (192MB/s) and those VMs are cached by L2ARC, a 90GB partition of a sata-SSD (530MB/s), througput is 2.5x higher, but better the L2ARC has no HDD-arm movements. Linux boot times from the L2ARC cached HDD are between 15 and 25 seconds :) Note that after ~1 second that VM will also run from L1ARC again :) :) For this reason the L2ARC has a hit rate of
could you share how exactly did you build and partitioned your server. I am also looking for something similar.
@@TheGolenEgg I have no server, I run everything on one desktop. The Host OS is a minimal install of Ubuntu 24.04 LTS, it boots from a 20GB ext4 partition and a 4GB swap partition on my nvme. I run all my apps in my VMs. The 11 most used VMs run from the 3rd OpenZFS partition of 488GB on the nvme. My 2 TB HDD has 2 partitions each with a datapool. The first part of of a HDD has a higher throughput, so the first 1 TB is for a VM datapool/partition (say ~50 VMs more), the second 1 TB is for data, photos, music, videos, etc. Like I said, my SATA-SSD is my cache drive and it has 4 partitions; L2ARC: 90GB for the VMs; 30GB for my data and for the ZIL/LOG: 5GB for the VMs and 3GB for the data.
I did use storage that I already had and only the nvme I bought new in 2019, My 2TB HDD is a replacement for 2 ancient HDDs (500GB + 1TB) with both 10 power-on years, when they died within say 2 months.
Last week I replaced CPU and memory with Ryzen 5 5600GT and 32GB of DDR4. The DDR was mainly, because I got tired of estimating whether the VM would fit in memory without reducing the L1ARC memory cache too much. I replaced the CPU, because I had the feeling that during especially booting the lz4 decompression took to much time on the 2nd slowest Ryzen ever, and that was true, because now with the 5600GT (3x faster than the 2200G) the Xubuntu VM boots in 4.5 seconds instead of 7 seconds. Both of course are nice boot times,
@@bertnijhof5413 Thanks for reply. I am also looking to build my primary system where I can use vm and stare data. I do not have a need of a Server. Just looking for a resilient system with fault toleration. My usages are data storage 50% vms 25% a/v editing 25%.
Just been playing around with l2arc on a nvme 2 usb adapter, so maximum 1GB/sec speeds, and filling the cache up with data (after applying tweaks like disabling noprecache etc) and it's really fun to watch pulling a file in through dd and seeing it come from the hdd, then you do it again and it's coming from the cache.
Speed isn't what I'm looking for, it's to get flash storage in between me and the hdd's, so the drives don't have to spin up on pre-cached data. Like if I'm binge watching something, I'd rather it come from the SSD rather than having 4 enterprise drives burning power.
I see you have the SNIA ZFS powerpoint up. Good one.
About 17 min into the video it seemed the high spec server running TrueNAS only had about 1000MB/s read when reading from the drives with no L2ARC. And that server had more than 100 hard drives with multiple vdevs. Since ZFS natively writes to all those vdevs at once and therefore reads files from all those drives at the same time, why was this number so slow?
I am one of those people who most of the time do not recommend L2ARC for accelerating sequential reads. But if the large ZFS servers I sold could only get 1000MB/s from the Hard drive pool it would be much more likely that the L2ARC would help sequentially
Alot of the old guard ZFS admins are still stuck in parity raid days, which doesn't really perform all that well, not to mention the fault tolerance is much, much lower than it appears when using large disks. Mirrors are the only way to go with drives 8TB and larger if you plan on being able to resilver. That's not opinion, it's math, math that nobody bothers to do for some reason...
Also, the reason why they were only seeing about 1000MB/s is because it's not truly a sequential read, because there are multiple reads being requested simultaneously, which is the case in almost every environment. They also had 4x NVME L2ARC devices, which will essentially run at drive speed as threads increase. Modern NVME is good for an easy 1000MB/s per drive, so, 4 of them will be good for 4x that without issue, which is more or less what you're seeing in the example slide.
Huh, I always thought it was "level 2 ARC" not "layer 2 ARC." Hearing layer 2 repeatedly makes me think it operates at the data link layer instead of the network layer...
Good argument for l2arc
would you recomm L2ARC for limited ram systems? like in cases of only 16gb? should we go withe the smallest available 128gb nvme l2arc?
A 1TB L2ARC will take up ~750mb of ram. I would say that that is well worth it for a disk volume. If you had under 12 gigs of ram I would not. But I would just get a 1 TB NVMe and use it!
No it will take 50gb ram
1gb l2 is 50MB ram
It depends on record size I think and how many blocks there are. 80 bits per sector I believe?🤔
Hey Rex, about that white plastic pin that holds the NVMe drives on Supermicro boards. You should push it further down in order to hold the NVMe horizontal and steady.
Technology improves and things change. I still see some people showing the install of FreeNAS/TrueNAS on a cheap USB key. As stated, it all depends on what the NAS is being used for. The only big reads on my system would be from my media pools that has all my tvs and movies.
I'm looking at putting together a system with 64 gig of RAM but everything I'm reading says don't go above five times that for the LT arc. Any updated info on this?
Much of the old knowledge is not useful.
It all depends on what you are storing. My 2TB of L2ARC takes 1 gig of RAM so you could easily do 20TB of L2arc
My system is 512 gb of ram, 400 set to arc, with 2tb nvme set to l2arc, 1 tb mirrored to meta, 512 sata mirrored ssd to logs, my hdd are rarely touched with 200tb of storage and about 55% full my personal data doubles ever 2 to 3 years, my backup Nas is just my old HD before upgrading.
NVMe/SATA SSD drives - they did change over the years. Write resilience for a single cell dropped a lot (SLC->MLC->TLC->QLC) while the overall size of the drives increased. So you need a lot more data written to the big drive for a single cell to be used X times vs a lot less data written to smaller drive for the same cell to be used the same X amount of times. That's why for example Intel DC S3710 400GB server-grade drive from 2016 has a 8300TBW warranty, while brand new commercial 1TB SSDs hover around 600TBW warranty.
Resilience of the flash aside, you will probably burn the controller or others support components on the drive sooner than the flash chip itself. Commercial grade drives are not designed for 24/7 operations with a lot of data moving. For a homelab/media server with small daily data transfer it should be fine, especially if drive is only used for L2ARC. For a small business use, better get at least NAS designed drives, because it's always a downtime that costs the most, not the hardware alone :)
If you had enough l2arc depending on file access patterns and luck it could increase lifespan of your hdd in pools
it is not a completed job doing card moves on a server with out dropping at least one screw.
Great video! I have multiple (3) large pools, that benefit from dedicated L2ARC NVMEs. I was toying with the idea - Since the L2ARC is 'disposable' if something happens, is it worth building a stripped NVME pool of 2-4 NVMEs and creating a partition for each of my pools? The idea being that each pool would benefit from insanely fast stripped NVME reads. Thoughts? I know it sounds a little janky, but it's just for a media server at home.
You want to shot a fly using a canon my man :) For media home server having a simple L2ARC is more than enough - you will easily saturate 1GbE or 2.5GbE NIC, even 10GbE with a single L2ARC drive. To be frank not having an L2ARC would be fine as well, since even when streaming 4K BluRay not recompressed movies you will need 100Mbps for 1 client. So decently configured pool would have no troubles with sustaining that read.
L2ARC vs building a stripped pool in ZFS - two totally different things. Like you said, L2ARC is disposable, but stripped pools without any protection, running on commercial NVMe drives is just asking for trouble, because you loose your whole pool with 1 drive failure.
I think only benefit you will see will be in the synthetic benchmarks, not in real use case for home media server. Plus you would have to have a multiple 10GbE clients pulling data at the same time to stress your server.
Is L2ARC only used for reads? What if i want to copy a SD card really fast to offload footage?
Yes, L2ARC is only used for reads, ZFS does not have an option for real write cache.
However, as far as understands ZFS combines write requests to write them as efficiently as possible, it's close to being RAM write cache out of the box.
So, my guess would be that you will not have issues copying something to ZFS pool really fast, but you only option to make it faster is to add more ram or striped pools.
What the heck do you do with all that storage?
256GB of ram is 2TB enough or 4 better
sup spacerex, I'm having a hard time finding which video of yours it was I watched but you talked about why I shouldn't get the DS2422+ because it only supports synology drives which are overpriced and they don't even have 20tb or 22tb yet. If I wanted to maximize the amount of how much storage I could have with as many 22tb drives as possible.. I think I recall you mentioning that the DS1821 was the best bet. I read somewhere today that if you don't use synology's drives, that even with something like DS1821+, this prevents you from being able to upgrade and scale it up to 18 drives. Is this true? Apologize for the lengthy question.
you can put any drives in the 1821+ and use all 18 drives!
Synology doesn't keep their QVL's updated for very long, so you won't see those drives as supported, even though they likely will work just fine. Nowadays, a disk is a disk basically.
There are other reasons to not go with Synology though, namely you're paying more for less capable hardware and a close to zero config out of the box experience. If that's what you want, buy a similarly specced qnap device and save yourself a little money.
ty for the video.... makes a lot of sense. Wonder why Tommy from Lawence systems disagrees about this.
In his video that he did on it he had the entire active data set fitting in ARC. This meant that the L2ARC was unnessiary
@@SpaceRexWill I figured that’s what it was. Glad to see I’m starting to understand this stuff
I feel like I have enough ram that l2 arc isn’t going to give me a boost considering the small size of my pools. My pool is 12Tb and I my ram cache is 140GB
It really comes down to how you use your data.
For example a Plex server will have no help with L2ARC as 99% of files are not going to be watched back to back.
But something like a video editing file server where you have 1 TB projects you are working on would be greatly increased by L2ARC
Thanks I'll look into this
Great video. Next one about ZIL/Log ;-)
Will do one! For me I mostly work on Video production servers so No sync is totally fine. But will do some testing on the ZIL
@@SpaceRexWill great
And metadata
I have yet to have a worload that benifited from l2arc in the last 15years. My main workload gets around 5-8k read iops, but only 4-10 of those iops actually hit the disks, everything else is from arc, so when I have 1tb of l2arc, only 2-4 of those 4-10 iops even hit l2arc, reeally pointless. My other usecases is for streaming, and those are only ever read once, so caching doesn't matter at all. Sure if you dont have enough ram for your working set, it will help. I have rarely seen people that have a huge working set. The size of your disks doesn't matter
Most of the time, ZFS just boils down to doing math to make sure you are not going to shoot yourself in the foot. For funzies, here is some data on my ARC/L2ARC.
Using 2 Samsung 970 EVO 2TB, current power on time ~5 years, 5 months, TBW so far 730.7TB, or ~367.3GB/day.
ARC size (current): 93.3 % 477.7 GiB
Target size (adaptive): 93.4 % 478.4 GiB
Min size (hard limit): 4.7 % 24.0 GiB
Max size (high water): 21:1 512.0 GiB
Most Frequently Used (MFU) cache size: 69.7 % 324.0 GiB
Most Recently Used (MRU) cache size: 30.3 % 141.0 GiB
Metadata cache size (hard limit): 75.0 % 384.0 GiB
Metadata cache size (current): 6.3 % 24.1 GiB
Dnode cache size (hard limit): 10.0 % 38.4 GiB
Dnode cache size (current): 16.0 % 6.1 GiB
ARC hash breakdown:
Elements max: 9.8M
Elements current: 99.3 % 9.7M
Collisions: 2.8M
Chain max: 4
Chains: 334.9k
ARC misc:
Deleted: 12.1M
Mutex misses: 12.4k
Eviction skips: 427
Eviction skips due to L2 writes: 0
L2 cached evictions: 11.9 TiB
L2 eligible evictions: 132.0 GiB
L2 eligible MFU evictions: 83.8 % 110.5 GiB
L2 eligible MRU evictions: 16.2 % 21.4 GiB
L2 ineligible evictions: 115.8 GiB
ARC total accesses (hits + misses): 2.7G
Cache hit ratio: 99.1 % 2.7G
Cache miss ratio: 0.9 % 25.7M
Actual hit ratio (MFU + MRU hits): 99.1 % 2.7G
Data demand efficiency: 99.9 % 621.0M
Data prefetch efficiency: 1.1 % 12.6M
Cache hits by cache type:
Most frequently used (MFU): 97.3 % 2.6G
Most recently used (MRU): 2.7 % 72.8M
Most frequently used (MFU) ghost: 0.3 % 7.5M
Most recently used (MRU) ghost: < 0.1 % 399.1k
Cache hits by data type:
Demand data: 23.0 % 620.5M
Prefetch data: < 0.1 % 143.2k
Demand metadata: 77.0 % 2.1G
Prefetch metadata: < 0.1 % 796.2k
Cache misses by data type:
Demand data: 2.1 % 540.3k
Prefetch data: 48.5 % 12.5M
Demand metadata: 30.9 % 7.9M
Prefetch metadata: 18.5 % 4.8M
DMU prefetch efficiency: 234.1M
Hit ratio: 7.3 % 17.1M
Miss ratio: 92.7 % 217.0M
L2ARC status: HEALTHY
Low memory aborts: 0
Free on write: 258
R/W clashes: 0
Bad checksums: 0
I/O errors: 0
L2ARC size (adaptive): 3.7 TiB
Compressed: 98.8 % 3.6 TiB
Header size: < 0.1 % 306.4 MiB
MFU allocated size: 78.7 % 2.9 TiB
MRU allocated size: 21.3 % 791.1 GiB
Prefetch allocated size: < 0.1 % 140.7 MiB
Data (buffer content) allocated size: 99.9 % 3.6 TiB
Metadata (buffer content) allocated size: 0.1 % 4.9 GiB
L2ARC breakdown: 25.6M
Hit ratio: 31.1 % 7.9M
Miss ratio: 68.9 % 17.6M
Feeds: 320.6k
L2ARC writes:
Writes sent: 100 % 184.6k
L2ARC evicts:
Lock retries: 464
Upon reading: 0
Some uses of ZFS have huge servers with 128 or 512GB or more, of RAM, so they can hold that much in RAW video files when editing. And they tend to be a small group. LARC is still valid even if you are the only user or editor.
Absolutely! Especially with a video editing workflow, where your active dataset will be very similar for a few days as you are cutting it
Read cache not helpful in a low to no read environment? Makes sense.
Just built a truenas scale server with 512gb of registered ECC. There's a bug in truenas scale where it only uses half the ram. You have to add a little code for boot so it uses more.
It's not a bug and it will get fixed in later releases
Yeah it will be fixed in a few weeks with scale 24.04
I've been using mlc nvme's as a L2ARC with my ssd array for quite a while now. Well worth it. Especially for a Minecraft server.
Question, I have 2 x 2gb nvme sticks that I am attaching via carrier boards to slimsas ports (motherboard only has 1 M.2 slot, using for OS drive, and I don't have anymore PCIe slots left) can I mirror them and then partition them, so I can use a small parts of them for discrete caches (read & write) and then the rest for SLOG? The primary use of this NAS is going to be VM hosting with a smigg of fileshare. Also, what is the suggested block size for VM hosting scenario vs a fileshare scenario?
those pesky screws :D i feel you!
I have 60 HDDs running 4GB/s. I don't see how adding L2Arc will help except in access times, but spread out among 60 drives, you're gonna have a larger initial latency, but it's just as fast after that for sequential reads and writes.
dRAID actually made this fast. 4 vdevs, 1 spare, 2 parity, 5 data, 15 children each. 1M recordsize. 128K record size slowed everything down, even my SSD mirrors. It's not even funny.
Also, large numbers of mirrors were many times slower than these 4 dRAID vdevs. I think mirrors don't scale the more you have. My SSD array has 40 mirrors, and it's slower than my HDDs.
40 SSD mirrors should absolutely be able to do 4GB/s. There probably is a misconfiguration / hardware bug somewhere
L2ARC would help with random iops on that HDD pool
@@SpaceRexWill Random iops is the thing I don't understand. What causes them to occur? And caches only help with reads, not writes right?
@@Saturn2888 random is when your data is not streaming. If you were streaming a 40GB video, then needed a different file totally unrelated from a different process (maybe a docker container or VM, but quite literally anything but the video) that's a random I/O. The more different things there are requesting reads or writes, the more random I/O you'll have. IOPS (Input/Output Operations per Second). Random is the opposite of sequential basically.
@@kdb424 I don't think that's quite right because there's a separate fio test for random read and write different from its sequential read and write test.
After more testing, 16 threads rather than 1, I ended up writing 9GB/s sequentially on this pool. Crazy. Don't remember reads. Something high as well.
I have two metadata SSDs, and I think they help with random reads and writes by knowing exactly where the files are located as my random write speed was still 1GB/s. Random reads were almost the same as sequential reads.
I have the data if you want it. No L2Arc required. I think the video is important, but I can't see a need for my use case. Maybe if you only have 1 vdev of HDDs, it'd make more sense.
@@Saturn2888 Special vdevs are very useful as they cache metadata, and can optionally, depending on block size, store some of that data (not just the metadata) on SSD's as well. I don't run an L2ARC, nor do I feel that I need one, but it will depend massively on your workload and how your vdevs are set up if it will matter at all.
Do you have a tutorial showing how someone can connect Filezilla to synology?
good but should put "nvme" in video title..
This title should read "L2ARC is AWESOME on ZFS...if you're editing video." I can't think of another normal use case where it would be useful.
Watching video? Many people have their video libraries on their Nases.
Also, Photo editing\viewing.
@@BoraHorzaGobuchul It's been well documented that most video streaming never rises to the level to come close to touching L2ARC, unless your system is severely deficient in system RAM. And obviously photo are miniscule in size comparison to editing video. So I stand by my original premise...L2ARC is awesome if you're editing video off your NAS.
@@Matlock69 ok, so photo work is likely to benefit more from an nvme metadata vdev, since photo workflow often deals with a large number of files if it's serious. However, that would require at least a 2-way mirror of Enterprise-grade ssds, and those aren't cheap...
I want a Server like that!
I want a wife like his 😛?
ffs it’s level 2arc not layer and it shouldn’t take 7 mins of video to say it’s simply where arc evictions are stored.
For some of us here who aren’t miserable elitists, the 7 minute explanation is helpful.
Gotta wonder why a person who already knows about it is watching the video 😊
dude talk normal