here is the scenario: 16GB ram 12TB total HDD( 3x 4TB stripe vdev) ; and there are many 3 GB iso files in that 12TB vdev. now if 8 people at one start downloading 8 different iso files(all 3 gb in size), how will truenas act if only 1. only ARC is enabled and if 2. arc + l2arc is enabled..
What an amazing video. I am a software engineer and started getting interested in hardware and building a homelab, I learned so much from you sir. Thanks a lot
Thank you for the explanation. I think that it might also be important to explicitly point out you will ALWAYS have a ZFS intent log (ZIL) and in the absence of a separate log (SLOG) device, it will use your main pool to store said ZIL. And ad simile, you will also ALWAYS have adaptive replacement cache (ARC) (which you can consider the RAM in a system your Level 1 (L1) ARC) and that if you add another separate, and dedicated device to supplement the RAM for said adaptive replacement cache, this is where and how you can have a level 2 adaptive replacement cache (L2ARC). In other words, you might be without a L2ARC if you don't add one (which, as shown, may or may not help with your reads), but you will ALWAYS have a ZIL, and the performance of your synchronous writes can depend a great deal of where the ZIL resides (e.g. on the main pool or on a separate log (SLOG) device).
Actually, it was mentioned in the video that the ZIL will take up a small space of the main storage pool unless there's a separate device. That implies that there is a ZIL log, even when it was not explicitly said that you always have one.
This is great! I have come back to this topic several times, and never dedicated enough time to research entirely as there is a lot of confusing stuff out there. But this explained it very well
In regards to L2ARC, I have a lot of my Steam games outsourced to a ZVOL (with 16KB blocksize and 16KB NTFS clusters) on my NAS, and over here the L2ARC seems to quite a decent of a job, given the sizes of data sets in games.
Yeah, same here. I only have 16GB of RAM in my server for TrueNAS+ZFS, and I also use the same box as the gaming rig, which eats the rest of the physical RAM. So the 100GB of L2ARC really did seem to massively speed up games with a lot of texture data and the like after they load in the first time. Before everyone says I should just get a second machine for gaming... This is the perfect combination of reusing old gear instead of generating eWaste, and taking up the least amount of space in a smaller home. There's probably also a power benefit pushing everything through one 80+ platinum PSU, than having 3-4 total systems doing individual draws.
So in summary, as much RAM as possible for better reads. And a couple of LOG SSD/NVME for faster writes to avoid having to use sync as "off" to avoid data loss
Thank you! I’ve seen so much back and forth on zil and arc and different opinions on which and what, I was completely lost. This video was sussinct and to the point, and actually showed with examples why.
Small Correction: A 1 tb drive zil would be wasted in the sense its oversized but not wasted for longevity. When using a SSD a larger drive would be able to handle the write more often before failure than the same drive in a lower capacity. You would want to balance cost/ failure and realistic how often your company upgrades anyway.
Super informative on ZFS cache and its performance impact. I did realize we could disable the sync setting as I had it as default sync always with inheritance. Disabled on NFS make a huge difference in performance for VMs sitting on NFS are port of a data store. But based on your input and this demo. The right thing to do is add a log cache and set the NFS dataset back to sync standard or always for safety. I'm curious to know how these features can effect iscsi performance as well. These different cache features look to come in to play depending on what type of loads and features you run on the true nas. Reading the reports and applying different cache types to help performance is a great feature. At a minimum its probably best to add log cache for sync standard as well as being careful in selecting which pools to keep sync disabled for non critical data.
The only reason of using the larger physical devices for ZIL is to increase longevity. Constant writes wearing out SSD very quickly and that's exactly what happens with ZIL devices, but modern SSDs are smart enough to spread the writes as much as possible meaning the larger the drive the more time it will get to work out. That is particularly important for m.2 NVMe ssd's because one have to completely stop the system to replace these drives if they start to fail.
I greatly appreciate all your technical explanations in all the videos I have seen so far. I have just a small tip for the recording: If your script was somehow closer to the camera, then the viewer would much more feel as if you were addressing him directly. Not sure if that is possible without a teleprompter ...
One of my favorite tricks (that I think you know already) is that Intel Optane makes an awesome ZIL device. Like you can probably get away with not having a mirror because Optane doesn't lie like other SSDs do. When it says a write is complete, it's complete.
Except H10 series devices have the Optane Name but are just regular flash with Optane cache (worst of both worlds for this use case). Also the 16GB drives don't have much endurance, get at least the 119GB Optane 800P or a 900 series + or obviously a DC Optane drive would be fine for that use.
I had a system using a single 16GB Optane SLOG, and after a sudden power loss the Optane drive ended up with a corrupt partition table. This caused the entire filesystem to be marked as bad. Had to rebuild it and restore from backup. Probably a rare event, but definitely not foolproof without mirroring.
Good explanation! I use OpenZFS on a 16GB Ryzen 3 2200G desktop and that is why I made some other choices. 1. I always work with partitions, so I need less disks and I save money. 2. I limit the L1ARC to 3GB. I want to avoid that on booting one of my many VMs, the system has to write updated records in the L1ARC to disk, to free memory for the VM. 3. I have the following 3 datapools: - one on my 512GB nvme-SSD - one striped datapool of two 500GB partitions on my 500GB and 1TB HDDs. That datapool is supported by a 90GB Cache and a 5GB Log. - one datapool at the end of the 1TB HDD.That datapool is supported by a 30GB Cache and a 3GB Log. The caches and logs fit nicely on my 128GB sata-SSD. The L1ARC hit rate is >=98% and the L2ARC hit rate is now 43%. The L2ARC is especially effective, when I boot a VM. In the striped datapool I run one dataset with my own personal stuff with copies=2 so that dataset is mirrored :)
Im not sure i understand the reason for point 2? ZFS would never write anything from ARC to disk would it? Everything thats in the ARC already exists on the disks, so it would simply just delete it from the ARC to free up space to give memory to VMs or apps
@@Nonstopie if the records are updated, they stay for 5 seconds in the ARC before they are written to he disk. The same is true for new records. If some process want to read the same record after say 2 seconds, you want to give it an up-to-date record.
@@williamp6800 Yes , like I said the most important dataset is mirrored (Raid-1) and what I did not mention, I have a backups on my laptop with a 2TB HDD and another one on my 2003 Pentium 4 HT (1C2T; 3.0GHz; 1.5GB DDR 400MHz) with 4 leftover HDDs in total 1.21TB (3.5" IDE 250+320GB & 2.5" SATA-1 320+320GB). It runs on FreeBSD 13.1 :) :) I have another backup of the family stuff (photos, videos; music, etc) on the 64GB SD Card of my phone. Together with the laptop I consider it my off-line backups.
Yet another great video! I thoroughly enjoy your videos, your profound expertise and (calm) way of presenting things make this an incredible experience. Thank you so much.
Well this is interesting. I just redid my main pool, added more storage. And I have three separate streams syncing data to the main zfs pool --- which is being run as a NFS server. It was writing at about 49M a second. Once I turned on forced syncs, and added LOG vdev the writes doubled to 97M a second.
@@waldolemmerI assume not so much ppl read documentation anymore, since you can you listen to UA-cam videos, but in that case ppl arent in the position to very what has been said, or consulting the documents for verification after consuming of online content. therefore theirs a lot of misunderstanding outhere.
If you have spinners, then instead of using ZIL or L2ARC, you should consider adding a "Special" or "Metadata" drive to your pool. Of course, it should be a mirror. It will store the metadata, and you can set it to store small writes, which essentially turns your pool into a hybrid drive, with both flash and magnetic media. This will increase your random 4K and IOPS performance for those files which would actually benefit. That is how you increase the performance of your pool for most workloads. Adding cache or intent doesn't, most of the time. It's strange that whenever this question come up, nobody talks about the actual solution people are trying to create, but misunderstanding. It's the other thing that you really want. It's that simple. I'm using a mirrored metadata vdev in my pool, with a set of 8 spinners in a striped mirror config, and I get line speeds at 10gbps all day long.
This could also work for an SSD array, if you use a couple NVMe drives to lower your latencies and increase performance. Although, if you're hosting some VMs off an iSCSI, then they might serve you better repurposed as a ZIL SLOG.
As a home server with few users, can i use a mirror NVMe of 2TB partitioned into SLOG, L2ARC and special vdev (metadata)? My home NAS does not have many SATA or NVMe slots (nor i have much money) to keep them as separate devices.
@@NhatLinhNguyen82 If you aren't using iSCSI to host VMs, you probably won't need SLOG. L2ARC isn't particularly useful for a small home environment, either. Use them as a Special vDev for metadata, and you can also set it up to store smaller files, too. That's the best way to increase those random 4K IOPS. Edit: If you don't mind living dangerously, and I wouldn't recommend this, but you could use one for each, Special Metadata and L2ARC. It's just not a good idea.
@@TheChadXperience909 Thank you for advice. I am just curious why only VM implementation would benefit from SLOG. I am planning to force sync write to get more data integrity during the large transfer of the photo library which is very valuable to me (my kids photos). Though writing is not often, SLOG would provide speed in such cases for sync write and piece of mind in off case power goes off. Sync write is not often in my case thus SLOG work tasks will not often compete with read tasks of special vdevs thus not problem sharing NVMe. I agree with L2ARC though. I can even do a 3 way mirror to give more pieces of mind again metadata loss with special vdev.
@@NhatLinhNguyen82 VM isn't the only thing to use synchronous writes, but is the most common. The only thing it protects against is data loss during power loss, or some kind of problem which interrupts the transfer. If you're worried about this, you could get a UPS.
i feel like the moral of this video is buy more ram, which works when youre billing corporate, but im just trying to make use of what i got in my basement i feel that speeding up hdds with ssds would be a more normal use case. it would be nice to see the same $$ in ram vs ssd vs nvme to speed up a hdd raidz2
Thanks for the video. I'd add the benefit of a l2arc goes up the worse your hdd based zpool ist fragmented. Also depends on usage pattern. Will you do a video about the metadata device benefits? Would love to hear your take on it.
@@LAWRENCESYSTEMS one area that wasn't really covered by L1T in that video that I was curious to see when I read your video's title is a metadata only *persistent* L2ARC, also how having an SSD/Optane based L2ARC changes benchmarks for a HDD based TrueNAS system. A metadata special device just seemed to high of a risk to me to be worth it (lose the special device and you lose the entire pool) where as losing a persistent metadata L2ARC just means losing the time it takes to rebuild the L2ARC.
Thanks for this video. Being new to DIY NAS, I am about to build my first NAS, I was wondering if I should add a nvme cache drive to accelerate reads, now I know the best thing to do is just get more memory instead. I was going to go with 16GB, now I will go to 32GB. Just using it for file/movie storage for my HTPC to read the movies from. Going to learn how to setup a 6x6TB array now with two drive redundancy in a RAIDZ2.
Anyway to setup the cache to the dram on a gpu with direct storage, there is a little bit of cpu usage with ZFS, but it not appear to be substantial. I would love to just install a very overly absurd gpu that would never be able to use it's full processing and power and especially all of the DRAM instead alongside installing a nvme. I am kinda on a pci-e lane budget - it has pcie switch and could add a riser, but running a server board in a desktop case is kinda awkward. A full 20 lanes of pcie gen 3, seems small but it is on a switch so I think that is good. Anyways thank you for a look into some of what to run to test the speeds with . My current Truenas setup which is a raid array with 8x1tb ssd on LSI raid card passed through from proxmox to truenas core at this time. I will not yet benefit much from a cache due to I am no where near the potential speed I can read and write on this machine while connecting over the 1000Mb connection Exactly at 116MB/s writes at this time. I do have a dual 2.5Gb card in both my workstation and the server and will likely pass through into Truenas, and setup a round-robin bonding from both sides, I get it does not speed up connections normally, but I am pretty sure if round-robin is set on both ends it will speed up a bit. Ultimately I will eventually set with sfp+ 10Gb. Copying a 700GB file over the network was a bear for speed yesterday.Why moves such large file is due to weird qcow2 file that when I --delete the snapshots, it does not get smaller on the filesystem, but when I examine it with qemu-info the size shows as decreased to the size it should be. Thus I backed it up. Truenas and ZFS scares me though, every power outage it says pool offline once it boots up and wants me to export, which is when it warns the pool will be removed. -- I said noway to that removing of the pool, and rebooted, and that fixes it now everytime, but I actually have to go into the GUI and manually reboot Truenas after every unexpected shutdown due to the pools do not come back online the first time around. Your mention here of how it works, makes me wonder if I turn on synchronous, then will that stop. For the ZFS pool offline, I am just rebooting for now until I really want to fix it, but would help to know the answer. The conversations on the web are just really surrounding pools that actually disappear or were accidentally removed, but nothing can find me that has at its' heart the terms "Pool offline after power outage requires reboot to load pools and is not lost" I honestly have considered adding a boot up check that verifies if pool was loaded, sets a flag for reboot count, and then reboots it with the condition the number of reboots does not exceed 2 or something like that. I am watching a lot of your stuff. There are a some individuals on UA-cam who are absolutely amazing and Lawrence, you are definitely one of them.
New subscriber here.. This was a great explanation of some of the cache-types that zfs offers! As a relative newbie to this zfs world I would be interested in your thoughts on dedupe caches. Can they be added and removed as per your examples here? If the dedupe cache is lost, then is that catastrophic? My use case is as an archive/backup system. Thanks again!
thank you for this video! it made my proxmox VM performace drastically incease. Is there other recommendations you have related to TrueNAs proxmox nfs ??
I'm probably looking under the wrong rock or my install is broken somehow, but as far as I can find the current version of TNScale (Bluefin) has no Gear icon and no three dot menu to remove anything from the pool once made or attached. So by following along you now can't remove the log from the pool once it's created. That's frustrating that stuff once in the GUI just apparently disappears or gets hidden? I thought the Scale gui at this point was close to being ready for some light work outs.
I don't understand how you can get 90MB/s with 3 Sata SSD in raidz1. How is that number even making sense? this is slower than one mechanical HDD. Can someone explain?
Im no ZFS expert but from my understanding ZFS decreases performance somewhat compared to most journalling filesystems, but in turn losing your data is very unlikely And regular raid1 with zfs is not optimal, because zfs loses some of the controls and checks. Raidz1 fixes that but adds another slowdown, I believe
I really wish that XI would just come out and say the quiet part out loud and tell people if they want fast writes, they need more, faster vdevs. So much energy is wasted by people asking and researching that as through zfs is some sort of special unicorn where you can get additional write speed by using some esoteric config.
So, am i wrong in thinking there is a way to add a 2TB ssd that will be the fast write drive. and then transfer that data to the spinning drives as they can handle it? I know you said 6.25GB ZIL, but if I want to transfer a 1.5TB folder, am I screwed, or is it possible to have that dedicated fast temporary write drive. read speeds are plenty fast for me at the moment
Thx for the video Tom. One thing I find odd though is that by using NVMe drives that is able to do north of 2GB/s of writing, you can't reach that with TrueNAS when used as a SLOG device. That puzzles me a lot and I would really like to understand why. The reason is: if you have a standard pool with mechanical drives in it and you want to accelerate or benefit for that shiny 10GE network you setup, you can't because that SLOG isn't performing at all. There must be something else going on that slows thoses writes on SSD because that's not normal in my book: 60MB/s is a far cry from 2000MB/s that an NMVe can do. Even SATAIII drives that can do 400MB+/s of sustained writing speed are bug down to that same kind of speed. What is really going on so have such slow speed on hyper fast disks?
Thank you so much for this. It really helped me to understand and also teach me how to test on my own system. Can you do one one just like it on metadata and dedupe? These types of setups are not as easy to test on my system and I am trying to understand the relationship between the two. Level1Techs has recently been discussing metadata, but they didn't do something like this. Really appreciate it, thank you.
Great video with a lot of information! One q about L2ARC though: I use a 1TB M.2 as L2 and 128GB ARC for my 10GbE storage server which is mostly for photo editing. By using vmtouch I force the most recent photos into L2 so that I can access it really fast. Isn't that another valid use case for L2ARC? Thanks Tom!
As I said in the video it really depends on your workload, but if you have a frequently accessed data that is larger than ARC the L2Arch may help provided it's faster than the data VDEVS.
Would this be what I would need to use if I wanted to do what spotify & YT music does when I listen to music. It's temporarily saves, but it doesn't take up any of my memory. But I am still able to listen even offline...
What is that "sync;fio...." command? I tried the fio command, it worked one time only. I think the sync is to repet that command. I want to run that test too.l
of course. that was also stated in the video. but when you want to have the safety of synced writes than the ssd cache will bring up the performance quite a bit. but never up on par with async writes speeds.
Hi all. I installed TN Scale on an nvme (256GB). Is there a way to use the spare space in the drive as a cache? also is there any use for that free space? thnx
I didn't hear you mention this aspect. The L2 Cache needs to be faster than the Pool Media or it is a waste. For instance, you wouldn't want a hdd as L2 Cache when your pool is comprised of ssd's.
Can you make a video on how to disable it on TrueNAS? It makes my share way slower than a traditional RAID setup. I ended up using Ubuntu Server on RAID, it's way faster on my NVMe RAID setup
So is True Nas Something that ZFS interfaces with on the storage pool side. Or could I implement ZFS and Storage Pools with something like my homeserver using a different OS. Basically I'm saying for redundancy do i need a True NAS "box" and that storage pool tab to implement ZFS
Not clear on your questions but ZFS is a file system and TrueNAS is a good tool to make managing ZFS and everything that connects to ZFS easier than running it all from the command line.
Can you make a video about HDD spindown and why to not use that? I'm struggling to understand what the extended energymanagement does (Level 1,64,127,128,192,254).
@@LAWRENCESYSTEMS Cheers, I now got some experience and having spindown active is really annoying if you want to access data fast but have to wait until the drives are ready.
This is an excellent video! Really made me think twice about some things. AND, got me wondering....where can I find an nvme SSD that is that small? Was it 16GB? Looking through Amazon and I can not find it. Could you please share where did you get it, or give us a link please? Thanks Tom!
Intel Optanes are best for this. if you want to use "normal" cheap consumer sata/nvme ssds keep in mind the lowest capacity ones are often much slower writing than the bigger ones because they use less memory channels internaly. so while a used 10 year old 32GB SSD will totally be enough in terms of capacity, it may write much slower than the 256GB version of the same drive.
What about storing only metadata on the L2ARC? Is that possible, and would that help the system out on reboot by keeping metadata persistent across reboots?
Is the worse case scenario of loosing arc (if you go against the recommendation and only use one device) that you loose the 5 seconds of data. Or can you loose data from the storage-pool as well?
Truenas asked if i wanted a 16gb swap on the boot device. I have a 256gb m.2 disk as a boot device. Should i use 16 for swap? I dont know what this means.
I put a 256 gb 860 evo in my server a while back as L2ARC, since i had it laying around. Should I keep it as L2? or make it a slog? Should I remove it completely? I was kinda confused when I saw a forum post with a guy who had poor performance and someone told him to get a slog 3x4 tb ironwolf 64 gb ram
Is it possible to create a Log partition on the boot pool or boot drives or does Log need to be physical drives , I've got 2x 128GB m.2 drives mirrored for boot pool, they were cheap. Alot of space on there un-used.
I know people always say to not worry about L2ARC or that its a performance loss to use it, but seeing as how if the data isnt in ARC it would have to pull it from the pool (most commonly spinning drives), how is using an L2 disk (like a 1tb nvme) a performance loss vs a spinning sata drive?
As I said in the video it really depends on your workload, but if you have a frequently accessed data that is larger than ARC the L2Arch may help provided it's faster than the data VDEVS.
I understand that ZFS uses RAM for caching. This might be great for data that you need to be highliy available but probably not for data like the backups of my desktop machine that I want to store on the TrueNAS server. Caching this kind of data inside the RAM seems to be borderline useless. I've just set up a fresh install and the first data that I put on the server was doing that backup of my windows machine. Now 35 Gb of the RAM in my Truenas machine are occupied with that data... Is there a way you could turn of RAM-caching for a dedicated dataset or any other option to manage what data is chosen for RAM-cashing?
@@LAWRENCESYSTEMS Ok, but can I somehow prioritize what data goes into RAM-cash if I have data that needs to be highly availible and other data that doesn't? Wouldn't it make mor sense to cash the data that I neet to access frequently?
I have a dumb question... I'm researching truenas for the near future. I want to put portainer using the docker inside truenas. Does the cache works for all the apps inside docker inside truenas? Sorry for my poor explanation, i don't speak english very good...
ZFS is a COW video
ua-cam.com/video/nlBXXdz0JKA/v-deo.html
CULT OF ZFS Shirts
lawrence-technology-services.creator-spring.com/listing/cult-of-zfs
Our TrueNAS Tutorials
lawrence.technology/truenas-tutorials/
Links to Lots of ZFS Articles
forums.lawrencesystems.com/t/freenas-truenas-zfs-pools-raidz-raidz2-raidz3-capacity-integrity-and-performance/3569
⏱ Timestamps ⏱
00:00 ZFS Write and Read Cachce
01:59 ZIL & LOG VDEV Write Cache
06:29 ZFS ARC & L2ARC Read Cache
09:11 TrueNAS Lab Write Cache Test
13:20 TrueNAS How to Setup LOG VDEV
16:12 TrueNAS Lab Read Cache Test
19:06 TrueNAS How to Setup CACHE VDEV
here is the scenario: 16GB ram 12TB total HDD( 3x 4TB stripe vdev) ; and there are many 3 GB iso files in that 12TB vdev. now if 8 people at one start downloading 8 different iso files(all 3 gb in size), how will truenas act if only 1. only ARC is enabled and if 2. arc + l2arc is enabled..
You’re one of the best (if not the best) Truenas UA-camr explaining the design tradeoff of the system.
What an amazing video. I am a software engineer and started getting interested in hardware and building a homelab, I learned so much from you sir. Thanks a lot
Great to hear!
Thank you for the explanation.
I think that it might also be important to explicitly point out you will ALWAYS have a ZFS intent log (ZIL) and in the absence of a separate log (SLOG) device, it will use your main pool to store said ZIL.
And ad simile, you will also ALWAYS have adaptive replacement cache (ARC) (which you can consider the RAM in a system your Level 1 (L1) ARC) and that if you add another separate, and dedicated device to supplement the RAM for said adaptive replacement cache, this is where and how you can have a level 2 adaptive replacement cache (L2ARC).
In other words, you might be without a L2ARC if you don't add one (which, as shown, may or may not help with your reads), but you will ALWAYS have a ZIL, and the performance of your synchronous writes can depend a great deal of where the ZIL resides (e.g. on the main pool or on a separate log (SLOG) device).
Actually, it was mentioned in the video that the ZIL will take up a small space of the main storage pool unless there's a separate device. That implies that there is a ZIL log, even when it was not explicitly said that you always have one.
@@smurface549
You are right.
I stand corrected.
Thank you for that.
This is great! I have come back to this topic several times, and never dedicated enough time to research entirely as there is a lot of confusing stuff out there. But this explained it very well
Me too, thanks Lawrence
I SERIOUSLY appreciate your work and especially advocacy for ZFS - I learnt a lot from you. Thank you
My pleasure!
@@LAWRENCESYSTEMS my next step: zfs from command line ;-)
This is a great video! I truly needed this video to better understand how these things work. Thanks Tom!!!
In regards to L2ARC, I have a lot of my Steam games outsourced to a ZVOL (with 16KB blocksize and 16KB NTFS clusters) on my NAS, and over here the L2ARC seems to quite a decent of a job, given the sizes of data sets in games.
Yeah, same here. I only have 16GB of RAM in my server for TrueNAS+ZFS, and I also use the same box as the gaming rig, which eats the rest of the physical RAM. So the 100GB of L2ARC really did seem to massively speed up games with a lot of texture data and the like after they load in the first time.
Before everyone says I should just get a second machine for gaming... This is the perfect combination of reusing old gear instead of generating eWaste, and taking up the least amount of space in a smaller home. There's probably also a power benefit pushing everything through one 80+ platinum PSU, than having 3-4 total systems doing individual draws.
So in summary, as much RAM as possible for better reads. And a couple of LOG SSD/NVME for faster writes to avoid having to use sync as "off" to avoid data loss
Thank you! I’ve seen so much back and forth on zil and arc and different opinions on which and what, I was completely lost. This video was sussinct and to the point, and actually showed with examples why.
i was so confused about these 2 features in the proxmox manual thank you for this great explanation/demo
I have learned so much from you over the years. I really appreciate your content. Thank you!
Small Correction: A 1 tb drive zil would be wasted in the sense its oversized but not wasted for longevity. When using a SSD a larger drive would be able to handle the write more often before failure than the same drive in a lower capacity. You would want to balance cost/ failure and realistic how often your company upgrades anyway.
Easy.
I would have a 2-3GB ZIL with 2-5 GB over provisioning.
100GB Optane DC4801X M.2s are relatively cheap now. The low latency and extremely high endurance is perfect for a SLOG.
Thanks for the explanations. Its always helpful to get a refresher. Running ZFS on a Mac home server. This was quite helpful
Super informative on ZFS cache and its performance impact. I did realize we could disable the sync setting as I had it as default sync always with inheritance. Disabled on NFS make a huge difference in performance for VMs sitting on NFS are port of a data store. But based on your input and this demo. The right thing to do is add a log cache and set the NFS dataset back to sync standard or always for safety. I'm curious to know how these features can effect iscsi performance as well. These different cache features look to come in to play depending on what type of loads and features you run on the true nas. Reading the reports and applying different cache types to help performance is a great feature. At a minimum its probably best to add log cache for sync standard as well as being careful in selecting which pools to keep sync disabled for non critical data.
The only reason of using the larger physical devices for ZIL is to increase longevity. Constant writes wearing out SSD very quickly and that's exactly what happens with ZIL devices, but modern SSDs are smart enough to spread the writes as much as possible meaning the larger the drive the more time it will get to work out. That is particularly important for m.2 NVMe ssd's because one have to completely stop the system to replace these drives if they start to fail.
I greatly appreciate all your technical explanations in all the videos I have seen so far. I have just a small tip for the recording: If your script was somehow closer to the camera, then the viewer would much more feel as if you were addressing him directly. Not sure if that is possible without a teleprompter ...
One of my favorite tricks (that I think you know already) is that Intel Optane makes an awesome ZIL device. Like you can probably get away with not having a mirror because Optane doesn't lie like other SSDs do. When it says a write is complete, it's complete.
It is probably better to use an RMS-300 or RMS-200 if you can find one. They are designed for that kind of thing.
It's also better to not run raidz1
Except H10 series devices have the Optane Name but are just regular flash with Optane cache (worst of both worlds for this use case). Also the 16GB drives don't have much endurance, get at least the 119GB Optane 800P or a 900 series + or obviously a DC Optane drive would be fine for that use.
Matters not if you have an SSD with a capacitor to finish the operation.
Or a simple UPS…
I had a system using a single 16GB Optane SLOG, and after a sudden power loss the Optane drive ended up with a corrupt partition table. This caused the entire filesystem to be marked as bad. Had to rebuild it and restore from backup. Probably a rare event, but definitely not foolproof without mirroring.
Great info for first-time NAS builders such as myself!
This is one of the best tech video I have ever watched. Amazing content!
Good explanation! I use OpenZFS on a 16GB Ryzen 3 2200G desktop and that is why I made some other choices.
1. I always work with partitions, so I need less disks and I save money.
2. I limit the L1ARC to 3GB. I want to avoid that on booting one of my many VMs, the system has to write updated records in the L1ARC to disk, to free memory for the VM.
3. I have the following 3 datapools:
- one on my 512GB nvme-SSD
- one striped datapool of two 500GB partitions on my 500GB and 1TB HDDs. That datapool is supported by a 90GB Cache and a 5GB Log.
- one datapool at the end of the 1TB HDD.That datapool is supported by a 30GB Cache and a 3GB Log.
The caches and logs fit nicely on my 128GB sata-SSD. The L1ARC hit rate is >=98% and the L2ARC hit rate is now 43%. The L2ARC is especially effective, when I boot a VM. In the striped datapool I run one dataset with my own personal stuff with copies=2 so that dataset is mirrored :)
Im not sure i understand the reason for point 2?
ZFS would never write anything from ARC to disk would it? Everything thats in the ARC already exists on the disks, so it would simply just delete it from the ARC to free up space to give memory to VMs or apps
@@Nonstopie if the records are updated, they stay for 5 seconds in the ARC before they are written to he disk. The same is true for new records.
If some process want to read the same record after say 2 seconds, you want to give it an up-to-date record.
So you don’t have redundancy on any of your storage?
@@williamp6800 Yes , like I said the most important dataset is mirrored (Raid-1) and what I did not mention, I have a backups on my laptop with a 2TB HDD and another one on my 2003 Pentium 4 HT (1C2T; 3.0GHz; 1.5GB DDR 400MHz) with 4 leftover HDDs in total 1.21TB (3.5" IDE 250+320GB & 2.5" SATA-1 320+320GB). It runs on FreeBSD 13.1 :) :)
I have another backup of the family stuff (photos, videos; music, etc) on the 64GB SD Card of my phone. Together with the laptop I consider it my off-line backups.
You're the best! got more informations from this video than the other 10 I tryied to understand👍👍👍
Glad it helped!
Loving the zfs content (and the shirt)
Yet another great video! I thoroughly enjoy your videos, your profound expertise and (calm) way of presenting things make this an incredible experience. Thank you so much.
excellent description! i'll be watching this video a few times to let the concepts sink in. LOL! thanks!
Somehow... Tom is in my mind. I was just working on ZFS today.
Great video as always. Thought I had a pretty good understanding of TrueNAS and ZFS, but learned a lot more after watching. Thanks
Nice timing as I'm deploying a truenas, now you just need to release a video on how to use rdman RoCE etc
Well this is interesting. I just redid my main pool, added more storage. And I have three separate streams syncing data to the main zfs pool --- which is being run as a NFS server. It was writing at about 49M a second. Once I turned on forced syncs, and added LOG vdev the writes doubled to 97M a second.
ZFS really is the file system with the most misinformation in the entire Internet
RTFM
@@blender_wiki If only people would to that ...
@@Felix-ve9hsIs the official documentation digestible for the average power user?
@@waldolemmerI assume not so much ppl read documentation anymore, since you can you listen to UA-cam videos, but in that case ppl arent in the position to very what has been said, or consulting the documents for verification after consuming of online content. therefore theirs a lot of misunderstanding outhere.
i just learned and use L2arc today quiet interesting
Great video, Tom. Thanks for sharing!
Great Explanation, and nice demo too! thanks again Tom.
Interesting information, clear and calm habit. You have a new subscriber.
Thank you for the wonderful detailed explanation.
The t-shirt is awesome. It's a cow, and cow = copy on write.... apt install cowsay; cowsay "ZFS is a cult with integrity"
If you have spinners, then instead of using ZIL or L2ARC, you should consider adding a "Special" or "Metadata" drive to your pool. Of course, it should be a mirror. It will store the metadata, and you can set it to store small writes, which essentially turns your pool into a hybrid drive, with both flash and magnetic media. This will increase your random 4K and IOPS performance for those files which would actually benefit. That is how you increase the performance of your pool for most workloads. Adding cache or intent doesn't, most of the time. It's strange that whenever this question come up, nobody talks about the actual solution people are trying to create, but misunderstanding. It's the other thing that you really want. It's that simple. I'm using a mirrored metadata vdev in my pool, with a set of 8 spinners in a striped mirror config, and I get line speeds at 10gbps all day long.
This could also work for an SSD array, if you use a couple NVMe drives to lower your latencies and increase performance. Although, if you're hosting some VMs off an iSCSI, then they might serve you better repurposed as a ZIL SLOG.
As a home server with few users, can i use a mirror NVMe of 2TB partitioned into SLOG, L2ARC and special vdev (metadata)? My home NAS does not have many SATA or NVMe slots (nor i have much money) to keep them as separate devices.
@@NhatLinhNguyen82 If you aren't using iSCSI to host VMs, you probably won't need SLOG. L2ARC isn't particularly useful for a small home environment, either. Use them as a Special vDev for metadata, and you can also set it up to store smaller files, too. That's the best way to increase those random 4K IOPS.
Edit: If you don't mind living dangerously, and I wouldn't recommend this, but you could use one for each, Special Metadata and L2ARC. It's just not a good idea.
@@TheChadXperience909 Thank you for advice. I am just curious why only VM implementation would benefit from SLOG. I am planning to force sync write to get more data integrity during the large transfer of the photo library which is very valuable to me (my kids photos). Though writing is not often, SLOG would provide speed in such cases for sync write and piece of mind in off case power goes off. Sync write is not often in my case thus SLOG work tasks will not often compete with read tasks of special vdevs thus not problem sharing NVMe. I agree with L2ARC though. I can even do a 3 way mirror to give more pieces of mind again metadata loss with special vdev.
@@NhatLinhNguyen82 VM isn't the only thing to use synchronous writes, but is the most common. The only thing it protects against is data loss during power loss, or some kind of problem which interrupts the transfer. If you're worried about this, you could get a UPS.
Very good video Tom!
i feel like the moral of this video is buy more ram, which works when youre billing corporate, but im just trying to make use of what i got in my basement
i feel that speeding up hdds with ssds would be a more normal use case. it would be nice to see the same $$ in ram vs ssd vs nvme to speed up a hdd raidz2
Thanks for the video. I'd add the benefit of a l2arc goes up the worse your hdd based zpool ist fragmented. Also depends on usage pattern.
Will you do a video about the metadata device benefits? Would love to hear your take on it.
Level1Techs did one ua-cam.com/video/QI4SnKAP6cQ/v-deo.html
@@LAWRENCESYSTEMS one area that wasn't really covered by L1T in that video that I was curious to see when I read your video's title is a metadata only *persistent* L2ARC, also how having an SSD/Optane based L2ARC changes benchmarks for a HDD based TrueNAS system.
A metadata special device just seemed to high of a risk to me to be worth it (lose the special device and you lose the entire pool) where as losing a persistent metadata L2ARC just means losing the time it takes to rebuild the L2ARC.
Great explanation
Thanks for the great video, really helpful.
I'm also a ZFS cult member. I would really like to see a video on which SSDs are best for ZIL.
The faster the better, ZIL only uses a small amount of the drive and it's all about speed.
What NVME drive was used on the demos?
It has been hard for me to find a suitable, affordable drive to use
Thanks for the explanation
I have a larger than required OS disk that is an NVME disk. Can a partition from that disk be allocated to be used as ZIL?
Thanks for this video. Being new to DIY NAS, I am about to build my first NAS, I was wondering if I should add a nvme cache drive to accelerate reads, now I know the best thing to do is just get more memory instead. I was going to go with 16GB, now I will go to 32GB. Just using it for file/movie storage for my HTPC to read the movies from. Going to learn how to setup a 6x6TB array now with two drive redundancy in a RAIDZ2.
I would not bother with an nvme read cache
Anyway to setup the cache to the dram on a gpu with direct storage, there is a little bit of cpu usage with ZFS, but it not appear to be substantial. I would love to just install a very overly absurd gpu that would never be able to use it's full processing and power and especially all of the DRAM instead alongside installing a nvme. I am kinda on a pci-e lane budget - it has pcie switch and could add a riser, but running a server board in a desktop case is kinda awkward. A full 20 lanes of pcie gen 3, seems small but it is on a switch so I think that is good.
Anyways thank you for a look into some of what to run to test the speeds with . My current Truenas setup which is a raid array with 8x1tb ssd on LSI raid card passed through from proxmox to truenas core at this time. I will not yet benefit much from a cache due to I am no where near the potential speed I can read and write on this machine while connecting over the 1000Mb connection Exactly at 116MB/s writes at this time. I do have a dual 2.5Gb card in both my workstation and the server and will likely pass through into Truenas, and setup a round-robin bonding from both sides, I get it does not speed up connections normally, but I am pretty sure if round-robin is set on both ends it will speed up a bit. Ultimately I will eventually set with sfp+ 10Gb. Copying a 700GB file over the network was a bear for speed yesterday.Why moves such large file is due to weird qcow2 file that when I --delete the snapshots, it does not get smaller on the filesystem, but when I examine it with qemu-info the size shows as decreased to the size it should be. Thus I backed it up.
Truenas and ZFS scares me though, every power outage it says pool offline once it boots up and wants me to export, which is when it warns the pool will be removed. -- I said noway to that removing of the pool, and rebooted, and that fixes it now everytime, but I actually have to go into the GUI and manually reboot Truenas after every unexpected shutdown due to the pools do not come back online the first time around. Your mention here of how it works, makes me wonder if I turn on synchronous, then will that stop.
For the ZFS pool offline, I am just rebooting for now until I really want to fix it, but would help to know the answer. The conversations on the web are just really surrounding pools that actually disappear or were accidentally removed, but nothing can find me that has at its' heart the terms "Pool offline after power outage requires reboot to load pools and is not lost" I honestly have considered adding a boot up check that verifies if pool was loaded, sets a flag for reboot count, and then reboots it with the condition the number of reboots does not exceed 2 or something like that.
I am watching a lot of your stuff. There are a some individuals on UA-cam who are absolutely amazing and Lawrence, you are definitely one of them.
New subscriber here.. This was a great explanation of some of the cache-types that zfs offers! As a relative newbie to this zfs world I would be interested in your thoughts on dedupe caches. Can they be added and removed as per your examples here? If the dedupe cache is lost, then is that catastrophic? My use case is as an archive/backup system. Thanks again!
First?! Also, I'm glad you enjoy content creating. Cuz I sure enjoy consuming it!
Thank you for lesson :)
Thanks for the video!
muchas gracias por la información, saludos desde chilee!
thank you for this video! it made my proxmox VM performace drastically incease. Is there other recommendations you have related to TrueNAs proxmox nfs ??
I don't understand the question.
Can we see the contents of the FIO script please?
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=1 --size=256M --numjobs=16 --time_based --runtime=15 --group_reporting
fio --name=randrw --ioengine=libaio --iodepth=1 --rw=randrw --bs=4k --direct=1 --size=256M --numjobs=16 --time_based --runtime=15 --group_reporting
So get more RAM?
yes
@@LAWRENCESYSTEMS thank you 🫡
Good vid, compliments!
Thanks
I'm probably looking under the wrong rock or my install is broken somehow, but as far as I can find the current version of TNScale (Bluefin) has no Gear icon and no three dot menu to remove anything from the pool once made or attached. So by following along you now can't remove the log from the pool once it's created. That's frustrating that stuff once in the GUI just apparently disappears or gets hidden? I thought the Scale gui at this point was close to being ready for some light work outs.
In the System->advanced page is an option called "LOG (write Cache) Overprovision Size ...". How does this differ from the vdev cache option?
Can we get a video on dedup and how to add/remove it and if losing a dedup drive is recoverable and how?
I don't understand how you can get 90MB/s with 3 Sata SSD in raidz1. How is that number even making sense? this is slower than one mechanical HDD. Can someone explain?
Im no ZFS expert but from my understanding ZFS decreases performance somewhat compared to most journalling filesystems, but in turn losing your data is very unlikely
And regular raid1 with zfs is not optimal, because zfs loses some of the controls and checks. Raidz1 fixes that but adds another slowdown, I believe
I really wish that XI would just come out and say the quiet part out loud and tell people if they want fast writes, they need more, faster vdevs. So much energy is wasted by people asking and researching that as through zfs is some sort of special unicorn where you can get additional write speed by using some esoteric config.
So, am i wrong in thinking there is a way to add a 2TB ssd that will be the fast write drive. and then transfer that data to the spinning drives as they can handle it? I know you said 6.25GB ZIL, but if I want to transfer a 1.5TB folder, am I screwed, or is it possible to have that dedicated fast temporary write drive. read speeds are plenty fast for me at the moment
Thx for the video Tom. One thing I find odd though is that by using NVMe drives that is able to do north of 2GB/s of writing, you can't reach that with TrueNAS when used as a SLOG device. That puzzles me a lot and I would really like to understand why. The reason is: if you have a standard pool with mechanical drives in it and you want to accelerate or benefit for that shiny 10GE network you setup, you can't because that SLOG isn't performing at all. There must be something else going on that slows thoses writes on SSD because that's not normal in my book: 60MB/s is a far cry from 2000MB/s that an NMVe can do. Even SATAIII drives that can do 400MB+/s of sustained writing speed are bug down to that same kind of speed. What is really going on so have such slow speed on hyper fast disks?
Just make sure you have beefy ZFS cache on your secondary NVMe drive, that should help.
Thank you so much for this. It really helped me to understand and also teach me how to test on my own system.
Can you do one one just like it on metadata and dedupe?
These types of setups are not as easy to test on my system and I am trying to understand the relationship between the two. Level1Techs has recently been discussing metadata, but they didn't do something like this. Really appreciate it, thank you.
Great video with a lot of information! One q about L2ARC though: I use a 1TB M.2 as L2 and 128GB ARC for my 10GbE storage server which is mostly for photo editing. By using vmtouch I force the most recent photos into L2 so that I can access it really fast. Isn't that another valid use case for L2ARC? Thanks Tom!
Want to know too. Follow
As I said in the video it really depends on your workload, but if you have a frequently accessed data that is larger than ARC the L2Arch may help provided it's faster than the data VDEVS.
Would this be what I would need to use if I wanted to do what spotify & YT music does when I listen to music. It's temporarily saves, but it doesn't take up any of my memory. But I am still able to listen even offline...
So you said it allows services to prioritize over cache but my Minecraft server lags quite often since cache is been full
What is that "sync;fio...." command? I tried the fio command, it worked one time only. I think the sync is to repet that command.
I want to run that test too.l
how does running a full nvme array change the cache needs, since 24 PCIe gen 4 drives is getting close to the speed of RAM?
With enough RAM, do I need a special metadata VDEV?
To Cache or not to cache. Here's the answer: @7:44 in a nut shell
what do you think about dividing 2x 1TB nvme into two partitions and using one part raid1 for LOG and the other part as L2ARC raid0?
I don't think you can do that.
I've done this and it works perfectly (on Linux). The nvme is fast enough to handle both tasks.
Makes me think, if 16GB is large enough for an L2ARC cache, why not use the Intel Optane 16/32GB NVMe drives?
Read cache as an SSD actually slows things down! Write LOG cache is not much better. Best just to have async writes.
of course. that was also stated in the video. but when you want to have the safety of synced writes than the ssd cache will bring up the performance quite a bit. but never up on par with async writes speeds.
Nice video! How do special metadata vdevs factor into this?
Level1Techs did one ua-cam.com/video/QI4SnKAP6cQ/v-deo.html
Hi all. I installed TN Scale on an nvme (256GB). Is there a way to use the spare space in the drive as a cache? also is there any use for that free space? thnx
I didn't hear you mention this aspect. The L2 Cache needs to be faster than the Pool Media or it is a waste. For instance, you wouldn't want a hdd as L2 Cache when your pool is comprised of ssd's.
Thought I did and yes it does need to be faster.
Can you make a video on how to disable it on TrueNAS? It makes my share way slower than a traditional RAID setup. I ended up using Ubuntu Server on RAID, it's way faster on my NVMe RAID setup
My ZFS does not use all my RAM. I am running Truenas Core and i have a ton of unused memory.
So is True Nas Something that ZFS interfaces with on the storage pool side. Or could I implement ZFS and Storage Pools with something like my homeserver using a different OS. Basically I'm saying for redundancy do i need a True NAS "box" and that storage pool tab to implement ZFS
Not clear on your questions but ZFS is a file system and TrueNAS is a good tool to make managing ZFS and everything that connects to ZFS easier than running it all from the command line.
Instant subscribe
Thank you
Can you make a video about HDD spindown and why to not use that? I'm struggling to understand what the extended energymanagement does (Level 1,64,127,128,192,254).
I don't use that feature, I keep them spinning.
@@LAWRENCESYSTEMS Cheers, I now got some experience and having spindown active is really annoying if you want to access data fast but have to wait until the drives are ready.
This is an excellent video! Really made me think twice about some things. AND, got me wondering....where can I find an nvme SSD that is that small? Was it 16GB? Looking through Amazon and I can not find it. Could you please share where did you get it, or give us a link please? Thanks Tom!
Intel Optane M10 16GB is NVMe and small. However its not that quick as it only uses two (I think) lanes
Intel Optanes are best for this. if you want to use "normal" cheap consumer sata/nvme ssds keep in mind the lowest capacity ones are often much slower writing than the bigger ones because they use less memory channels internaly. so while a used 10 year old 32GB SSD will totally be enough in terms of capacity, it may write much slower than the 256GB version of the same drive.
What about storing only metadata on the L2ARC? Is that possible, and would that help the system out on reboot by keeping metadata persistent across reboots?
Not at this time.
Is the worse case scenario of loosing arc (if you go against the recommendation and only use one device) that you loose the 5 seconds of data. Or can you loose data from the storage-pool as well?
No, data committed to the storage pool is safe.
Truenas asked if i wanted a 16gb swap on the boot device. I have a 256gb m.2 disk as a boot device. Should i use 16 for swap? I dont know what this means.
is the RAID(mirror) for write log managed by ZFS or we need separate RAID card ?
The SLOG is managed by ZFS.
I put a 256 gb 860 evo in my server a while back as L2ARC, since i had it laying around. Should I keep it as L2? or make it a slog? Should I remove it completely?
I was kinda confused when I saw a forum post with a guy who had poor performance and someone told him to get a slog
3x4 tb ironwolf 64 gb ram
Is it possible to create a Log partition on the boot pool or boot drives or does Log need to be physical drives , I've got 2x 128GB m.2 drives mirrored for boot pool, they were cheap. Alot of space on there un-used.
I have heard it's possible, not sure it's a great idea as it's not well supported.
I know people always say to not worry about L2ARC or that its a performance loss to use it, but seeing as how if the data isnt in ARC it would have to pull it from the pool (most commonly spinning drives), how is using an L2 disk (like a 1tb nvme) a performance loss vs a spinning sata drive?
As I said in the video it really depends on your workload, but if you have a frequently accessed data that is larger than ARC the L2Arch may help provided it's faster than the data VDEVS.
What nvme are you using that is only 16GB in size!?
I understand that ZFS uses RAM for caching. This might be great for data that you need to be highliy available but probably not for data like the backups of my desktop machine that I want to store on the TrueNAS server. Caching this kind of data inside the RAM seems to be borderline useless. I've just set up a fresh install and the first data that I put on the server was doing that backup of my windows machine. Now 35 Gb of the RAM in my Truenas machine are occupied with that data... Is there a way you could turn of RAM-caching for a dedicated dataset or any other option to manage what data is chosen for RAM-cashing?
It only uses memory you are not using.
@@LAWRENCESYSTEMS Ok, but can I somehow prioritize what data goes into RAM-cash if I have data that needs to be highly availible and other data that doesn't? Wouldn't it make mor sense to cash the data that I neet to access frequently?
Does having Raidz2 have effect on write speed vs Raidz1 when not using log vdev?
Yes, Z2 can be slower as it has more drives in each VDEV to write to.
How i can make a cashe on my ssd in liunix
Where did you get the shirt sir?
lawrence.video/swag
Is it okay that the system will use such a ssd drive at 100%?
It is not good for ssd disks to be filled to the brim.
I have a dumb question... I'm researching truenas for the near future. I want to put portainer using the docker inside truenas. Does the cache works for all the apps inside docker inside truenas? Sorry for my poor explanation, i don't speak english very good...
Caching is a function of ZFS so any applications and data stored on ZFS will benefit from the cache
@@LAWRENCESYSTEMS Thank you!
Would an L2arc cache make sense for a Plex/Jellyfin media server? I would guess no as you probably won’t play the same episode/movie over and over…