@TOM, Ican see that XCPng in 2024 still not allows secure boot setup and encrypted root or any TANG/Clevis setup by design. Do you know anything about that if that is a thing in 8.3? Or maybe encrypted ZFS for the root? ?
I am not aware if that is on the roadmap and I don't really understand the use case. I have use encryption for some of my VM's but not for any of my XCP-ng hosts.
it sounds like zfs on xcp-ng works due to the underlaying operating system integrating the basic tools, but that xcp-ng wasn't isn't at all (currently) designed to take advantage of it other than simply identifying another url where storage is in the normal *nix way and providing absolutely no integrated management tools or leveraging of the zfs technology. - while zfs as a technology is mature, it sounds like this integration is a hack or something "in testing" rather than something that the xcp-ng folks actually want you to do vs. having an actually integrated zfs management system integrated (like imo they should). the lack of gui for things like scrubbing (it doesn't look like automated scrubbing is covered in this tutorial at all and is PARAMOUNT to a healthy zfs system), snapshots, permissions, adding and modifying datasets, etc. makes this implementation (imo) NOT ready for prime time.
I'd love to see a discussion about pros and cons of having VDIs locally on ZFS vs on TrueNas. You've mentioned already (and I see in earlier comments) that you're on your own to admin the ZFS pools, so that's a potential con, but are there other cons and are there pros (aside from VM performance)?
Checking the math here to see if it makes sense: It looks like that you have twelve 1 TB NVMe SSDs. In a raidz2 array, that should net you ~10 TB of storage (which `zfs list`) shows that. However, in xcp-ng though, when you added the storage, it showed that you only have about 7.85 TiB of space available. Any ideas as to why the size of the pool is different, and an unexpected value? Do you already have a VM stored there, or is there something that's going on with the math inside xcp-ng (where it is showing that you have less capacity than you actually are supposed to have)?
I suspect it's enforcing an 20%-ish free space rule plus some slack from zfs itself (3.5% of the pool space is reserved by ZFS itself). If you fill ZFS over 80% the performance tanks and you can get into lockups with "no free space available" errors even if there is space just because of fragmentation. Similar to btrfs btw. CoW filesystems really suffer if too full.
It’s because he’s using enterprise drives. 1 tb enterprise are actually 960gb drives + the overhead makes it roughly 894gb. That’s why you see enterprise drives listed as capacities such as 960gb, 1.92tb, and so on. It’s all overprovisioning
@@marcogenovesi8570 "If you fill ZFS over 80% the performance tanks and you can get into lockups with "no free space available" errors even if there is space just because of fragmentation." Varies. Fragmentation, as you probably already know, in ZFS, refers to how fragmented the FREE space is, not how fragmented the OCCUPIED space is. Both of my ZFS pools on my main Proxmox server was at 94% and 96% of its capacity, respectively, until very recently. You can STILL use it, but it gets VERY, VERY slow, because it's looking for free blocks to write the data to. "Similar to btrfs btw. CoW filesystems really suffer if too full." I would imagine that B-tree FS was developed similarly to ZFS, although ZFS was officially released out of the OpenSolaris project and mainlined into production Solaris 10 6/06 (U2), or about 3 years after ZFS was mainlined into PROD Solaris 10. I am, in the background, working on a theoretical proof, which aims to demonstrate that if you have a ZFS array that's 80% full, you can still achieve 100% fragmentation via a worked example, as a result of the copy-on-write nature of ZFS. (i.e. even if you're < 80% used, you can still result in relatively poor performance due to the CoW nature of ZFS), where, with enough repetitive writes, you can end up fragmenting something that wasn't fragmented initially. That's a work in progress.
@@npham1198 "It’s because he’s using enterprise drives." [citation needed] Where is the source for this? "1 tb enterprise are actually 960gb drives + the overhead makes it roughly 894gb." That makes no sense. 894 GB * (12 -2) (for raidz2) = 8940 GB. When Tom runs the `zfs list` command at 6'29", it clearly shows that the size of the pool is "10.5T". Your math does not support what's reported according to `zfs list`.
@@ewenchan1239 his livestream. He shows pictures. Also under the CLI you can see his drives show up as 894.25gb which is inline with the hundreds of 960gb drives we have in production
Don’t use xcpng cannot export to ova, spent the last two weeks trying to convert an xva to anything else without corruption in a production environment. It’s a lost cause have to rebuild a new vm on a new environment.
Thanks Tom for the video. Really helpful as usual. How did you create the partitions on the nvme drives and which partiontype did you chose?
Building the ZFS pool creates the partitions.
@TOM, Ican see that XCPng in 2024 still not allows secure boot setup and encrypted root or any TANG/Clevis setup by design. Do you know anything about that if that is a thing in 8.3? Or maybe encrypted ZFS for the root? ?
I am not aware if that is on the roadmap and I don't really understand the use case. I have use encryption for some of my VM's but not for any of my XCP-ng hosts.
it sounds like zfs on xcp-ng works due to the underlaying operating system integrating the basic tools, but that xcp-ng wasn't isn't at all (currently) designed to take advantage of it other than simply identifying another url where storage is in the normal *nix way and providing absolutely no integrated management tools or leveraging of the zfs technology. - while zfs as a technology is mature, it sounds like this integration is a hack or something "in testing" rather than something that the xcp-ng folks actually want you to do vs. having an actually integrated zfs management system integrated (like imo they should). the lack of gui for things like scrubbing (it doesn't look like automated scrubbing is covered in this tutorial at all and is PARAMOUNT to a healthy zfs system), snapshots, permissions, adding and modifying datasets, etc. makes this implementation (imo) NOT ready for prime time.
As I said, we don't have clients using it in production for the reasons you mentioned.
I'd love to see a discussion about pros and cons of having VDIs locally on ZFS vs on TrueNas. You've mentioned already (and I see in earlier comments) that you're on your own to admin the ZFS pools, so that's a potential con, but are there other cons and are there pros (aside from VM performance)?
I have a video on that ua-cam.com/video/xTo1F3LUhbE/v-deo.htmlsi=3jPadko3x-SH8LXh
Checking the math here to see if it makes sense:
It looks like that you have twelve 1 TB NVMe SSDs.
In a raidz2 array, that should net you ~10 TB of storage (which `zfs list`) shows that.
However, in xcp-ng though, when you added the storage, it showed that you only have about 7.85 TiB of space available.
Any ideas as to why the size of the pool is different, and an unexpected value?
Do you already have a VM stored there, or is there something that's going on with the math inside xcp-ng (where it is showing that you have less capacity than you actually are supposed to have)?
I suspect it's enforcing an 20%-ish free space rule plus some slack from zfs itself (3.5% of the pool space is reserved by ZFS itself). If you fill ZFS over 80% the performance tanks and you can get into lockups with "no free space available" errors even if there is space just because of fragmentation. Similar to btrfs btw. CoW filesystems really suffer if too full.
It’s because he’s using enterprise drives. 1 tb enterprise are actually 960gb drives + the overhead makes it roughly 894gb.
That’s why you see enterprise drives listed as capacities such as 960gb, 1.92tb, and so on. It’s all overprovisioning
@@marcogenovesi8570
"If you fill ZFS over 80% the performance tanks and you can get into lockups with "no free space available" errors even if there is space just because of fragmentation."
Varies.
Fragmentation, as you probably already know, in ZFS, refers to how fragmented the FREE space is, not how fragmented the OCCUPIED space is.
Both of my ZFS pools on my main Proxmox server was at 94% and 96% of its capacity, respectively, until very recently.
You can STILL use it, but it gets VERY, VERY slow, because it's looking for free blocks to write the data to.
"Similar to btrfs btw. CoW filesystems really suffer if too full."
I would imagine that B-tree FS was developed similarly to ZFS, although ZFS was officially released out of the OpenSolaris project and mainlined into production Solaris 10 6/06 (U2), or about 3 years after ZFS was mainlined into PROD Solaris 10.
I am, in the background, working on a theoretical proof, which aims to demonstrate that if you have a ZFS array that's 80% full, you can still achieve 100% fragmentation via a worked example, as a result of the copy-on-write nature of ZFS. (i.e. even if you're < 80% used, you can still result in relatively poor performance due to the CoW nature of ZFS), where, with enough repetitive writes, you can end up fragmenting something that wasn't fragmented initially.
That's a work in progress.
@@npham1198
"It’s because he’s using enterprise drives."
[citation needed]
Where is the source for this?
"1 tb enterprise are actually 960gb drives + the overhead makes it roughly 894gb."
That makes no sense.
894 GB * (12 -2) (for raidz2) = 8940 GB.
When Tom runs the `zfs list` command at 6'29", it clearly shows that the size of the pool is "10.5T".
Your math does not support what's reported according to `zfs list`.
@@ewenchan1239 his livestream. He shows pictures. Also under the CLI you can see his drives show up as 894.25gb which is inline with the hundreds of 960gb drives we have in production
When I have hit what seems to be a wall, I usually boot into an alternate Linux OS and then use dd to copy the HD image out.
Don’t use xcpng cannot export to ova, spent the last two weeks trying to convert an xva to anything else without corruption in a production environment. It’s a lost cause have to rebuild a new vm on a new environment.
I guess i don't know about XCP-ng directly but if you are using Xen Orchestra there is an option to export as an OVA directly