Architecture and Build - Building a ZFS Storage Server for a Mixed Workload

Поділитися
Вставка
  • Опубліковано 23 лип 2024
  • If you have been a fan of us for many years, you would have remembered that we started out doing longer-form educational videos focusing on our products and services. These long form videos are back!
    This "Architecture & Build" video series will showcase real-life storage builds and scenarios that will be designed on a whiteboard and then built with hardware in our lab.
    For our second video in the series, Brett and Mitch architect a ZFS single server solution that will perform best for mixed workloads. In the lab, they build this setup with a Storinator Hybrid using a mix of HDDs and SSDs and complete the administration using our internal Houston software platform.
    Chapters:
    00:00 - Introduction
    01:27 - Storage Requirements for Small to Medium-Sized Businesses
    02:05 - Introducing ZFS as a Solution
    02:32 - What is the required workload for this ZFS system?
    04:46 - The Hardware
    06:44 - Architecting a ZFS Storage Server for a Mixed Workload
    22:40 - Building a ZFS Storage Server for a Mixed Workload
    25:14 - Creating the RAIDZ2 Vdevs (storage pool)
    29:44 - Adding Additional Virtual Devices
    31:10 - Adding a Special Vdev
    33:38 - Creating/Configuring File Systems
    37:22 - Outro (Including a Look at ZFS Send & Receive)
    Check out our "Understanding the Basics ZFS" article: www.45drives.com/community/ar...
    Check out our bi-weekly tech tip video series: • Tuesday Tech Tips
    Visit our website: www.45drives.com/
    Check out our GitHub: github.com/45drives
    Read our Knowledgebase for technical articles: knowledgebase.45drives.com/
    Check out our blog: 45drives.blogspot.com/
    Single Server Buying Guide: www.45drives.com/products/net...
    Single Server Questionnaire: www.45drives.com/products/net...
    Ceph Clustering Buying Guide: www.45drives.com/solutions/di...
    Have a discussion on our subreddit: / 45drives
    #45drives #storinator #stornado #storageserver #serverstorage #storagenas #nasstorage #networkattachedstorage #petabytestorage #cluster #cephstorage #cephcluster #veeam #veeamrepository #proxmox #zfs #zfsstorage
  • Наука та технологія

КОМЕНТАРІ • 52

  • @45Drives
    @45Drives  2 роки тому +4

    Chapters:
    00:00 - Introduction
    01:27 - Storage Requirements for Small to Medium-Sized Businesses
    02:05 - Introducing ZFS as a Solution
    02:32 - What is the required workload for this ZFS system?
    04:46 - The Hardware
    06:44 - Architecting a ZFS Storage Server for a Mixed Workload
    22:40 - Building a ZFS Storage Server for a Mixed Workload
    25:14 - Creating the RAIDZ2 Vdevs (storage pool)
    29:44 - Adding Additional Virtual Devices
    31:10 - Adding a Special Vdev
    33:38 - Creating/Configuring File Systems
    37:22 - Outro (Including a Look at ZFS Send & Receive)

  • @wbrace2276
    @wbrace2276 2 роки тому +14

    While I know this is not the point of the video it honestly cracks me up watching two storage experts play the role of the customer acting like they don’t know what’s going on. Pure comedy and I love it. Ranks right up there with Doug’s love for his colored dry erase markers.

  • @tariq4846
    @tariq4846 2 місяці тому

    This is what I was looking for from many days. I was searching various blogs and forums to find a way to use HDDs to host my Proxmox.

  • @sbagel95
    @sbagel95 2 місяці тому

    I love this video, you guys make learning zfs very entertaining

  • @kiefffrc2386
    @kiefffrc2386 2 роки тому +7

    I’m 14 minutes in and I have to comment. Thank you for the knowledge in this video. As much as I know about ZFS, it’s always nice to get a lesson from two brilliant guys that know it very well. I mostly run ZFS at home for my esxi backend/all in one box for my home really. I wish I got to play with ZFS more in the wild, but I work at a MSP, so it’s not really a solution they would sell for their various reasons. Ok I’m gonna continue, but thanks again gents.

  • @Pariah902
    @Pariah902 2 роки тому +4

    Love this kind of format, thanks a lot!

  • @mdd1963
    @mdd1963 2 роки тому +4

    this is some damn good info, thanks guys!!

  • @MichealG
    @MichealG 2 роки тому +1

    Finally something other than ceph :phew Thank you 🙏

  • @nono_ct200
    @nono_ct200 2 роки тому +2

    Great job love it, storinators are very nice 😊

  • @DaniloMussolini
    @DaniloMussolini 2 роки тому

    Very nice info guys! Thanks for the video.

  • @DJRhinofart
    @DJRhinofart 7 місяців тому

    LOL You can totally hear the Canadian East Coast accent in there. Subbed, and loved it.

  • @jsaenzMusic
    @jsaenzMusic 4 місяці тому

    Curious why Z2 instead of a bunch of mirrored vdevs. Has your experience with the 2nd drive going bad while the replacement resilvered happened that often? So thankful for the knowledge and experience you guys are sharing. I loved how this video tied in all the special Vdevs and ZFS components into a practical build!

  • @jdeee.mp3
    @jdeee.mp3 2 роки тому +2

    Worth noting: If you add a special vdev, it's best to mirror, or perhaps two mirrors (RAIDZ10?), because if that special vdev fails you will lose the entire pool!

    • @jdeee.mp3
      @jdeee.mp3 2 роки тому

      And maybe because this a new feature and doesn't have all the kinks worked out, I have found that during testing, when the special vdev fills up, it will just start writing new data to the HDD vdev. And it will not re-arrange that data in the future, like to promote frequently used data or metadata to the special. I found that setting the special vdev to only store metadata was the best use in light of this.

    • @mitcHELLOworld
      @mitcHELLOworld 2 роки тому +3

      @@jdeee.mp3 We actually explain both of these things in the video :) that special VDEV functionality is actually by design. Since its first and most important use case is for metadata, once it starts to fill up it will eventually cut off the small block writes and send them back to the HDD's to ensure that it has enough space for the metadata.

    • @jdeee.mp3
      @jdeee.mp3 2 роки тому +1

      @@mitcHELLOworld Ok good to know, thanks. Edit: Must not have been paying attention. Re-watching the video I did see that you mentioned it!

    • @mitcHELLOworld
      @mitcHELLOworld 2 роки тому +1

      @@jdeee.mp3 yeah I believe the threshold is around 75% but don’t quote me on it. It can be tuned also! Thanks for commenting and watching !

  • @ws2940
    @ws2940 2 роки тому +2

    I understand some of this and am yet further confused. But that is ok :P A couple of questions. Where does the OS that controls all of this live? Is it on the platter and spindle disks? or on the SSD's? Or on the NVME devices? Also how do you access the drives for replacement? Does the unit slide out with enough slack on the power/network cabling. So you can take the top off and then access the drives to replace the faulty one while the device is up? Or do you leave space above the device in the rack so that you can take the top off to access the drives? Or do you have to take the device offline to replace a faulty disk?

  • @vgshadow3112
    @vgshadow3112 Рік тому

    Amazing ... Video about ZFS storage....
    I like that ZIL it should be have NVram battery i guess

  • @ClifBridegum
    @ClifBridegum 5 місяців тому

    This is my exact scenario of needing a system for mixed use. A small SMB roughly 15 users. The system will do file storage (smb - mostly ms office and some images), host a few vms (domain controller, db server), and a database for client appointments. Needs are small enough that I can just go all flash, somethinglike 8, 4tb ssds. Do you think a slog or special cache would be beneficial in my case? If so I guess NVME SSD? I can do 10 or 25G nics.

  • @bertnijhof5413
    @bertnijhof5413 8 місяців тому

    The LOG seems to be huge 4 SSDs, typically for millionaires without any financial constraints. Since 2018 my LOG is a SSD partition of 5GB, enough to cover all my writes during 5 seconds :) The interesting part of the video was about the special device for meta data and small files, maybe I will use my 30GB SSD partition for that purpose instead of using it as L2ARC :) I assume, that like on the HDD datapool they will write all meta data twice.
    My ZFS system is a $349 desktop; Ryzen 3 2200G, 16GB DDR4; 512GB nvme-SSD (3400/2300MB/s) as ZFS boot device and to store my main VMs. Other storage are 2 data pools;
    - one for VMs on the first 1TB partition of the HDD with a L2ARC partition of 90GB and that LOG of 5GB and
    - the other one for my data on the last 1TB partition of the HDD with L2ARC partition of 30GB and a LOG of 3GB.
    All L2ARCs and LOGs together are on the 128GB of my sata-SSD.

  • @jamesbland8919
    @jamesbland8919 2 роки тому

    Great video guys!
    I'm actually building out a Q30 for mixed use now and found this very helpful. My build out is very similar except that we have historically kept our database data in the VM stack. Are you presenting the db storage as ISCSI LUNs to your VMs?

  • @richardbennett4365
    @richardbennett4365 Рік тому

    So, so informative and HELPFUL, guys.
    I am in the beginning stages of setting up a home NAS, and I'll be using TrueNAS CORE, too.
    A couple questions, please. First, what I have interms of hardware: 4 1-TB M.2 NVMe drives, 2 SATA III SSDs, and a mirrored zpool of 2 6-TB HDD NAS-level (WD Red Plus) devices already hosting the data for an existing Nextcloud server I'm running on another machine.
    1) can one have the ZIL (SLOG) and the ARC (or L2ARC) on the same SATA III SSD, but with separate partitions for each, like sdx1 and sdx2, and
    2) what's the difference between SCALE's apps and CORE's plugins? Does the Nextcloud plugin on CORE start up and run a Docker instance of Nextcloud or the snap version of Nextcloud, or a plain server not in Docker, but in it's own VM, or something else like a jail on the boot disk where the TrueNAS CORE OS is running?

  • @Solkre82
    @Solkre82 2 роки тому +1

    I love this stuff, I wish I could work on it professionally. I'd end up spending thousands for a home setup like this to run my plex and like 1 vm... sigh

  • @ackwood-it
    @ackwood-it Рік тому

    Hi guys,
    I don't quite understand why you only see the recordsize when setting up the vdevs? I haven't seen anything about a blocksize (ashift) setting. What are these for the different workloads such as database, vms and shares? Or what are the recommendations for this?
    Thanks

  • @dylanbob9313
    @dylanbob9313 2 роки тому +1

    sick khamzat shirt

  • @gurkancekic9057
    @gurkancekic9057 4 місяці тому

    Thank you for lab presentation of your system. My question is how this system in presentation be coupled to Host as Datastore ? Fabric or iSCSI ? or just NFS ? thank you.

    • @gurkancekic9057
      @gurkancekic9057 4 місяці тому

      pardon, I found the answer myself on the replay, Michelle has already mentioned that it can be hooked up to Proxmox via iSCSI.

  • @pivot3india
    @pivot3india Рік тому

    can we do high availability for zfs (iscsi) using ceph ?

  • @phychmasher
    @phychmasher 2 роки тому

    How durable are those Sata SSDs? Would it have made sense to have 2 of them be the SLOG and the other two be your ddt? I'm thinking that data store could make great use of dedupe.

    • @45Drives
      @45Drives  2 роки тому +2

      The Micron 5300s we use are definitely robust and reliable, certainly enough to be used as a SLOG or Special VDEV. In general though a SLOG will only be useful in very specific circumstances, generally when your workload involves a lot of sync writes (databases, vm hosting), often times adding them will not provide any significant performance benefits.
      As for your DDT/Special VDEV, you can certainly add SSDs as these devices but you'll generally get more benefit out of using NVMe for these purposes. As a note we recommend against dedupe as the storage efficiency benefit is usually fairly small but the performance impact can be substantial. You will also want solid resiliency on a special VDEV, we'd recommend at least a 3 way mirror rather than 2 way. If your special VDEV fails so does the rest of your pool.

  • @nickway_
    @nickway_ Рік тому

    When you added your slog @30:38 and selected the 4 ssds as mirror, did it create a 4way mirror, or stripe+mirror?

    • @45Drives
      @45Drives  Рік тому +1

      It created a single 4 drive mirror. Great catch. You probably don’t need to go that crazy with a slog and instead would get better iops capabilities out of creating the first 2 disks into a mirror and then go back and add another LOG with another 2 disks - this would result in your two 2 drive mirror vdevs in your slog.

  • @tigerfan525
    @tigerfan525 Рік тому

    "Is that eight?"... nope, add another line and now its 7. In all seriousness, great video.

  • @chrismoore9997
    @chrismoore9997 2 роки тому +2

    What OS is you NAS using ? I don't recognize the menu system.

    • @mitcHELLOworld
      @mitcHELLOworld 2 роки тому +2

      The OS we are using is actually just Rocky Linux, but the menu system is actually our own Houston UI ! It is totally free and open source and easy to install if you'd like to try it. We officially support Ubuntu and Rocky... Might be some small hoops to jump through for some other Debian or RHEL derivatives :)

    • @ws2940
      @ws2940 2 роки тому +1

      @@mitcHELLOworld thank you for the answer. My bad, I asked this question a few comments up.

  • @bonneywatson9223
    @bonneywatson9223 Рік тому

    Amen 🙏

  • @JonnyDudemeister
    @JonnyDudemeister 2 роки тому

    Could you do an architecture and build video on a dual controller ZFS storage server? Would really like to see a setup with higher availability than recovering from a replica but without having to build a 3+ node ceph cluster.

    • @45Drives
      @45Drives  2 роки тому +1

      We'll add it to the queue

  • @nono_ct200
    @nono_ct200 2 роки тому

    Dear 45Drives Team, regarding Mixed Workload I‘m currently planning a Storinator with Houston UI for SMB Fileshares for a Active Directory in combination with Proxmox Backup Server, on a separate Drive Array, but everything within the same Storinator, maybe your team has some experience with this Config and is able to share the Key Configurations with me/us. I’ve tried installing Ubuntu server 20.04 with Houston UI and ad PBS also but this did not work, from the other side it didn’t work either (installing pbs and adding Houston ui) maybe you had this use case before

    • @45Drives
      @45Drives  2 роки тому +2

      Proxmox backup servers run on Debian, I can't imagine it would run on Ubuntu without significant modification, if even possible. You could install Proxmox Backup Server off their official ISO, then install cockpit and all of our Houston Modules on top, we do not officially package for Debian, but the ubuntu packages may work with minimal modifications.
      Generally not a use case we're familiar with as having backup infrastructure, and file sharing infrastructure on the same device is not our recommendation. Backup infrastructure should always be on its own gear.

    • @nono_ct200
      @nono_ct200 2 роки тому +1

      @@45Drives ok thanks for the clarification 👌

  • @gngui
    @gngui Рік тому +1

    Do I need the special VDEV if I have enough RAM?

    • @45Drives
      @45Drives  Рік тому +1

      Hey Gerald,
      Appreciate the question!
      If you have lots of RAM, you can probably get away with no special vdev. However, you will want to watch out for the arc hit rate, as if it is up north of 90% percent then your ARC cache is serving most of the I/O requests. In this case, a spec vdev will only be for the other 10% or colder data on the zpool.
      The special vdev is only going to serve metadata faster. So, if faster listing times and/or very fast searches is crucial to your workflow than keeping your pool simple with lots of RAM only, and regular vdevs is a great choice.
      If you check out our video on NVMe special devices ( ua-cam.com/video/0aM1iZJkOaA/v-deo.html ), you will see that even a regular HDD pool can serve metadata workloads pretty darn fast with just spinners and a good ARC cache.
      Hope this answer clarifies things up for you.
      Thanks again!

  • @richardbennett4365
    @richardbennett4365 Рік тому

    Most assurance on ZFS would be a mirrored drive setup.
    It is NOT Copy On Write! Don't have a COW.
    ZFS is redirect on write like B Tree File System.

  • @Alee20300
    @Alee20300 3 місяці тому

    L2arc is not required ?

  • @helderfilho4724
    @helderfilho4724 7 днів тому

    19:24 sounds like AccuBattery (:

  • @RamaOlama
    @RamaOlama 3 місяці тому +1

    you’ll honestly loose your jobs in 5 years, when u.3/e1 nvme drives gets a lot faster. ZFS is already a bottleneck on something like 8x Micron 7450 Max.
    We have CPU’s now with 128 PCI5.0 Lanes, that makes up for an big amount of ultra fast ssds for almost all companies. If ZFS/Ceph will get a bottleneck, no one is going to use it anymore.
    Cheers

    • @sbagel95
      @sbagel95 Місяць тому

      Can you explain this more?

    • @RamaOlama
      @RamaOlama Місяць тому

      @@sbagel95 There is not much to explain, ZVOLs (ZFS Blockdevices that you use to split ZFS over ISCSI) are 20 times slower as every alternative that exists. You can share the Whole pool as it is over whatever you like, for simplicity lets take ISCSI, its still at least 2x slower as any alternative way.
      Lets say it differently, ZFS is the slowest File and Blockstorage at the moment for NVME’s that exist, simply because of the extreme ZIL and Cache overhead.
      While for spinning drives, for what ZFS was developed for, its a superrior filesystem.
      To explain this in detail, a youtube comment wouldn’t be enough.