How to set up a cluster with CephAdm

Поділитися
Вставка
  • Опубліковано 26 сер 2024
  • We explore using the cephadm tooling to bootstrap and configure a small cluster with three drives and multiple hosts. We also discuss how cephadm administers different resources and shares them between hosts.
    Write up:
    danielpersson....
    Unlock unlimited opportunities with 50% off your first month of Coursera Plus
    imp.i384100.ne...
    Join the channel to get access to more perks:
    / @danielpersson
    Or visit my blog at:
    danielpersson.dev
    Outro music: Sanaas Scylla
    #ceph #cephadm #install

КОМЕНТАРІ • 88

  • @user-mq3hj3dt9f
    @user-mq3hj3dt9f 2 роки тому +1

    You're a wizard! Your level is what I strive for! Tell me, will there be a video about deploying OpenStack using Kolla Ansible?

    • @DanielPersson
      @DanielPersson  2 роки тому +2

      Hi Михаил
      Well, not directly at the moment. A all in one install of only Kolla requires a machine with 32 GB of memory. Sadly I don't have that compute resources to spare at the moment. But this weeks video will be an install video of Openstack using Kolla Ansible wrapped with Kayobe.
      I'm still learning about OpenStack so there might be more videos about it in the future.
      Thank you for watching my videos.
      Best regards
      Daniel

  • @LampJustin
    @LampJustin 2 роки тому +2

    Yeah finally ^^ Thank you very much!

    • @LampJustin
      @LampJustin 2 роки тому +2

      The takeaway is fair. I love using labels, filtering etc. but that's personal preference. The thing is if you run big clusters you almost always have standardisation and if you always add nodes or disks according to that, then filters add a real benefit, as even someone that has no access to the cluster, but is allowed to add and remove a failed disk for example can easily heal the cluster with 0 touch. And that's what Cephs all about. Also the ceph upgrades with cephadm are really sleek!

    • @subhobroto
      @subhobroto 2 роки тому +1

      @@LampJustin cephadm makes life easy!

    • @LampJustin
      @LampJustin 2 роки тому +2

      @@subhobroto Indeed!

  • @subhobroto
    @subhobroto 2 роки тому +2

    As the next video, would it be interesting to configure bi directional synchronization between two ceph clusters, now that's so easy to set up ceph clusters 😁?

    • @DanielPersson
      @DanielPersson  2 роки тому +1

      Hi Subhubroto
      I'll look into it, might not be the next video.
      Thank you for watching my videos.
      Best regards
      Daniel

    • @4n7s
      @4n7s 2 роки тому +2

      I'd be interested too to know how could one make multiple clusters work as one to store data.

  • @subhobroto
    @subhobroto 2 роки тому +1

    This is amazing! Thank you!

  • @samuelajisafe6058
    @samuelajisafe6058 3 місяці тому +1

    Hello Daniel,
    I setup the ceph using the cephadmin, I am able to login to the UI, however the trying to add another, I supplied all the require information like the hostname, IP address, however, it's been over an hour an the node has still not joined the cluster, how can I troubleshoot this.
    NOTE: I added the public ssh key to the second node

    • @samuelajisafe6058
      @samuelajisafe6058 3 місяці тому +1

      I just also discover that I can't set any option in Create OSD

    • @DanielPersson
      @DanielPersson  2 місяці тому

      Hi Samuel.
      Well the joining of the cluster is dependent on if your main machine can connect as root to the other host. If not it will not work. There are so many reasons this might be failing, so I can't troubleshoot it here, but I think if you start over, it might be something simple you've overlooked the first time.
      I hope this helps. Thank you for watching my videos.
      Best regards
      Daniel

  • @afz902k
    @afz902k Рік тому +2

    Hey Daniel, if I already have a couple ceph clusters running, is it feasible to install cephadm on top of them just as a visualization tool? I already have graphs so I don't need that necessarily, and I don't need all the actual administration it does, but I do like its dashboard

    • @DanielPersson
      @DanielPersson  Рік тому +1

      Hi Fergus.
      I would not recommend to migrate but I have made a video about the flow:
      ua-cam.com/video/ZaIy11MY6tQ/v-deo.html
      I hope this helps.
      Best regards
      Daniel

    • @afz902k
      @afz902k Рік тому +1

      @@DanielPersson thanks! I went and had a watch there too. Overall, nice to know :)

  • @skawashkar
    @skawashkar 4 місяці тому +1

    One quick question is I don't see ceph.client.admin.keyring and ceph.conf in the other nodes. Should I initiate the bootstrap in the other nodes too?

    • @DanielPersson
      @DanielPersson  4 місяці тому

      Hi Sujith
      That is a great observation. Sadly no. When you set up a cluster with cephadm you create a system with access on the first node and then the rest of the system is in containers and all the other nodes don't have the same access.
      There is an image you can deploy for client tasks or you use the first node to contact the rest of the system.
      If you want all of them to have similar access you need to copy the files over and install the ceph-common package.
      I hope this helps. Thank you for watching my videos.
      Best regards
      Daniel

    • @skawashkar
      @skawashkar 4 місяці тому +1

      @@DanielPersson Thank you very much Daniel. I have one more question iscsi which I have posted in your video relates to iscsi block configuration. Any changes in the iscsi-gateway.cfg causes the rbd-gateway-api service to fail and even if I revert back to the previous configuration. It never starts.Also is it possible to mention multiple pools with different replication type and create iscsi targets from those pools. Because if I add one more pool in the iscsi-gateway.cfg the rbd-gateway-api service to fail and it never starts.

    • @DanielPersson
      @DanielPersson  4 місяці тому

      Hi Sujith.
      Adding more pools and having more RBD images should work just fine. Some companies have thousands of RBD images.
      Best regards
      Daniel

  • @prajinprakash4585
    @prajinprakash4585 Рік тому +1

    For me not listing the devices. I only added 1 hosts. Now have 2 hosts in total. But unable to add OSD its not listing devices as in this video.

    • @DanielPersson
      @DanielPersson  Рік тому

      Hi Prajin
      You need to have a working cluster before you can add OSD. Generally it requires three hosts with monitors before it can reach a qourum so you can add storage devices. I've built a single host system for testing but it's a hazzle.
      I hope this helps.
      Thank you for watching my videos.
      Best regards
      Daniel

  • @xiaobaowen2392
    @xiaobaowen2392 2 роки тому +1

    amazing

  • @4n7s
    @4n7s 2 роки тому +1

    I have installed my cephadm using sudo apt install -y cephadm, is this practically the same thing?

    • @DanielPersson
      @DanielPersson  2 роки тому +1

      Hi Ledimestari
      Yes, the only difference is that the other way works on more distributions. Then again the package install for docker was debian centric.
      Thank you for watching my videos
      Best regards
      Daniel

  • @dwieztro6748
    @dwieztro6748 11 місяців тому +1

    hii.. what happen if admin1 is crash and need to reinstall from scratch?

    • @DanielPersson
      @DanielPersson  11 місяців тому

      Hi Dwie
      Well, all monitors should be able to handle the traffic, and you should be able to replace it with another host. If you have multiple hosts installed and one crashes, then another one might take over the role of monitor.
      I'm more familiar with the manual path and then you just create another monitor, add one, and remove the old one and then you can gradually replace them without actually restarting.
      I hope this helps. Thank you for watching my videos.
      Best regards
      Daniel

  • @IT-Entrepreneur
    @IT-Entrepreneur Рік тому +1

    What hardware did you use for your Ceph Cluster? Rasperries?

    • @DanielPersson
      @DanielPersson  Рік тому

      Hi
      No, I started to install on raspberrys but the arm packages aren't that new. I built some of my own but getting ceph to run on raspberrys with constrained memory and PCIe lanes seemed to be a big challenge. My cluster at home is running on a couple of mini PCs.
      This video is a bit old
      ua-cam.com/video/B6XXOVcLhzA/v-deo.html
      But I'll go through the early nodes and their capabilities.
      I hope this helps.
      Thank you for watching my videos.
      Best regards
      Daniel

  • @sklise1
    @sklise1 2 роки тому +1

    FYI, getting an error when i follow the kalaspuffar/cephadm-install.md. I get an error running Package docker-ce is not available, but is referred to by another package.
    This may mean that the package is missing, has been obsoleted, or
    is only available from another source

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Steve
      Thank you for watching my videos.
      Install instructions will always be updated. You can find resent instructions on dockers homepage. docs.docker.com/get-docker/
      Best regards
      Daniel

  • @d.s.dahiya3408
    @d.s.dahiya3408 Рік тому +1

    hello Daniel
    Thanks for this detailed video.
    Did you ever tried migrating an OSD from one node to another node without backfilling or reallocation of data so that the device can be automatically recognized by the ceph cluster on another node or is there a way to do that?

    • @DanielPersson
      @DanielPersson  Рік тому +1

      Hi D.S
      The OSD metadata is stored in LVM volume. So you should be able to move one drive from one node to another and just activate it.
      I have a video about failure recovery.
      ua-cam.com/video/ofxWIaCPO6c/v-deo.html
      I hope this helps. Thank you for watching my videos.
      Best regards
      Daniel

    • @d.s.dahiya3408
      @d.s.dahiya3408 Рік тому +1

      @@DanielPersson thanks for reply. Will try it, but in your case you discarded the old server whereas my server is part of cluster from which I am moving one drive to other node which is also part of cluster, will give a try or if you have more suggestion, welcome that.

  • @sandeeppatil5925
    @sandeeppatil5925 Рік тому +1

    is there no automated way to install ceph - say via ansible ?

    • @DanielPersson
      @DanielPersson  Рік тому

      Hi Sandeep
      Yes there is a GitHub project I used back in the day. Don't know if it's still maintained though.
      Best regards
      Daniel

  • @sukarn001
    @sukarn001 2 роки тому +1

    when I am runing cephadm boootstrap cmd getting error "Verifying IP 192.x.x.x port 3300 ...
    ERROR: [Errno 99] Cannot assign requested address. Please help

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Sukarn
      Thank you for watching my videos.
      Please double check that you've supplied the right IP address for the current host. You are root on the machine so you are allowed to start new services. If that doesn't work clean up and start over.
      I hope this helps.
      Best regards
      Daniel

    • @tonyley8546
      @tonyley8546 Рік тому +1

      you need to bootstrap from the localhost (the ip needs to be the same ip as the localhost)

  • @G16MTC
    @G16MTC 2 роки тому +1

    Question for you. When I go to the Dashboard Create OSD step, the [add] button is grayed out (No Devices Available). Any ideas what is wrong. Need to fdisk the drives? Thanks in advance

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Chuck
      That is strange. If you have drives connected to your hosts that aren't used or formatted, they should be available to add to the cluster. As I mentioned in the video, this whole GUI seems a bit beta, so there might just be a bug that they need to resolve.
      If so, then feel free to open a bug report so they can rectify it. I think this tool might be a good one in a year or two when they have solved all the issues and iterated on the GUI some.
      Thank you for watching my videos.
      Best regards
      Daniel

    • @aniketdc
      @aniketdc 2 роки тому +1

      @@DanielPersson I had the same issue. Attached another disk, fdisk and mounted, but it did not help

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Aniket
      Well that is to be expected. I would rather run fdisk and remove all partitions so you have a clean disk. Then your odds of success would be greater. But then again it could be a bug.
      Best regards
      Daniel

    • @G16MTC
      @G16MTC 2 роки тому +1

      Turns out ceph does not like you adding drives that have existing partitions. I used fdisk to delete the partitions, and ceph was able to see the drives

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Chuck
      Probably a bug. Works fine if you do it manually.
      Best regards
      Daniel

  • @TerraMagnus
    @TerraMagnus 2 роки тому +1

    I had hopes this might work on the latest Raspbian / Raspberry Pi arch but I'm sad to report it fails at installing cephadm package, which is not available.

    • @TerraMagnus
      @TerraMagnus 2 роки тому +1

      I'm happy to say I didn't give up there. I downloaded cephadm script and managed to get my Raspberry Pi cluster up. It's been the backing store for my Docker Swarm for over a week now and literally my only regret is that I didn't invest in larger SSD's (4x Raspberry Pi 4 with 1x 480GB Crucial SSD each as OSD). But I have 4x 1TB NVMe and 4x 6TB HDD just "laying around" so I've got work to do to figure out some cheap ways to host them given more Rasperry Pi's are unobtainable right now.

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi TerraMagnus
      Thank you for watching my videos.
      Great to hear that you got it to work. I'm currently running one node on Raspberry Pi and it works with a smaller drives. I've found a set of "slow" drives 2´5 inch HDD that is 5 TB in size. Works great in a Raspberry storage node.
      I have more Raspberry hardware on the way to my address so there will be a video in the future on the topic where I'll try to get it running here. I might reach out with a question if I can't figure it out :)
      Best regards
      Daniel

  • @eibrahimov
    @eibrahimov 2 роки тому +1

    Best video about Cephadm. Thank you a lot. I have tried this step by step. All is fine, but I set mon service 3 on my host, it is only runs 1 mon service on cephadm-1, not starts other mon services. Are there any tips for this? Thanks for the best video about deploy Ceph

    • @DanielPersson
      @DanielPersson  2 роки тому +1

      Hi Edgar
      Thank you for watching my videos
      Have you tagged each host with mon? I know I had some issues in the beginning when I forgot to apply tags for directions.
      I hope this helps.
      Best regards
      Daniel

    • @eibrahimov
      @eibrahimov 2 роки тому +1

      @@DanielPersson ✌ Yep, I added manual labels to each host and set unmanaged enable ..., 😉 there are start running,
      Thank you for the fast reply.

    • @RAHULMAHESHWARIdexter
      @RAHULMAHESHWARIdexter 2 роки тому +1

      @@DanielPersson Hi Daniel, I added tags for each host, still only 1 mon service on cephadm-1. Under Daemons for cephadmn2, it shows running "Deployed mon.cephadmn2 on host 'cephadmn2'" but only 1 monitor in the dashboard. Please help.

    • @RAHULMAHESHWARIdexter
      @RAHULMAHESHWARIdexter 2 роки тому +1

      @@eibrahimov How did you set manual labels to each host and where did you set the unmanaged option? I am unable to find where is this option.

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Rahul
      Thank you for watching my videos. It might take some time before things are deployed, then again there might be some other reason that some services aren't deployed. Perhaps there is a lack of resources.
      Best regards
      Daniel

  • @KostaBukov
    @KostaBukov 2 роки тому +1

    Hello Daniel, great Cephadm video. Can I ask you one question - we want to install Ceph PoS on bare metal servers HPE Synergy. We have the HW configuration ready - we have 3 servers HPE Synergy 480 Gen10 Compute Module (each server with 2xCPUs Intel Xeon-Gold 6252 (2.1GHz/24-core), 192GB RAM, 2x300GB HDD for OS RHEL 8.6 (already installed) and we have DAS (direct-attached-storage) with 18 x 1.6TB SSD drives inside. I attached 6 x1.6TB SSD from the DAS to each of the 3 servers (as JBOD). Now I can see these 6 SSDs as 12 devices because the DAS storage has two paths (for redundancy) to the disks (sda, sdb, sdc, sdd, sde, sdf, sdg, sdh, sdi, sdj, sdk, sdl). My question is shall I configure multipath from RHEL 8.6 OS (for example sda+sdbb=md0) or cephadm will handle the multipath by itself?

    • @DanielPersson
      @DanielPersson  2 роки тому +1

      Hi Kosta.
      Thank you for watching my videos.
      I've never worked with DAS storage before, so I think it's more of a question for someone with more knowledge about that solution. When it comes to OSD, my recommendation would always be to keep it as close to the hardware as possible. But how to do that with multipath is not something I've researched so far.
      I see that you have gotten a lot of responses on the mailing list, so I'll add the link here as a reference if someone else has the same question.
      lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/6MB5PXPER2E7ZYQ6KQF75SWPIYGA76CP/
      Best regards
      Daniel

    • @KostaBukov
      @KostaBukov 2 роки тому +1

      @@DanielPersson Hello Daniel, thanks for your response. You are right ceph-users community helped me with answers for the multipath so I managed to install 3-node CEPH cluster. Kudos to your great videos which I followed and I installed latest CEPH release Quincy from scratch on RHEL 8.6 and configured rbd disk drive to test the performance with Bonnie++ tool.

  • @jozefrebjak6209
    @jozefrebjak6209 2 роки тому +1

    Nice video, we are now in a stage to deploy ceph in production. In first step is our goal to connect it to vSphere and we are not decided if we will do it with NFS or ISCSI. Can you make some video about this topics ? Especially we are interested with NFS ingress service. We need to migrate all data from SAN to Ceph and then to connect Ceph to new OpenStack cluster.

    • @DanielPersson
      @DanielPersson  2 роки тому +1

      Hi Jozef
      I thought NFS was almost deprecated but I might be wrong there.
      As for connection between OpenStack and Ceph that is an ongoing research topic for me but when a video will be done is hard to predict.
      Thank you for watching my videos
      Best regards
      Daniel

    • @jozefrebjak6209
      @jozefrebjak6209 2 роки тому +1

      You can also deploy Ingress service to get one VIP across hosts with keepalived and haproxy, but it’s possible only with CLI. There is no option to do it with dashboard. Firtsly you need to deploy NFS service with cli on different port than standard 2049 if I’m not wrong and then deploy ingress service for nfs with port 2049.

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Jozef
      That might be the case. I'm just starting my research again. But I thought that there were different solutions to deploy a cluster with Ceph connected as a part of the storage solution with Cinder.
      Best regards
      Daniel

    • @jozefrebjak6209
      @jozefrebjak6209 2 роки тому +1

      ​@@DanielPersson After some testing NFS behind VIP we decided to go with iSCSi with vSphere. Now we have Block storage with iSCSi gateways and it works really nice. With NFS behind VIP we had a problem to create one datastore and mount it to all host in cluster as we have now with NAS. If we mount Ceph NFS datastore to all hosts in cluster every host create new datastore like Host 1 -> datastore-ceph Host 2 -> datastore-ceph(1) what is strange. What about to make video about benchmarking Ceph ? I would really appreciate it.

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Jozef.
      I don't know about the NFS implementation, but I guess each host could get their own data area for backups or similar. But I think that might be a question for the user mailing list.
      When it comes to benchmarking Ceph, I would love to do that if I had a, let's say, raspberry pi cluster for external testing. However, as it is now, I have focused on having a home production cluster where I save my backups and testing is done in virtual machines. And to get the correct output from a benchmark, you need a controlled environment to test against, which I don't have.
      But perhaps in the future, when the channel grows and gains me more income, I might use some of it to buy hardware for testing, and then I could benchmark a bunch of stuff and perhaps even run some machine learning tasks in a better environment.
      Best regards
      Daniel

  • @ismaellayth
    @ismaellayth 2 роки тому

    I got "mgr is not available" error after executing "cephadm bootstrap --mon-ip " command.
    What is the problem? :(

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi.
      It's not easy to know what the problem is. It takes a while for the servers to come up. But other than that there could be multiple factors that can lead to the manager not being accessable.
      I would check the ceph logging directory and see if you can find any message explaining why the service isn't starting.
      Thank you for watching my videos.
      Best regards
      Daniel

  • @ismaellayth
    @ismaellayth 2 роки тому +1

    Hello Daniel,
    First, I would like to thank you for your great content..
    I have faced an issue with cephadm being unable to bring some osds up after host reboot...
    My ceph cluster version is pacific 16.2.10 deployed on ubuntu 20.04 LTS servers with docker version 20.10.12..
    I tried restarting/redeploying the osds using the dashboard service and also restarting their service with systemctl..
    After 6 hours of trying, i couldn't bring them up. So, I deleted/recreated them to get health_ok again..
    I suspect that there is a permission error with ceph user to manage the docker containers..
    This is my current ceph user permissions on /etc/passwd..
    ceph:x:64045:64045:Ceph storage service:/var/lib/ceph:/usr/sbin/nologin
    I searched on it and some folks refer to change it to
    ceph:x:167:167:Ceph storage service:/var/lib/ceph:/usr/sbin/nologin
    Do you recommend that?
    Regards

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi اسماعيل ليث فهمي •
      Thank you for watching my videos.
      No, I would not recommend changing the UserID and GroupID in your passwd file.
      First, directories on disk will use these IDs to figure out who should be able to access them, and a mismatch could be really confusing. Moreover, you have the group specified in the group file with that ID which will make things even more confusing.
      There might be a way to change the IDs for a user or group from some tooling, but if you want them to be different, you need to go through your drive and figure out which resources are referencing these IDs and update all of them which might be doable but seems a bit cumbersome if not needed.
      I hope this helps
      Best regards
      Daniel

    • @ismaellayth
      @ismaellayth 2 роки тому +1

      @@DanielPersson
      Hello Daniel,
      Is (64045) the default UserID and GroupID for ceph user on cephadm cluster?
      and what should I do if I faced another OSDs refusing to go up?
      Thank you

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi اسماعيل ليث فهمي •
      No, not to my knowledge. I've read a lot of documentation and there is not any mention of any specific ID as a default value. It might be something that usually happens when you install a system but those values could change depending on other services that you've installed on your system.
      The approach you followed are a valid way of reseting a OSD. It all depends on the information you are able to get. If you can fetch the logs of the services and figure out why the service refuses to start then you can take action on that. Maybe the cephadm will obscure that information from you but you should be able to find the information you need if you can manage a dockerized cluster.
      This topic is a bit interesting to me so there might be a video on this topic in the future.
      Best regards
      Daniel

  • @hendranatasaputra2826
    @hendranatasaputra2826 2 роки тому +1

    after node2 reboot.. 1 osd under node2 cant up. status is IN.. but how to bring back to UP?
    systemctl status ceph-osd@0 not working..

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Hendranata
      Each OSD has an ID starting at 0. So maybe the second node has OSD 1. If that is the case, I would try `systemctl status ceph-osd@1` instead. If this doesn't work, look into the directory `/var/lib/ceph/osd` there. You should find the data dirs for your OSDs. They will be named [cluster_name]-[osd_id].
      I hope this helps. Thank you for watching my videos.
      Best regards
      Daniel

    • @hendranatasaputra2826
      @hendranatasaputra2826 2 роки тому +1

      @@DanielPersson yes it is osd.2
      i have perform "systemctl status ceph-osd@2" and get error: Unit ceph-osd@2.service could not be found.
      even we tried osd@1 osd@2 and osd@3 also all are error like that..
      after some research.. i am able to check the status using this command
      systemctl status ceph-a990f8b2-f5e5-11ec-8449-0800271c1c67@2.service
      but not able to perform start action.. any idea?

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Hendranata
      As I said in the video this solution is probably not ready for production as there still are a lot of questions that need answering when you want to run this in a managed form. But usually, in normal environments, you should have log files in `/var/log/ceph` where you can read what is going wrong during startup. I'm not sure if the OSD docker environment will mount this directory.
      They have some documentation on how to watch and monitor your cephadm cluster
      docs.ceph.com/en/quincy/cephadm/operations/
      This might help. Another thing you can try is to stop the service from the GUI and then try restarting it.
      I hope this helps.
      Best regards
      Daniel

    • @hendranatasaputra2826
      @hendranatasaputra2826 2 роки тому +1

      @@DanielPersson so using cephadm is actually not a good way for deploying a production cluster right now? appreciate your responds

    • @DanielPersson
      @DanielPersson  2 роки тому

      Hi Hendranata
      The opinion I have is a bit subjective. I'm currently running Pacific at home and ran Octopus at work. After a lot of testing, I upgraded our work cluster to Pacific.
      I would not dream of upgrading to Quincy at work for at least a year. But I will try it at home before that. Having a stop in production that I can't debug and solve quickly would be really costly. If I lose my cluster at home with all my UA-cam backups, I would be sad, but I'd survive. That's what external backups are for.
      At the moment, cephadm is a little bit over 2 years old. So that is how long the cephadm concept has been available. And they have made great strides to improve it. And if you have a really large cluster, it might be a good solution. But in that case, you can probably lose a couple of OSDs easily and just throw them away and rebuild them without impacting operation.
      And in these large clusters, you probably have hundreds of machines, and it would be untenable to handle all of them manually. I'm not sure how large your cluster is. But my home cluster is just 4 machines, so in that case, I can just upgrade the system manually and keep track of the process. I will have full access to all logs and processes. And I have even been able to restore an OSD after a critical system upgrade that went horribly wrong (ua-cam.com/video/ofxWIaCPO6c/v-deo.html). So I have a different set of priorities.
      At work, we have a more extensive cluster with a lot more OSDs, but maintenance is still not an unbearable task when I need to upgrade or move OSDs around or fix disks.
      Cephadm probably has options where you can access the running docker containers to get logs and resolve issues that arise, but I don't need it as I run bare-metal instead. Our servers need the processing and memory to run the services, and I don't want the overhead of dockerizing the services. I know that the overhead is minimal, but there is one.
      So the option is up to you if you want to run a manual cluster or if you want to run with cephadm. This video was more of an introduction to talking about what cephadm is and how to install it. Not a comprehensive guide to working with Ceph.
      I have a 5 parts series of installing Ceph bare-metal, not because it's that much harder but because I think it's important to understand the system before using it.
      With this, I'm not saying I will not run cephadm in production in 2025 or 2035. It might be stable and understood by me then, and our cluster might have grown to a size where we require it. But my stance on smaller clusters is that you don't need the complexity of an automatic deploying system.
      I hope this helps.
      Best regards
      Daniel

  • @jimallen8238
    @jimallen8238 Рік тому +2

    Thank you for talking us through this setup. While this was helpful, it was hard for me to understand and follow along. I think the primary reason was you explained “how” but very little of “what” you were doing or “why”.

    • @DanielPersson
      @DanielPersson  Рік тому

      Hi Jim
      You are right, when it comes to automatic processes it can be complicated to cover all bases and this video was a revisit of the topic so I might have skipped over some topics that have been covered earlier. Maybe the first video can help?
      ua-cam.com/video/ENsfa8WB6EI/v-deo.html
      Best regards
      Daniel

    • @jimallen8238
      @jimallen8238 Рік тому +1

      @@DanielPersson Thank you.