Manual install of Ceph part 1 - Cluster backbone
Вставка
- Опубліковано 12 вер 2024
- We will walk through the process of manually installing a Ceph cluster. This gives you a better overview of how Ceph works, and we will discuss different concepts used when running a cluster.
Write up:
danielpersson....
One year of unlimited online learning. Get 30% OFF Coursera Plus.
www.coursera.o...
Expanda seus horizontes de carreira. Tenha 68% de desconto por um ano de Coursera Plus
www.coursera.o...
Unlock unlimited opportunities with 50% off your first month of Coursera Plus
imp.i384100.ne...
Join the channel to get access to more perks:
/ @danielpersson
Or visit my blog at:
danielpersson.dev
Outro music: Sanaas Scylla
#ceph #manual #backbone
Great tutorials about Ceph. Curious about the next Part...! Definitly will try to setup a little cluster...:-)
This is massively helpful! Thank you so much for putting this together along with the gist. Super appreciated!
Ceph is awesome.. and will help to bringing 7b and 12b parameters Ai to home computing.. without the use of quantization.. along with pi 5 plus and kubernetes
Really helpful..!! Thank you for illustrating it in such a great manner.
Great tutorials about ceph. Thank you
Also I have another question too. Why you create monmap and monkey in tmp directory? It will be remove from system after a reboot. Is not neccessary they be remained in system for feauter?
Hi
No, the whole point is to have them temporary. They will be stored inside the cluster as an integral part and if you ever need to add more monitors you could ask for them but that is an operation you plan and do seldom.
At work I had to replace the IP addresses for the monitors and operation described in the documentation as something you just don't do but there is a procedure for it, add new ones and remove the old ones but keep the cluster in a quorum over the whole procedure. I have a video for that as well.
Thank you for watching my videos.
Best regards
Daniel
Great tutorials! I love it. Can you make a tutorial how to customize KV engine(for example, use a rocksdb build from modified source code) of blue store or other storage engines?
Hi 栋 黄
Thank you for watching my videos.
I need to do some research on the subject to answer that, but I will put it in my idea bank to look into it in the future.
Best regards
Daniel
I have a ceph cluster setup on AWS ec2 and is able to mount rbds with the Private IPs of instances having monitors, but not able to with Public IPs. What am I missing
Hi Kamil
Well there is a couple of port requirements. First of you need to expose your monitors so that a connection can be made but the data is streamed directly from the OSDs so ports like 6800-6900 is also needed to be visible as public resources. Exact ports can be found in the Ceph documentation.
I hope this helps.
Best regards
Daniel
Hey Daniel, How are you doing? Great Video! I just followed this tutorial today with Ceph Version 17 (Quincy). I noticed one issue: When I load the dashboard, I get messages in the browser that the main.js and style.css files are not of the correct size. It seems ceph mgr is timing out after 20 seconds. Can we update this timeout value somewhere, so that I give the browser more time to download the complete JS or CSS file?
Warm regards,
Guus
Hi Guus.
I think that might be complicated. Maybe you could change your browser defaults in your browser or rebuild the manager if that is required.
But the JS files are not large at all so there is probably something else going on. Have you tried a re-install?
Thank you for watching my videos.
Best regards
Daniel
@@DanielPersson I think it might be related to the fact that I'm currently using some hotel WiFi...
Your videos are great!
Warm regards,
Guus
I tried this on a new (fresh installed) debian11 system as root:
"apt install ceph ceph-common" --> "packages have unmet dependencies"
How can this be? 🤔
Hi Against.
Great question. I've never installed them without adding the repositories to get the latest packages. But I guess that they haven't packaged a working version. Debian usually package a really old version of ceph, let's say 14 or something. And that version is not built for Bullseye. The first working version is Pacific I believe.
I hope this helps.
Best regards
Daniel
Hey, did you solve it? Daniels Comment didn't really help me. Could you tell me how you've done it? Would be a great help
Hi Lukas
Do you still have problems with this? It's a bit strange. I've installed Ceph multiple times on newly installed Debian systems following this gist:
gist.github.com/kalaspuffar/53d0e828e96482d3ee1f8c88b0f9ea5d
If there is still an issue could you tell me what dependencies it says is unmet?
Thank you for watching my videos.
Best regards
Daniel
Hi again Lucas
Another thing is to ensure that you have specified the right version. So the script will add dependencies for buster
deb download.ceph.com/debian-pacific/ buster main
But if you are running a new Debian 11 you want to change that to bullseye as that is the name of that release. If you want the lastest of ceph you can change pacific to quincy.
I hope this helps.
Best regards
Daniel
Thank you for sharing that was great. There is a questìon here . I i want add node4 to this cluster , should I easily add its hostname and IP address to /etc/ceph/ceph.conf and update monmap with monmaptool like this : " monmaptool --add {node2-id} {node2-ip} --fsid {cluster uuid} /tmp/monmap ", and following steps as we did for node2 and node3 in "Setting up more nodes" section
Is it enough for adding a new node to an ezisting cluster ?
Hi
Yes and no. You should not just add another monitor to your cluster that would not help you. Ceph need a uneven amount of monitors in most cases three, for larger clusters five and for huge ones you can have seven.
The way to expand a cluster is to add more OSD services. And other connecting services. The more OSDs you have the more storage space you can allocate to different workloads. I run a medium sized cluster at work with three monitors and half a petabyte of data. At home I run a small one with 40TB on three monitors. I run 6 OSDs at home, one for each drive.
You might need MDS if you want a file system or RGW if you run object storage workloads. There is more videos in this series explaining these concepts.
Thank you for watching my videos.
Best regards
Daniel
What do you do at 11:39 to fix the ceph command from hanging?
I'm having exactly the same problem but you didn't explain how you fixed it!
Hi Jack
Thank you for watching my videos.
I explain in the next segment of the video the change I did to make it work.
Mainly I removed the port information for the two available communication protocols and only specified the IPs for each monitor host. Other than that it worked.
I hope this helps.
Best regards
Daniel
same problem here, can you explain?
Hi Marcello
Have you tried to remove the port information in the configuration file?
Thank you for watching my videos.
Best regards
Daniel
Hi Daniel, i am trying to install ceph cluster by following above method but failed and stack on systemctl start ceph-mon@mon1 its not working. I am using Debian 12 and ceph reef version and followed as you guide in this tutorial. can you help me on this?
Hi Ankit.
I would search for an answer in /var/log/ceph there must be a log file with more information. To my knowledge this tutorial should work on Bookworm and Reef not much have changed between the version.
I hope this helps. Thank you for watching my videos.
Best regards
Daniel
thanks, Denial, for answering, i am able to install ceph-cluster on ubuntu-22-05 LTS and it was seems working but i have rebooted to check whole cluster and now it did not comeup and in /var/log/cephadm, i see this cephadm ['shell', '--fsid', '8b67526a-c99b-11ee-9579-080027252ebc', '-c', '/etc/ceph/ceph.conf', '-k', '/etc/ceph/ceph.client.admin.keyring']
2024-02-12 12:00:41,853 7f0131f83b80 ERROR ERROR: No container engine binary found (podman or docker). Try run `apt/dnf/yum/zypper install `
2024-02-12 12:05:46,734 7f99b562fb80 DEBUG --------------------------------------------------------------------------------
cephadm ['shell']
2024-02-12 12:05:46,734 7f99b562fb80 ERROR ERROR: No container engine binary found (podman or docker). Try run `apt/dnf/yum/zypper install `
2024-02-14 06:03:33,557 7ff5c697eb80 DEBUG --------------------------------------------------------------------------------
cephadm ['shell', '--fsid', '8b67526a-c99b-11ee-9579-080027252ebc', '-c', '/etc/ceph/ceph.conf', '-k', '/etc/ceph/ceph.client.admin.keyring']
2024-02-14 06:03:33,563 7ff5c697eb80 ERROR ERROR: No container engine binary found (podman or docker). Try run `apt/dnf/yum/zypper install `
but containers are running >>
podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb74a2ce7122 quay.io/ceph/ceph-grafana:9.4.7 /bin/bash 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-grafana-mon1
971ae5a8ed1c quay.io/ceph/ceph@sha256:a4e86c750cc11a8c93453ef5682acfa543e3ca08410efefa30f520b54f41831f -n client.crash.m... 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-crash-mon1
c04600a1e9af quay.io/prometheus/node-exporter:v1.5.0 --no-collector.ti... 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-node-exporter-mon1
e6d228e1e221 quay.io/ceph/ceph:v18 -n mon.mon1 -f --... 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-mon-mon1
25bdeb241e89 quay.io/prometheus/alertmanager:v0.25.0 --cluster.listen-... 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-alertmanager-mon1
41aacb98bcc4 quay.io/prometheus/prometheus:v2.43.0 --config.file=/et... 24 minutes ago Up 24 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-prometheus-mon1
3ce8ae65a28f quay.io/ceph/ceph@sha256:a4e86c750cc11a8c93453ef5682acfa543e3ca08410efefa30f520b54f41831f -n client.ceph-ex... 3 minutes ago Up 3 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-ceph-exporter-mon1
0c1bcde43176 quay.io/ceph/ceph:v18 -n mgr.mon1.lkpmm... 3 minutes ago Up 3 minutes ago ceph-8b67526a-c99b-11ee-9579-080027252ebc-mgr-mon1-lkpmmv
i am confused and no clue where and what to look?
Hi Ankit.
Really hard without the logs, if you install the Ceph packages so they will write local logs you can read them in /var/log/ceph and know what is going wrong. Otherwise you need to attach to the containers in order to find the log files.
I've done a lot of videos on the topic for more viewing.
ua-cam.com/video/4TLEg_OIU8M/v-deo.html
ua-cam.com/video/ENsfa8WB6EI/v-deo.html
ua-cam.com/video/ZaIy11MY6tQ/v-deo.html
Best regards
Daniel
Thanks! One question though, why are you not using cephadm? Just curious
Hi Modzilla
Only tried it once and felt like I didn't have enough control. It installed services on random hosts and the upgrade was hanging during an update.
I've planned a series of cephadm videos in the future when I've played with it a bit more. A classical installation with a package built for my system works well now and is what I'm currently running at home and in production.
I understand that new features are fun to try and hyped but when it comes to critical systems I use what has been proven to work.
Best regards
Daniel
@@DanielPersson thanks for clearing that up! I would consider myself a novice when it comes to Ceph, but deploying my cluster with cephadm was so damn easy. Since I have a Kubernetes background, it just made sense to me. Yeah I believe you need to tag your hosts and control where the services get deployed that way by telling it to only deploy on nodes with those tags
Which version do Debian is being used ?
Hi Srinivas
In this tutorial I probably used Bullseye. All older is using Buster and I've not started using bookworm yet.
Best regards
Daniel
@@DanielPersson oh ohk thank you so much sir for taking out time to reply
what can happen on ceph which needs user interaction? what to do to not freak out when self-healing isn't working?
I'm actually thinking about using ceph or minio but I'm not sure yet which fits better.
Hi Neo
Thank you for watching my videos.
First of all OSD have a object store that you can query for all the information. And the monitors will keep track on available OSD hosts and which keep what placement groups.
Self healing is something that could be a bit scary but over time it will handle most issues.
I had some problem with upgrading with older versions when I used cephadm so I stick to old and well proven packages.
Things that have required interaction from me has mostly been in changing pool affinities, moving data to different OSD replication domains and changing memory requirements.
We had problem early with MDS and memory. It is a really large file index and require a lot of memory.
We also introduced a caching pool so we could use different media for recently used and archival storage.
We have had a disk crash and replacing it was pretty straight forward. The cluster was working for a while but stabilized after a week or so.
I lost a whole host on my local cluster because I was to eager to upgrade. But I was able to recover from that to. Have a video about it.
Thank you for watching my videos.
I hope this helps.
Best regards
Daniel
ceph -s has timeout error could you help me to solve this problem?
Hi Mostafa
I would start over configuring your monitors. If you can't run the ceph status command it's usually because you down have monitors in quorum. This is usually because either the ceph.conf is having the wrong names or IP's for the monitors. Or you have given the wrong information during creation of the monitor map.
I hope this helps. Thank you for watching my videos.
Best regards
Daniel
Can you make a tutorial on how to setup our own FTP server?
Hi aminaye
Well, I could look into it, depending on operating system the setup is a bit different. And there is a lot of different protocols you might want to support so describing all can be a bit challenging.
Thank you for watching my videos.
Best regards
Daniel