Very interesting video. I run a Host-OS Ubuntu and I have 6 VM to run my apps in (4 Ubuntu flavors and 2 Windows releases; XP and 11). For monitoring the system I use Conky; one for the host and one in each VM. PCP and Grafana looks interesting and this week I might install it in my 2nd spare Host-OS. In Conky I had to script a lot to get full support for OpenZFS datasets/datapools size and throughputs. The host collects the data and shares it with the VMs thought Virtualbox shared folders. So for me it is important to know the support for OpenZFS, since all my data and all my 70 VMs are stored in OpenZFS. Only the OS itself (Host and all VMs) run in ext4 or ntfs.
In real world production usage it's better to re-utilize the maximum amount of metrics which are already being collected by the system itself (by SAR means on Linux, for instance), and to collect manually only the ones, you cannot get elsewhere. Just because you don't want to be in situation when the monitoring software itself produces any visible load on the monitored system. Also, having metrics collected and stored is absolutely minimum requirement for any production-ready system, and hopefully there're plenty of options already, including a number of opensource ones. But when it comes to some advanced usage, it's good to have some automated analytics on your side. Any kind of reports which can join together different metrics timeseries, while knowing what to search for. It's good to see that Graphana started to get some of that, but I'd say we're only at the beginning of this journey. The ideal monitoring software should be able not only show you current and historical values, but also to suggest what kind of reconfiguration you need to consider, based on (a) historical load and (b) prospected future load.
I am planning to review and compare those other open source systems in this series, kind hard to compare something when this is the first video :D. Yes collection of metrics is one thing, actually using it to do capacity planning/capacity management would be a course. Agree with the recommended course of actions, however I am still old fashioned enough to do my own thinking, as Spock said "Computer make excellent servants, but I have no wish to server under them".
@@CyberGizmo I completely get your point. When there's a chance, I also love to think myself. But when you're dealing with a fleet of thousands servers it's a whole different topic. Sometimes you just need to have some automation in place, that will diagnose things for you. Not in order to replace the human, but in order to help him to make a final decision
I was not aware of PCP, I'll definitely need to have a look. I've been getting into grafana lately via a dual effort to get better with podman and have been using influxdb. My hope is to eventually use it to monitor a good number of customers for various things. The one big downside to the Grafana ecosystem is that it seems to be more built for development / analysis and not for general monitoring use cases. Each database seems to have at least some of their own alerting features. Grafana's alerting capabilities are getting better, however neither has any pre-configuration on alerting for specific metrics. For example by comparison, I have one customer that uses a commercial tool called LogicMonitor, it's data sources have default alerting for all sorts of things, far more than a single person could ever create themselves. Endpoints are automatically or manually put into a hierarchical folder structure, and configuration on alerting and other things are inherited rather than being manually applied to each host. So of course it does the basics, disk space / resource utilization alerts, but also more esoteric stuff like specific errors inside of windows event logs. That said, LM is too expensive for most of my clients, and the most important stuff to alert on isn't that hard, so if I can at least make the front and backend of this easy enough to be a drop-in thing then it may still be worth using it in that way. The rational thing to do (for a windows focused client base) would just be to use something like NinjaRMM, but that wouldn't be nearly as fun XD
Agreed, I had to start somewhere and I thought I would choose one at random and PCP came up first, Zabbix is on the list, and also the way I do it using elasticsearch and Kibana. People keep telling me how great Grafana is, but as you say it seem more in development than a complete reporting system
I installed and started the pmcd service. However in Ubuntu grafana is snap only. I installed it and it worked. However the pcp plug-in is missing, because the grafana-pcp install is completely missing no snap no deb file nor ppa file. Probably I would have to install grafana and grafana-pcp directly from github. I don't like it, too much hassle to maintain it. I'll think about it and save it for another day. Look at the next videos and decide later.
I installed it on ubuntu using this link grafana.com/docs/grafana/latest/setup-grafana/installation/debian/ Not a fan of snaps, there always some complication with other packages
hmm.. more just for amateurs. For my 500+ servers even with ansible it's lots of work. Zabbix+SNMP is the way to go in my case, off-course. Did I miss a lecture about? Anyway Thanks DJ!
@@user-pc4i8ege55 Actually, most of them are virtual. But the main goal (in my view) is not "collect as many data as possible" but to build a few metrics, based on Performance Indicators, visualize it and be sure your information chains is sustainable. So slicing and dicing huge amounts of raw data is just as critical as to collect it. (so you may give me you money ;-)
Opps, Erratta: to install grafana on Fedora use dnf install grafana grafana-pcp -y
I apparently edited that out by accident
Really nice👍 and happy new year DJ.
Wish you a happy new year. Waiting for persistent data on grafana 🌹
Man, PCP - that’s a name I haven’t heard in a long time. Reminds me of working at SGI in the 90’s.
Very interesting video. I run a Host-OS Ubuntu and I have 6 VM to run my apps in (4 Ubuntu flavors and 2 Windows releases; XP and 11). For monitoring the system I use Conky; one for the host and one in each VM. PCP and Grafana looks interesting and this week I might install it in my 2nd spare Host-OS.
In Conky I had to script a lot to get full support for OpenZFS datasets/datapools size and throughputs. The host collects the data and shares it with the VMs thought Virtualbox shared folders. So for me it is important to know the support for OpenZFS, since all my data and all my 70 VMs are stored in OpenZFS. Only the OS itself (Host and all VMs) run in ext4 or ntfs.
In real world production usage it's better to re-utilize the maximum amount of metrics which are already being collected by the system itself (by SAR means on Linux, for instance), and to collect manually only the ones, you cannot get elsewhere. Just because you don't want to be in situation when the monitoring software itself produces any visible load on the monitored system.
Also, having metrics collected and stored is absolutely minimum requirement for any production-ready system, and hopefully there're plenty of options already, including a number of opensource ones. But when it comes to some advanced usage, it's good to have some automated analytics on your side. Any kind of reports which can join together different metrics timeseries, while knowing what to search for.
It's good to see that Graphana started to get some of that, but I'd say we're only at the beginning of this journey.
The ideal monitoring software should be able not only show you current and historical values, but also to suggest what kind of reconfiguration you need to consider, based on (a) historical load and (b) prospected future load.
I am planning to review and compare those other open source systems in this series, kind hard to compare something when this is the first video :D. Yes collection of metrics is one thing, actually using it to do capacity planning/capacity management would be a course. Agree with the recommended course of actions, however I am still old fashioned enough to do my own thinking, as Spock said "Computer make excellent servants, but I have no wish to server under them".
@@CyberGizmo I completely get your point. When there's a chance, I also love to think myself. But when you're dealing with a fleet of thousands servers it's a whole different topic. Sometimes you just need to have some automation in place, that will diagnose things for you. Not in order to replace the human, but in order to help him to make a final decision
@@alx8439 I completely understand, fortunately my days of managing that many servers is over and now I can play with just my meager 18
I was not aware of PCP, I'll definitely need to have a look. I've been getting into grafana lately via a dual effort to get better with podman and have been using influxdb. My hope is to eventually use it to monitor a good number of customers for various things. The one big downside to the Grafana ecosystem is that it seems to be more built for development / analysis and not for general monitoring use cases. Each database seems to have at least some of their own alerting features. Grafana's alerting capabilities are getting better, however neither has any pre-configuration on alerting for specific metrics. For example by comparison, I have one customer that uses a commercial tool called LogicMonitor, it's data sources have default alerting for all sorts of things, far more than a single person could ever create themselves. Endpoints are automatically or manually put into a hierarchical folder structure, and configuration on alerting and other things are inherited rather than being manually applied to each host. So of course it does the basics, disk space / resource utilization alerts, but also more esoteric stuff like specific errors inside of windows event logs. That said, LM is too expensive for most of my clients, and the most important stuff to alert on isn't that hard, so if I can at least make the front and backend of this easy enough to be a drop-in thing then it may still be worth using it in that way. The rational thing to do (for a windows focused client base) would just be to use something like NinjaRMM, but that wouldn't be nearly as fun XD
Agreed, I had to start somewhere and I thought I would choose one at random and PCP came up first, Zabbix is on the list, and also the way I do it using elasticsearch and Kibana. People keep telling me how great Grafana is, but as you say it seem more in development than a complete reporting system
I installed and started the pmcd service. However in Ubuntu grafana is snap only. I installed it and it worked. However the pcp plug-in is missing, because the grafana-pcp install is completely missing no snap no deb file nor ppa file. Probably I would have to install grafana and grafana-pcp directly from github. I don't like it, too much hassle to maintain it.
I'll think about it and save it for another day. Look at the next videos and decide later.
I installed it on ubuntu using this link grafana.com/docs/grafana/latest/setup-grafana/installation/debian/
Not a fan of snaps, there always some complication with other packages
hmm.. more just for amateurs. For my 500+ servers even with ansible it's lots of work.
Zabbix+SNMP is the way to go in my case, off-course.
Did I miss a lecture about?
Anyway Thanks DJ!
Not yet, Eugene. I am just getting started on discussing this area and Zabbix is on the list
@@CyberGizmo it's a hope! Thanks for all you doing!
I'm guessing these are mostly dedicated servers? For my money, Zabbix doesn't cut it in more dynamic virtualized environments.
@@user-pc4i8ege55 Actually, most of them are virtual. But the main goal (in my view) is not "collect as many data as possible" but to build a few metrics, based on Performance Indicators, visualize it and be sure your information chains is sustainable.
So slicing and dicing huge amounts of raw data is just as critical as to collect it.
(so you may give me you money ;-)