Perfect compilation of each use case and detailed explanation. I haven't seen such a video on Prometheus and Thanos before. Thank you very much Really Helpful 👍
Hi Anton, thank you for the informative and educational content! I look forward to your next video about preparing a Production ready Thanos setup using Amazon EKS Cluster remote write endpoint and deploying stateless Prometheus in Agent mode and moving local alerts to a Centralized Thanos.
Thanks anton for all efforts, But we guys still waiting for remote Prometheus setup/cluster setup monitored through Grfana via Thanos. It would be great help Again thanks 🎉❤
Hi @Anton .. why do we have 2 receivers ? Is it because to distribute load? also can you clarify this : 1. Lets say we have more than 2 prometheus instances and we have 1 thanos, should we use multiple receivers i.e more than 3 or 4? 2. Is it wise to assign 1 receiver to 1 prometheus meaning receiver 1-> prometheus 1, receiver 2-> prometheus 2 etc?
Hi, thank you for the video, very clear as always, cannot wait to see the next video for Thanos in production, I hope soon. Are you gonna use the prometheus agent in the next video?
Not sure when I'll create a following video cause I don't see a lot of interest in it here. I have one video on prometheus agent (ua-cam.com/video/VyYrThINCjg/v-deo.html), set up remote write is the same as with other prometheus.
@@AntonPutraAmazing job! I wait for Thanos in production. I try to find some tutorial to do it but with no luck. I'm a new one with Kubernetes and Operators world and I don't know where to change prometheus configurtation to create this agent, write-remote solution. Please help :)
Hi Anton, great topic. Thank you so much!!! I can see on the cover #1, have you planned any further tutorial on that topic? Do we have any chance for a Federated Prometheus arch tutorial?
I'm thinking about it since lot of people asked. Probably soon, but all the pieces for that in this video. Just need to use public load balancers pretty much and of course configure IAM for S3
@@AntonPutra Oh that's great. Also I had a query, using remote write I'm storing data in thanos , now on the grafana level what changes should be done so that the tenant-A are able to see metrics of clusters running in tenant-A and tenant-B are able to see metrics of clusters running in tenant-B.
Super helpful video @AntonPutra . I have replicated this set up for my project. I have deployed on k8s v1.26.5 cluster. The only difference is I am using higher versions of Prometheus (v2.48.1); Prometheus-operator (0.71.2). I also upgraded the CRDs as per the operator. The issue is the I am not able to fetch any active prometheus target. I've checked all the configurations, deployed pods are running up and fine. The servicemonitor has correct labels(same as your src code). Any help appreciated ! Best Regards.
Forgot to add, I am checking the target at the local Prometheus UI :) There are no error logs as well, neither in operator pod nor in the prometheus-staging pod. I see one expected log missing as below when I compare it with your video: level=info msg="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/prometheus/0 msg="Using pod service account via in-cluster config"
Awesome video. Thanks a lot for sharing this. Could you please share the production ready multicluster monitoring setup in AKS? What will be required components to be setup in centralized cluster and other individual cluster? Any document would be a great help.
hi , good video. You were talk about a video of how configure prometheus agent , IRSA with bucket s3 and alb aws for exposing remote-write-svc , where are those videos ? thanks
Awesome video! I was trying to follow as in video, I was stuck at few points. For first step, when thanos was disabled and 2 pods of prometheus-staging-0 were created, my expectation was to be able to see targets when prometheus-operated was port forward on port 9090 to my localhost. I didn't see that. Could you please let me know why? Second after adding thanos sidecars without mutual tls, on port forward querier on 9090 on localhost, I did see sidecar in my Stores but I wasn't able to see any Prometheus metrics as you showed . Please guide.
Thanks, there are a lot of labels that must match in order for the target to show up. I have another video on my channel covering Prometheus Operator, it may help. For the second question, if you don't have any targets, you won't have any metrics in Prometheus.
It doesn't really matter how you deploy it for production-whether using plain YAML, Kustomize, or Helm. Just make sure you use the same approach throughout your entire infrastructure.
Nice video Anton, do you have something for thanos receive distributor and thanos receive configuration. Or the one which is video is the same thing. For me the remotewrite does not work if the distributor is enabled via the helm chart. Any help is highly appreciated.
Hi, when are you planning to make a video about thanos and prometheus in agent mod please? Thank you very much for your answer. The current video was great, I would also like to see more about prometheus agent mod and Thanos.
Thanks, it's very similar and I have examples somewhere in my GitHub repo. You just need to provide a couple of additional flags and you get stateless, well almost stateless prometheus
Hi @AntonPutra, Thanks for the video. Did you get a chance to prepare the Production ready Thanos setup using Amazon EKS Cluster remote write endpoint and deploying stateless Prometheus in Agent mode and moving local alerts to a Centralized Thanos.
@@AntonPutraI would love to see it done in AWS! I’m trying to continue off what I’ve learned from this video. Do I now need to expose the Querier from each cluster to Grafana? I’m so grateful for the content you put out. MANY thanks and cups of coffee for you friend!
Well, you need to use IRSA to grant the Thanos storage gateway read/write access to the S3 bucket. The rest of it is more or less the same. Are there any specific issues?
Hi Anton. Thanks a million times for the awesome series. I have a question. Let's destroy the minio and set the "tsdb.retention" value like "180day" in the receivers. Is this bad practice? I don't want to use any object storage(S3, Minio etc.). I am asking this question for the Remote Write approach.
No, it's fine and it can eliminate the bottleneck. Just make sure you set up sharding from day one. Also, if you decide to use Thanos Ruler, it's more reliable without S3 and the store gateway.
@@m18unet For example, if you run stateless Prometheus in 'agent' mode, no data is stored locally. This means you cannot run Alertmanager locally as well. The Ruler is used to run alerts on the global state, which also has some drawbacks. You can read more about this in the official documentation.
I don't know yet; it's very similar. You just need to use IRSA to configure permissions for S3. Do you have any specific use case in mind, such as 1 cloud, 2 clouds, or perhaps 10 environments?
Hey Anton, I'm really curious to know this, during your videos you usually organizing files with numbers in front of them, the question might be stupid, but I'll ask, do you use this only for the tutorials, or do you have a reason behind this, and follow this in your production coffee as well?
Hi Anton, Could you please do a video on how to use Prometheus in agent mode and forward metrics to Thanos Receive with scaling on both Agent and Receive side?
I have one video on prometheus agent (ua-cam.com/video/VyYrThINCjg/v-deo.html), set up remote write is the same as with other prometheus. Just use a code snippet from this video.
@@AntonPutra Thanks a lot.. one more thing as per official doc looks like thanos remote-write is not recommended for single tenant. if so, do you know the reason behind that?
@@Kavinnathcse Prometheus generally does not recommend using push methods except when necessary. Based on my experience managing 20+ environments and almost 100 Kubernetes clusters, remote read is a bit slow since each time you open Grafana, Thanos needs to query the remote environment. To improve user experience and speed up queries, we decided to use remote write.
Hi, i am doing this on on-prem and i don’t have any storage i can use except PVC, is there any way i can use thanos for this ? If yes then what will be the sidecar config ?
Hi Anton, thank you for sharing your videos. It has been incredibly helpful and expanded my knowledge. I followed your tutorial and successfully deployed Thanos with remote write Prometheus. Everything appears to be functioning correctly, except that I am unable to view any targets in the Thanos UI. Could you suggest a reason for this issue? btw, I used helm charts instead of manifests, which are much easier Furthermore, I have another question regarding best practices for the Alert Manager. Should I transfer all alerts to the ruler or should I deploy the Alert Manager with each Prometheus instance? Thank you in advance for your assistance.
Thanks! Well, first of all, you should not see any "targets" in the Thanos UI; you should only see "Stores". Targets should be visible in the local Prometheus UI. Second, Thanos recommends using local Prometheus and Alertmanagers because they do not rely on external components for queries to succeed. In my opinion, if Thanos is well monitored and any issues can be immediately identified, then using a global Ruler is acceptable.
@@AntonPutra I think so. I could not find much helpful content about Thanos implementation on UA-cam and you are always providing great tutoring. On top of that, the only other tutorial I watched also talked about a second video that was never released
Hi Anton Putra. Thx for thiw video very usefull. If I decide to use the opensfhit cluster prometheus native thanos query as datasource for an external grafana which is in a nerwork outside the openshift one, do I have first to open the flow?
Thanks for this video and sharing your knowledge, I just have a question about it, you mentioned that remote write is your favorite solution and I really like how this is working but I was investigating about remote write and the documentation says that is recommend only in cases of pushing metrics is the only solution. What do you think about it? I don't see any problem using remote write but maybe I'm ignoring something here (I'm newbie in this)
I've been running thanos at scale 10+ envs for a couple of years, based on my personal experience remote read almost lags, so I found in order to provide better user experience for developers and other devops team members is better to use remote wrtie
Thank you for your videos very insightful , I have a question, I have multiple clusters each cluster has a prometheus deployed to it and I have a monitoring cluster that has a centralized thanos, the prometheuses are in agent mode with enabled remote write, do I need to deploy a receiver for each prometheus ?, using thanos helm chart and for prometheus kube-prometheus-stack, and if so do each receiver need a qeurier or I can connect all of them to the same querier, each one is gonna save to a different bucket does the same needs to happen for the store gateway ?, and if you have any suggestions helping me architect this setup. Thank you again
Sure, you can deploy a single receiver (it can be sharded) and point all your prometheus agents there. Also you need at least 1 querier to query that receiver for short term metrics and store gateway in case you use object store for long term storage.
but how the receiver is gonna know which bucket to upload to if its a single one and all my four prometheuses write the metrics to it , for example the prometheus on dev should be uploaded to a dev bucket and so on @@AntonPutra
No, it's pretty much the same. By 'production', I meant that I was going to show how to use cloud-specific components, such as IRSA, to set up permissions to access an S3 bucket, etc.
@@AntonPutra Got it. To allow multi-cluster Thanos, is it as simple as making the receiver endpoint external? Would exposing it through an ingress and figuring out how to use cert-manager fix this? Any gotchas or resources? Thanks again, this tutorial was AMAZING
@@leeren_ yes you need public DNS for each receiver endpoint, but those endpoints use custom protocols and ports so you need dedicated load balancer or you can tcp service on Nginx to share one
@@leeren_ Either way, you don't need a public certificate. You need to create a CA, and you can import it to Cert-Manager to automatically renew your private certificates. The same applies with ACM.
Hi Anton, How to configure the prometheus to handle 4000 to 7000 Pods cluster? When we put current config and setup remote write, prometheus starts using 45GB+ RAM. can you please tell the best way to optimize it?
Two ways: 1. Manual sharding allows you to have multiple Prometheus instances querying different targets. 2. Prometheus has a sharding feature, but it's in beta; you can take a look.
@@AntonPutra Thanks. 1. so it will be 2 prometheus instance of 22 GB RAM but usage of 44GB is expected,right? One more thing related to put the datasource in grafana. Do I need to create 2 datasources. 1. Prometheus 2. Querier. Is there a way I get all the data of prometheus,Querier and Storage Gateway with single datasource?
Thanks Anton, will you be doing a video on a HA Thanos Receive setup? I can get it working with a single Receive pod but when I setup multiple I start getting internal server errors etc
I cover it in this tutorial. here is an exampe - github.com/antonputra/tutorials/blob/main/lessons/163/hashring.yaml 41:25 Configure Thanos Receiver Sharding
@@AntonPutra Thanks for that, the issue I see is when one of the receivers goes down for whatever reason the metrics it has are lost and Prometheus stops remote writing and throwing 500s because it cannot communicate with a host from the hashring. Does that make sense?
@@nickcarlton4604 Yes, but when the receiver recovers, prometheus will write all the missing metrics (if the receiver was down for few hours). You can set the replication if you want - github.com/antonputra/tutorials/blob/main/lessons/163/receiver-1/statefulset.yaml#L53 I'm not sure if HA is posible in this sense. You're right it's a receiver sharding for scalability and has nothing to do with HA (high availability when one replica goes down). I know that HA mode on Prometheus is possible by running multiple independent instances and use external labels for deduplication (it may or may not use different receiver shard...)
Thanks, I guess it would be more a case of if I had 2 receivers and a hashring, when a receiver goes down, how can I still remote write to the remaining one? Or would it be better to have two receivers without a hashring, let them work independently via a clusterIP service but ship the data off to object storage faster (a few minutes) so then it can be consumed from store gateway?
Question : Do we have any Helm charts or Operator for the Receiver Stack Installation or CRD's is the only way for now ? I checked the Official Bitnami Chart which supports the sidecar approach but nothing on the Receiver Any ideas ?
Hey Anton, Followed your video and implemented the thanos write method, all the pods are running without error, but I am still not able to see any metrics in thanos querier :( Can you point me what I could be missing here ?
based on my experience self managed thanos up to 100x cheaper. cadvisor and node exporters create lots of metrics with high cardinality (lots of label combinations) can be very expensive with managed services like managed prometheus or any other data dog sfx you name it
@@AntonPutra hey bro can do one vedio on opentelometry auto instrumentation kubernates cluster which application running on pod and i need response time of endpoints in that application. I'm a beginner and I'm not getting how approach and implementation.
@@sagarhm2237 Sure, in the future. I already used it in one of my benchmark videos, but I don't remember which one exactly. However, it is not as mature as Prometheus. On the other hand, it's used by many commercial monitoring agents.
🔴 - To support my channel, I’d like to offer Mentorship/On-the-Job Support/Consulting - me@antonputra.com
when it will come please let us know @AntonPutra
Hey Anton, Is your mentorship still available? (And yes, I'd be happy to pay :) )
Producing these had to take quite some time ... thanks for doing them, learning a ton from these prometheus videos you recently posted 10/10
Thank you, I'm glad that you find it useful!
Spasibo Anton! I was googling how to secure remote write on Thanos Remote Write/Prometheus Agent and you just released this video 3 days ago.
pojalusta, source code in the description
Amazing video and lecture..keep going .
thank you!!
Perfect compilation of each use case and detailed explanation. I haven't seen such a video on Prometheus and Thanos before. Thank you very much Really Helpful 👍
Hi Anton, thank you for the informative and educational content! I look forward to your next video about preparing a Production ready Thanos setup using Amazon EKS Cluster remote write endpoint and deploying stateless Prometheus in Agent mode and moving local alerts to a Centralized Thanos.
Thank you, will do
Thanks Anton! Helpful as always!
This is super helpful, thank you a LOT
welcome!
As usual another great tutorial!
N.B. I hope you will present a video about VictoriaMetrics) VM more productive and has great architecture
Thanks, will do!
super informative and useful tutorial, I can't wait for the production setup!
Thanks, well it's very similar but implementations would vary depending on the cloud you use.
Thanks anton for all efforts,
But we guys still waiting for remote Prometheus setup/cluster setup monitored through Grfana via Thanos.
It would be great help
Again thanks 🎉❤
Thanks will do eventually =)
Well done !!!
Thank you for your job !
Thanks =)
Hi @Anton .. why do we have 2 receivers ? Is it because to distribute load?
also can you clarify this :
1. Lets say we have more than 2 prometheus instances and we have 1 thanos, should we use multiple receivers i.e more than 3 or 4?
2. Is it wise to assign 1 receiver to 1 prometheus meaning receiver 1-> prometheus 1, receiver 2-> prometheus 2 etc?
we'll use an S3 bucket to store prometheus magics.
:)
Hi, thank you for the video, very clear as always, cannot wait to see the next video for Thanos in production, I hope soon.
Are you gonna use the prometheus agent in the next video?
Not sure when I'll create a following video cause I don't see a lot of interest in it here.
I have one video on prometheus agent (ua-cam.com/video/VyYrThINCjg/v-deo.html), set up remote write is the same as with other prometheus.
@@AntonPutraAmazing job! I wait for Thanos in production. I try to find some tutorial to do it but with no luck. I'm a new one with Kubernetes and Operators world and I don't know where to change prometheus configurtation to create this agent, write-remote solution. Please help :)
Hi Anton, great topic. Thank you so much!!!
I can see on the cover #1, have you planned any further tutorial on that topic?
Do we have any chance for a Federated Prometheus arch tutorial?
Thank you. However, I believe that Prometheus federation may no longer be widely used. Is my understanding incorrect?
@@AntonPutra yeah, I think you're right. Maybe Thanos fix that issue to collect metrics from different Prometheus
Is it correct?
@@pier_x0 yes it’s a main use case for thanks to give you global view
Hello Anton Good Video , a query when are you planning to release Thanos Production Setup ?
I'm thinking about it since lot of people asked. Probably soon, but all the pieces for that in this video. Just need to use public load balancers pretty much and of course configure IAM for S3
@@AntonPutra Oh that's great. Also I had a query, using remote write I'm storing data in thanos , now on the grafana level what changes should be done so that the tenant-A are able to see metrics of clusters running in tenant-A and tenant-B are able to see metrics of clusters running in tenant-B.
@@nitishchauhan7377 you can use tenant label on that metric
Super helpful video @AntonPutra . I have replicated this set up for my project. I have deployed on k8s v1.26.5 cluster. The only difference is I am using higher versions of Prometheus (v2.48.1); Prometheus-operator (0.71.2). I also upgraded the CRDs as per the operator. The issue is the I am not able to fetch any active prometheus target. I've checked all the configurations, deployed pods are running up and fine. The servicemonitor has correct labels(same as your src code). Any help appreciated ! Best Regards.
Forgot to add, I am checking the target at the local Prometheus UI :) There are no error logs as well, neither in operator pod nor in the prometheus-staging pod. I see one expected log missing as below when I compare it with your video:
level=info msg="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/prometheus/0 msg="Using pod service account via in-cluster config"
You got us used to doing everything in terraform was you tired😂 it’s a joke thanks for the sharing
Better doing it by hands for learning:)
Awesome video. Thanks a lot for sharing this. Could you please share the production ready multicluster monitoring setup in AKS?
What will be required components to be setup in centralized cluster and other individual cluster? Any document would be a great help.
sure - thanos.io/v0.6/thanos/storage.md/#azure-configuration
great tutorial, thanks!
Thank you!
you are genius
haha
hi , good video. You were talk about a video of how configure prometheus agent , IRSA with bucket s3 and alb aws for exposing remote-write-svc , where are those videos ? thanks
I haven't recorded them yet, sorry.
awesome video
Thanks, took me a while to record =)
Awesome video! I was trying to follow as in video, I was stuck at few points. For first step, when thanos was disabled and 2 pods of prometheus-staging-0 were created, my expectation was to be able to see targets when prometheus-operated was port forward on port 9090 to my localhost. I didn't see that. Could you please let me know why? Second after adding thanos sidecars without mutual tls, on port forward querier on 9090 on localhost, I did see sidecar in my Stores but I wasn't able to see any Prometheus metrics as you showed . Please guide.
Thanks, there are a lot of labels that must match in order for the target to show up. I have another video on my channel covering Prometheus Operator, it may help. For the second question, if you don't have any targets, you won't have any metrics in Prometheus.
Thanks for your tutorial. I would like to know in real production scenario, are there any ways to expose minIO securely without port forwarding?
Sure, Minio is a distributed web server that mostly uses HTTP and websockets. Use an ingress or simply a load balancer to expose Minio.
Nice tutorial. Whats the recommended way to install thanos on production grade level? Do we have respective helm chart from thanos community?
It doesn't really matter how you deploy it for production-whether using plain YAML, Kustomize, or Helm. Just make sure you use the same approach throughout your entire infrastructure.
Thanks!
Thanks for the support @denisrazumnyi6456!
Nice video . I clone you repo but was didn't see the yaml files you were using.
Sorry for the delay. All of the YAML files are here: github.com/antonputra/tutorials/tree/main/lessons/163.
Nice video Anton, do you have something for thanos receive distributor and thanos receive configuration. Or the one which is video is the same thing. For me the remotewrite does not work if the distributor is enabled via the helm chart. Any help is highly appreciated.
Hi, when are you planning to make a video about thanos and prometheus in agent mod please? Thank you very much for your answer. The current video was great, I would also like to see more about prometheus agent mod and Thanos.
Thanks, it's very similar and I have examples somewhere in my GitHub repo. You just need to provide a couple of additional flags and you get stateless, well almost stateless prometheus
Hi @AntonPutra, Thanks for the video. Did you get a chance to prepare the Production ready Thanos setup using Amazon EKS Cluster remote write endpoint and deploying stateless Prometheus in Agent mode and moving local alerts to a Centralized Thanos.
Not yet, but I'll definitely create one soon
@@AntonPutraI would love to see it done in AWS! I’m trying to continue off what I’ve learned from this video. Do I now need to expose the Querier from each cluster to Grafana? I’m so grateful for the content you put out. MANY thanks and cups of coffee for you friend!
@@xXKingofDiamondsXx I may refresh it soon, maybe with Thanos, Cortex, or even Grafana Mimir. I haven't decided yet which one to cover first.
great tutorial as always. when is the production setup coming pls?
Thanks, it's not much different just use external dns and more shards =)
Hello anton, thanks for video. can you please give steps for aws eks cluster monitoring with prometheus thanos setup.
Well, you need to use IRSA to grant the Thanos storage gateway read/write access to the S3 bucket. The rest of it is more or less the same. Are there any specific issues?
Hi Anton. Thanks a million times for the awesome series. I have a question.
Let's destroy the minio and set the "tsdb.retention" value like "180day" in the receivers. Is this bad practice? I don't want to use any object storage(S3, Minio etc.).
I am asking this question for the Remote Write approach.
No, it's fine and it can eliminate the bottleneck. Just make sure you set up sharding from day one. Also, if you decide to use Thanos Ruler, it's more reliable without S3 and the store gateway.
I completely understand you. Thanks for your detailed explanation Anton 😊
By the way, why might I need to install the thanos-ruler? I don't fully understand this component.
@@m18unet For example, if you run stateless Prometheus in 'agent' mode, no data is stored locally. This means you cannot run Alertmanager locally as well. The Ruler is used to run alerts on the global state, which also has some drawbacks. You can read more about this in the official documentation.
Hello Anton Putra,
When can we expect the video on production ready Thanos setup?
I don't know yet; it's very similar. You just need to use IRSA to configure permissions for S3. Do you have any specific use case in mind, such as 1 cloud, 2 clouds, or perhaps 10 environments?
@@AntonPutra
Thanks for the reply,
Planning to setup in Multiple AKS clusters where the receivers will be behind ingress. Anyways let me set it up.
Please complete the series about devops interview questions
ok
we need a setup between two eks cluster metrics with prometheus thanos ha and stored in s3.... we are awaiting for this log from 4 months
Sorry, I didn't get a question.
Hey Anton, I'm really curious to know this, during your videos you usually organizing files with numbers in front of them, the question might be stupid, but I'll ask, do you use this only for the tutorials, or do you have a reason behind this, and follow this in your production coffee as well?
Hi Rafael, no just for tutorial for files to be in order. Don't use numbers in your production code, it would look like a beginner's code =)
@@AntonPutra oh ok, thanks for the answer, I'm a beginner though haha 😂 in the DevOps world at least.
Hi Anton, Could you please do a video on how to use Prometheus in agent mode and forward metrics to Thanos Receive with scaling on both Agent and Receive side?
I have one video on prometheus agent (ua-cam.com/video/VyYrThINCjg/v-deo.html), set up remote write is the same as with other prometheus. Just use a code snippet from this video.
very nice tutorial .. so whats the difference between thanos vs grafana mimir ?
thanks. i haven't used it yet, but I'll definitely test it and create a tutorial, maybe a direct comparison soon
@@AntonPutra Thanks a lot.. one more thing as per official doc looks like thanos remote-write is not recommended for single tenant. if so, do you know the reason behind that?
@@Kavinnathcse Prometheus generally does not recommend using push methods except when necessary. Based on my experience managing 20+ environments and almost 100 Kubernetes clusters, remote read is a bit slow since each time you open Grafana, Thanos needs to query the remote environment. To improve user experience and speed up queries, we decided to use remote write.
Hi, i am doing this on on-prem and i don’t have any storage i can use except PVC, is there any way i can use thanos for this ? If yes then what will be the sidecar config ?
Hi Anton, thank you for sharing your videos. It has been incredibly helpful and expanded my knowledge.
I followed your tutorial and successfully deployed Thanos with remote write Prometheus. Everything appears to be functioning correctly, except that I am unable to view any targets in the Thanos UI. Could you suggest a reason for this issue?
btw, I used helm charts instead of manifests, which are much easier
Furthermore, I have another question regarding best practices for the Alert Manager. Should I transfer all alerts to the ruler or should I deploy the Alert Manager with each Prometheus instance?
Thank you in advance for your assistance.
Thanks! Well, first of all, you should not see any "targets" in the Thanos UI; you should only see "Stores". Targets should be visible in the local Prometheus UI.
Second, Thanos recommends using local Prometheus and Alertmanagers because they do not rely on external components for queries to succeed.
In my opinion, if Thanos is well monitored and any issues can be immediately identified, then using a global Ruler is acceptable.
Thanks for the tutorial Anton! When the next video will be released?
It appears that it's a very niche topic. Do you think there is an interest in continuing and providing prod ready setup?
@@AntonPutra I think so. I could not find much helpful content about Thanos implementation on UA-cam and you are always providing great tutoring.
On top of that, the only other tutorial I watched also talked about a second video that was never released
@@gabrielportela6544 okay, but I still need to finish my terraform series.
I'd also love too see the video ❤
Thank you so much for your work
Hi Anton Putra. Thx for thiw video very usefull. If I decide to use the opensfhit cluster prometheus native thanos query as datasource for an external grafana which is in a nerwork outside the openshift one, do I have first to open the flow?
Thanks for this video and sharing your knowledge, I just have a question about it, you mentioned that remote write is your favorite solution and I really like how this is working but I was investigating about remote write and the documentation says that is recommend only in cases of pushing metrics is the only solution. What do you think about it? I don't see any problem using remote write but maybe I'm ignoring something here (I'm newbie in this)
I've been running thanos at scale 10+ envs for a couple of years, based on my personal experience remote read almost lags, so I found in order to provide better user experience for developers and other devops team members is better to use remote wrtie
Thank you for your videos very insightful ,
I have a question, I have multiple clusters each cluster has a prometheus deployed to it and I have a monitoring cluster that has a centralized thanos, the prometheuses are in agent mode with enabled remote write, do I need to deploy a receiver for each prometheus ?, using thanos helm chart and for prometheus kube-prometheus-stack,
and if so do each receiver need a qeurier or I can connect all of them to the same querier, each one is gonna save to a different bucket does the same needs to happen for the store gateway ?, and if you have any suggestions helping me architect this setup.
Thank you again
Sure, you can deploy a single receiver (it can be sharded) and point all your prometheus agents there. Also you need at least 1 querier to query that receiver for short term metrics and store gateway in case you use object store for long term storage.
but how the receiver is gonna know which bucket to upload to if its a single one and all my four prometheuses write the metrics to it , for example the prometheus on dev should be uploaded to a dev bucket and so on @@AntonPutra
Looking forward to the followup video for production endpoints! Is that hard to setup?
No, it's pretty much the same. By 'production', I meant that I was going to show how to use cloud-specific components, such as IRSA, to set up permissions to access an S3 bucket, etc.
@@AntonPutra Got it. To allow multi-cluster Thanos, is it as simple as making the receiver endpoint external? Would exposing it through an ingress and figuring out how to use cert-manager fix this? Any gotchas or resources? Thanks again, this tutorial was AMAZING
@@leeren_ yes you need public DNS for each receiver endpoint, but those endpoints use custom protocols and ports so you need dedicated load balancer or you can tcp service on Nginx to share one
@@AntonPutra For AWS do you recommend using ACM or just doing cert-manager ?
@@leeren_ Either way, you don't need a public certificate. You need to create a CA, and you can import it to Cert-Manager to automatically renew your private certificates. The same applies with ACM.
@AntonPutra : From where to get these yaml's??
I found them in the official repository a while ago and made a few updates. Are there any concerns?
Hi Anton, How to configure the prometheus to handle 4000 to 7000 Pods cluster? When we put current config and setup remote write, prometheus starts using 45GB+ RAM. can you please tell the best way to optimize it?
Two ways:
1. Manual sharding allows you to have multiple Prometheus instances querying different targets.
2. Prometheus has a sharding feature, but it's in beta; you can take a look.
@@AntonPutra Thanks. 1. so it will be 2 prometheus instance of 22 GB RAM but usage of 44GB is expected,right? One more thing related to put the datasource in grafana. Do I need to create 2 datasources. 1. Prometheus 2. Querier. Is there a way I get all the data of prometheus,Querier and Storage Gateway with single datasource?
Hi , I am not able to find the video for thanos setup in eks
It's very similar to this one but I haven't released it yet
i think grafana mimir better than thanos at scale and easy to scaleout
I've never used it before but I'll give it a shot
Thanks Anton, will you be doing a video on a HA Thanos Receive setup? I can get it working with a single Receive pod but when I setup multiple I start getting internal server errors etc
I cover it in this tutorial. here is an exampe - github.com/antonputra/tutorials/blob/main/lessons/163/hashring.yaml
41:25 Configure Thanos Receiver Sharding
@@AntonPutra Thanks for that, the issue I see is when one of the receivers goes down for whatever reason the metrics it has are lost and Prometheus stops remote writing and throwing 500s because it cannot communicate with a host from the hashring. Does that make sense?
@@nickcarlton4604 Yes, but when the receiver recovers, prometheus will write all the missing metrics (if the receiver was down for few hours). You can set the replication if you want - github.com/antonputra/tutorials/blob/main/lessons/163/receiver-1/statefulset.yaml#L53
I'm not sure if HA is posible in this sense. You're right it's a receiver sharding for scalability and has nothing to do with HA (high availability when one replica goes down). I know that HA mode on Prometheus is possible by running multiple independent instances and use external labels for deduplication (it may or may not use different receiver shard...)
Thanks, I guess it would be more a case of if I had 2 receivers and a hashring, when a receiver goes down, how can I still remote write to the remaining one? Or would it be better to have two receivers without a hashring, let them work independently via a clusterIP service but ship the data off to object storage faster (a few minutes) so then it can be consumed from store gateway?
@@nickcarlton4604 Not sure about that, I use 5 shards in prod and don't have any issues. Please try it and let me know if it works
Question : Do we have any Helm charts or Operator for the Receiver Stack Installation or CRD's is the only way for now ?
I checked the Official Bitnami Chart which supports the sidecar approach but nothing on the Receiver
Any ideas ?
I don't think so, there is one for just prometheus - github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Hi Anton, got a question, How is thanos querier connected to prometheus sidecar. What values is to be passed in querier.stores list ?
It uses gRPC protocol
thanos query \
--http-address "0.0.0.0:9090" \
--query.replica-label "replica" \
--endpoint ":" \
--endpoint ":" \
Hey Anton,
Followed your video and implemented the thanos write method, all the pods are running without error, but I am still not able to see any metrics in thanos querier :( Can you point me what I could be missing here ?
I would suggest starting with the minimal setup and trying to inspect every component for any errors.
Hi Anton,
why don't you use helm to install prometheus?
You can use helm, nothing wrong with it
Hi sir can you make a database benchmarking (mongo db vs mysql vs postgresql)
Sure, I was working on one between MySQL and Postgres already
Do you think their is an interest to use thanos when implemented managed Prometheus ?
based on my experience self managed thanos up to 100x cheaper. cadvisor and node exporters create lots of metrics with high cardinality (lots of label combinations) can be very expensive with managed services like managed prometheus or any other data dog sfx you name it
Make videos on Cortex
in the works with vitoriametrics
@@AntonPutra appreciate it. and if possible please make it a detailed and long video
@@rahulchowdhury279 sure
Still waiting @Anton
Which app u using to edite
Adobe Suite
@@AntonPutra hey bro can do one vedio on opentelometry auto instrumentation kubernates cluster which application running on pod and i need response time of endpoints in that application. I'm a beginner and I'm not getting how approach and implementation.
@@sagarhm2237 Sure, in the future. I already used it in one of my benchmark videos, but I don't remember which one exactly. However, it is not as mature as Prometheus. On the other hand, it's used by many commercial monitoring agents.
@@AntonPutra thanks I'm the fresher one .