Great great video. One way to trigger a workflow is via ArgoCD notification service. When a new deployment sync happens, the notification service sends a message containing actual new version to a webhook. The question is if the notification service is fully reliable. It’s up to your receiving service to do whatever it needs to do . I’m thinking loud here, but have done some other triggers based on the notification service messages.
Thanks for another great video 🎉 this is definitively a good approach and it perfectly marries event based architectures which I like and use a lot myself. I'll give it a try
we are solving this problem by having a loop on the CICD pipeline calling the argocd api and checking if all changes are deployed and healthy, which by the way is a quality gate itself and then we trigger the argoworkflows test by calling the argoworkflows api from the cicd.
Thank you for the great content. The gitops approach complicated the delivery pipeline. Even with create replicaset approach, we need to check if new pods are OK or crashing. Not sure if we can use the argocd Application (CRD) alerts also or hooks to trigger the pipeline.
Great video as usual Victor! I have another idea to solve this problem: Start an argocd sync, and then argocd wait from the pipeline. Should work too right? Maybe getting some error in the first step because a sync may already be in progress, but thats okey. What do you think?
One of the main benefits of gitops is that it is more secure. Since argo CD pulls info, there is no need for anything or anyone to have direct access to the cluster. If you use argocd cli, you're removing that advantage. Another issue is that you are blocking pipeline run and wasting resources.
Im thinking of using Argo events as first step in a workflow pipeline which is triggered when a new cluster (which is basically a K8s secret) is created and then start deploying apps and managing sequence between them
When we started with Flux, we had the same problem. Our pipeline renders a helm chart with helm template and pushes this to a git ops repo. Then the same pipeline sends a webhook to flux to trigger reconcilation. Flux will update the commit status after that and the pipeline checks that for a maximum of 5mins. Then we continue with selenium and zap and then the same for production stage. I like having this within a single stream of events controlled by a pipeline, because otherwiss you never know where you currently are in the process and if it fails somewhere you‘ll have to start searching. Not a good approach when you have loads of pipelines to manage.
I do not think sops is a good solution when working in kubernetes. If they add an operator that would change. Until then, I strongly recommend External Secrets Operator.
@donnieashok8799 The sops plugin for Argo CD is a joke (as most plugins are). The real solution is to extend kubernetes with a CRD and a controller but SOPS does not seem to be interested in kubernetes. To be clear, I am not saying that you shouldn't use sops but only that I discarded it for my use so I am not the right person to make a video about it.
Hi, great video, how would you compare this aproach to cd with the Kargo tool. It also solves the promotion problem in gitops, not only decoupling and post deployment steps...
Hi, one issue with this solution is that if you change, for example, resources or an environment variable, the tests will be rerun on the same image. If the functional tests aren't too costly, this might not be a big problem. Another possible solution could be to use Shell Operator, which would allow writing the validation in shell to check if the change being made to the replicaset is the image. Note: I've never used Shell Operator, so this approach is 100% theoretical.
I'm guessing that you would change environment variable as a way to change the functionality of the app in some form or another. If that's the case, running post deployment tasks makes sense. Even when that is not the case, those are probably rare cases so running tasks is not much of an overhead (of it happens rarely).
@@DevOpsToolkit Yes, you're probably right. However, have you considered using Shell Operator for these tasks? Have you tested it? Does it make sense to use it in this scenario?
I don't use it much myself so i cannot say whether it is good or not. What i can say is that I'm a huge fan of using operators in general (no matter how you create them).
There is a major thing I think of.. which is it takes time to scale and for pods to be ready.. and also it takes time for the previous replicaset to scale down after deploying a new release.. So functional tests in that case could run on the previous release not the new release.. which is not what it was intended to do..
It depends on what you're testing. In my case it's from the start of rollout. I want to ensure not only that it works after it's been rolled out but also during
@@DevOpsToolkit is there a way to do that? I mean functional testing only on new deployments.. I see 2 options: 1. waiting for the new release to be fully deployed.. maybe test in the middle.. and test after the deployment is done. i.e. old replicas=0 replicas=100% 2. a specific endpoint/ingress/gateway for that exact release to be tested?
@mohamedsamet4539 I haven't tried that since the time rolling out tests are done, the rollout is done as well. If you need to test only after it's rolled out your best bet is to create your own operator which can trigger events.
@@DevOpsToolkit That's a good idea for a small project. I want to implement it. If it's possible can you tell me in general how to do that? I think of it like this: an operator that continuously watches the status of the replicas/deployments/pods and shows the status of a new "deployment" in the CRD and triggers an event after the "deployment" is done. The "deployment" is considered to be done if all the pods of the new replica are ready and the old replicas are scaled to 0.
@mohamedsamet4539 something like that should work. If you're familiar with go you can do it with kubebuilder or one of many tools that create controllers. Alternatively, it might be easier to add what's missing directly to argo events.
Thank you for the video! While the challenge was well explained, I'm concerned that basing events on replica count might trigger tests on every scaling event. This doesn't seem like the optimal solution for handling such this challenge.
I'm triggering it (in that video) on ReplicaSets which is not perfect, but still okay in most of the cases (at least when using Deployments). That being said, I don't think that Argo Events is the right tool, but that it is the best we have (among many bad options).
Maybe I too old-fashioned, but I just use "flux reconcile --with-source" in CI deploy job after pushing changes to git. If this command fails to succeed in specified timeout, script calls "flux events" to show what's wrong, and exits with code 1 (to mark job as failed). This way developers promptly can see if their changes lead to non-working code, and also, what specifically was broken, just by one click in CI pipelines panel. In event-based architecture, connecting notifications and logs to specific git commits and tags still is a challenge. (By the way, Argo has "argo app sync" and "argo app wait" commands serving the same purpose.)
I should have said in the video that one of my requirements (and one of main reasons for using gitops) is to deny any tool (including pipelines) access to the cluster. That's why gitops pull model is dominant.
> connecting notifications and logs to specific git commits and tags still is a challenge. thats exactly why we are currently doing the same as you, just with argocd. We have the argocli and have it wait-until-stable, after which we run e2e tests in triggered pipelines in gitlab. The feedback loop for our developers is much better, actually seeing that the commit i made a couple of minutes ago lead to the tests failing. Its not 100% airtight, sometimes if multiple people deploy different parts at the same time, you can have pipelines failing because someone else made a mistake. But thats still preferrable, because it still means i dont go testing my newly implemented feature just to find out someone else broke sth else in the meantime.
@@DevOpsToolkit Yes, this restriction makes task significantly harder. However, if checking health endpoint in remote cluster is allowed, you can deploy in each cluster simple proxy that exports GitOps k8s objects state (health, version), with some security measures like IP allowlists and auth, if needed. This way you can get unified method to check deployment status independently of application specifics.
the replication set is created, but in production it means that pods are just starting to rollout, it can be minutes before the rollout is completed. that is not the time to run the functional test against the service.
In my case that is okay since my tests are meant to start when the rollout starts. I want to see ensure that users are not experiencing any issues both during and after the rollout.
@@DevOpsToolkit that is why we have canary deployment in argo rollout. and in that case you can run your argo-workflow tests from within k8s (or your github action from a github runner that runs inside your k8s cluster) directly against the "X-Preview" svc version of the app. that will make sure you test the new version
Viktor, it is possible to use this Argo Events to trigger Jenkins Pipelines?I have installed ArgoCD on my Kubernetes and using it, do I need to install any addiitonal plugin to be enable to use Argo Events?
OR..... Use git branches or tags to model release state. Merge to release branch, test and build container. If this successful, pipeline merges git branch 'dev_ready_to_test' Another pipeline triggers on 'dev_ready_to test'. It deploys and runs functional tests and if it passes, it merges to 'prod_ready_to_deploy'. Add steps as needed for your release process. ALL this is tracked with in the system of record not some external event bus.
@@DevOpsToolkit does ArgoEvents do this? It triggers on Replicaset creation, but has no notion on the rollout being actually successful and healthy, has it? However, the replicaset trick is a very nice one! Maybe Kargo could be a solution for detecting healthiness?
With replicasets you at least know that a rollout of a new release started. You could do it in more complicated ways to be sure that it is fully rolled out, but I did not explore those in this video.
That true. It could also be Tekton or Jenkins or even kubernetes jibs. I used Actions mostly because I use them more often than other pipelines but the logic is the same no matter what your choice is.
Interesting approach. What I would do, is add a new stage in the main pipeline, just after the stage where the pipeline updates the git repo with the new image tag. On that new stage, I would (continuously) send a request to the healthcheck endpoint of the application (until it is healthy) and when it is healthy, I would trigger a new pipeline called "functional tests". That way, there is no need to install more stuff to my cluster (like Argo events), and I have one place to check what happens after the deployment of a new version of my application (the main pipeline stages)
@DevOpsToolkit good question! The healthcheck returns a json, and the current version of the deployed app is included in the json data. This data also contains the "healthy: true" key/value pair, only when the app is really healthy (e.g. it can connect to the db). The healthcheck endpoint is only visible from inside the network (not exposed to external users)
@@ioannisgko when you say inside the network, it should still be accessible from your pipeline runners/agents which are outside your cluster the way you phrased it. Having all the developers build such custom endpoints in the application and exposing it with a separate ingress seems a lot more challenging than an event based option if you are using Argo. Also, seems your pipeline is also wasting compute waiting for things to be ready. I am not sure about the scale here, but wouldn't you rather invest that time of blocking wait loop in building/running another pipeline ?
@dhananjaimvpai we do not take this approach, nothing custom is built by the developers. The healthcheck endpoint is prevented from being exposed to the real users of production just by a simple setting in the firewall by the Operations team. The healthcheck is accessible from all pipelines by default.
To me that adds unnecessary complexity and increases the pipeline duration (read it as cost increase) of a variable amount of time, potentially very long depending on the app startup time. At the very worst case it could cause the pipeline to hang up waiting should the app behave incorrectly
Yeaaah! Thank you very much! I''ve been waiting this video for a couple years!
thank you so much
you are an asset to the devops community
Great great video. One way to trigger a workflow is via ArgoCD notification service. When a new deployment sync happens, the notification service sends a message containing actual new version to a webhook. The question is if the notification service is fully reliable. It’s up to your receiving service to do whatever it needs to do . I’m thinking loud here, but have done some other triggers based on the notification service messages.
Thanks for another great video 🎉 this is definitively a good approach and it perfectly marries event based architectures which I like and use a lot myself. I'll give it a try
always learn a lot thanks
Great video viktor
we are solving this problem by having a loop on the CICD pipeline calling the argocd api and checking if all changes are deployed and healthy, which by the way is a quality gate itself and then we trigger the argoworkflows test by calling the argoworkflows api from the cicd.
Thank you for the great content. The gitops approach complicated the delivery pipeline. Even with create replicaset approach, we need to check if new pods are OK or crashing. Not sure if we can use the argocd Application (CRD) alerts also or hooks to trigger the pipeline.
Great video as usual Victor! I have another idea to solve this problem: Start an argocd sync, and then argocd wait from the pipeline. Should work too right? Maybe getting some error in the first step because a sync may already be in progress, but thats okey. What do you think?
One of the main benefits of gitops is that it is more secure. Since argo CD pulls info, there is no need for anything or anyone to have direct access to the cluster. If you use argocd cli, you're removing that advantage. Another issue is that you are blocking pipeline run and wasting resources.
Im thinking of using Argo events as first step in a workflow pipeline which is triggered when a new cluster (which is basically a K8s secret) is created and then start deploying apps and managing sequence between them
I used ArgoCD PostSync hooks for something similar.
When we started with Flux, we had the same problem. Our pipeline renders a helm chart with helm template and pushes this to a git ops repo. Then the same pipeline sends a webhook to flux to trigger reconcilation. Flux will update the commit status after that and the pipeline checks that for a maximum of 5mins. Then we continue with selenium and zap and then the same for production stage. I like having this within a single stream of events controlled by a pipeline, because otherwiss you never know where you currently are in the process and if it fails somewhere you‘ll have to start searching. Not a good approach when you have loads of pipelines to manage.
Could you please do an episode on how to commit secrets to Git using Kustomize-sops and KMS key?
I do not think sops is a good solution when working in kubernetes. If they add an operator that would change. Until then, I strongly recommend External Secrets Operator.
@@DevOpsToolkit How about using Argo CD in between to decrypt the Secret objects using KMS? Would you consider that as a good solution?
@donnieashok8799 The sops plugin for Argo CD is a joke (as most plugins are). The real solution is to extend kubernetes with a CRD and a controller but SOPS does not seem to be interested in kubernetes.
To be clear, I am not saying that you shouldn't use sops but only that I discarded it for my use so I am not the right person to make a video about it.
Is there a way to send argocd notifications to argo event source [ exposed on an alb] . So that sensor can trigger post deployment workflow
You can configure argo events to watch kubernetes objects and trigger something when they are created, updated, or deleted.
@@DevOpsToolkit Makes sense, thinking of a valid workflow, Thank you
love it, thanks for share it
Hi, great video, how would you compare this aproach to cd with the Kargo tool. It also solves the promotion problem in gitops, not only decoupling and post deployment steps...
Back when i checked it the ñast time, it was doing only the promotion, and not post-sync operations. I need to check it again.
Here it goes: ua-cam.com/video/RoY7Qu51zwU/v-deo.html
Hi, one issue with this solution is that if you change, for example, resources or an environment variable, the tests will be rerun on the same image. If the functional tests aren't too costly, this might not be a big problem. Another possible solution could be to use Shell Operator, which would allow writing the validation in shell to check if the change being made to the replicaset is the image. Note: I've never used Shell Operator, so this approach is 100% theoretical.
I'm guessing that you would change environment variable as a way to change the functionality of the app in some form or another. If that's the case, running post deployment tasks makes sense. Even when that is not the case, those are probably rare cases so running tasks is not much of an overhead (of it happens rarely).
@@DevOpsToolkit Yes, you're probably right. However, have you considered using Shell Operator for these tasks? Have you tested it? Does it make sense to use it in this scenario?
I don't use it much myself so i cannot say whether it is good or not. What i can say is that I'm a huge fan of using operators in general (no matter how you create them).
Is there a way to see the jobs on ArgoCD UI triggered by ArgoEvents?
If by jobs you mean Argo Workflows jobs (executions) the answer is no.
There is a major thing I think of.. which is it takes time to scale and for pods to be ready.. and also it takes time for the previous replicaset to scale down after deploying a new release..
So functional tests in that case could run on the previous release not the new release.. which is not what it was intended to do..
It depends on what you're testing. In my case it's from the start of rollout. I want to ensure not only that it works after it's been rolled out but also during
@@DevOpsToolkit is there a way to do that? I mean functional testing only on new deployments.. I see 2 options:
1. waiting for the new release to be fully deployed.. maybe test in the middle.. and test after the deployment is done. i.e. old replicas=0 replicas=100%
2. a specific endpoint/ingress/gateway for that exact release to be tested?
@mohamedsamet4539 I haven't tried that since the time rolling out tests are done, the rollout is done as well. If you need to test only after it's rolled out your best bet is to create your own operator which can trigger events.
@@DevOpsToolkit That's a good idea for a small project. I want to implement it.
If it's possible can you tell me in general how to do that?
I think of it like this: an operator that continuously watches the status of the replicas/deployments/pods and shows the status of a new "deployment" in the CRD and triggers an event after the "deployment" is done. The "deployment" is considered to be done if all the pods of the new replica are ready and the old replicas are scaled to 0.
@mohamedsamet4539 something like that should work. If you're familiar with go you can do it with kubebuilder or one of many tools that create controllers. Alternatively, it might be easier to add what's missing directly to argo events.
Thank you for the video! While the challenge was well explained, I'm concerned that basing events on replica count might trigger tests on every scaling event. This doesn't seem like the optimal solution for handling such this challenge.
I'm triggering it (in that video) on ReplicaSets which is not perfect, but still okay in most of the cases (at least when using Deployments). That being said, I don't think that Argo Events is the right tool, but that it is the best we have (among many bad options).
thanks
Maybe I too old-fashioned, but I just use "flux reconcile --with-source" in CI deploy job after pushing changes to git. If this command fails to succeed in specified timeout, script calls "flux events" to show what's wrong, and exits with code 1 (to mark job as failed). This way developers promptly can see if their changes lead to non-working code, and also, what specifically was broken, just by one click in CI pipelines panel.
In event-based architecture, connecting notifications and logs to specific git commits and tags still is a challenge.
(By the way, Argo has "argo app sync" and "argo app wait" commands serving the same purpose.)
I should have said in the video that one of my requirements (and one of main reasons for using gitops) is to deny any tool (including pipelines) access to the cluster. That's why gitops pull model is dominant.
> connecting notifications and logs to specific git commits and tags still is a challenge.
thats exactly why we are currently doing the same as you, just with argocd. We have the argocli and have it wait-until-stable, after which we run e2e tests in triggered pipelines in gitlab. The feedback loop for our developers is much better, actually seeing that the commit i made a couple of minutes ago lead to the tests failing.
Its not 100% airtight, sometimes if multiple people deploy different parts at the same time, you can have pipelines failing because someone else made a mistake. But thats still preferrable, because it still means i dont go testing my newly implemented feature just to find out someone else broke sth else in the meantime.
@@DevOpsToolkit Yes, this restriction makes task significantly harder. However, if checking health endpoint in remote cluster is allowed, you can deploy in each cluster simple proxy that exports GitOps k8s objects state (health, version), with some security measures like IP allowlists and auth, if needed. This way you can get unified method to check deployment status independently of application specifics.
@sergeyp2932 as long as that health check outputs versions that should work. It's less efficient, but it can be a solution.
the replication set is created, but in production it means that pods are just starting to rollout, it can be minutes before the rollout is completed. that is not the time to run the functional test against the service.
In my case that is okay since my tests are meant to start when the rollout starts. I want to see ensure that users are not experiencing any issues both during and after the rollout.
@@DevOpsToolkit that is why we have canary deployment in argo rollout. and in that case you can run your argo-workflow tests from within k8s (or your github action from a github runner that runs inside your k8s cluster) directly against the "X-Preview" svc version of the app. that will make sure you test the new version
Viktor, it is possible to use this Argo Events to trigger Jenkins Pipelines?I have installed ArgoCD on my Kubernetes and using it, do I need to install any addiitonal plugin to be enable to use Argo Events?
Yes. You can send an http request to Jenkins.
@@DevOpsToolkit Thank you very much 🙏
Hmmm, can I trigger the event only once the ArgoCD app was successfully synced and healthy?
It's a bit tricky since argo event does not handle statuses well. It's doable...
Argo events 🎉🎉🎉🎉
OR..... Use git branches or tags to model release state.
Merge to release branch, test and build container. If this successful, pipeline merges git branch 'dev_ready_to_test'
Another pipeline triggers on 'dev_ready_to test'. It deploys and runs functional tests and if it passes, it merges to 'prod_ready_to_deploy'.
Add steps as needed for your release process.
ALL this is tracked with in the system of record not some external event bus.
How will you know that a new release is operational before running tests?
@@DevOpsToolkit does ArgoEvents do this? It triggers on Replicaset creation, but has no notion on the rollout being actually successful and healthy, has it?
However, the replicaset trick is a very nice one! Maybe Kargo could be a solution for detecting healthiness?
With replicasets you at least know that a rollout of a new release started. You could do it in more complicated ways to be sure that it is fully rolled out, but I did not explore those in this video.
Here it goes: ua-cam.com/video/RoY7Qu51zwU/v-deo.html
Why trigger a pipeline in GitHub when you can trigger a Workflow in Argo Workflows without ever leaving Kubernetes :P
That true. It could also be Tekton or Jenkins or even kubernetes jibs. I used Actions mostly because I use them more often than other pipelines but the logic is the same no matter what your choice is.
Hmmm, no…..
Interesting approach. What I would do, is add a new stage in the main pipeline, just after the stage where the pipeline updates the git repo with the new image tag. On that new stage, I would (continuously) send a request to the healthcheck endpoint of the application (until it is healthy) and when it is healthy, I would trigger a new pipeline called "functional tests". That way, there is no need to install more stuff to my cluster (like Argo events), and I have one place to check what happens after the deployment of a new version of my application (the main pipeline stages)
Does that health check return the version of the app so that you know that it is now the previous release sending you check responses?
@DevOpsToolkit good question! The healthcheck returns a json, and the current version of the deployed app is included in the json data. This data also contains the "healthy: true" key/value pair, only when the app is really healthy (e.g. it can connect to the db). The healthcheck endpoint is only visible from inside the network (not exposed to external users)
@@ioannisgko when you say inside the network, it should still be accessible from your pipeline runners/agents which are outside your cluster the way you phrased it. Having all the developers build such custom endpoints in the application and exposing it with a separate ingress seems a lot more challenging than an event based option if you are using Argo.
Also, seems your pipeline is also wasting compute waiting for things to be ready. I am not sure about the scale here, but wouldn't you rather invest that time of blocking wait loop in building/running another pipeline ?
@dhananjaimvpai we do not take this approach, nothing custom is built by the developers. The healthcheck endpoint is prevented from being exposed to the real users of production just by a simple setting in the firewall by the Operations team. The healthcheck is accessible from all pipelines by default.
To me that adds unnecessary complexity and increases the pipeline duration (read it as cost increase) of a variable amount of time, potentially very long depending on the app startup time. At the very worst case it could cause the pipeline to hang up waiting should the app behave incorrectly