GitOps Broke CI/CD! Here's How to Fix It With Argo Events

Поділитися
Вставка
  • Опубліковано 5 лют 2025

КОМЕНТАРІ • 73

  • @tim5967
    @tim5967 5 місяців тому +2

    Yeaaah! Thank you very much! I''ve been waiting this video for a couple years!

  • @aviadhaham
    @aviadhaham 6 місяців тому +3

    thank you so much
    you are an asset to the devops community

  • @supera74
    @supera74 6 місяців тому +1

    Great great video. One way to trigger a workflow is via ArgoCD notification service. When a new deployment sync happens, the notification service sends a message containing actual new version to a webhook. The question is if the notification service is fully reliable. It’s up to your receiving service to do whatever it needs to do . I’m thinking loud here, but have done some other triggers based on the notification service messages.

  • @IvanRizzante
    @IvanRizzante 6 місяців тому +1

    Thanks for another great video 🎉 this is definitively a good approach and it perfectly marries event based architectures which I like and use a lot myself. I'll give it a try

  • @gackerman99
    @gackerman99 5 місяців тому +1

    always learn a lot thanks

  • @tomasferrari92
    @tomasferrari92 6 місяців тому +1

    Great video viktor

  • @hugolopes5604
    @hugolopes5604 6 місяців тому +1

    we are solving this problem by having a loop on the CICD pipeline calling the argocd api and checking if all changes are deployed and healthy, which by the way is a quality gate itself and then we trigger the argoworkflows test by calling the argoworkflows api from the cicd.

  • @MohamedAnouarKhediri
    @MohamedAnouarKhediri 6 місяців тому +1

    Thank you for the great content. The gitops approach complicated the delivery pipeline. Even with create replicaset approach, we need to check if new pods are OK or crashing. Not sure if we can use the argocd Application (CRD) alerts also or hooks to trigger the pipeline.

  • @frederikhaveras1742
    @frederikhaveras1742 6 місяців тому +1

    Great video as usual Victor! I have another idea to solve this problem: Start an argocd sync, and then argocd wait from the pipeline. Should work too right? Maybe getting some error in the first step because a sync may already be in progress, but thats okey. What do you think?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      One of the main benefits of gitops is that it is more secure. Since argo CD pulls info, there is no need for anything or anyone to have direct access to the cluster. If you use argocd cli, you're removing that advantage. Another issue is that you are blocking pipeline run and wasting resources.

  • @Luther_Luffeigh
    @Luther_Luffeigh 6 місяців тому +3

    Im thinking of using Argo events as first step in a workflow pipeline which is triggered when a new cluster (which is basically a K8s secret) is created and then start deploying apps and managing sequence between them

  • @veresij_
    @veresij_ 6 місяців тому +1

    I used ArgoCD PostSync hooks for something similar.

  • @rainybubblesfan
    @rainybubblesfan 6 місяців тому +1

    When we started with Flux, we had the same problem. Our pipeline renders a helm chart with helm template and pushes this to a git ops repo. Then the same pipeline sends a webhook to flux to trigger reconcilation. Flux will update the commit status after that and the pipeline checks that for a maximum of 5mins. Then we continue with selenium and zap and then the same for production stage. I like having this within a single stream of events controlled by a pipeline, because otherwiss you never know where you currently are in the process and if it fails somewhere you‘ll have to start searching. Not a good approach when you have loads of pipelines to manage.

  • @donnieashok8799
    @donnieashok8799 6 місяців тому +1

    Could you please do an episode on how to commit secrets to Git using Kustomize-sops and KMS key?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      I do not think sops is a good solution when working in kubernetes. If they add an operator that would change. Until then, I strongly recommend External Secrets Operator.

    • @donnieashok8799
      @donnieashok8799 6 місяців тому +1

      @@DevOpsToolkit How about using Argo CD in between to decrypt the Secret objects using KMS? Would you consider that as a good solution?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      @donnieashok8799 The sops plugin for Argo CD is a joke (as most plugins are). The real solution is to extend kubernetes with a CRD and a controller but SOPS does not seem to be interested in kubernetes.
      To be clear, I am not saying that you shouldn't use sops but only that I discarded it for my use so I am not the right person to make a video about it.

  • @SR-jg3bp
    @SR-jg3bp 2 місяці тому +1

    Is there a way to send argocd notifications to argo event source [ exposed on an alb] . So that sensor can trigger post deployment workflow

    • @DevOpsToolkit
      @DevOpsToolkit  2 місяці тому +1

      You can configure argo events to watch kubernetes objects and trigger something when they are created, updated, or deleted.

    • @SR-jg3bp
      @SR-jg3bp 2 місяці тому +1

      @@DevOpsToolkit Makes sense, thinking of a valid workflow, Thank you

  • @javisartdesign
    @javisartdesign 6 місяців тому +1

    love it, thanks for share it

  • @lgatial
    @lgatial 6 місяців тому +1

    Hi, great video, how would you compare this aproach to cd with the Kargo tool. It also solves the promotion problem in gitops, not only decoupling and post deployment steps...

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      Back when i checked it the ñast time, it was doing only the promotion, and not post-sync operations. I need to check it again.

    • @DevOpsToolkit
      @DevOpsToolkit  2 місяці тому

      Here it goes: ua-cam.com/video/RoY7Qu51zwU/v-deo.html

  • @mreparaz
    @mreparaz 6 місяців тому +1

    Hi, one issue with this solution is that if you change, for example, resources or an environment variable, the tests will be rerun on the same image. If the functional tests aren't too costly, this might not be a big problem. Another possible solution could be to use Shell Operator, which would allow writing the validation in shell to check if the change being made to the replicaset is the image. Note: I've never used Shell Operator, so this approach is 100% theoretical.

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      I'm guessing that you would change environment variable as a way to change the functionality of the app in some form or another. If that's the case, running post deployment tasks makes sense. Even when that is not the case, those are probably rare cases so running tasks is not much of an overhead (of it happens rarely).

    • @mreparaz
      @mreparaz 6 місяців тому +1

      @@DevOpsToolkit Yes, you're probably right. However, have you considered using Shell Operator for these tasks? Have you tested it? Does it make sense to use it in this scenario?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      I don't use it much myself so i cannot say whether it is good or not. What i can say is that I'm a huge fan of using operators in general (no matter how you create them).

  • @kandreasyan
    @kandreasyan 3 місяці тому +1

    Is there a way to see the jobs on ArgoCD UI triggered by ArgoEvents?

    • @DevOpsToolkit
      @DevOpsToolkit  3 місяці тому

      If by jobs you mean Argo Workflows jobs (executions) the answer is no.

  • @mohamedsamet4539
    @mohamedsamet4539 6 місяців тому +1

    There is a major thing I think of.. which is it takes time to scale and for pods to be ready.. and also it takes time for the previous replicaset to scale down after deploying a new release..
    So functional tests in that case could run on the previous release not the new release.. which is not what it was intended to do..

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      It depends on what you're testing. In my case it's from the start of rollout. I want to ensure not only that it works after it's been rolled out but also during

    • @mohamedsamet4539
      @mohamedsamet4539 6 місяців тому +1

      ​@@DevOpsToolkit is there a way to do that? I mean functional testing only on new deployments.. I see 2 options:
      1. waiting for the new release to be fully deployed.. maybe test in the middle.. and test after the deployment is done. i.e. old replicas=0 replicas=100%
      2. a specific endpoint/ingress/gateway for that exact release to be tested?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      @mohamedsamet4539 I haven't tried that since the time rolling out tests are done, the rollout is done as well. If you need to test only after it's rolled out your best bet is to create your own operator which can trigger events.

    • @mohamedsamet4539
      @mohamedsamet4539 6 місяців тому +1

      ​@@DevOpsToolkit That's a good idea for a small project. I want to implement it.
      If it's possible can you tell me in general how to do that?
      I think of it like this: an operator that continuously watches the status of the replicas/deployments/pods and shows the status of a new "deployment" in the CRD and triggers an event after the "deployment" is done. The "deployment" is considered to be done if all the pods of the new replica are ready and the old replicas are scaled to 0.

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      @mohamedsamet4539 something like that should work. If you're familiar with go you can do it with kubebuilder or one of many tools that create controllers. Alternatively, it might be easier to add what's missing directly to argo events.

  • @RonnySharaby
    @RonnySharaby Місяць тому +1

    Thank you for the video! While the challenge was well explained, I'm concerned that basing events on replica count might trigger tests on every scaling event. This doesn't seem like the optimal solution for handling such this challenge.

    • @DevOpsToolkit
      @DevOpsToolkit  Місяць тому

      I'm triggering it (in that video) on ReplicaSets which is not perfect, but still okay in most of the cases (at least when using Deployments). That being said, I don't think that Argo Events is the right tool, but that it is the best we have (among many bad options).

  • @jserpapinto
    @jserpapinto 6 місяців тому +1

    thanks

  • @sergeyp2932
    @sergeyp2932 6 місяців тому +1

    Maybe I too old-fashioned, but I just use "flux reconcile --with-source" in CI deploy job after pushing changes to git. If this command fails to succeed in specified timeout, script calls "flux events" to show what's wrong, and exits with code 1 (to mark job as failed). This way developers promptly can see if their changes lead to non-working code, and also, what specifically was broken, just by one click in CI pipelines panel.
    In event-based architecture, connecting notifications and logs to specific git commits and tags still is a challenge.
    (By the way, Argo has "argo app sync" and "argo app wait" commands serving the same purpose.)

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      I should have said in the video that one of my requirements (and one of main reasons for using gitops) is to deny any tool (including pipelines) access to the cluster. That's why gitops pull model is dominant.

    • @tlanfer
      @tlanfer 6 місяців тому +2

      > connecting notifications and logs to specific git commits and tags still is a challenge.
      thats exactly why we are currently doing the same as you, just with argocd. We have the argocli and have it wait-until-stable, after which we run e2e tests in triggered pipelines in gitlab. The feedback loop for our developers is much better, actually seeing that the commit i made a couple of minutes ago lead to the tests failing.
      Its not 100% airtight, sometimes if multiple people deploy different parts at the same time, you can have pipelines failing because someone else made a mistake. But thats still preferrable, because it still means i dont go testing my newly implemented feature just to find out someone else broke sth else in the meantime.

    • @sergeyp2932
      @sergeyp2932 6 місяців тому +1

      @@DevOpsToolkit Yes, this restriction makes task significantly harder. However, if checking health endpoint in remote cluster is allowed, you can deploy in each cluster simple proxy that exports GitOps k8s objects state (health, version), with some security measures like IP allowlists and auth, if needed. This way you can get unified method to check deployment status independently of application specifics.

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      @sergeyp2932 as long as that health check outputs versions that should work. It's less efficient, but it can be a solution.

  • @haimari871
    @haimari871 6 місяців тому +1

    the replication set is created, but in production it means that pods are just starting to rollout, it can be minutes before the rollout is completed. that is not the time to run the functional test against the service.

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      In my case that is okay since my tests are meant to start when the rollout starts. I want to see ensure that users are not experiencing any issues both during and after the rollout.

    • @haimari871
      @haimari871 6 місяців тому

      @@DevOpsToolkit that is why we have canary deployment in argo rollout. and in that case you can run your argo-workflow tests from within k8s (or your github action from a github runner that runs inside your k8s cluster) directly against the "X-Preview" svc version of the app. that will make sure you test the new version

  • @suporteking
    @suporteking 6 місяців тому +1

    Viktor, it is possible to use this Argo Events to trigger Jenkins Pipelines?I have installed ArgoCD on my Kubernetes and using it, do I need to install any addiitonal plugin to be enable to use Argo Events?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      Yes. You can send an http request to Jenkins.

    • @suporteking
      @suporteking 6 місяців тому +1

      @@DevOpsToolkit Thank you very much 🙏

  • @zoop2174
    @zoop2174 6 місяців тому +1

    Hmmm, can I trigger the event only once the ArgoCD app was successfully synced and healthy?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      It's a bit tricky since argo event does not handle statuses well. It's doable...

  • @Daveooooooooooo0
    @Daveooooooooooo0 2 місяці тому +1

    Argo events 🎉🎉🎉🎉

  • @HeathAlexander
    @HeathAlexander 6 місяців тому +3

    OR..... Use git branches or tags to model release state.
    Merge to release branch, test and build container. If this successful, pipeline merges git branch 'dev_ready_to_test'
    Another pipeline triggers on 'dev_ready_to test'. It deploys and runs functional tests and if it passes, it merges to 'prod_ready_to_deploy'.
    Add steps as needed for your release process.
    ALL this is tracked with in the system of record not some external event bus.

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      How will you know that a new release is operational before running tests?

    • @DerJoe92
      @DerJoe92 6 місяців тому +1

      ​​​@@DevOpsToolkit does ArgoEvents do this? It triggers on Replicaset creation, but has no notion on the rollout being actually successful and healthy, has it?
      However, the replicaset trick is a very nice one! Maybe Kargo could be a solution for detecting healthiness?

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      With replicasets you at least know that a rollout of a new release started. You could do it in more complicated ways to be sure that it is fully rolled out, but I did not explore those in this video.

    • @DevOpsToolkit
      @DevOpsToolkit  2 місяці тому

      Here it goes: ua-cam.com/video/RoY7Qu51zwU/v-deo.html

  • @pavelpikat8950
    @pavelpikat8950 6 місяців тому +1

    Why trigger a pipeline in GitHub when you can trigger a Workflow in Argo Workflows without ever leaving Kubernetes :P

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому

      That true. It could also be Tekton or Jenkins or even kubernetes jibs. I used Actions mostly because I use them more often than other pipelines but the logic is the same no matter what your choice is.

  • @InstaKane
    @InstaKane 6 місяців тому +1

    Hmmm, no…..

  • @ioannisgko
    @ioannisgko 6 місяців тому +5

    Interesting approach. What I would do, is add a new stage in the main pipeline, just after the stage where the pipeline updates the git repo with the new image tag. On that new stage, I would (continuously) send a request to the healthcheck endpoint of the application (until it is healthy) and when it is healthy, I would trigger a new pipeline called "functional tests". That way, there is no need to install more stuff to my cluster (like Argo events), and I have one place to check what happens after the deployment of a new version of my application (the main pipeline stages)

    • @DevOpsToolkit
      @DevOpsToolkit  6 місяців тому +1

      Does that health check return the version of the app so that you know that it is now the previous release sending you check responses?

    • @ioannisgko
      @ioannisgko 6 місяців тому +2

      @DevOpsToolkit good question! The healthcheck returns a json, and the current version of the deployed app is included in the json data. This data also contains the "healthy: true" key/value pair, only when the app is really healthy (e.g. it can connect to the db). The healthcheck endpoint is only visible from inside the network (not exposed to external users)

    • @dhananjaimvpai
      @dhananjaimvpai 6 місяців тому +2

      ​@@ioannisgko when you say inside the network, it should still be accessible from your pipeline runners/agents which are outside your cluster the way you phrased it. Having all the developers build such custom endpoints in the application and exposing it with a separate ingress seems a lot more challenging than an event based option if you are using Argo.
      Also, seems your pipeline is also wasting compute waiting for things to be ready. I am not sure about the scale here, but wouldn't you rather invest that time of blocking wait loop in building/running another pipeline ?

    • @ioannisgko
      @ioannisgko 6 місяців тому +1

      @dhananjaimvpai we do not take this approach, nothing custom is built by the developers. The healthcheck endpoint is prevented from being exposed to the real users of production just by a simple setting in the firewall by the Operations team. The healthcheck is accessible from all pipelines by default.

    • @IvanRizzante
      @IvanRizzante 6 місяців тому +3

      To me that adds unnecessary complexity and increases the pipeline duration (read it as cost increase) of a variable amount of time, potentially very long depending on the app startup time. At the very worst case it could cause the pipeline to hang up waiting should the app behave incorrectly