Site Reliability Engineering at Google • Christof Leng • GOTO 2018

Поділитися
Вставка
  • Опубліковано 3 жов 2024

КОМЕНТАРІ • 10

  • @Sousleek
    @Sousleek 5 років тому +10

    1.25x FTW

    • @missingdays1
      @missingdays1 3 роки тому +1

      Dude talks on 1.25x like a normal person talks on 1x

  • @Stephendenham
    @Stephendenham 2 роки тому +1

    Very surprised at 30:03 to hear him say the alert should be as precise as possible. My understanding of Google's SRE practice was to only alert on what users care about - "can they log in, can they add something to their cart?"
    ua-cam.com/video/CGldVD5wR-g/v-deo.html
    It seems as though the speaker is muddying good alerts with good monitoring at that point in the talk.
    Why is this important? Because trying to make alerts super specific, means 100s of alerts and it's too difficult to avoid alert fatigue. Avoiding this fatigue ensures meaningful alerts... I don't need my alert notification to tell me which specific part of the system failed, just that I need to open my monitoring dashboard, and with good monitoring, it will be relatively clear what went wrong.

    • @ChristofLeng
      @ChristofLeng Місяць тому +1

      User focus and precision go along together very well. An alert shouldn't be just "users see an increased rate of HTTP 500s" but "the checkout page sees a high rate of HTTP 500s, specifically when served from Cloud region $foo". This gives you a lot more to work with, saving you critical time when trying to understand what you need to fix the issue the users are experiencing.

  • @daradadachanji5670
    @daradadachanji5670 3 роки тому +1

    "Gmail 500" lmfao

  • @pablomena585
    @pablomena585 6 років тому +2

    Services Reliability engineer, nice twist!

  • @rainybubblesfan
    @rainybubblesfan 6 років тому +1

    Doesn‘t sound like DevOps to me, but just traditional silos of Dev and Ops...

    • @yclian5056
      @yclian5056 5 років тому +1

      SRE rolls SWE into the team for a long-enough period of time to train them on SRE capability. How's that silo?

    • @jakub.anderwald
      @jakub.anderwald 5 років тому +3

      DevOps doesn't need to get rid of separate organizations. It's a culture. not an org scheme, not a role. How you implement that culture it's up to you - here we can see Google's approach. You might have a different one.

    • @brunovalerio3188
      @brunovalerio3188 5 років тому +2

      class SRE implements DevOps! ;)