How GitHub's Database Self-Destructed in 43 Seconds

Поділитися
Вставка
  • Опубліковано 29 чер 2024
  • A brief maintenance accident turns for the worse as GitHub's database automatically fails over and breaks the website.
    Sources:
    github.blog/2018-10-30-oct21-...
    github.blog/2016-12-08-orches...
    github.blog/2018-06-20-mysql-...
    news.ycombinator.com/item?id=...
    / github_major_service_o...
    hub.packtpub.com/github-down-...
    github.blog/2017-10-12-evolut...
    Chapters:
    0:00 Part 1: Intro
    1:25 Part 2: GitHub's database explained
    3:40 Part 3: The 43 seconds
    5:04 Part 4: Fail back or not?
    6:54 Part 5: Recovery process
    10:32 Part 6: Aftermath
    Notes:
    - Funnily enough in this blog post from 4 months prior to the incident github.blog/2018-06-20-mysql-... they specifically explained how cross-data-center failovers could be carried out successfully
    Music:
    - Hitman by Kevin MacLeod
    - Blue Mood by Robert Munzinger
    - Pixelland by Kevin MacLeod
    - Dumb as a Box by Dan Lebowitz
  • Наука та технологія

КОМЕНТАРІ • 818

  • @sollybunn
    @sollybunn Рік тому +5907

    "We can't delete user data, we aren't gitlab"
    This video is a goldmine

  • @MaxwellHay
    @MaxwellHay Рік тому +6474

    The assumption that 50% of total github users are active is too optimistic

    • @Backtrack3332
      @Backtrack3332 Рік тому +572

      Yea, I'm guessing 2% max

    • @FiksIIanzO
      @FiksIIanzO Рік тому +577

      It's good to grossly overestimate potential issues

    • @KaidenBird
      @KaidenBird Рік тому +221

      As someone who hasn't pushed in weeks, that hurts, but is too true.

    • @lightning_11
      @lightning_11 Рік тому +21

      @@Backtrack3332 That's still a lot, though!

    • @RMDragon3
      @RMDragon3 Рік тому +160

      Yeah, those assumptions seem very off to me. I'm feel like less than 50% of GitHub users are active daily between abandoned users and people who rarely use it. On top of that, a significant percentage of users will be students or personal projects that don't really have a monetary impact. Also, most users likely didn't lose anywhere near to 2 hours, especially because the website wasn't fully down for anywhere close to those 24 hours. I'm sure it didn't work great during that time, but it was usable. If it happened to me, I would likely test for 5 minutes, check with collegues and just work locally, testing every hour or so. Some people may have been affected more, but 2 hours of lost productivity seems way too high to me. With that in mind, the estimate would likely be a few orders of magnitude lower.

  • @RichieYT
    @RichieYT Рік тому +4143

    These problems always occur during routine maintenance. That's why I don't do any maintenance whatsoever and my systems have never experienced downtime (although I've never checked)

    • @nicholasfinch4087
      @nicholasfinch4087 Рік тому +685

      can't have a problem if you don't see a problem

    • @kurdtpage
      @kurdtpage Рік тому +92

      This is the way

    • @zsoltsz2323
      @zsoltsz2323 Рік тому +155

      Even Chernobyl was routine maintenance.

    • @PieJee1
      @PieJee1 Рік тому +26

      That makes your system full of security exploits as security issues are not patched too. You will also face a huge issue if you are forced to update if you use versions that are too old

    • @elle9834
      @elle9834 Рік тому +69

      Out of sight out of mind

  • @Justin-jm2fd
    @Justin-jm2fd Рік тому +3244

    As a former bitbucket employee I can confirm we have disaster recovery plans for a lunar data center outage

    • @KangJangkrik
      @KangJangkrik Рік тому +10

      Now what?

    • @fatrobin72
      @fatrobin72 Рік тому +67

      Last I checked it was a disaster plan, there was no recovery...

    • @DaveParr
      @DaveParr Рік тому +12

      I'd assume you would us IPFS.

    • @jaythecoderx4623
      @jaythecoderx4623 Рік тому +7

      @@DaveParr Those have a lot of latency tho, don't they?

    • @siliconcassettes3369
      @siliconcassettes3369 Рік тому +19

      As a time traveller from the future I can confirm the recovery plans are insufficient and the situation becomes irrecoverable

  • @axelboberg
    @axelboberg Рік тому +2344

    Interplanetary failovers are a struggle, not gonna lie.

    • @__dm__
      @__dm__ Рік тому +47

      ipfs is (was?) a project with interplanetary, high-latency connection in mind with Merkle DAG datastructures for well, unstructured object data.
      It got adopted by the crypto crowd because memes and idk where it's going

    • @philip3963
      @philip3963 Рік тому +27

      @@__dm__ I work with IT solutions and I swear I've seen IPFS support in the industry before, just can't remember where

    • @ExEBoss
      @ExEBoss Рік тому +1

      @@philip3963 *Cloudflare* says they have support for it.

    • @muhammadyusoffjamaluddin
      @muhammadyusoffjamaluddin Рік тому +5

      PHP Devs: YOU THINK SOO??????

    • @LinhNguyen-zg9kn
      @LinhNguyen-zg9kn Рік тому

      bruh they had the option to rollback 40 mins of write on the promoted db and sync both db. They pretty much fucked themselves in the ass tbh

  • @kalebbruwer
    @kalebbruwer Рік тому +531

    It's bold to assume that
    a) 50% of Github users are active on any given day
    b) Their time is worth an average of $50/hr
    c) Not syncing with remote for one day would affect the average user

    • @mews75
      @mews75 4 місяці тому +4

      That's what i was thinking lol

    • @opfipip3711
      @opfipip3711 4 місяці тому +9

      yeah, one of the great things about git is that it is trivial to set up a new remote and even no problem to code for weeks without an internet connection at all. I'd say GitHub could only be up ~20% of the time without that having a strong (financial) impact on most of the projects hosted there. Would piss of lots of devs, tho.

    • @Ignacio_DB
      @Ignacio_DB 3 місяці тому +2

      im no it guy, but 40 mins of lost data is a better sacrifice than hours of slow time, they couldnt just freeze the west db, and see what it was different, transfer and boom everything has been solved

    • @mennoltvanalten7260
      @mennoltvanalten7260 2 місяці тому +2

      I push maybe 3 times a week... but I'm basically using GitHub as a backup for some personal projects. So long as my computer survives I can handle not pushing for a few days

    • @iheartlreoy8134
      @iheartlreoy8134 Місяць тому +3

      don’t you just hate when your andromeda integration service fails causing all writes made after the American civil war to be lost

  • @ericlizama8552
    @ericlizama8552 Рік тому +606

    Honestly I'm impressed that Bitbucket was able to lower the Earth-Mars latency down to 60 milliseconds.

    • @Fenhum
      @Fenhum Рік тому +81

      they must've found a cheap way to build those einstein rosen bridges ey?

    • @wesleyeberly228
      @wesleyeberly228 Рік тому +14

      @@Fenhumsomething akin to hyper pulse relays from battletech

    • @shippo72
      @shippo72 9 місяців тому

      @@mikicerise6250 Ansible is instantaneous, no matter the distance. It even allows you to communicate both upstream and downstream of your current dimensional position.

    • @AR-yd2nd
      @AR-yd2nd 4 місяці тому +3

      Faster than light bitbucket

    • @runforitman
      @runforitman 24 дні тому

      Those wormhole generators give you cancer you know

  • @CoryKing
    @CoryKing Рік тому +275

    I worked at a website that handles millions of write transactions per day across like 7 global data centers. We were starting to think of a way to drop into a “read only” mode in the event something like this happened. Then we wouldn’t need to paw through the mess of uncommitted transactions…

    • @KF-zb6gi
      @KF-zb6gi Рік тому +20

      that's actually sounds good

    • @xpusostomos
      @xpusostomos 7 місяців тому

      ​@@KF-zb6gisure it's good ... If this is the rare web site where it even makes sense to be read only

    • @GeorgeTsiros
      @GeorgeTsiros 5 місяців тому +2

      when you say millions of transactions per day, is there something difficult about these? I mean, even if you do 100 million per day, that's on the order of 1k transactions per second, that's reasonable, yes?

    • @xpusostomos
      @xpusostomos 5 місяців тому +18

      @@GeorgeTsiros the difficult part, if you watched the video, is reconciling conflicting changes

  • @manzenshaaegis8783
    @manzenshaaegis8783 Рік тому +435

    This is one of those things that in hindsight, it is so easy to see how they set themselves up for failure. But I bet you a lot of brilliant people looked at this and still did not see the issue until it (inevitably) blew up. It do be like that sometimes...

    • @simonsomething2620
      @simonsomething2620 Рік тому +25

      probably more along the lines of politics and "we'll do it later"

    • @christianbarnay2499
      @christianbarnay2499 Рік тому +68

      I know at least one org that can't have that kind of failure. Their standard operating procedure is to actually force the primary switch on a regular basis. Every 2 or 3 months they power off all primary servers and check that all secondaries have promoted and are now fully operating as primaries with no data was loss. Then they return they restart the old primaries that become the new secondaries. It covers all possible kinds of failures of the primaries. This is also used for the upgrade procedure. Whenever you need to upgrade a server, you upgrade the secondary first, do some offline tests, then promote it to primary, keep the old primary/new secondary ready with the old version for a few days in case a rollback is needed. And finally update it.
      The first time I saw that choice of having the failover procedure being an integral part of normal operations I thought it was genius. When you have an incident, you don't need to panic and look up for exceptional procedures you are not familiar with. You just change the schedule of the regular routine. And if needed you can do forensics on the system you just put offline while users are working unaffected by the incident.

    • @travcollier
      @travcollier Рік тому +29

      @@christianbarnay2499 Good idea.
      Of course, it is also expensive AF. Robustness always costs short term efficiency.

    • @smugfaced
      @smugfaced Рік тому

      it really do be

    • @checker297
      @checker297 Рік тому +10

      @@christianbarnay2499 everyone can have this kind of failure, it just is the level of extremes. It isnt in normal situations when you get pressured as a engineer, its when shit is on fire and suddenly all your plans which required something you assumed would be working due to its robustness, forces your hand to pull a rabbit out of your arse.

  • @riddixdan5572
    @riddixdan5572 Рік тому +750

    What a goldmine of a channel. I'm here with you all, witnessing the birth of a great channel

  • @0tiii
    @0tiii 11 місяців тому +21

    dude almost sounds like fireship

  • @edhahaz
    @edhahaz Рік тому +1882

    imagine being github and being unable to... MERGE two databases

    • @littleloner1159
      @littleloner1159 Рік тому +134

      It's GitHub
      Didn't they delete their whole code like twice?

    • @joelpww
      @joelpww Рік тому +305

      ​@@littleloner1159 might be thinking of gitlab

    • @casev799
      @casev799 Рік тому +17

      Yeah, but you'd expect them to learn at some point. They have their whole library of users that could help too....

    • @ko-Daegu
      @ko-Daegu Рік тому +174

      @@casev799 typical YT reply evrything is easy in their eyes yet they accomplished nothing

    • @Paulo27
      @Paulo27 Рік тому +103

      git push --force -----FORCE ----------FOOOOOOOOORCEEEEEPLEEEEEAAAAASSSSEEEEEE

  • @dybdab
    @dybdab Рік тому +408

    One of the greatest "history" channels on UA-cam, love the content.

    • @l-l
      @l-l Рік тому +1

      Absolutely

    • @namansoood
      @namansoood Рік тому +6

      Internet Historian: 👀

  • @Geolaminar
    @Geolaminar Рік тому +51

    Well, it could have been worse. The automated lunar relay launch could have been misconfigured such that it did not alert US STRATCOM, and therefore appeared to be a ballistic missile launch against a domestic target, which immediately would lead to global thermonuclear war due to improper database failover configuration.

    • @MrLastlived
      @MrLastlived 11 місяців тому +10

      I swear to god if all of humanity gets wiped out over a stupid accident and not because of a grand painstaking political catastrophe I'ma be real disappointed in hell.

    • @mattheholic2
      @mattheholic2 11 місяців тому +12

      ​@@MrLastlivedThat was close to happening multiple times over the course of history. It's a miracle we haven't already done that.

  • @JohnAlbertRigali
    @JohnAlbertRigali 6 місяців тому +26

    Considering the scope of the GitHub disaster, it seems to me that recovery with 30 hours is very impressive. I've had to engineer recoveries from much smaller disasters and every one of them took me at least 48 hours if I remember correctly.

  • @thebeber2546
    @thebeber2546 Рік тому +119

    The ending was hilarious. Great video overall.

  • @ccthomas
    @ccthomas Рік тому +51

    When the east coast database recovered and started accepting writes again from applications, they dodged the very common bullet of those apps pushing work at the database as fast as they can and overwhelming it, causing a second wave of outage. In this case, it looks like the controls over the work rate (whether implicit in the nature and scale of the apps, or an explicit mechanism) were sufficient to prevent that.

  • @rajarshichattopadhyay1728
    @rajarshichattopadhyay1728 11 місяців тому +16

    I love how in the last 30 sec, Kevin was not only able to explain how interplanetary network would work but how a random command would blow everything up in exactly 30 sec 😆

  • @LolWutMikehSM
    @LolWutMikehSM Рік тому +39

    That interplanetary loop was good

  • @Hopgop1
    @Hopgop1 Рік тому +127

    I love these videos, I work in IT but for a much smaller national company, really interesting to learn some lessons from, plus the editing and storytelling makes it very entertaining.

  • @IroAppe
    @IroAppe Рік тому +58

    This was definitely not a failure. I've seen other videos where "they did everything wrong they could". In this case, in the circumstances, they did exactly what they had to do. Except for those few discussing prioritizing uptime over data consistency, which is a no-no. It's good that the right engineers prevailed. A laggy service is just so much better, than a nightmare collapse or massive inconsistency nightmare that will plague costumers all over for weeks. I get that they're paid for uptime and fluidity of the service, but in a case that is equivalent to a survival situation, you have to prioritize. Worrying about a "laggy service" in the east-coast is then equivalent to complaining about the lack of ice cream in an apocalypse scenario.
    In fact, I see this as a huge win! How many times have short measures without much thinking trying to treat the superficial symptoms as fast as possible, that are merely an extension of the underlying real problem, led to a full-scale disaster? At once, there were finally people thinking critically before doing something! Treating the core of the problem.

    • @Penfolduk001
      @Penfolduk001 11 місяців тому +2

      The worry here was that they had to spend the time coming up with the plan to respond.
      Whilst I realise you can't plan for every contingency, cross-hub failure like this should have already been considered and planned for. From the video this doesn't appear to have been the case.
      Guess they were lucky the initial fault didn't last more than 49 seconds.

    • @xpusostomos
      @xpusostomos 7 місяців тому +1

      Nobody was arguing for inconsistency. The argument was getting back up fast vs losing 40 minutes of changes

    • @leaffinite3828
      @leaffinite3828 7 місяців тому

      ​@xpusostomos losing 40 minutes of changes is i think the inconsistency in question

    • @xpusostomos
      @xpusostomos 7 місяців тому

      @@leaffinite3828 that's not a data inconsistency

    • @leaffinite3828
      @leaffinite3828 7 місяців тому +2

      @@xpusostomos why dont you define the term then, get us on even ground

  • @kuroodo_
    @kuroodo_ Рік тому +31

    The explosion at the end threw me into tears lol

  • @fairlyfactual451
    @fairlyfactual451 Рік тому +50

    This is why you always should practice regional failovers of your cloud architecture and make doing so mandatory company events (or even random events).

    • @alexischicoine2072
      @alexischicoine2072 11 місяців тому

      My company practices that once a year I believe. I had a senior colleague take part in it.

  • @radiosification
    @radiosification Рік тому +3

    I love these incident analysis videos. Please keep making more!

  • @XxBuzzkill77xX
    @XxBuzzkill77xX Рік тому +21

    This content is incredible! Really has me thinking about some of my architecture and how to think about planning infrastructure going forward, keep up the awesome work!

  • @hchris96
    @hchris96 Рік тому +25

    Thank you! This was perfect. I love this. And the amount of explosions is tasteful and not overdone

  • @majesticcok
    @majesticcok Рік тому +27

    I love these videos, but as a DevOps Engineer I get anxious if I watch too many in a short period of time :)

  • @jermunitz3020
    @jermunitz3020 Рік тому

    Nice editing Kevin. Really looking forward to the next one.

  • @gleep23
    @gleep23 Рік тому +2

    I like how you turned this technical issue into an enjoyable story. Great storytelling skill.

  • @jure.
    @jure. Рік тому +14

    I love your videos so much. They're so informative, interesting, well-made and even funny. Keep it up!

  • @acoolnameemm
    @acoolnameemm 10 місяців тому +2

    This video is full of explosions and memes but in a tempered manner and it hits all the nerves in my brain. I need more videos like this.

  • @CoryKing
    @CoryKing Рік тому

    These videos are hilarious! I look forward to more!
    It’s like the dark net diaries podcast but different and super funny.
    Good stuff! I watched all of these and am disappointed there isn’t more to binge watch. I hope you keep this format, this is an excellent concept for a UA-cam channel!

  • @rigell2764
    @rigell2764 Рік тому +18

    These graphics make me laugh. 1, 2, 4, 5, red among us guy, purple among us guy, pizza, 8 ball 😂. Also the Ace Attorney part was great.

  • @IceTank
    @IceTank Рік тому +9

    The editing is on point. Very nice video.

  • @kiro_f
    @kiro_f Рік тому +2

    Can't wait for another video, just kinda wanna go on a binge watch of them but there isn't that many, hopefully in the future though :)

  • @CubemasterXD
    @CubemasterXD Рік тому +2

    these videos are so underrated
    the (visual) humor keeps getting better and better

  • @vikaskrishnan4018
    @vikaskrishnan4018 Рік тому +1

    I loved the whole breakdown of the issue Github faced, but its the last 30 seconds of the video that gained you a Sub!
    Keep up the crisp K.I.S.S explanation and subtle humour combined with the accurate images and editing!

  • @kanal7523
    @kanal7523 Рік тому +1

    I love the animations and goofiness, pls never stop making these videos

  • @mr_darkeye
    @mr_darkeye Рік тому +8

    always nice to see a new video from you

  • @LemonGingerHoney
    @LemonGingerHoney Рік тому +10

    I felt their pain.
    What a fantastic job on the recovery and post mortem.

  • @VaraNiN
    @VaraNiN Рік тому

    This channel gonna be big soon with these high quality vids and the algorithm starting to push em

  • @TheNivk1994
    @TheNivk1994 Рік тому +61

    Please…. More of these videos of software disasters! Facebook outage etc. !! As a developer myself, it’s somehow calming that such big players fall into these „oh shit….“ situations too! ❤️

    • @simonsomething2620
      @simonsomething2620 Рік тому +3

      They're all humans and none of them conjure magic tricks. Usually using the same jazz us mortals are :D

  • @nickdaboss03
    @nickdaboss03 Рік тому

    Loving these new documentary type videos!

  • @darthollie
    @darthollie 7 місяців тому +5

    This video is waaaay longer than 43 seconds

  • @miklov
    @miklov Рік тому

    Fascinating. Love the bit at the end too! Thank you.

  • @MaNameizJeff
    @MaNameizJeff 11 місяців тому +1

    I am loving your videos so much. You make describing how exactly these internet exploits are done in the most entertaining way. Even someone who only knows basics like myself can follow along and understand.

  • @ForcefighterX2
    @ForcefighterX2 Рік тому

    2nd video from your channel. Realized it's awesome. You've got a new subscriber, bro!

  • @elatedemu
    @elatedemu 9 місяців тому

    Your visuals are probably the best and most entertaining I've ever seen

  • @Crocsx058
    @Crocsx058 Рік тому

    Man your video are so good and it's so cool to see other company post mortem and the cause so well explained. Thanks

  • @eantropix
    @eantropix 11 місяців тому +2

    Bro backing up data to Mars sounds so unbelievably awesome and impractical at the same time, I love it

  • @druidshmooid
    @druidshmooid Рік тому

    Loving the videos. Great content. Keep it coming.

  • @TheShnitzel
    @TheShnitzel Рік тому

    Another great video!
    Keep up the awesome work!

  • @arcaneblackwood3602
    @arcaneblackwood3602 Рік тому

    The humor in this video is 120%. We need news actors like you in this world.

  • @AdroSlice
    @AdroSlice Рік тому

    That last part is gold. Thank you so much.

  • @ironized
    @ironized Рік тому

    Founds this video today, please keep these up. I work in business resilience/crisis management and find this very helpful

  • @fir3cl4w
    @fir3cl4w Рік тому +9

    Love the Ace Attourney bit, keep up the good work ❤

    • @benbrist
      @benbrist Рік тому +1

      We're not GitLab had me in stitches

  • @henkfinkers3931
    @henkfinkers3931 Рік тому +12

    I absolutely love this channel.

  •  Рік тому

    I just found out about your channel, amazingly well put together videos

  • @Froschkoenig751
    @Froschkoenig751 Рік тому +1

    Love the humor mixed with the animations and actually insightful content - you got a new subscriber with that video!

  • @kim15742
    @kim15742 Рік тому

    You are now one of my very favourite youtubers! Great videos

  • @PolskaChild
    @PolskaChild 11 місяців тому

    Everything about the video was great lmao. The humor, the animations, and not stupidly complicated.

  • @beakt
    @beakt Рік тому

    Your background music and sound effects are very clever.

  • @Epausti
    @Epausti Рік тому

    Love your stuff! Your channel will blow up

  • @kriterer
    @kriterer Рік тому +3

    $50 an hour is a wild overstatement

  • @EdwardChan.999
    @EdwardChan.999 Рік тому +8

    I hate dealing with databases, but watching your database stories is a pleasure 👍🏻

  • @owenschwartz
    @owenschwartz Рік тому +1

    Absolutely loving these videos.

  • @JxH
    @JxH Рік тому +10

    We do have to admire the self-confidence of the system designers. They plunged right in, built a highly complex system, blissfully unaware of their own naïveté. Failure control is about 30x more complex than they had assumed.

  • @juleswinnfield1437
    @juleswinnfield1437 Рік тому

    This was such a cool video, always great when you learn things and don't realise it!

  • @teamwolfyta6511
    @teamwolfyta6511 Рік тому +1

    That Bitbucket joke was the funniest thing I've heard in coding terms, Keep up the awesome stuff mate! 🤣

  • @shitshow_1
    @shitshow_1 Рік тому +1

    I always thought of inconsistency in divergence timelines and how Engineers would handle it. Great video 👍

  • @jeffreyz4632
    @jeffreyz4632 Рік тому

    Love ur database videos, keep it up

  • @Gabriel-kl6bt
    @Gabriel-kl6bt Рік тому +1

    The thought of being amidst these people recovering from this kind of chaos gives me stomachache.

  • @arthurritt3047
    @arthurritt3047 Рік тому

    You made it so easy to understand man you're good

  • @x4exr
    @x4exr Місяць тому

    This video is packed with humor. Its no nonchalant and thats funny to catch 😹I enjoyed watching this video!!

  • @Pixelhurricane
    @Pixelhurricane Рік тому

    your joke at the end about the martian servers hand me in tears, too real

  • @opensauce04
    @opensauce04 Рік тому +1

    I am absolutely loving these videos

  • @chewcodes
    @chewcodes Рік тому

    another excellent video, thank you mr fang

  • @JAMBUILDER08
    @JAMBUILDER08 Рік тому

    This is a great example of what to do after a major IT issue, which is make plans to handle such a situation better and easier should it occour again.

  • @ghostiulian1
    @ghostiulian1 9 місяців тому +2

    They handled it pretty well, to be honest

  • @AccurateBurn
    @AccurateBurn Рік тому

    Explosions!?!?!? another banger dude, so entertaining. This is so funny, we got HA, also failover is not supported architecture.

  • @Speak4Yourself2
    @Speak4Yourself2 Рік тому

    Brilliant video. Thanks a lot!

  • @morswinpsiopsiol667
    @morswinpsiopsiol667 Рік тому

    I love your content, man, keep it up, you are awesome! ^^

  • @christianbarnay2499
    @christianbarnay2499 Рік тому +35

    Github is designed at its core to allow for loss of connectivity anywhere in the network. In this event they completely failed at handling the exact type of issue their system was designed to overcome seamlessly.
    As mentioned in the video this should have resulted in a 43s downtime for the vast majority of clients. And only a handful of clients having to reconcile data by hand between the west and east coast centers.
    The major problem is they clearly never tested the primary database loss scenario. They would have identified that they needed to replicate not only the database but the entire infrastructure to the west coast so it could still work during an east coast downtime. Or deactivate cross-country failover.
    The second problem is they one-sidedly decided they had to reconcile all user data by themselves. Client data belongs to clients. You should never alter client data without full information and consent. Deciding to manually rollback and backup east coast commits was altering client data and a big no-no.
    The right course of action should be:
    1. Inform clients that there is a potential discrepancy between servers and you are building a list of affected projets,
    2. Let the system reconcile projects that have no issue at all (no commits during the downtime or only west coast commits that can be pushed to the east with a fast-forward) and inform those clients that everything is fine for them and the system is back to normal operations,
    3. Tell clients that need manual reconciliation that you propose the following plan: keep the branch with the most recent commit as is, and rename the conflicting branch as _ so they will have both accesible in the same repo and can reconcile their data as it suits them. And ask them to reply with their approval of the plan or a proposition for an alternate plan before some reasonable deadline. And give them contact info if they need help and/or advice.
    That way instead of going all in manipulating all clients data, they would only need a small taskforce ready to help those that actually need it.

    • @eekee6034
      @eekee6034 Рік тому

      *Git* is designed to allow for loss of connectivity. Git*hub* was designed by the kind of crazies who jump on open source bandwagons.

    • @samuellourenco1050
      @samuellourenco1050 Рік тому +3

      One question about your point 3. How to reconcile two divergent branches?

    • @christianbarnay2499
      @christianbarnay2499 Рік тому +2

      @@samuellourenco1050 There are tons of ways to do it.
      Simplest is git merge with manual resolution of conflicts.
      Most tedious is creating a new branch at the diverging point and cherry pick from each side, then destroy both incomplete branches and rename the new branch to the original name.
      The right strategy is up to each client depending on the state of their data and their own standards for repo cleanliness.
      Some will want to remove all traces of the incident. Others will consider it's part of the project life and should stay visible in the history.

    • @JohnSmith-fz1ih
      @JohnSmith-fz1ih 11 місяців тому +5

      Where did you get the notion that they altered client data? My understanding from watching this video is that they rolled back to a consistent state, then restored the two lots of data that ended up split over the two data centres. The result being all data restored.
      I’m not certain in what users with the data spread across both the east coast and west coast servers experienced. But your post reads to me of “I watched a 12-minute summary and now I think I know better than the staff that worked with the product every day”.

    • @christianbarnay2499
      @christianbarnay2499 11 місяців тому

      @@JohnSmith-fz1ih In a history tool like GIT client data is not limited to the content of latest commit. Client data is the entire tree with all branches, commit dates, comments and commit order.
      Dealing with conflicting data is an important decision. And the way you want the data to appear and be accessible after the resolution is a decision by the project owner. Each project owner will have a different approach on the way they want to deal with such a situation. And GIT allows for all those approaches. The Github team making a single universal decision for all projects is barring project owners from making their own decision on the matter.
      What I say doesn't come from just watching a 12 minute video. It comes from using GIT on a daily basis, including a few occasions in which I migrated entire projects from old tech repos like CVS or SVN to GIT.
      And on some of those occasions I had to retrieve commits that were split over several repos and reconcile them using dates and comments. With the help of some low level GIT commands I could easily automate that process.
      That's why I am fully confident that GIT has all the tools needed to allow the Github team to automatically rename conflicting branches, regroup everything in the master repo, replicate to all mirrors, and then let project owners do the merge the way they want instead of forcing their own single decision for everyone.
      The main benefits of GIT over all other versioning systems are its high resilience to conflicts and the possibility for project admins to do absolutely everything with their repo on any PC and push the result to the central repo. This incident was the perfect occasion to highlight those features and display complete transparency by rapidly giving control of the 2 branches of their repos to project owners.

  • @Ngethe_M
    @Ngethe_M 8 місяців тому

    great content, found this channel ad immediately subscribed

  • @rewazilol
    @rewazilol Рік тому

    Subscribed. This was amazing.

  • @Justin-jm2fd
    @Justin-jm2fd Рік тому

    Awesome video for any site reliability engineer

  • @JohnnyMcMenamin
    @JohnnyMcMenamin Рік тому

    First time viewer here and recent subscriber. I enjoy your style of video editing and presentation.

  • @Markyroson
    @Markyroson 7 місяців тому

    I love the "until next time" segment at the end lolol

  • @villan7h
    @villan7h 11 місяців тому

    These videos are too good!!

  • @violetwtf
    @violetwtf Рік тому

    you are my favorite channel, i love what you do

  • @isaiahsmith178
    @isaiahsmith178 Рік тому

    Dude your channel is so good.

  • @matthewschuster4600
    @matthewschuster4600 Рік тому

    That last 30 seconds or whatever just earned you a sub. Lmao.

  • @TheOneAndOnlyMart
    @TheOneAndOnlyMart 11 місяців тому

    love your animation style

  • @whynotanyting
    @whynotanyting Рік тому +1

    "For instance, how am I gonna stop some big mean Mother-Hubber from tearin' me a structurally superfluous data center?"

  • @kubajurka
    @kubajurka 11 місяців тому +1

    I understood virtually nothing but still found the video absolutely exhilarating.

  • @sauwurabh
    @sauwurabh 10 місяців тому

    Kevin this is some good shizzz, watched the GitLab video first then this on and subscribed.

  • @alwaysthinking614
    @alwaysthinking614 Рік тому

    great work bro awesome video

  • @Joelitop
    @Joelitop 3 місяці тому

    This is great content, keep it up, you made my day brighter ❤

  • @cristerronaldo_sewy
    @cristerronaldo_sewy Рік тому

    Great video. Just want to say, you sound so similar to Fireship LOL

  • @MHX11
    @MHX11 Рік тому +3

    Your jokes are top tier and never fails to make me laugh!