Is Google Spanner the BEST Database? | | Systems Design Interview 0 to 1 with Ex-Google SWE

Поділитися
Вставка
  • Опубліковано 13 гру 2024

КОМЕНТАРІ • 21

  • @NeetCode
    @NeetCode Рік тому +19

    I'm surprised how underated Spanner is outside of google.. a lot of DBs like Cockroach and PlanetScale are gaining traction, but both those DBs borrow A LOT from google tech.
    PlanetScale - uses vitess (created at google)
    Cockroach DB - founders are ex-googlers

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +6

      I haven't heard of PlanetScale! I know Cockroach is extremely similar to spanner, but tries to do the same thing without using gps clocks via some logic similar to version vectors.
      If only I knew about this 8 years ago I could have founded a 500 million dollar company by now!

    • @_____case
      @_____case Рік тому +8

      Spanner is underrated because it's proprietary, super expensive, and honestly overkill. Most teams do not need its capabilities.
      That being said, using Spanner has been such a pleasure. I literally never have to worry about it going down or appearing inconsistent.

    • @NeetCode
      @NeetCode Рік тому +2

      Yeah it makes sense. The main companies that would find spanner useful are googles competitors anyway

  • @vladlazar94
    @vladlazar94 Рік тому +11

    This importance of this database architecture cannot be overstated. It effectively removes a hell lot of complexity associated with eventually consistent distributed systems. Think large scale application development using old, familiar tools: SQL, ACID transactions, strong schema invariants, etc. Yes, Spanner might be unattractive now (price, non-standard hardware, vendor lock-in) but I believe that over the next decade, Spanner offsprings such as Cockroach will shift cloud-native development towards the relational model. The benefits of strong consistency are simply huge. Engineers who have developed around the lack of distributed transactions and strong consistency know how painful these systems are to test, debug and maintain. The experience and skill barrier for effectively working with non relational databases is frequently underestimated in our industry, and it leads to a lot of wasted time and eroded customer trust. If a user interaction leads to a SINGLE database transaction that either succeeds or fails, with no inconsistent states in between, then that work is DONE and we can confidently deploy it. Spanner allows my team to develop our app as if it were a hobby project running on a single server machine. It’s crushingly cheap in terms of developer time. Slowly and under the radar, this might actually be one the most profound and disruptive technologies for the years to come. It’s not flashy, but enormously impactful.

  • @recursion.
    @recursion. Рік тому +6

    Jordan, you are truly extraordinary, a man of immense stature. Your relentless pursuit of power, your unapologetic drive for success - it leaves me in awe. You are a force to be reckoned with, indeed. 🤤

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +2

      Awe man, I appreciate the kind words. Same to you sir, and keep up the hard work!

  • @tysonliu2833
    @tysonliu2833 5 місяців тому +3

    Here is my understanding, we wanna keep casualty, we cant simply rely on the clokc in each dc due to clock drift etc, ex A1 happens before A2, but T1 could be larger than T2 if drift between the two dc is large, now spanber uses sth called TrueTime, when we r writing A2, we call TrueTime to get the time range the actual time falls in, say [x y] where T2>x & T2 T1, which preserves casuaality.

  • @quantum-ng8bs
    @quantum-ng8bs Рік тому +3

    love your channel man! meaty content from somebody who knows their stuff instead of clickbait.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +2

      Thanks man! Ngl when I saw the word meaty didn't think this was where the comment was going

    • @quantum-ng8bs
      @quantum-ng8bs Рік тому +2

      @@jordanhasnolife5163 hahahaha I should have led with an obligatory "no homo"

  • @Rohit-hs8wp
    @Rohit-hs8wp Рік тому +2

    The typical solution you mentioned to resolve the distributed read is prone to fail because we are grabbing the predicate Lock in 2 different Node, so there is possibility that when Transaction T1 release the Lock on first Node and try to grab the second Lock on Node 2 between that phase T2 and T3 grab lock in Node 1 and Node 2 and when T1 actually grab the Lock on Node 2 he see the comment 3 and say WTF!
    It can only be valid if we grab the Lock first on all Node and then read it and then release it. Since we are grabbing the Lock on all different node first and not on a single node it is not a 2PL ?
    Please comment, where I m wrong. Thank you for great series man.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +3

      The locks have to be held for the entirety of the transaction, not just locally. So T1 is holding that lock on the first partition until he's done reading the second partition.
      So you're correct, it's a distributed 2 phase locking that effectively requires two phase commit, which makes it even more expensive and worse.

  • @gaurangpateriya4879
    @gaurangpateriya4879 6 місяців тому +1

    Hi, I am not entirely sure, how spanner solved the problem of "reading comments" example that you mentioned earlier in this video. Would you be able to help in understanding that?

    • @jordanhasnolife5163
      @jordanhasnolife5163  6 місяців тому +2

      We basically just want to achieve causally consistent snapshots over many nodes, regardless of the replication topology. Spanner allows us to do this, because we can just use a timestamp - there's no need to do any coordination to get some monotonically increasing transaction number every time that we write, nor do we have to lock every single node that we want to read from to ensure causal consistency.
      We just pass in one timestamp, and then everything works.

  • @donotreportmebro
    @donotreportmebro Рік тому +1

    I'd assume any decent datacenter would have a gps clock and all network switched to be synchronized to PTP time.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +1

      I think this is less of an easy proposition than you might think!

  • @brandonwl
    @brandonwl Рік тому +2

    Would it be better to use SSI rather than 2PL?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +1

      I think it depends on the use case. Optimistic locking is good when you don't have many, pessimistic locking is better otherwise