The *Right* Way to do Batch Job Data Joins | Systems Design Interview 0 to 1 with Ex-Google SWE

Поділитися
Вставка
  • Опубліковано 12 гру 2024

КОМЕНТАРІ • 20

  • @kunalsinghal3558
    @kunalsinghal3558 8 місяців тому +3

    "10k subscriber mark which I should hit and probably I don't know one to two years from now" . Congrats boi 28.7k subs done. Miles to go ahead.

  • @recursion.
    @recursion. Рік тому +4

    Hey man, would be interested to make a video about your learning journey about the technologies, languages, framework and other system design topics? Just wondering how a massive chad sigma 10x programmer journey would look like...

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +1

      Ah well, 10k should be coming up soon... Perhaps then :)

    • @recursion.
      @recursion. Рік тому +2

      @@jordanhasnolife5163 Are you asking me to buy 300 subscribers? [insert "bombastic side eye dog pic"]

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +3

      @@recursion. I'm certainly not asking you not to! (*cute anime girl expression*)

    • @recursion.
      @recursion. Рік тому +1

      @@jordanhasnolife5163 I don't know what you want (300 sub otw🤤)

    • @recursion.
      @recursion. Рік тому

      @@jordanhasnolife516380 less subs 😩😩 (please make my wish come true tho)

  • @ReadTheUnderstory
    @ReadTheUnderstory 9 місяців тому +1

    Just curious, where did you source your information for this particular video? DDIA doesn't do a super clear job explaining how the different joins work imo, and other online sources I've looked at are equally vague.

    • @jordanhasnolife5163
      @jordanhasnolife5163  9 місяців тому +4

      I worked pretty much exclusively on big data pipelines for a bit haha. But besides that, most of these lessons from DDIA you can think through yourself. When would it help me to perform a sort merge join? If my data isn't already sorted, what would be the penalty there?

  • @htm332
    @htm332 Рік тому +3

    Solid video my guy. What resources do you use to learn this stuff?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Рік тому +2

      DDIA, random UA-cam videos/websites, my own experience

    • @htm332
      @htm332 Рік тому +1

      @@jordanhasnolife5163 care to share any of those YT channels/sites?

  • @sahilguleria6976
    @sahilguleria6976 3 місяці тому +1

    Hello Sir, I’ve got to be honest sort then shuffle is still shuffling my brain. I’m not seeing how sorting is helping here. From what I understand, after sorting, we hash the key to decide its partition and then add the key-value pair. But wouldn’t the merge process be the same even if we skipped the sorting foreplay? Or am I missing some magical sorting powers here?
    Or Are we merging after the sort itself and then this sorted and merged value are then again merged on the partition?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 місяці тому

      Think about the time complexity of sorting merged lists versus sorting unmerged lists. In the shuffle phase we're still sending the data in sorted form to each reducer.

  • @sahilguleria6976
    @sahilguleria6976 3 місяці тому +1

    What does it mean that the merging can be done entirely on disk? My little mind just can’t seem to comprehend it!

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 місяці тому

      You don't need to load the whole dataset in memory. Just a one entry at a time from each data set that you're merging.

  • @user-se9zv8hq9r
    @user-se9zv8hq9r Рік тому +3

    so whens the onlyfans starting