033 Shuffle and Sort in hadoop

Поділитися
Вставка
  • Опубліковано 25 сер 2024

КОМЕНТАРІ • 16

  • @its_joel7324
    @its_joel7324 2 роки тому

    Thankyou very much for this..

  • @rytmf
    @rytmf 3 роки тому

    Great explanation. Ty

  • @judesoosai8648
    @judesoosai8648 6 років тому +1

    I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly.
    When there are 40 files, it goes like this ...
    merge 4 files -> 1 file (round 1)
    merge 10 files ->1 file (round 2)
    merge 10 files -> 1 file (round 3)
    merge 10 files -> 1 file (round 4)
    At this point we have 4 merged files and 6 unmerged files (totally 10).
    In round 5, these 10 files will be merged in the reducer memory.
    However I am not clear how this logic would make the disk i/o efficient.

  • @mohammadsadaquat3624
    @mohammadsadaquat3624 8 років тому

    very nyc explanation. Keep posting newer contents. Thanx

  • @nsb5467
    @nsb5467 8 років тому +2

    Hi, Can you explain why using three files for the first reducer split increases disk I/O efficiency?

    • @judesoosai8648
      @judesoosai8648 6 років тому

      @Nachiket Bhoyar
      I understand the merging of files at the reducer side happens in multiple rounds with max of 10 files in each round (configurable and called as merge factor). The final merge is happening in reducer memory and the number of files in the final round is kept equal to the merge factor (default 10). To achieve this the merge logic groups the files accordingly.
      When there are 40 files, it goes like this ...
      merge 4 files -> 1 file (round 1)
      merge 10 files ->1 file (round 2)
      merge 10 files -> 1 file (round 3)
      merge 10 files -> 1 file (round 4)
      At this point we have 4 merged files and 6 unmerged files (totally 10).
      In round 5, these 10 files will be merged in the reducer memory.
      However I am not clear how this logic would make the disk i/o efficient.

  • @akashgaikwad6847
    @akashgaikwad6847 7 років тому +1

    How is disk I/O efficiency increased taking first 3 files into one and then processing later by batches of ten?
    Files are already moved over network so how will they increase I/O efficiency? how is the example given at the last related.Please elaborate.

  • @JMK2928
    @JMK2928 2 роки тому

    Is there any notes

  • @mahendarkusuma
    @mahendarkusuma 7 років тому

    Very good presentation, can you please tell me Which tool are you using to generate the simulations

  • @sunnyjain4774
    @sunnyjain4774 6 років тому +1

    Already read this in the definite hadoop. Can you exlain how partitions takes in spill. Thanks

  • @charleygrossman8368
    @charleygrossman8368 8 років тому

    Hello, I have a question.
    Speaking for the sort phase, would you consider the theoretical sort (first one) with three even splits to be a bucket sort? And the actual sort (second one) that is implemented, why does it begin with three partitions, then 10, 10, and finally the remaining 7 files?
    Thank you sir.

  • @sonalisharma9654
    @sonalisharma9654 6 років тому

    Very helpful

  • @VibeWithSingh
    @VibeWithSingh 8 років тому

    Nice explanation. though didn't understand the last splitting part. but still kudos. :)

  • @kirantvbk
    @kirantvbk 6 років тому

    When files spill over to the disk and then data gets partitioned and sorted. Does it need to read the data into memory again and do sort and write back? Or does it in disk?

  • @shaikhmohammedatif2391
    @shaikhmohammedatif2391 3 роки тому

    have u made another channel?

  • @spirridd
    @spirridd 5 років тому

    Impossible to understand video