I watched it again and have a question. Is each partition resided in a single file in mapper output? That will be possible for reducer to copy the those partitions including the same keys from different mapper outputs. If not, some way should be involved to store/mark the offset for each partition in the spilled files?
Hi Pramod, I must say your explanation on each MapReduce videos are excellent and by far the best I have come across till now... Thank you very much for the same. I have one confusion as stated below, if you could kindly elaborate on the same - You mentioned, during spilling the map outputs to disk (when in-memory buffer becomes 80% full), several spilled files are created as edit is not allowed in HDFS on an existing file. But the map outputs are basically stored in local file system of the data node where map task was performed (as you mentioned earlier). Then what is the problem in appending to the same when we're writing to local file system, not in HDFS? I would appreciate if you kindly provide an explanation for the same. Thanks in advance.
Its a excellent session.Thank you very much . I am new to Hadoop. I have question. You are telling "combiner in mapper will also decrease the network Transfer". But as per my under standing it is always writing to local file system , then how network latency is involved here.
In case there is shuffle, mapper output will be transferred across different nodes. Since combiner reduces the data, less data is transferred across the network.
In video, at one place you are saying that - because hdfs file can’t be edited that’s why hadoop create multiple split files, which is wrong. Map is writing on local file system, not on hdfs. This separate file creation is done for performance.
Hi Pramod, I have a query in the video from 7:07 to 7:57. You mentioned that all Spill files(which are partitioned, sorted, combined) are written to Local file system as they are generated while map task is going on.Are they merged in Local File system to 1 file after the whole map task is completed?How are the operations of a Partition,Sort,combiner are performed on a data which is in Local file system?This involves a lot of Input/Output operation as you mentioned.
I want to express my great thanks. It is detailed, easy to understand, cover every aspect. It deserves more "views" and "likes"
Very well explained. The illustration was so simple to understand. I think the author covered all the aspects. Thank you.
U deserve way more views on your uTube channel , the explainations are so good as campared to other videos.
more content at pixipanda.com
this is what I was looking for months. Thank you!
Auggie Williams hehe
Excellent presentation, in depth coverage ! !
Excellent!!! Each steps nicely explained.
superrrrrrrrrrrrrrr hit session sir.cleared my all concepts.
very helpful.. keep it up....
Thank you. Now doing Scala and Spark
I watched it again and have a question. Is each partition resided in a single file in mapper output? That will be possible for reducer to copy the those partitions including the same keys from different mapper outputs. If not, some way should be involved to store/mark the offset for each partition in the spilled files?
Hi Pramod,
I must say your explanation on each MapReduce videos are excellent and by far the best I have come across till now... Thank you very much for the same. I have one confusion as stated below, if you could kindly elaborate on the same -
You mentioned, during spilling the map outputs to disk (when in-memory buffer becomes 80% full), several spilled files are created as edit is not allowed in HDFS on an existing file. But the map outputs are basically stored in local file system of the data node where map task was performed (as you mentioned earlier). Then what is the problem in appending to the same when we're writing to local file system, not in HDFS?
I would appreciate if you kindly provide an explanation for the same. Thanks in advance.
Its a excellent session.Thank you very much . I am new to Hadoop. I have question. You are telling "combiner in mapper will also decrease the network Transfer". But as per my under standing it is always writing to local file system , then how network latency is involved here.
In case there is shuffle, mapper output will be transferred across different nodes. Since combiner reduces the data, less data is transferred across the network.
In video, at one place you are saying that - because hdfs file can’t be edited that’s why hadoop create multiple split files, which is wrong. Map is writing on local file system, not on hdfs. This separate file creation is done for performance.
Hi Pramod, I have a query in the video from 7:07 to 7:57. You mentioned that all Spill files(which are partitioned, sorted, combined) are written to Local file system as they are generated while map task is going on.Are they merged in Local File system to 1 file after the whole map task is completed?How are the operations of a Partition,Sort,combiner are performed on a data which is in Local file system?This involves a lot of Input/Output operation as you mentioned.
The files are spilled on HDFS. Later finally one file will be created and that will be copy to local file system...
Combiner class will execute at map shase only ?
Becouse .. logs says, one time execution of combiner ... Please clarify wrt logs. Thanks
Thanooj Kalathuru Combiner is executed in map and reduce phase