Good information. I liked the overlay of slides over the video. I wouldn't think that would work but it does. Sound is excellent except for questions from some members of the audience and when the speaker turns his back.
Karthik, in Hadoop the replication is for data redundancy. It also provides the map/reduce framework with multiple places to schedule mappers, right? i.e. with default replication of 3, a mapper for a given data block can be scheduled on any of the 3 different machines where that block is located. As for how Hadoop does block splits, it basically splits at the block size, regardless of natural record boundaries. The record readers in the map phase know how to retrieve records that were split.
Karthik, in general the fact that the data is replicated 3 times doesn't affect performance, since map/reduce processes each block only once in the map phase. But generally yes, the more data you have, and thus the more data which must be scanned by the mappers, the longer your map/reduce job will take to run. However, performance depends on many factors such as the size of your cluster, how busy the cluster is at the moment, etc.
Thanks for such informative demo.!! I have couple of questions like. 1. The Data itself is very big...(For ex: Google processes 20 PB of data per data). In hadoop we are replicating the data 3 times. Here it will become 60 PB of data.. Won't it affect the processing performance. I'm new to this., If my perception is wrong please correct me.!! 2. Can you please give me an example, how unstructured data split into blocks & stored.And how queried..?? Thanks
For some reason I am having a hard time pasting the actual URL and getting it to work properly (it keeps expanding into a bunch of hex characters). If you go to github.com / sleberknight then choose the project called basic-hadoop-examples that should get you there
It's been more than 3 years since this video was uploaded but in the mapreduce wordcount program, line 31 is unnecessary. 'word' is nowhere used. The code would work just fine without that line!
One of the best Hadoop presentation.Thanks a lot !
Good information. I liked the overlay of slides over the video. I wouldn't think that would work but it does. Sound is excellent except for questions from some members of the audience and when the speaker turns his back.
Thx.. Best map Reduce Tutorial I have ever watched..
Ultimate video for Hadoop Overview .. must watch.
Awesome Video loved the presentation and ease with which its presented
Karthik, in Hadoop the replication is for data redundancy. It also provides the map/reduce framework with multiple places to schedule mappers, right? i.e. with default replication of 3, a mapper for a given data block can be scheduled on any of the 3 different machines where that block is located. As for how Hadoop does block splits, it basically splits at the block size, regardless of natural record boundaries. The record readers in the map phase know how to retrieve records that were split.
what a presentation. very nice. Thanks for sharing.
Karthik, in general the fact that the data is replicated 3 times doesn't affect performance, since map/reduce processes each block only once in the map phase. But generally yes, the more data you have, and thus the more data which must be scanned by the mappers, the longer your map/reduce job will take to run. However, performance depends on many factors such as the size of your cluster, how busy the cluster is at the moment, etc.
thank you so much, great overview to hadoop
Thanks for such informative demo.!!
I have couple of questions like.
1. The Data itself is very big...(For ex: Google processes 20 PB of data per data). In hadoop we are replicating the data 3 times. Here it will become 60 PB of data.. Won't it affect the processing performance. I'm new to this., If my perception is wrong please correct me.!!
2. Can you please give me an example, how unstructured data split into blocks & stored.And how queried..??
Thanks
The description now includes a link to the code samples on GitHub
super good vid !!! many thx !!!
is the code available?
For some reason I am having a hard time pasting the actual URL and getting it to work properly (it keeps expanding into a bunch of hex characters). If you go to github.com / sleberknight then choose the project called basic-hadoop-examples that should get you there
The sample code is available on GitHub at github.com/sleberknight/basic-hadoop-examples
It's been more than 3 years since this video was uploaded but in the mapreduce wordcount program, line 31 is unnecessary. 'word' is nowhere used. The code would work just fine without that line!
mjshaheed You're right. Some guy in the audience noticed it as well @30:30 :)
Awesome :)
Would have been nice, if you can batch some of the poor voice spots. But nice presentation !
thank you for this video. it is very informative.
page is showing 404.
thanks!
thanks yo
super like
You can simply configure single node hadoop using the below blog
hadoopcorner.blogspot.in/