Hadoop Architecture | HDFS Architecture | HDFS Tutorial | Hadoop Tutorial | Edureka

edureka!

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 сер 2024

КОМЕНТАРІ • 147

@edurekaIN 6 років тому ⁺⁶
Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Hadoop Training and Certification Curriculum, Visit our Website: bit.ly/2Ozdh1I
@srikrishnarr6553 4 роки тому ⁺¹⁰
speaking with authority ensuring the audience has understood ...thats too good
@adrianmarin3967 4 роки тому ⁺⁴
Easy to understand. Thank you! Great job keeping everyone in the audience paying attention.
@aecosta1981 5 років тому ⁺²
Thank you so much for help us to understand the fantastic world of Hadoop.
@edurekaIN 5 років тому
Thanks for watching the video! We are glad that our video was helpful. Cheers!
@mahen5782 6 місяців тому
Truly helpful with live examples..keep up the good job!!
@edurekaIN 6 місяців тому
Glad it was helpful!
@study-channel6301 4 роки тому ⁺²
This is exactly what I've been looking for. Thank you! :)
@edurekaIN 4 роки тому ⁺²
Thanks for being a part of our community! Cheers!
@snehalgandham 6 років тому ⁺²
Awesome Tutorial. The architecture has been explained precisely. Thanks.
@edurekaIN 6 років тому
Hey Snehal, we are glad that you found our lectures useful. Do subscribe and stay connected with us. Cheers :)
@girish90 3 роки тому ⁺¹
Thanks this is a great tutorial!
@mariei7445 2 роки тому
You are the best. made it so easy to understand. Best !
@edurekaIN 2 роки тому
Hey:) Thank you so much for your sweet words :) Really means a lot ! Glad to know that our content/courses is making you learn better :) Our team is striving hard to give the best content. Keep learning with us -Team Edureka :) Don't forget to like the video and share it with maximum people:) Do subscribe the channel:)
@abhishekkaushik5614 5 років тому ⁺¹
Worth 58:14 minutes... Nice one
@sameergpta 6 років тому ⁺³
Very simple and extremely informative session.
@edurekaIN 6 років тому
Thank you for watching our videos. Do subscribe to our youtube channel and stay updated with our content. Cheers :)
@anuradhag4717 2 роки тому ⁺¹
Awesome tutorial!
@mca-hod7476 6 років тому ⁺¹
It is very useful ..............Thanx
@edurekaIN 6 років тому
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@vatsala1388 3 роки тому
Very helpful, easy to understand !!
@shanisankar5345 2 роки тому
Very very helpful and easy to understand. 🙏 Thankyou for such a wonderful presentation
@edurekaIN 2 роки тому
Thank you so much : ) We are glad to be a part of your learning journey. Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@venkateswarlub6365 Рік тому ⁺¹
Excellent session sir. Very useful for me. Tq
@edurekaIN Рік тому
You are welcome 😃 Glad it was helpful!!
@devsatheesh7287 7 років тому ⁺¹
very good Explanation, i study very useful
@shakeer6808 3 роки тому
Very simple nd amazing explanation. Especially the figures made us to understand very clearly. In between the questions made us realize whether we understood or no.
Thank you sir
@Giridhar1534 2 роки тому ⁺¹
Why HDFC read onces and write many in architecture?
@duven2089 6 років тому ⁺¹
Very well explained.😃
@nandinip1657 6 років тому ⁺¹
explain more about name node and datanode
@sitaluk21 6 років тому ⁺¹
Very well organized & explained
@edurekaIN 6 років тому
Hey Seeta! Thank you for appreciating our work. Do subscribe, like and share to stay connected with us. Cheers :)
@pavankumar-nm9yu 2 роки тому
Awesome explanation of the hdfs architecture..
@edurekaIN 2 роки тому
Thank you for you time in giving a feedback :) We are glad that you are learning from our videos! Stay connected with our channel :)
@manjunathckadani7732 6 років тому ⁺¹
Good Explanation
@barbobrien9318 4 роки тому
Love the graphics essential to learning.
@svdfxd 7 років тому ⁺¹
Simply Awesome !!!
@amitd16 6 років тому ⁺¹
Thank you so much..very well explained
@edurekaIN 6 років тому
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@1982sangeetha 7 років тому ⁺¹
Very nice explaination !!
Just a quick question on HDFS multi-block write mechanism which is explained at 40th minute.
Here 1st and 2nd copy of block B is getting written into same rack [ Rack 5].
2nd copy of block B was supposed to be in different rack right? 2nd and 3rd copy can be in same rack but not the 1st and 2nd copy.
@edurekaIN 7 років тому
+sang, thanks for checking out our tutorial! We're glad you found it useful. You are right. Block B-copy should be first copied to Rack1 datanode3. then to Rack3 datanode9. Cheers!
@babus2241 5 років тому
Good explanation
Good job
@satyabetha637 5 років тому
Excellent introduction and very absorbing
@edurekaIN 5 років тому
Hey Satya, we are glad you loved the video. Do subscribe and hit the bell icon to never miss an update from us in the future. Cheers!
@ramjadhav6942 5 років тому
many thanks...
@RaviYadav-nj8zh 3 роки тому
Amazing session 👍👍 loved it ❤️
Hare Krishna ♥️🙏
@anishasingh8055 7 років тому ⁺¹
Thanku it was very useful
@gopireddytalatala9772 7 років тому
Anisha Singh ..the way he teach very good to understand every one...are you learn hadoop
@amitbukshet4160 7 років тому ⁺¹
Very good explanation. thanks.
@edurekaIN 7 років тому ⁺¹
Hey Amit, thanks for your wonderful feedback. We thought you might be interested in learning through Hadoop use cases. You can check out the videos here: ua-cam.com/play/PL9ooVrP1hQOGh5sIXY_E6JE4zknuzxleF.html. Hope this helps. Cheers!
@suryag7597 6 років тому ⁺¹
Great Session
@edurekaIN 6 років тому
Hey Surya, thank you for watching our video. We are glad to know that you liked our tutorial. Do subscribe and stay connected with us. Cheers :)
@sanjeetkumar7646 6 років тому
Wonderful explanation. It was very helpful for me.
@mubashshirrizvi3024 7 років тому ⁺¹
very well explained....Thank You...
I have few questions,
1)is it mandatory to have equal number of nodes in all the racks?
2) in write mechanism why first copy is created in DN1 and not in DN4/DN6?
Please answer
@edurekaIN 7 років тому ⁺²
+Mubashshir Rizvi, thanks for the wonderful feedback! Here are the answers o your queries:
1 Ans: No, we can define the number of nodes as per our requirement.
2 Ans : It is not mandatory that the first copy should be created on DN1. Selection of DN is solely dependent upon the Hadoop system. In this video for demonstration purpose, it is mentioned that the data is copied to DN1, but in real-time, Hadoop sytem handles this internally.
Hope this helps. Cheers!
@azadbulla3556 6 років тому ⁺¹
great!!!!!!!!!!!!
@devendranehra9840 5 років тому
very good
@jayasharma7616 7 років тому ⁺¹
I have gone through the videos and all of them are very useful.I have a doubt here : rack means different machines at one physical location and connected to each other. As said by you rack have data nodes.Then will it be correct if i say that different computers listed in a rack are data nodes?
@edurekaIN 7 років тому ⁺²
+Jaya Sharma, thanks for checking out our tutorial! We're glad you found it useful.
Rack is like a a container, which contains the data node, and which is nothing but a computing machine, and which contains the actual data, So if the data is very big, and comes to rack, then the data is distributed among the data nodes, that can be recollect as a single unit (which will be a merged output from all the data-nodes which kept the data). Theses things are maintained by hadoop framework, which means, in what amount the data should be divided among the data-nodes, which rack will be storing the data for certain region or group. Because, in real-time scenario, there can be multiple racks which will be storing data-nodes from different regions or say group.
Hope this helps. Cheers!
@jaymishra102 6 років тому ⁺¹
Why does client node seek permission/status from datanodes to perform write operation(see whether they are ready or not).name node already must be having the status of each data node then only it will send the node no.s right?Kindly brief
@edurekaIN 6 років тому ⁺¹
DataNode sends heartbeat to the NameNode periodically i.e. in interval. It might happen that during this interval a DataNode can crash which NameNode won't be aware of. Also, NameNode is the master node and needs to be available all the time. Hence, to lessen the load from NameNode, the read/write operation is taken care by DataNodes.
Hope this helps :)
@chetanpaithane6560 7 років тому
Very nice explanation. How does HDFS manage metadata on name node? Quick explanation will certainly help.
@edurekaIN 7 років тому ⁺¹
Hey Chethan, thanks for checking out our tutorial! We're glad you liked it. Here's the answer to your query:
The HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too.
The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint.
Hope this helps. Cheers!
@chetanpaithane6560 7 років тому
Thanks for the reply. My question was a bit different though. Let me elaborate it more with example of reiserfs.
1. If one wants to create a file or directory in reiserfs, reiserfs btree code creates an inode.
2. At the time of writing the inode on disk (stat data is on-disk representation of inode), the stat data item is inserted into B+ tree.
3. Dirent is inserted into parent directory.
3. Whenever, lookup for the file happens, the b+ tree is searched using key-value pair to retrieve information.
=======
My question :
How does HDFS manage metadata of files/directories on name node? Explanation would be helpful.
Thanks,
Chetan
@edurekaIN 7 років тому ⁺¹
Hey Chetan, maybe this would help.
Persistence of HDFS metadata broadly breaks down into 2 categories of files:
1) fsimage - An fsimage file contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID.
2) Edits - An edits file is a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage.
* Checkpointing is the process of merging the content of the most recent fsimage with all edits applied after that fsimage is merged in order to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands.
Here is an example of an HDFS metadata directory taken from a NameNode. This shows the output of running the tree command on the metadata directory, which is configured by setting dfs.namenode.name.dir in hdfs-site.xml.
data/dfs/name
├── current
│ ├── VERSION
│ ├── edits_0000000000000000001-0000000000000000007
│ ├── edits_0000000000000000008-0000000000000000015
│ ├── edits_0000000000000000016-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000029
│ ├── edits_0000000000000000030-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000031
│ ├── edits_inprogress_0000000000000000032
│ ├── fsimage_0000000000000000030
│ ├── fsimage_0000000000000000030.md5
│ ├── fsimage_0000000000000000031
│ ├── fsimage_0000000000000000031.md5
│ └── seen_txid
└── in_use.lock
In this example, the same directory has been used for both fsimage and edits. Alternatively, configuration options are available that allow separating fsimage and edits into different directories. Each file within this directory serves a specific purpose in the overall scheme of metadata persistence:
• VERSION - Text file that contains:
• layoutVersion - The version of the HDFS metadata format. When we add new features that require changing the metadata format, we change this number. An HDFS upgrade is required when the current HDFS software uses a layout version newer than what is currently tracked here.
• namespaceID/clusterID/blockpoolID - These are unique identifiers of an HDFS cluster. The identifiers are used to prevent DataNodes from registering accidentally with an incorrect NameNode that is part of a different cluster. These identifiers also are particularly important in a federated deployment. Within a federated deployment, there are multiple NameNodes working independently. Each NameNode serves a unique portion of the namespace (namespaceID) and manages a unique set of blocks (blockpoolID). The clusterID ties the whole cluster together as a single logical unit. It’s the same across all nodes in the cluster.
• storageType - This is either NAME_NODE or JOURNAL_NODE. Metadata on a JournalNode in an HA deployment is discussed later.
• ctime - Creation time of file system state. This field is updated during HDFS upgrades.
• edits_start transaction ID-end transaction ID - These are finalized (unmodifiable) edit log segments. Each of these files contains all of the edit log transactions in the range defined by the file name’s
• edits_inprogress__start transaction ID - This is the current edit log in progress. All transactions starting from are in this file, and all new incoming transactions will get appended to this file. HDFS pre-allocates space in this file in 1 MB chunks for efficiency, and then fills it with incoming transactions. You’ll probably see this file’s size as a multiple of 1 MB. When HDFS finalizes the log segment, it truncates the unused portion of the space that doesn’t contain any transactions, so the finalized file’s space will shrink down.
• fsimage_end transaction ID - This contains the complete metadata image up through
• seen_txid - This contains the last transaction ID of the last checkpoint (merge of edits into a fsimage) or edit log roll (finalization of current edits_inprogress and creation of a new one). Note that this is not the last transaction ID accepted by the NameNode. The file is not updated on every transaction, only on a checkpoint or an edit log roll. The purpose of this file is to try to identify if edits are missing during startup. It’s possible to configure the NameNode to use separate directories for fsimage and edits files. If the edits directory accidentally gets deleted, then all transactions since the last checkpoint would go away, and the NameNode would start up using just fsimage at an old state. To guard against this, NameNode startup also checks seen_txid to verify that it can load transactions at least up through that number. It aborts startup if it can’t.
• in_use.lock - This is a lock file held by the NameNode process, used to prevent multiple NameNode processes from starting up and concurrently modifying the directory.
Hope this helps. Cheers!
@chetanpaithane6560 7 років тому ⁺¹
Thanks for the information.
@kaushalsingh601 6 років тому ⁺¹
awesome explanation😊
@edurekaIN 6 років тому
Thank you, Kaushal! Do subscribe, like and share to stay connected with us. Cheers :)
@ganeshsundar1484 7 років тому
Good explanation, I hav a question,
Who is going to create blocks??
@edurekaIN 7 років тому
Hey Ganesh, thanks for checking out our tutorial! We're glad you liked it.
While storing data in HDFS, the NameNode will divide the files into data blocks (as mentioned by you in dfs.block.size property) and then stores the data blocks across various DataNodes in the HDFS.
Hope this helps. Cheers!
@naveenreddy5064 7 років тому ⁺¹
Good explanation.
@edurekaIN 7 років тому
Hey Naveen, thanks for checking out our tutorial! We're glad you found it useful. Here's another video that we thought you might like: ua-cam.com/video/tu7nCsHImbI/v-deo.html.
Do subscribe to our channel to stay posted on upcoming tutorials. Cheers!
@ankitjain2416 6 років тому ⁺¹
Hi,
Its Very well explained.
But I have small query. As per HDFS Arch we are creating three replication copy of a block. Ist copy is storing in one Rack and other two are in different but same rack. You already explained by saving it in same rack we are saving network bandwidth but my question is why we are creating 2 copies? If there is any issue with this rack then either both would not be accessible or both would be accessible. In that case we are not occupying/consuming more disk space keeping big data into consideration? What's the purpose of third copy?
Please reply.
Thanks
@edurekaIN 6 років тому
This is done to prevent data loss and provide more fault tolerance. As you mentioned, in case a rack fails, data can be retrieved from the third block residing in a different rack. Also, HDFS periodically checks for under replicated and corrupt blocks and adds more replicas if required to ensure that configured replication factor is maintained. Same is done for the corrupted blocks.
Hope this helps :)
@BSS2UA9901S 6 років тому ⁺¹
Nice tutorial :)
@edurekaIN 6 років тому
Hey , thank you for watching our video. We are glad to know that you liked our tutorial. Do subscribe and stay connected with us. Cheers :)
@sandykoolz 7 років тому ⁺¹
Thanks for good explanation, Vineet. I have a question, is it configurable to write the replicas in parallel. Because writing the replicas to the racks sequentially takes more time and also name node should wait for the Ack from the last replica commit.
@edurekaIN 6 років тому
No, it is not configurable as the whole process is being guided through a pipeline. Also, while waiting for acknowledgment message NameNode can serve other client requests.
@1983akj 4 роки тому
Hi, Thanks for the very informative video.I have a question here, why are we creating 2 replications of same block in single rack. Wouldn't the second one is redundant because if the rack is not available there is no meaning of having 2 copies in same rack.
@edurekaIN 4 роки тому
Yes, you are correct. If that rack fails, then both the copies will not be available. For the purpose of the video, we just did that but in real life, it is suggested to have it on different racks.
@kishorekumar2769 6 років тому ⁺¹
What is the difference between hadoop dfs -ls / and hdfs dfs -ls /
@edurekaIN 6 років тому
hadoop fs
fs is used for generic file system and it can point to any file system such as local file system, HDFS, WebHDFS, S3 FS, etc.
hadoop dfs
hdfs dfs
dfs points to the Distributed File System and it is specific to HDFS. You can use it to execute operations on HDFS. Now it is deprecated, and you have to use hdfs dfs instead of hadoop dfs.
Hope this helps :)
@laxmikantdhond3284 7 років тому ⁺¹
What is need of 3 replica, As we coping 2 replica in same rack ?? Can you please explain ?
@edurekaIN 6 років тому
It is done to provide more fault tolerance. Also, in general, DataNodes are likely to fail more than that of a rack. Besides this, having two replica in the same ract helps to improve the network performance because, in general, you will find greater network bandwidth between machines in the same rack than the machines residing in different rack.
Hope this answers your query! :)
@phillybruce 7 років тому ⁺³
A 43 min job on one machine takes exactly 4.3 min on 10 machines: Doesn't the lower level of parallelism in the reduce phase, the overhead of the mater name server and the fact that the data nodes may not have equal slices of the data make this an approximation?
@edurekaIN 6 років тому ⁺¹
Yes Bruce, you are absolutely correct.It is just an approximation so as to make you understand the benefits of parallelisn.
Hope this helps :)
@pushpendrasharma91 6 років тому
what difference between mapreduce and yarn ? and why need yarn?
@edurekaIN 6 років тому
Hey, sorry for the delay. YARN is for resource allocation in hadoop while MapReduce is a programming model for processing big data using parallel & distributed algorithm on a cluster.
Hope this helps!
@RohitRoy-ji9kv 3 роки тому
sweet
@shreeprakashagrahari9762 6 років тому ⁺¹
Hi
If we put a file from local machine to HDFS , Still it ll create 3 replica of each blocks(block of file ) in Racks?
@edurekaIN 6 років тому
Yes,
According to the default replication factor, whatever content you put in HDFS will be replicated and stored in different racks.
Hope this helps :)
@pranjitbharali6605 7 років тому ⁺¹
what if the no. of replicas is decided dynamically ?
@edurekaIN 6 років тому
No, the number of replica is not decided dynamically. By default it is specified in hdfs-site.xml.
But you can also explicitly decide the block size for a file.
Hope this helps :)
@truthsjourney3994 7 років тому ⁺¹
Awesone
@subhamkumargupta2712 5 років тому
what is the difference between the commands hadoop fs -ls and hadoop dfs -ls?
@edurekaIN 5 років тому
Hey Subham, The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others.So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS.
Hope this helps!
@gundaanil5001 6 років тому ⁺¹
who will create racks ?? and how do we the rack configuration ?
@edurekaIN 6 років тому
Rack configuration is done by the cluster administrator. For more information about rack awareness configuration, refer this link: hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/RackAwareness.html
Hope this helps :)
@sivagurusubarmaniyan1434 6 років тому ⁺¹
Good explanation! Btw, at minute 43:45 it was mentioned that reading different data blocks from datanodes within the same rack can reduce the usage of network bandwidth. Just wondering how the usage of network bandwidth get reduced. Correct me if I'm wrong, if data are read from different rack, it can reduce the load on the switches and increase their performance.
@edurekaIN 5 років тому ⁺¹
Hey Sivaguru, You are right. If you read data from different data nodes residing on different racks, the load on rack switches gets distributed and it actually contributes to performance tuning. Hope this helps!
@gummadavellisaiavinash6480 7 років тому
Thanks for information.
Please tell that
Which playlist we need to select for hadoop bigdata.
1. Hadoop Training Videos( 26 videos)
2. Big Data Hadoop Tutorial Videos( 46 videos )
Give replay ...
Both from # edureka
@edurekaIN 7 років тому
Hey Avinash, in Hadoop Training Videos playlist ( 26 videos) we only have the latest videos. The other playlist Big Data Hadoop Tutorial Videos playlist ( 46 videos ) has all the big data hadoop videos by Edureka, so I would suggest you to follow this playlist.
Hope this helps. Cheers!
@gummadavellisaiavinash6480 7 років тому
***** Thanks
@gummadavellisaiavinash6480 7 років тому
I want to learn hadoop.
@edurekaIN 7 років тому
Hey Avinash, thanks for checking out our tutorial!
Our instructor led Big Data Hadoop Certification Training with help you to learn Hadoop, you can check out the details of this training here: www.edureka.co/big-data-and-hadoop
Hope this helps. Cheers!
@prateeksingh3636 7 років тому
Hi,Its so great, I've a little doubt,
At the time of block writing,Client is going to write first block to the data node and other replica creates automatically,and client gets the feedback,But for this process client must have all blocks of that files up-front so who is going to create these blocks ? Is client itself?? and at what point of time blocks gets created and collect to client??
Who is going to maintain the sequence of that block if a case file need to re-collect?
Because as a user I will give only a BIG file as input.Appreciate your help.
@edurekaIN 7 років тому
Hey Prateek, thanks for checking out our tutorial.
You can check out this blog for a detailed explanation on storing file in Hadoop environment. It will give you all the info you need. www.edureka.co/blog/apache-hadoop-hdfs-architecture/
Hope this helps. Cheers!
@r4hu1gunner 5 років тому
Sir, in HDFS multi block write pipeline, why block B is getting copied twice to rack 5? First copy was copied to rack 5, shouldn't second and third copy by replicated/copied to the same rack?
@edurekaIN 5 років тому
Hey Rahul, "Replication factor is basically the no.of times we are going to replicate every single Data Block. So, in Hadoop, we have replication factor by default as 3, and the replication in hadoop is not the drawback, in fact it makes hadoop effective and efficient by incorporating the feature like Fault Tolerant.
There is a flexibility to change the replication factor in hadoop, i.e it can be changed to 2(less than 3) or can be increased(more than 3). However it is considered ideally to have replication factor as 3, because:
If one node of your’s goes down, you still have fault tolerant with 2 nodes and your critical data is saved in these two nodes successfully.
Also, you have ample time to send an alert to name node and recover the duplication of the failed node into a new node.
And in the meantime, if the 2nd node also fails unplanned, you still have one node active with your critical data to process.
Hence replication factor 3 is considered to best fit, less than that could be challenging during data recovery, and higher no of the node are known as cost prone."
Hope this helps!
@aparnasen4095 7 років тому
very nice explanation indeed!!! thanks a lot.. still got a doubt during the video, in the HDFS multi block write mechanism, it is shown that the first replica of BLK 2 is created in the same Rack (Rack 5), while earlier it was explained to create replicas in different Racks.. how far this is correct??please clear my doubt..Thanks in advance...
@edurekaIN 7 років тому ⁺¹
Hey Aparna, thanks for the wonderful feedback! We're glad you liked our tutorial.
With regard to your query, the block replacement policy is something that can be customised. But, the default block placement algorithm works fine. It states that if the client is itself on a data-node then store the first replica on that machine. Store the second replica on a different rack and then the third replica on the same rack where first replica was stored but on a different node. And if the replication factor is more than 3 then the further replicas are placed randomly(not actually, load balancing and network bandwidth) have to be taken into account. But the above is the best case scenario, if you don't have enough size on the local machine(for placing first replica) then hadoop will try to store the data in the same rack on a node which has lot of free space. So, it depends upon a lot more factor than what is told.
Hope this helps. Cheers!
@aparnasen4095 7 років тому ⁺¹
edureka! Thanks a lot again for replying to my query.. please keep up the good work and wish you all the very best !!!
@komalkale6205 7 років тому ⁺¹
what is hadoop clusture?
@edurekaIN 6 років тому
In talking about Hadoop clusters, first we need to define two terms: cluster and node. A cluster is a collection of nodes. A node is a process running on a virtual or physical machine or in a container. We say process because a code would be running other programs beside Hadoop.
There are two types of cluster setup for Hadoop:
We have Single Node Cluster (Normal Setup)
Multi Node Cluster
@lakshmidurga1406 7 років тому ⁺¹
great
@sarangsirsikar3443 7 років тому
should the block size be in multiples of 64mb only??
@edurekaIN 7 років тому
Hey Sarang, thanks for checking out our tutorial.
Yes, it will be multiple of 64 MB or 128 MB.The default size of a block in Hadoop yarn is 128 MB, but in Hadoop 1x its 64 MB.
Hope this helps. Cheers!
@graphe.l5911 6 років тому
Suppose we are maintaining 3Copies of Data ( 1 is in A Rack 2,3 are in B Rack ) suppose if B Rack fails due to some network problem . Hadoop can access data from A Rack it is fine. But my doubt is before we fixing up B Rack if A Rack also fails How to get the Data? Do we have any mechanism maintaining Replication factor as 3 if some of copy fails means does it create those 2 copies by using A Rack copy to maintain Replication factor as 3 before we fix the problem of B Rack???
@edurekaIN 5 років тому
Hey, "Failure of the complete Rack is very rare. Generally the nodes among the racks fail, and yes it is possible that all the 3 nodes residing on two racks where the data block is present can fail. But you need to know how critical is the data & you need to change the replication factor accordingly.
Suppose one of the DataNode fails, then NameNode quickly starts replicaitiong all the blocks present in that DataNode.
"
Hope this helps!
@teju7907 5 років тому
Hi team,
1) One DataNode means 1 CPU/RAM ? please give me answer.
2) where the RACKS are configured. means how many RACKs
going to be created ?
@edurekaIN 5 років тому
Hey Chenna, A small Hadoop cluster includes a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. Though it is possible to have data-only worker nodes and compute-only worker nodes, a slave or worker node acts as both a DataNode and TaskTracker.
Hope this helps!
@kajapraneetha2885 5 років тому
If i install hadoop on my laptop, then it will run on the top of the OS. How will the HDFS is running on the top of it ? where is the namenode created ? how will the hard disk be partitioned? how will the cpu be distributed for processing among data nodes ?
@edurekaIN 5 років тому
Hey, www.edureka.co/blog/interview-questions/hadoop-interview-questions-hadoop-cluster/. Please take a look at this blog. Cheers!
@vinothinijawahar6938 5 років тому
What is core switch here
@edurekaIN 5 років тому
Hey, A great deal of chatter takes place between the master nodes and slave nodes in a Hadoop cluster that is essential in keeping the cluster running, so enterprise-class switches are definitely recommended. These core switches handle massive amounts of traffic, so 40GbE is a necessity.
@sriharshagudi6769 6 років тому
If a block of 128 MB is stored with 50 MB or 100 MB. What will happen with the remaining storage space in the block? Will it be used by another file or will it wasted?
@edurekaIN 5 років тому
Hey Sriharsha, the remaining space shall be unused but we can't exactly call it a waste of space because this architecture is optimized basically for parallel processing. It's meant for enormous amounts of data to be processed in smaller blocks simultaneously, ultimately saving alot of time and resulting in better efficiency.
Hope this helps!
@vaibhavkumar3351 7 років тому
HI Team
Greetings!!!
Please do let me know is there any coming batch of the instructor in the video . i need to join asap .
Thanks ..
@edurekaIN 7 років тому ⁺¹
Hey Vaibhav, thanks for checking out our tutorial and for your interest. While we do not have any upcoming batches led by this instructor, we have upcoming batches by other top-rated instructors who have trained hundreds of professionals. You can check out the batch dates here: www.edureka.co/big-data-and-hadoop. If you would like to take a look at the sample class recordings of the other instructors, please share your contact details with us here (we will not publish the comment) or inbox us on FB and we will send you the links. Alternatively, you can also call us at +91 88808 62004 . Hope this helps. Cheers!
@edurekaIN 7 років тому
Hey Vaibhav, we have shared your contact details with the relevant team. You can expect to hear from them very soon. Since this instructor does not have any batches coming up, they will share sample class recordings for instructors who have upcoming batches. You can take a look and decide. :) Please feel free to get in touch if you have any questions. Hope this helps. Cheers!
@prateekjaiswal7230 7 років тому
sir please explain everytopic in depth...
i am not understand any topic of hadoop...??
@edurekaIN 7 років тому
Hey Prateek, thanks for checking out our tutorial!
We suggest that you start with this tutorial ua-cam.com/video/zez2Tv-bcXY/v-deo.html and work your way down this playlist: ua-cam.com/play/PL9ooVrP1hQOFrYxqxb0NJCdCABPZNo0pD.html.
You can also sign up for our structured instructor-led training to get support and doubt clearance: www.edureka.co/big-data-and-hadoop.
Hope this helps. Cheers!
@Manishkumar-zj9zw 7 років тому
are the vedios in proper sequence??
@edurekaIN 7 років тому ⁺¹
Hey Manish, thanks for checking out our tutorials.
For Hadoop Developer training, you can follow this playlist: ua-cam.com/play/PL9ooVrP1hQOEmUPq5vhWfLYJH_b9jFBbR.html. You can skip video #3,4,5 as there may be repetition of concepts.
For a structured training programme that includes practicals, 24X7 support and lifetime access to learning material, please check out our course here: www.edureka.co/big-data-and-hadoop.
Hope this helps. Cheers!
@omarayman5478 7 років тому
how can i download hadoop software or where to find it..thanks in advance
@edurekaIN 7 років тому
Hey Omar, thanks for checking out our tutorial! Kindly use the bellow link to download Hadoop Software.
www-eu.apache.org/dist/hadoop/common/
Cheers!
@412sahil 7 років тому ⁺¹
what is secondary namenode ?
@edurekaIN 6 років тому
Secondary NameNode is a helper to the primary NameNode but is not a replacement for the primary namenode.Secondary Namenode takes the responsibility of merging editlogs with fsimage from the namenode.
1.It gets the edit logs from the namenode in regular intervals and applies to fsimage
2.Once it has new fsimage, it copies back to namenode
3.Namenode will use this fsimage for the next restart,which will reduce the startup time
Hope this helps :)
@indrapadmaja 7 років тому
How can i install Hadoop in windows7 os?
@edurekaIN 7 років тому
Hey Indrakanth, thanks for checking out our tutorial.
Hadoop cannot be installed on windows machine. So please install virtual machine (centos) which is Linux operating system and there you can install Hadoop in centos.
Please go through the below blog which has the detailed steps for installing Hadoop on centos.
www.edureka.co/blog/install-hadoop-single-node-hadoop-cluster
Hope this helps. Cheers!
@ubaidmukati1532 7 років тому
what if the name node fails??
@edurekaIN 7 років тому
+Ubaid Mukati, thanks for checking out our tutorial! If the NameNode process or machine fails, then the entire cluster will not be available until either the NameNode is rebooted or it is assigned and started on another machine. Any restarted NameNode is not available until it gets heartbeat messages from the data nodes with the block locations for all of the files on the data nodes. This can take hours for large clusters which results in decreased availability when there is an unexpected outage.
The single NameNode contains the metadata about all of the file blocks stored in HDFS. This meta data is a registry of which file blocks make up each HDFS file. Without this registry, there is no way to know which blocks belong to which HDFS files. The location of file blocks is sent to the NameNode through heartbeat messages from the Data Nodes.
In the event of the NameNode failure, since there are normally no HDFS file blocks stored on the NameNode, there would be no loss of the file blocks that make up HDFS files. As mentioned, the NameNode contains a registry of all of the blocks in HDFS. This information is located in an image file called fsimage also in an edit log that keep tracks of all of the files on the system. If this file is lost or corrupted, then there will be no record of which blocks are in which HDFS file resulting in data loss of the entire cluster. Hadoop does have built in mechanisms and also some administration practices to protect against this case.
Hope this helps. Cheers!
@prachiagrawalcipher 7 років тому
How DataNode1 knows about datanode4?
@edurekaIN 7 років тому
Hey Prachi, thanks for checking out our tutorial! The Application Master is the one that handles communication between DataNodes.
So DataNode1 and DataNode2 are connected via Application Master.
Hope this helps. Cheers!
@mohammedabdulbari3460 7 років тому
Hi edureka, I want to take the hadoop course that you guys are offering, is there any email address that i can get to contact you guys.
@edurekaIN 7 років тому
+Mohammed Abdul Bari, thanks for checking out our tutorial and for your interest. We can definitely help you there. You can get in touch with us at +91 88808 62004 or simply write to us at sales@edureka.co. You can even register online here: www.edureka.co/big-data-and-hadoop. Alternatively, you can share your contact details with us (we will not make the comment public) and we will get in touch with you. Hope this helps. Cheers!
@saibadrish1248 7 років тому ⁺⁴
thanks for the details explanation !! really appreciated !

Наступне

Автоматичне відтворення

Hadoop Components Explained | Hadoop Ecosystem | Hadoop Architecture | Hadoop Tutorial | Edureka