Hey Bro! Very nice to see that you have started performance tuning...please continue the session with other performance tuning like: HBase, Yarn, Kafka etc...
Following are the Hive optimization techniques for Hive Performance Tuning. Tez-Execution Engine in Hive Usage of Suitable File Format in Hive Hive Partitioning Bucketing in Hive Vectorization In Hive Cost-Based Optimization in Hive Hive Indexing De-normalizing data - Compress map/reduce output Avoid small files
Sir, will you please give me answer to this? "What approach we should take to load thousands of small 1 KB files using Hive, do we load one by one or should we merge together and load at once and how to do this?"
To control the no of files inserted in hive tables we can either change the no of mapper/reducers to 1 depending on the need, so that the final output file will always be one. If not anyone of the below things should be enable to merge a reducer output if the size is less than an block size. hive.merge.mapfiles -- Merge small files at the end of a map-only job. hive.merge.mapredfiles -- Merge small files at the end of a map-reduce job. hive.merge.size.per.task -- Size of merged files at the end of the job. hive.merge.smallfiles.avgsize -- When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
Scenario Based :You get data on first on every month .This data is stored as a partitoned table in Hive. Suppose you get data in the middle of the month any date then provide a logical scenario to delete the previous partition and create a new partition with the latest date.
Hey Bro! Very nice to see that you have started performance tuning...please continue the session with other performance tuning like: HBase, Yarn, Kafka etc...
Sure 👍 thanks 🙏
Very easily explained. Thank you so much
Straight to mind.... ! keep it up buddy
Following are the Hive optimization techniques for Hive Performance Tuning.
Tez-Execution Engine in Hive
Usage of Suitable File Format in Hive
Hive Partitioning
Bucketing in Hive
Vectorization In Hive
Cost-Based Optimization in Hive
Hive Indexing
De-normalizing data -
Compress map/reduce output
Avoid small files
one more i want to add is smb join...sort-merge-bucket join which is replacement of msj i.e,map side join
Correct 🙂💯
Partition and Indexing purpose is one and the same. Ie., Similar concept. Correct me if I am wrong
Purpose is same. But concepts are different. I will try to create another video. You can check my video on partitioning in playlist
Very much useful Ankush 🙂
INDEXING is available in latest veriosn of HIVE ?,Please let me know
Indexing means are talking about bloom filters? If not can you please let me know how to create...
Sir, will you please give me answer to this? "What approach we should take to load thousands of small 1 KB files using Hive, do we load one by one or should we merge together and load at once and how to do this?"
To control the no of files inserted in hive tables we can either change the no of mapper/reducers to 1 depending on the need, so that the final output file will always be one. If not anyone of the below things should be enable to merge a reducer output if the size is less than an block size.
hive.merge.mapfiles -- Merge small files at the end of a map-only job.
hive.merge.mapredfiles -- Merge small files at the end of a map-reduce job.
hive.merge.size.per.task -- Size of merged files at the end of the job.
hive.merge.smallfiles.avgsize -- When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
Thanks for this.
Is partition with indexing increase pyspark query performance..? Or I should use only partition..?
Really a nice video
Could you please share videos for SCD in Hive. and SCD Revert too.
Can you explain the concepts with real-time example
Thanks
Good video, but console examples would have been more helpful.
Can you please elaborate these concepts with examples.
👍💯
Scenario Based :You get data on first on every month .This data is stored as a partitoned table in Hive. Suppose you get data in the middle of the month any date then provide a logical scenario to delete the previous partition and create a new partition with the latest date.
if you dont want to have old partitions, you can use insert overwrite
First computer open cheyyu theory evariki kavali