Hive Optimization Techniques With Examples

Поділитися
Вставка
  • Опубліковано 18 жов 2024

КОМЕНТАРІ • 26

  • @mdshahalam3010
    @mdshahalam3010 4 роки тому +3

    Hey Bro! Very nice to see that you have started performance tuning...please continue the session with other performance tuning like: HBase, Yarn, Kafka etc...

  • @piyushgaikwad4205
    @piyushgaikwad4205 2 роки тому

    Very easily explained. Thank you so much

  • @sagarmohare1584
    @sagarmohare1584 3 роки тому

    Straight to mind.... ! keep it up buddy

  • @jaishuagarwal2210
    @jaishuagarwal2210 3 роки тому +3

    Following are the Hive optimization techniques for Hive Performance Tuning.
    Tez-Execution Engine in Hive
    Usage of Suitable File Format in Hive
    Hive Partitioning
    Bucketing in Hive
    Vectorization In Hive
    Cost-Based Optimization in Hive
    Hive Indexing
    De-normalizing data -
    Compress map/reduce output
    Avoid small files

    • @apurvakulkarni9409
      @apurvakulkarni9409 2 роки тому +2

      one more i want to add is smb join...sort-merge-bucket join which is replacement of msj i.e,map side join

    • @learnomate
      @learnomate  2 роки тому

      Correct 🙂💯

  • @naveenkumarmaddala7830
    @naveenkumarmaddala7830 4 роки тому +2

    Partition and Indexing purpose is one and the same. Ie., Similar concept. Correct me if I am wrong

    • @learnomate
      @learnomate  4 роки тому +2

      Purpose is same. But concepts are different. I will try to create another video. You can check my video on partitioning in playlist

  • @Sarojpradeep
    @Sarojpradeep 2 роки тому

    Very much useful Ankush 🙂

  • @manikandanvenkatachalam5604
    @manikandanvenkatachalam5604 4 роки тому +1

    INDEXING is available in latest veriosn of HIVE ?,Please let me know

  • @keyurthakor6495
    @keyurthakor6495 4 роки тому

    Indexing means are talking about bloom filters? If not can you please let me know how to create...

  • @deepikakumari5369
    @deepikakumari5369 4 роки тому +1

    Sir, will you please give me answer to this? "What approach we should take to load thousands of small 1 KB files using Hive, do we load one by one or should we merge together and load at once and how to do this?"

    • @karthikgolagani6844
      @karthikgolagani6844 3 роки тому +1

      To control the no of files inserted in hive tables we can either change the no of mapper/reducers to 1 depending on the need, so that the final output file will always be one. If not anyone of the below things should be enable to merge a reducer output if the size is less than an block size.
      hive.merge.mapfiles -- Merge small files at the end of a map-only job.
      hive.merge.mapredfiles -- Merge small files at the end of a map-reduce job.
      hive.merge.size.per.task -- Size of merged files at the end of the job.
      hive.merge.smallfiles.avgsize -- When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

  • @naveenkumar-tb1de
    @naveenkumar-tb1de 4 роки тому +1

    Thanks for this.

  • @vishvjitmane7651
    @vishvjitmane7651 4 роки тому

    Is partition with indexing increase pyspark query performance..? Or I should use only partition..?

  • @jaishuagarwal2210
    @jaishuagarwal2210 3 роки тому

    Really a nice video

  • @vru5696
    @vru5696 3 роки тому

    Could you please share videos for SCD in Hive. and SCD Revert too.

  • @praveenadurai2784
    @praveenadurai2784 4 роки тому +1

    Can you explain the concepts with real-time example

  • @papachoudhary5482
    @papachoudhary5482 4 роки тому +1

    Thanks

  • @databiceps
    @databiceps 4 роки тому

    Good video, but console examples would have been more helpful.

  • @naveenkumarmaddala7830
    @naveenkumarmaddala7830 4 роки тому +2

    Can you please elaborate these concepts with examples.

  • @himanish2006
    @himanish2006 4 роки тому

    Scenario Based :You get data on first on every month .This data is stored as a partitoned table in Hive. Suppose you get data in the middle of the month any date then provide a logical scenario to delete the previous partition and create a new partition with the latest date.

    • @karthikgolagani6844
      @karthikgolagani6844 3 роки тому

      if you dont want to have old partitions, you can use insert overwrite

  • @littlesingham1300
    @littlesingham1300 3 роки тому

    First computer open cheyyu theory evariki kavali