What is Apache Spark? Learn Apache Spark in 15 Minutes

Поділитися
Вставка
  • Опубліковано 21 січ 2025

КОМЕНТАРІ • 48

  • @5t1300
    @5t1300 13 годин тому

    Great job breaking this down with clear examples

  • @MohanKrishna-yi9cc
    @MohanKrishna-yi9cc 4 місяці тому +9

    OMG , I even tried many UDEMY courses to understand this. None of the tutor explained this clearly .. iam loving it ... Sir please start full databricks course to help us. Please.. 🙏

    • @mr.ktalkstech
      @mr.ktalkstech  3 місяці тому +1

      Thank you so much :) Sure, will do :)

  • @slicein
    @slicein Місяць тому +1

    Fantastic. Thank you for such an easy and efficient explanation. The restaurant example is apt for spark. Great work👌👍🙏❤❤

  • @kiranzsai
    @kiranzsai 6 днів тому

    Excellent !! thank you for the explanation!! Thats Y I subscribed n became member to your channel!!

  • @dprakash1793
    @dprakash1793 5 місяців тому +3

    this is what i was looking for well explained. thank you

  • @manikandan-fq5sh
    @manikandan-fq5sh 4 місяці тому +2

    simply great explanation about SPARK architecture, how its connected step by step it connects all the dots in Spark.

    • @mr.ktalkstech
      @mr.ktalkstech  3 місяці тому

      Thank you so much :)

    • @manikandan-fq5sh
      @manikandan-fq5sh 3 місяці тому

      @@mr.ktalkstech looking further concept in Spark, it would be great if you try full course

  • @emily-bose
    @emily-bose 19 днів тому

    Excellent and one of the best explanation of spark architecture...

  • @dvg_ck
    @dvg_ck 3 місяці тому

    Simple and brilliant analogy Mr K

  • @shyammaths5705
    @shyammaths5705 4 місяці тому

    this is so simple and clear explanation that
    it made to share to my friends.
    keep making video
    your efforts putting great impact in our life.

  • @rakeshverma6867
    @rakeshverma6867 5 місяців тому

    Simplest and excellent explanation Mr K.

  • @062nanthagopalm6
    @062nanthagopalm6 5 місяців тому

    Wow! Just mind blowing brother💥💥!! Looking for more DE fundamentals videos ✨♥️👌

  • @sharaniyaswaminathan8760
    @sharaniyaswaminathan8760 5 місяців тому

    Excellent! Thank you for explaining this.

  • @benim1917
    @benim1917 5 місяців тому

    Clear and well explained

  • @smderoller
    @smderoller 5 місяців тому

    Very well explained!!!

  • @Bijuthtt
    @Bijuthtt 5 місяців тому

    Awesome explanation bro.

  • @digitalabi
    @digitalabi 2 місяці тому

    I appreciate your explanation; it has clarified the topic for me. Thank you. 🙏🏼
    However, I have one question: if the CSV files are split into two, how will one worker determine if there are any duplications with another worker work?

  • @Abhinavkumar-kt8gj
    @Abhinavkumar-kt8gj 5 місяців тому

    Excellent!

  • @seethaba
    @seethaba 3 місяці тому

    Great primer @Mr. K! Thanks. Quick question - How does the driver program create task partitions for the plan? For example, if there are duplicates across two worker nodes, wouldn't the count be misrepresented if it simply adds 4500 and 5500? Does this get auto-handled or do we have to control the partitioning logic?

    • @Bhavik_P89
      @Bhavik_P89 3 місяці тому

      According to number of partitions of the files. You can also control over task by setting up configuration of partitions limit after each transformation.by using below code
      spark.conf.set("spark.sql.shuffle.partitions", num_partitions)
      The task is always depends on the number of partitions.
      Your question is that each and every worker nodes have duplicates and in count operation it will just sum the results right.
      Ans- after getting the result from each worker nodes the driver program will again aggregate it and then give the final result

  • @selvakumarr.k.8660
    @selvakumarr.k.8660 5 місяців тому

    Useful presentation

  • @dogzrgood
    @dogzrgood 3 місяці тому

    Great explanation. Do you have a full pyspark tutorial?

  • @shabeerkhan379
    @shabeerkhan379 4 місяці тому

    Really good

  • @AlexFosterAI
    @AlexFosterAI 2 місяці тому

    hey man, may be worth a shot checking out LakeSail's PySail built on rust. supposedly 4x faster with 90% less hardware a cost according to their latest benchmarks. and can migrate existing python code. might be cool to make a vid on!
    love ur content!

  • @neeraj_dama
    @neeraj_dama 5 місяців тому

    thanks for this

  • @zakeerp
    @zakeerp 5 місяців тому +1

    Hi , what tools used to create this type of videos. Please help.

    • @mr.ktalkstech
      @mr.ktalkstech  5 місяців тому +1

      Final cut pro, CapCut, PowerPoint and After effects.

    • @zakeerp
      @zakeerp 4 місяці тому

      @@mr.ktalkstech thank you for the info

  • @mgdesire9255
    @mgdesire9255 5 місяців тому

    waiting for your pyspark playlist:)

  • @satish1012
    @satish1012 2 місяці тому

    This is my understanding
    - Apache Spark falls under the compute category.
    -It's related to MapReduce but is faster due to in-memory processing.
    -Spark can read large datasets from object stores like S3 or Azure Blob Storage.
    -It dynamically scales compute resources, similar to autoscaling and Kubernetes orchestration.
    -It processes the data to deliver analytics, ML models, or other results efficiently.

  • @PatelTushya
    @PatelTushya 2 місяці тому

    Respect++

  • @kirankarthikeyan4940
    @kirankarthikeyan4940 5 місяців тому

    will this same topic be covered in the other channel (Mr.K Talks Tech Tamil)

  • @RijyosInfo-wn6wt
    @RijyosInfo-wn6wt Місяць тому

    RESPECT++++++++

  • @neuera9556
    @neuera9556 4 місяці тому

    Did not said about rddd