Apache Spark Transformation and Actions

Поділитися
Вставка
  • Опубліковано 24 жов 2024

КОМЕНТАРІ • 32

  • @IsaiahShadE
    @IsaiahShadE 3 роки тому +2

    Probably the only person who tells you facts and reality in the data science community.

  • @HaridasJanjire
    @HaridasJanjire 4 роки тому +2

    Very well.. Very helpful to learn Apache spark with real business end to end case.

  • @ayeshababar-fl4ev
    @ayeshababar-fl4ev 9 місяців тому

    Very elaborate and well-explained! Can you please share the code and notebook?

  • @sudippandit1
    @sudippandit1 3 роки тому +1

    Excellent presentation sir!!

  • @mateen161
    @mateen161 4 роки тому +1

    Thanks Srivatsan...Nice explanation!

  • @KishoreKumar-yx4nw
    @KishoreKumar-yx4nw 4 роки тому +1

    Thanks Srinivasan for the wonderful explanation

  • @IsaiahShadE
    @IsaiahShadE 3 роки тому +1

    Sir you are an Inspiration.

  • @mukeshkesavan4852
    @mukeshkesavan4852 2 роки тому

    Thanks ton..! You made spark easy. Please make a video on how to optimize spark code and data skewness..

  • @designwithpicmaker2785
    @designwithpicmaker2785 4 роки тому +1

    Thank you bro thanks for this wonderful content video

  • @taliacohen7872
    @taliacohen7872 2 роки тому

    Amazing video thank you!!!!

  • @viBeotamil
    @viBeotamil 3 роки тому +1

    Amazing video sir.

  • @ranjanirajamani7565
    @ranjanirajamani7565 4 роки тому +3

    Thank you, Sir my learning curve with regards to Spark has taken an exponential trend after watching your videos. It has been a rich learning experience. I have been trying to practice this parallely. I have a question regarding data frame in pyspark. When I tried to create the variable "bad_loan" using withColumn and when (for the various cases of loan_status), the variable doesnt get created in the table, though I can see it in the dataframe. When I try to access this column using a select statement, I get an error. Can you please throw some light on this?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому

      Thanks Ranjani.. did u assign it to dataframe and use that dataframe to save. In my video I think I saved old dataframe object and not the one I assigned to new columns. Can you please validate it?

    • @ranjanirajamani7565
      @ranjanirajamani7565 4 роки тому

      @@AIEngineeringLife Thank you for the response, Sir. I was able to resolve this issue. It was related to the way the when function was to be used.

  • @saurabhjain1626
    @saurabhjain1626 4 роки тому +2

    Thank you for the wonderful video...I have a question as you mentioned you should use sortWithinPartitions to avoid expensive transformations when you know that the particular data is in one partition, how will you know that?? I am assuming that is only possible when you partition the data based on the values of that particular column.

  • @naveenreddythirugudu
    @naveenreddythirugudu 3 роки тому

    Best video 👍

  • @nagarajuch2412
    @nagarajuch2412 4 роки тому +1

    Videos are all very informative.
    Is there anyway we can sort based on more than one attribute? eg: Country Ascending and Date Descending

    • @nagarajuch2412
      @nagarajuch2412 4 роки тому +2

      Ans: orderBy(col("City").asc(),col("Date").desc())

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому

      @@nagarajuch2412 .. You got the answer :) .. It is there in one of my data engineering video as well

  • @AkshayKumar-xo2sk
    @AkshayKumar-xo2sk 3 роки тому

    @AIEngineering - Thanks a lot for your video. May I kindly check all your spark video codes are based on python? You don't use scala/java? Whatever we do in scala/java can also be done using python?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      All of my videos are using pyspark. So python is the one I have used but same can be easily done on Scala as well

    • @AkshayKumar-xo2sk
      @AkshayKumar-xo2sk 3 роки тому

      @@AIEngineeringLife - do you think CCA175 cloudera certification for Apache spark and hadoop developer is good one to attempt for someone who is working as Data Engineer? Do you recommend any other certifications? And can the certification be done using Pyspark as well? Your help is highly appreciated

  • @kkckvr
    @kkckvr 4 роки тому +1

    Thanks a lot

  • @rajeevrajeev5244
    @rajeevrajeev5244 3 роки тому

    Do you have this Databricks page somewhere in git?

  • @deepakparamesh8292
    @deepakparamesh8292 4 роки тому

    very nice explanation sir.....could you please upload the code, sir?

    • @AIEngineeringLife
      @AIEngineeringLife  4 роки тому +1

      Deepak.. Spark videos are not yet in my git repo.. It will take time to get there. Below is my repo that has other video code at this time
      github.com/srivatsan88/UA-camLI

  • @kketanbhaalerao
    @kketanbhaalerao 3 роки тому +1

    Please
    provide your GitHub link and also provide corona data and twitter data

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому +2

      You can find all codes here - github.com/srivatsan88/Mastering-Apache-Spark

  • @Cricketpracticevideoarchive
    @Cricketpracticevideoarchive 4 роки тому +1

    Grateful for this series
    Day 3 : colab.research.google.com/drive/1yTDcFFcUAynSXqZxjmu6UJ8bFAkEgnqV?usp=sharing&authuser=1#scrollTo=O9naSW-WLWR5