10 PySpark Product Based Interview Questions

Поділитися
Вставка
  • Опубліковано 29 гру 2024

КОМЕНТАРІ • 23

  • @abhishekn786
    @abhishekn786 5 місяців тому +5

    Thanks for the video, Please continue this pyspark interview videos.
    Thanks Again.

  • @Sandeep-bl9ji
    @Sandeep-bl9ji 11 місяців тому +1

    Really a nice explanation with a clear shot... Thanks a lot... Please keep more videos on pyspark

  • @ashishlimaye2408
    @ashishlimaye2408 10 місяців тому

    Great questions!

  • @vishnuvardhan9082
    @vishnuvardhan9082 11 місяців тому +1

    Loving your channel more day by day

  • @BigData-fu6jd
    @BigData-fu6jd 3 місяці тому

    Really helpful.....Looking for more videos :)

    • @thedatatech
      @thedatatech  3 місяці тому +1

      Glad to hear that

    • @BigData-fu6jd
      @BigData-fu6jd 3 місяці тому

      @@thedatatech Please upload more videos and topic related videos in the description

  • @sudarshanthota3369
    @sudarshanthota3369 6 місяців тому

    awesome video

  • @Vanmathi-e5o
    @Vanmathi-e5o 5 місяців тому +2

    Only one video on the Pyspark Playlist ...
    Pls post more!!

  • @businessskills98
    @businessskills98 11 місяців тому +1

    New subscriber added

  • @akhilchandaka4053
    @akhilchandaka4053 4 місяці тому

    Hi Bro
    You always teach Amazing stuff
    Great work 😊
    I have request,
    Can you please give syllabus or kind of preparation strategy for spark preparation.

  • @Ameem-rw4ir
    @Ameem-rw4ir 8 місяців тому

    Bro, thanks for your effort for sharing interview based real time questions and answers. can you please share realtime streaming (kafka and pyspark) based interview based questions and answers??.

  • @amit4rou
    @amit4rou 5 місяців тому +2

    Doubt in 1st question:
    The delimiters in the data are "," "\t" "|"
    Then why did you use ",|\t|\|"
    Please explain.

    • @divit00
      @divit00 4 місяці тому +1

      split takes a regex pattern ",|\t|\|" means , OR \t OR |

  • @boreddykesavareddy669
    @boreddykesavareddy669 8 місяців тому

    Instead of left anti we can use except

  • @chandrarahul1990
    @chandrarahul1990 4 місяці тому

    i feel we can solve the 7th question using window function row_number() as well

  • @saravanakumar-r1s
    @saravanakumar-r1s 11 місяців тому

    in interview if we solve the problems in SQL using sparksql will it be okay?

    • @Rakesh-q7m8r
      @Rakesh-q7m8r 10 місяців тому

      It depends, sometimes, interviewer specially asks you not to use the sql and rather use the dataframe apis.

    • @selva30989
      @selva30989 8 місяців тому

      Few guys will ask you to share code in both, spark sql and pyspark
      This way they can assess your pyspark and sql knowledge in single scenario based questions

  • @hyderali-wl3yi
    @hyderali-wl3yi 7 місяців тому

    bro, thanks for your inputs. below data is in a file. can you please help me how to handle this?. I got bit trouble on your one line string data with what i have in multiple rows with multiple delimiter.
    empid,fname|lname@sal#deptid
    1,mohan|kumar@5000#100
    2,karna|varadan@3489#101
    3,kavitha|gandan@6000#102
    Expected output
    empid,fname,lname,sal,deptid
    1,mohan,kumar,5000,100
    2,karan,varadan,3489,101
    3,kavitha,gandan,6000,102

  • @hyderali-wl3yi
    @hyderali-wl3yi 7 місяців тому +1

    bro, thanks for your inputs. below data is a file.can you please help me how to handle this in pyspark?
    empid,fname|lname@sal#deptid
    1,mohan|kumar@5000#100
    2,karna|varadan@3489#101
    3,kavitha|gandan@6000#102
    Expected output
    empid,fname,lname,sal,deptid
    1,mohan,kumar,5000,100
    2,karan,varadan,3489,101
    3,kavitha,gandan,6000,102

    • @piyushramkar9404
      @piyushramkar9404 7 місяців тому

      from pyspark.sql import SparkSession
      from pyspark.sql.functions import split, col
      spark = SparkSession.builder.appName("MyApp").getOrCreate()
      df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").option("delimiter", ",").load("D:/data/employees.csv")
      exp_op = df.withColumn("fname", split(col("fname|lname@sal#deptid"), "\\|").getItem(0)) \
      .withColumn("lname", split(split(col("fname|lname@sal#deptid"), "\\|").getItem(1),"@").getItem(0)) \
      .withColumn("sal", split(split(col("fname|lname@sal#deptid"), "@").getItem(1), "#").getItem(0)) \
      .withColumn("deptid", split(col("fname|lname@sal#deptid"), "#").getItem(1)) \
      .select("empid","fname", "lname", "sal", "deptid")
      exp_op.show()
      #ouput
      +-----+-------+-------+----+------+
      |empid| fname| lname| sal|deptid|
      +-----+-------+-------+----+------+
      | 1| mohan| kumar|5000| 100|
      | 2| karna|varadan|3489| 101|
      | 3|kavitha| gandan|6000| 102|
      +-----+-------+-------+----+------+