Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.
00:03 Recently asked Pyspark Coding Questions 02:37 Writing and executing Pyspark pseudo code 05:21 Creating a Spark dataframe from input and performing group by aggregation 08:04 Using aggregation functions and collect list in Pyspark. 11:15 Spark SQL solution for creating DataFrame and running queries. 14:18 Understanding the data frame reader API for reading JSON and the usage of explode function 17:11 Creating a Spark dataframe and performing operations on it. 19:44 Converting string to date and performing group by in Pyspark DataFrame 22:32 Finding the average stock value using PySpark 25:38 Practice more on data frames for interviews 28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding
Hi Sumit , Well the last question about aggregation and max average of stock , there should be time also with date. Because originally at different times the prices of stock changes. Then we need to convert it into yyyy-MM-dd format to get the day specific stock , get their average and then max of avg. Just thought of sharing. Well overall implementation would still be same :) cheers
What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on UA-cam and when you will upload it on UA-cam we are waiting for remaining 10 questions on pyspark Thank you ❤
Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.
One of the best interview series Thank you sumit sir .
glad to know that you liked it.
One of the great explanation so far on youtube. I wish i could afford your course :(
Need more Pyspark Interview Solutions like this 😊
Best selection of questions and very good explanation.
Thanks a lot, Sumit! I am a senior data engineer with 5 years of exp but since we don't work with dataframes or pyspark mostly I am not able to do these simple things.
You are doing a great job posting these❤
Very useful informative video which gives more confidence to the bigdata aspirants. Thanks Sumit.
00:03 Recently asked Pyspark Coding Questions
02:37 Writing and executing Pyspark pseudo code
05:21 Creating a Spark dataframe from input and performing group by aggregation
08:04 Using aggregation functions and collect list in Pyspark.
11:15 Spark SQL solution for creating DataFrame and running queries.
14:18 Understanding the data frame reader API for reading JSON and the usage of explode function
17:11 Creating a Spark dataframe and performing operations on it.
19:44 Converting string to date and performing group by in Pyspark DataFrame
22:32 Finding the average stock value using PySpark
25:38 Practice more on data frames for interviews
28:15 Practice more to gain confidence in writing correct syntax for Pyspark coding
Thank You sir for the best explanation. Can you please come up with more examples?
Sir...Share need more .. please continue this playlist
Hi Sumit , Well the last question about aggregation and max average of stock , there should be time also with date. Because originally at different times the prices of stock changes.
Then we need to convert it into yyyy-MM-dd format to get the day specific stock , get their average and then max of avg. Just thought of sharing. Well overall implementation would still be same :) cheers
It will be great if you put questions in comment . Others can try without looking at solution first
Thank you sir😄
Best explanation sir thanks
I am happy to hear this
We can apply distinct() too I guess for avoiding duplicate values in df.
Superb
Much needed sir.....!!!
Sujoy, I am sure you will enjoy watching this.
thanks sumit make videos like this .
definitely
Nice explanation sir, kindly post scenario based questions
yes for sure
Thank you Sir greatly explained, would be good if you can post data/schemas also in the decription box for us to query and do hands on. Thanks.! :)
Hi Sumit,
Could you please create Video explaining pipelines on AWS Databricks End-End along with Orchestration of those.
What about remaining 10 questions on pyspark you told we are covering it in next video but still you not uploaded on UA-cam and when you will upload it on UA-cam we are waiting for remaining 10 questions on pyspark
Thank you ❤
Amazing sir
Nikhil, I am sure you will find it useful.
Hi Sir, can we not write in Spark sql in interview? As there is no difference in performance.
This is great!
thank you Umesh
Nice video
thank you
in question number 2 = do we not need to remove duplicate as last can you please clear me on it ?
Hello sir, how can I run pyspark code online, are you also using any online utilty to run pyspark code as shown in this video , could you please share the source, it would be very helpful.
Sir create coding interview playlist
Q2.
Data=[('a','aa',1),
('a','aa',2),
('b','bb',5),
('b','bb',3),
('b','bb',4)]
data_schema= "col1 string, col2 string, col3 int"
df_data=spark.createDataFrame(data=Data,schema=data_schema)
df_data.display()
from pyspark.sql.functions import *
from pyspark.sql.types import *
result = ( df_data.groupBy(col('col1'),col('col2'))\
.agg(collect_set(col('col3')))
)
result.display()