52
36 590

MNC asked Interview Question #mnc #accenture #ey #cognizant #pwc #deloitte #DE #interview

12:01

Capgemini DE interview Questions for 3-4 years of exp. #capgemini #dataengineer #interview

10:09

Deloitte data engineer interview Questions for 2-3 years of exp. #deloitte #dataengineer #interview

12:31

Questions asked in Tiger Analytics TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

14:45

Databricks Snowflake ETL Project || Sample Project for Data Engineer || #databricks #snowflake #etl

9:49

Questions asked in DELOITTE TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

11:16

EXL interview question for DE for 3-4 yrs exp. #EXL #interview #dataengineering #dataengineer #mnc

In this vide we discussed the latest asked question in EXL to data engineer.
Q1: SQL List the films together with the leading star for all 1962 films.
Link : sqlzoo.net/wiki/More_JOIN_operations
Database Information: sqlzoo.net/wiki/More_details_about_the_database.
Q2: Pyspark
Is there a significant relationship between the age of a customer and their primary spending category?
To explore this, we have three tables:
Customer Table:
Fields: cust_id, cust_age, cust_name, cust_income
Credit Card Transactions Table:
Fields: transaction_id (Primary Key), cust_id, date, mcc (Merchant Category Code - 4 digit code), amount
Merchant Table:
Fields: mcc, mcc_desc (Merchant Category Description)
We aim to determine whether there is a correlation between customer age and the merchant categories where they predominantly spend.
Data:
# Sample data for the Customer table
customer_data = [
Row(cust_id=1, cust_age=25, cust_name='Alice', cust_income=50000),
Row(cust_id=2, cust_age=30, cust_name='Bob', cust_income=60000),
Row(cust_id=3, cust_age=45, cust_name='Charlie', cust_income=70000),
Row(cust_id=4, cust_age=30, cust_name='David', cust_income=80000),
Row(cust_id=5, cust_age=55, cust_name='Eve', cust_income=90000)
]
# Sample data for the Credit Card Transactions table
transactions_data = [
Row(transaction_id=1, cust_id=1, date='2024-01-01', mcc=4295, amount=150),
Row(transaction_id=2, cust_id=1, date='2024-01-05', mcc=6348, amount=200),
Row(transaction_id=3, cust_id=2, date='2024-01-02', mcc=4295, amount=300),
Row(transaction_id=4, cust_id=2, date='2024-01-06', mcc=4295, amount=100),
Row(transaction_id=5, cust_id=3, date='2024-01-03', mcc=6348, amount=250),
Row(transaction_id=6, cust_id=4, date='2024-01-04', mcc=6348, amount=300),
Row(transaction_id=7, cust_id=4, date='2024-01-07', mcc=4295, amount=400),
Row(transaction_id=8, cust_id=5, date='2024-01-08', mcc=6348, amount=500)
]
# Sample data for the Merchant table
merchant_data = [
Row(mcc=4295, mcc_desc='airline'),
Row(mcc=6348, mcc_desc='hotel')
]
#EXL #mnc #dataengineering #bigdata #bigdataengineer #interviewquestions #interview

Відео

MNC asked Interview Question #mnc #accenture #ey #cognizant #pwc #deloitte #DE #interview

12:01

MNC asked Interview Question #mnc #accenture #ey #cognizant #pwc #deloitte #DE #interview

Переглядів 271Місяць тому

This video talks about following question: You are working on a large dataset that contains transaction records for an e-commerce platform. The dataset is stored in a distributed file system . You need to perform a series of transformations and aggregations to generate a summary report showing the top 10 products by total sales revenue for each category. Transaction DF: transaction_id,product_i...

Capgemini DE interview Questions for 3-4 years of exp. #capgemini #dataengineer #interview

10:09

Capgemini DE interview Questions for 3-4 years of exp. #capgemini #dataengineer #interview

Переглядів 790Місяць тому

Capgemini interview questions: Write a spark script to find the employee with the highest salary in each department? data = [ (1, "John", 1, 5000), (2, "Jane", 1, 6000), (3, "Doe", 2, 5500), (4, "Alice", 2, 7000), (5, "Bob", 3, 4500), (6, "Eve", 3, 6200) ] columns = ["employee_id", "employee_name", "department_id", "salary"] employeeDF = spark.createDataFrame(data, columns) Question 2: given a ...

Deloitte data engineer interview Questions for 2-3 years of exp. #deloitte #dataengineer #interview

12:31

Deloitte data engineer interview Questions for 2-3 years of exp. #deloitte #dataengineer #interview

Переглядів 1,6 тис.2 місяці тому

Deloitte Interview Questions: Question 1: You are working on a data analysis project where you need to analyze a dataset containing information about various countries. Your task is to 1) calculate the GDP per capita of these countries, rounded to the nearest integer, 2) identify the countries with the minimum and maximum GDP per capita. The GDP per capita should be calculated as (GDP / Populat...

Questions asked in Tiger Analytics TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

14:45

Questions asked in Tiger Analytics TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

Переглядів 7083 місяці тому

Implement a solution to automatically assign ages where the eldest adult in the family is paired with the youngest child, ensuring proper guardianship within the family structure. Sample Dataset: from pyspark.sql.functions import col, dense_rank, row_number data = [("John", "Adult", 35), ("Alice", "Adult", 30), ("Bob", "Child", 10), ("Charlie", "Child", 8), ("ben", "Child", 12), ("David", "Adul...

Databricks Snowflake ETL Project || Sample Project for Data Engineer || #databricks #snowflake #etl

9:49

Databricks Snowflake ETL Project || Sample Project for Data Engineer || #databricks #snowflake #etl

Переглядів 6434 місяці тому

In this video I have demonstrated a Basic ETL project having source - CSV file in DBFS Databricks Notebook for ingestion and transformation . Target: Snowflake Go through this video and try to understand how the ETL project look like. Notebook Url : databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5527220668214488/1108029397787502/7860462939115290/latest.h...

Questions asked in DELOITTE TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

11:16

Questions asked in DELOITTE TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

Переглядів 5964 місяці тому

In this video I have discussed Questions below. Q1: What is the difference between managed and external table in Hive? Q2 : Explain what is row based and column based file formats. Q3: Difference between OLAP and OLTP Q4: How does spark performs shuffle operations. Q 5: what is df.explain() and it's significance Q6: Write a sql query to print the respective manager name of employee from employe...

Questions asked in KPMG TO DE - Part 2|| Pyspark || Data Engineer #pyspark #dataengineer

12:21

Questions asked in KPMG TO DE - Part 2|| Pyspark || Data Engineer #pyspark #dataengineer

Переглядів 6354 місяці тому

In this video I have discussed Questions below. Q 11 : What is the difference between map and flatMap? Q 12: What is catalyst Optimizer. Q 13: What are broadcast variables and accumulators in PySpark? Q 14: Explain a scenario where we want to reduce the number of partition but we prefer reparation. Q 15: What is bucketing in spark? To download slide: www.linkedin.com/posts/priyam-jain-0946ab199...

10 Asked Questions in KPMG TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

15:25

10 Asked Questions in KPMG TO DE - Part 1|| Pyspark || Data Engineer #pyspark #dataengineer

Переглядів 1,2 тис.4 місяці тому

In this video I have discussed Questions below. 1. Which one is better to use, Hadoop or Spark? 2. What is the difference between Spark Transformation and Spark Actions? 3. What distinguishes PySpark, Data Bricks, and Elastic Map Reduce? 4. What is Lazy Transformation? 5. What are RDD and Data Frames? 6. What is a Spark Partition? 7. What is shuffling in PySpark? 8. What is DAG in Spark? 9. How...

Data Engineering Interview Experience: Expected Questions & Answers #dataengineering #pyspark

9:39

Data Engineering Interview Experience: Expected Questions & Answers #dataengineering #pyspark

Переглядів 7174 місяці тому

In this video, we discuss the common questions often asked in Data Engineering interviews, ranging from fundamental concepts to practical scenarios: Round 1 and Round 2 expected interview questions guidelines. Whether you're gearing up for your next interview or simply looking to expand your knowledge in data engineering, this video serves as a valuable resource to sharpen your skills and ace t...

Q 19: Amazon pyspark Interview Question | #faang | startascratch #pyspark | #amazon #interview

10:04

Q 19: Amazon pyspark Interview Question | #faang | startascratch #pyspark | #amazon #interview

Переглядів 3434 місяці тому

Check out this platform startascratch. Which is useful to practice pyspark questions. Question. Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users. Check out this video and do let me know your doubts we can connect on link...

Question 18:EXL Self Join Interview Question | EXL | Data Engineer #pyspark | #EXL #interview #mnc

13:41

Question 18:EXL Self Join Interview Question | EXL | Data Engineer #pyspark | #EXL #interview #mnc

Переглядів 9735 місяців тому

Find the manager who is reported by atleast 5 other employees? data = [ (1, 'John Doe', 'IT', 5), (2, 'Jane Smith', 'IT', 5), (3, 'Alice Johnson', 'HR', 4), (4, 'Bob Anderson', 'HR', 4), (5, 'Charlie Brown', 'IT', 6), (6, 'David Lee', 'IT', 7), (7, 'Emma White', 'HR', 8), (8, 'Frank Thomas', 'HR', 2), (9, 'Grace Kelly', 'HR', 2), (10, 'Henry Ford', 'HR', 2), (11, 'Ivy Chen', 'IT', 6), (12, 'Jac...

Question 17: Nagarro DE interview questions part3 | data engineer | #pyspark #nagarro #bigdata

5:34

Question 17: Nagarro DE interview questions part3 | data engineer | #pyspark #nagarro #bigdata

Переглядів 5426 місяців тому

Calculate the average price for each medication category. Relevant columns: medications (medication_id, medication_name, category_id) medication_prices (medication_id, price) medications_data = [ (1, "Medication A", 1), (2, "Medication B", 1), (3, "Medication C", 2), (4, "Medication D", 2), (5, "Medication E", 3) ] medication_prices_data = [ (1, 10.50), (2, 20.75), (3, 15.25), (4, 25.00), (5, 1...

Question 16: Nagarro DE interview questions part2 | data engineer | #pyspark #nagarro #bigdata

6:37

Question 16: Nagarro DE interview questions part2 | data engineer | #pyspark #nagarro #bigdata

Переглядів 8676 місяців тому

In this video I have discussed on Interview question asked in Nagarro interview for data engineers. Find the medications that were prescribed by at least three different doctors. Relevant DF: df1 = medications (medication_id, medication_name), df2 = prescriptions (prescription_id, doctor_id, medication_id) medications_data = [ (1, "Medication A"), (2, "Medication B"), (3, "Medication C"), (4, "...

Question 15: Nagarro DE interview questions part1 | data engineer | #pyspark #nagarro #bigdata

9:41

Question 15: Nagarro DE interview questions part1 | data engineer | #pyspark #nagarro #bigdata

Переглядів 1,8 тис.7 місяців тому

In this video I have discussed on Interview question asked in Nagarro interview for data engineers. List the airlines that operate flights to all available destinations. airlines_data = [ (1, "Airline A"), (2, "Airline B"), (3, "Airline C"), ] flights_data = [ (1, 1, 101), (2, 1, 102), (3, 2, 101), (4, 2, 103), (5, 3, 101), (6, 3, 102), (7, 3, 103) ] airlines_df = spark.createDataFrame(airlines...

Question 14: Interview question for data engineers #json #pyspark #databricks #azure

11:44

Question 14: Interview question for data engineers #json #pyspark #databricks #azure

Переглядів 3607 місяців тому

Question 14: Interview question for data engineers #json #pyspark #databricks #azure

Spark memory management | OOM in executors | Interview questions #pyspark #interview

22:10

Spark memory management | OOM in executors | Interview questions #pyspark #interview

Переглядів 7847 місяців тому

Spark memory management | OOM in executors | Interview questions #pyspark #interview

Most asked interview question in big data engineer interview | OOM in spark part 1 | #pyspark

11:59

Most asked interview question in big data engineer interview | OOM in spark part 1 | #pyspark

Переглядів 6047 місяців тому

Most asked interview question in big data engineer interview | OOM in spark part 1 | #pyspark

Question 13: KPMG Interview Questions part 2| data engineers | groupBy #pyspark #KPMG #big4

8:49

Question 13: KPMG Interview Questions part 2| data engineers | groupBy #pyspark #KPMG #big4

Переглядів 6097 місяців тому

Question 13: KPMG Interview Questions part 2| data engineers | groupBy #pyspark #KPMG #big4

Question 12: KPMG Interview Questions part 1| data engineers | Unpivot #pyspark #KPMG #big4

6:26

Question 12: KPMG Interview Questions part 1| data engineers | Unpivot #pyspark #KPMG #big4

Переглядів 1 тис.7 місяців тому

Question 12: KPMG Interview Questions part 1| data engineers | Unpivot #pyspark #KPMG #big4

Partitioning vs Bucketing | Interview Question | PySpark #pyspark #bigdata #pwc #interview

12:54

Partitioning vs Bucketing | Interview Question | PySpark #pyspark #bigdata #pwc #interview

Переглядів 3,1 тис.7 місяців тому

Partitioning vs Bucketing | Interview Question | PySpark #pyspark #bigdata #pwc #interview

Question 11: PWC Interview Questions part 2| data engineers | #pyspark #bigdata #pwc #interview

6:05

Question 11: PWC Interview Questions part 2| data engineers | #pyspark #bigdata #pwc #interview

Переглядів 8027 місяців тому

Question 11: PWC Interview Questions part 2| data engineers | #pyspark #bigdata #pwc #interview

Question 10: PWC Interview Questions | data engineers | #pyspark #bigdata #pwc #interview

11:34

Question 10: PWC Interview Questions | data engineers | #pyspark #bigdata #pwc #interview

Переглядів 4,1 тис.7 місяців тому

Question 10: PWC Interview Questions | data engineers | #pyspark #bigdata #pwc #interview

Question 9: Deloitte Interview Questions | data engineers | #pyspark #bigdata #deloitte #interview

9:03

Question 9: Deloitte Interview Questions | data engineers | #pyspark #bigdata #deloitte #interview

Переглядів 3,2 тис.7 місяців тому

Question 9: Deloitte Interview Questions | data engineers | #pyspark #bigdata #deloitte #interview

Question 8: #Interview questions on Word count of complex Dataset in pyspark #big4 #mnc

14:24

Question 8: #Interview questions on Word count of complex Dataset in pyspark #big4 #mnc

Переглядів 4167 місяців тому

Question 8: #Interview questions on Word count of complex Dataset in pyspark #big4 #mnc

Question 7: #Interview questions on #groupby #collect_list #dataframe in pyspark #interviewquestions

8:00

Question 7: #Interview questions on #groupby #collect_list #dataframe in pyspark #interviewquestions

Переглядів 4447 місяців тому

Question 7: #Interview questions on #groupby #collect_list #dataframe in pyspark #interviewquestions

How to save data in #pyspark || Different formats and options to save data #interviewquestions.

15:28

How to save data in #pyspark || Different formats and options to save data #interviewquestions.

Переглядів 3458 місяців тому

How to save data in #pyspark || Different formats and options to save data #interviewquestions.

Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates

13:58

Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates

Переглядів 3878 місяців тому

Question 6: #Interview questions on #joins #groupby in pyspark #insurance #aggregates

Question 5: #Interview questions on #windowfunctions in pyspark #insurance #withcolumn

9:28

Question 5: #Interview questions on #windowfunctions in pyspark #insurance #withcolumn

Переглядів 3618 місяців тому

Question 5: #Interview questions on #windowfunctions in pyspark #insurance #withcolumn

Question 4: #Interview questions on #pyspark including #joins #groupby #when #bigdata

12:56

Question 4: #Interview questions on #pyspark including #joins #groupby #when #bigdata

Переглядів 4318 місяців тому

Question 4: #Interview questions on #pyspark including #joins #groupby #when #bigdata

КОМЕНТАРІ

@subedi04 Місяць тому
can you also share SQL from snowflake side to create those tables
@pysparkpulse Місяць тому
Sure will share
@sathyamoorthy2362 Місяць тому
For 2 question it doesn't mention to calculate sum for previous day and current day ,I beleive it should be per policy type sum not specific days ,please check. If you have avg why we need sum because sum=2*avg?
@pysparkpulse Місяць тому
We can use that too
@tanmaykapil5362 Місяць тому
expecting many more interview questions from you till now i have enjoyed every session of yours.
@pysparkpulse Місяць тому
Sure I am trying to gather questions and make video on that. Glad you liked it
@ourgourmetkitchen1774 2 місяці тому
great video, really helpful
@pysparkpulse 2 місяці тому
Glad it was helpful!
@dolstoi4206 2 місяці тому
thanks for the video, but where are the map_key and map_value functions?
@pysparkpulse 2 місяці тому
Will create another video on it thanks for highlighting
@uandmahesh6096 2 місяці тому
Hi bro.. I like your videos a lot as tgey are always very informative. I have 1 dbt like can we answer this question in SQL in interviews
@pysparkpulse 2 місяці тому
Yes
@dineshughade6741 2 місяці тому
Could you share that ppt with us it would be great then
@pysparkpulse 2 місяці тому
Please check the below link www.linkedin.com/feed/update/urn:li:activity:7160308705211645952?updateEntityUrn=urn%3Ali%3Afs_feedUpdate%3A%28V2%2Curn%3Ali%3Aactivity%3A7160308705211645952%29
@ShubhamRai06 2 місяці тому
from pyspark.sql.functions import avg medications_data = [ (1, "Medication A", 1), (2, "Medication B", 1), (3, "Medication C", 2), (4, "Medication D", 2), (5, "Medication E", 3) ] medication_prices_data = [ (1, 10.50), (2, 20.75), (3, 15.25), (4, 25.00), (5, 12.50) ] medications_df = spark.createDataFrame(medications_data, ["medication_id", "medication_name", "category_id"]) medication_prices_df = spark.createDataFrame(medication_prices_data, ["medication_id", "price"]) # medications_df.show() # medication_prices_df.show() joined_df=medications_df.join(medication_prices_df,medications_df.medication_id==medication_prices_df.medication_id).select(medications_df.medication_name,medications_df.category_id,medication_prices_df.medication_id,medication_prices_df.price) # joined_df.show() # joined_df.printSchema() result_df=joined_df.groupBy("category_id").agg(avg("price").alias("Avg Price")) result_df.show()
@ShubhamRai06 2 місяці тому
# Find the medications that were prescribed by at least three different doctors. # Relevant DF: # df1 = medications (medication_id, medication_name), # df2 = prescriptions (prescription_id, doctor_id, medication_id) medications_data = [ (1, "Medication A"), (2, "Medication B"), (3, "Medication C"), (4, "Medication D"), (5, "Medication E") ] prescriptions_data = [ (1, 1, 1), (2, 2, 1), (3, 3, 1), (4, 1, 2), (5, 2, 2), (6, 3, 2), (7, 1, 3), (8, 2, 4), (9, 3, 4), (10, 4, 5), (11, 5, 5), (12, 6, 5) ] medications_df = spark.createDataFrame(medications_data, ["medication_id", "medication_name"]) prescriptions_df = spark.createDataFrame(prescriptions_data, ["prescription_id", "doctor_id", "medication_id"]) # medications_df.show() # medications_df.printSchema() # prescriptions_df.show() # prescriptions_df.printSchema() Most_pres_medication_df = prescriptions_df.groupBy('medication_id').count().filter('count>="3"') Most_pres_medication_df.cache().count() medications_df.cache().count() final_df=Most_pres_medication_df.join(medications_df,medications_df.medication_id==Most_pres_medication_df.medication_id,"inner").drop(Most_pres_medication_df.medication_id) final_df.show()
@mr.chicomalo4003 2 місяці тому
QUESTION1 ------------------- print('INPUT DATAFRAME') data=[(1,"Alice","123 Main Street,New York,USA"),(2,"Bob","456 Oak Avenue,San Francisco,USA"), (3,"Carol","789 Pine Road,Los Angeles,USA"),(4,"David","321 Elm Lane,Chicago,USA"), (5,"Emily","654 Maple Drive,Miami,USA")] schema=("emp_id","emp_name","emp_add") df=spark.createDataFrame(data,schema) df.show(truncate=False) print('OUTPUT DATAFRAME') from pyspark.sql.functions import split df1=df.withColumn("street_name",split("emp_add",",")[0])\ .withColumn("city_name",split("emp_add",",")[1])\ .withColumn("country_name",split("emp_add",",")[2]).drop("emp_add").show(truncate=False) print('O/p by using getItem() function') print('In PySpark, the getItem() function is used to retrieve an element from an array or a value from a map column within a DataFrame') df2=df.withColumn("street_name",split("emp_add",",").getItem(0))\ .withColumn("city_name",split("emp_add",",").getItem(1))\ .withColumn("country_name",split("emp_add",",").getItem(2)).drop("emp_add").show(truncate=False) THEORY -------------- In PySpark, the getItem() function is used to retrieve an element from an array or a value from a map column within a DataFrame. The split() function returns an array column in PySpark, which can be further processed using array functions or exploded into multiple rows if needed.Each element of the array corresponds to a substring resulting from the split operation.The limit argument in split() specifies the maximum number of splits to perform. This is particularly useful when you want to limit the number of resulting substrings after splitting the original string. QUESTION2 -------------------- from pyspark.sql.functions import row_number,col from pyspark.sql.window import Window print('INPUT DATASET') data = [ (1, 'Math', 90), (1, 'Science', 93), (1, 'History', 85), (2, 'Math', 85), (2, 'Science', 79), (2, 'History', 96), (3, 'Math', 95), (3, 'Science', 87), (3, 'History', 77), (4, 'Math', 78), (4, 'Science', 91), (4, 'History', 90), (5, 'Math', 92), (5, 'Science', 84), (5, 'History', 88), ] schema=("id","subject","Marks") df=spark.createDataFrame(data,schema) df.show() windowSpec = Window.partitionBy("subject").orderBy(col("Marks").desc()) df1 = df.withColumn("row_number",row_number().over(windowSpec)) print("OUTPUT AFTER APPLYING ROW_NUMBER() WINDOW FUNCTION") df1.show(truncate = False) print("OUTPUT OF STUDENT ID WHO SCORED TOP IN EACH SUBJECT") df1.filter(df1.row_number == 1).select("id","subject").show() THEORY ------------- PySpark Window Ranking functions row_number() window function gives the sequential row number starting from 1 to the result of each window partition. rank() window function provides a rank to the result within a window partition. This function leaves gaps in rank when there are ties. dense_rank() window function is used to get the result with rank of rows within a window partition without any gaps. This is similar to rank() function difference being rank function leaves gaps in rank when there are ties. This is the same as the LAG function in SQL. The lag() function allows you to access a previous row’s value within the partition based on a specified offset. It retrieves the column value from the previous row, which can be helpful for comparative analysis or calculating differences between consecutive rows. This is the same as the LEAD function in SQL. Similar to lag(), the lead() function retrieves the column value from the following row within the partition based on a specified offset. It helps in accessing subsequent row values for comparison or predictive analysis.
@rockroll28 3 місяці тому
Good information. Constructive criticism: You were explaining too fast. Chart explained can be part of 1 video and practical can be in another video. This way 2 videos of 10 to 12 minutes could have been helpful. Best of luck 👍🏻
@pysparkpulse 3 місяці тому
Thank you for your feedback will keep this in mind ☺️
@abhishekmalvadkar206 3 місяці тому
very well explained 👏
@pysparkpulse 3 місяці тому
Thank you Abhishek
@nidhisingh5674 3 місяці тому
You're doing a great job👏
@pysparkpulse 3 місяці тому
Thank you @nidhisingh5674
@bolisettisaisatwik2198 4 місяці тому
We can also order by date and then filter the latest date.
@pysparkpulse 4 місяці тому
yes right
@Paruu16 4 місяці тому
bhai data bhi description pr dal dia kro will be timesaviing while practicing
@pysparkpulse 4 місяці тому
Yes sure bro in the later videos i did this
@Paruu16 4 місяці тому
I used this for full address however the column shows up with null values. df=df.withColumn("FullAdress",concat(col("city")+col("Country")))
@pysparkpulse 4 місяці тому
Hi @paruu16 please don't use + and use comma. This is a syntax error
@Paruu16 4 місяці тому
bro can you please also post the data required to do these questions as well in the description.
@pysparkpulse 4 місяці тому
Yes sure
@Manojkumar__ 4 місяці тому
pls try to improve sound quality.. 😢
@pysparkpulse 4 місяці тому
Sure I will do this thank for your feedback
@eternalsilvers5961 4 місяці тому
If Execution Memory and Storage Memory both supports data spilling into Disks why does the OOM issues occurs?
@pysparkpulse 4 місяці тому
There can be multiple reasons for this even if spill to disk is possible There are some tasks which requires certain execution memory and if it is not present it may lead to oom Or If we cache some bigger data the it may also lead to oom.
@shivamchandan50 4 місяці тому
plz make video on debugging in pyspark
@pysparkpulse 4 місяці тому
Sure 😃
@prabhatgupta6415 4 місяці тому
What tech r being used in your project?
@pysparkpulse 4 місяці тому
Currently i am planning to do it with community Databricks and snowflake
@prabhatgupta6415 4 місяці тому
@@pysparkpulse No I am asking..in ur new company..
@sundipperumallapalli 4 місяці тому
So UnionBy is used when there is Interchanged or Switched/Swapped Columns in a given DataFrames isn't it Sir?. So that It will arrange Columns Linearly
@pysparkpulse 4 місяці тому
Yes right
@sundipperumallapalli 4 місяці тому
@@pysparkpulse Thanks Sir 🔥
@0adarsh101 4 місяці тому
keep uploading videos like this.
@pysparkpulse 4 місяці тому
Sure thank you for your support 😊
@princyjain9323 4 місяці тому
Very helpful lecture sir thank you so much ❤
@pysparkpulse 4 місяці тому
Thank you 😊
@AgamJain-vq2ub 4 місяці тому
Doing good job beta 👏🏻😊
@pysparkpulse 4 місяці тому
Thank you 😊
@avinash7003 4 місяці тому
can you do one real time project
@pysparkpulse 4 місяці тому
Yes I was also thinking about the same will create a project end to end in community Databricks
@avinash7003 4 місяці тому
@@pysparkpulse please atleast 2hours one project should be unique from other UA-camrs
@pysparkpulse 4 місяці тому
Sure definitely it will be unique
@0adarsh101 4 місяці тому
You know any good DE projects available on UA-cam or Udemy which we can do?
@pysparkpulse 4 місяці тому
In udemy there are many projects you can pick any according to the cloud of your choice
@0adarsh101 4 місяці тому
@@pysparkpulse thanks
@0adarsh101 4 місяці тому
you haven't faced any python coding question but still on safer side if we need to prepare for just python coding questions then what all topics of python u will suggest?
@pysparkpulse 4 місяці тому
For python as DE you should be aware about list, dictionary, set functions and oops
@0adarsh101 4 місяці тому
@@pysparkpulse thanks
@souravdas-kt7gg 4 місяці тому
Interview Questions are good
@pysparkpulse 4 місяці тому
Thank you sourav
@prabhatgupta6415 5 місяців тому
Hey Priyam! Explain the process of thoughtwork. How u cleared? Regarding notice period,How to approach, How did you approach. Thanks and appreciated for all the contents
@pysparkpulse 5 місяців тому
@prabhatgupta6415 Thank You for your appreciation. There will be in total 4 rounds. Will create a detailed video on interview procedure and all.
@prabhatgupta6415 5 місяців тому
@@pysparkpulse do mention how did u tackle np. I do have same year of exp as you. Need some guidance on that. Thanks🤗
@pysparkpulse 5 місяців тому
@@prabhatgupta6415 Sure
@mahir14_ 4 місяці тому
Can you make GitHub repo of all questions or store somewhere so it's better to get at one go and practice
@pysparkpulse 4 місяці тому
Sure will do this
@prabhatgupta6415 5 місяців тому
Explain ur interview process
@pysparkpulse 5 місяців тому
Sure
@shwetadawkhar4815 5 місяців тому
How many rounds of interview will be there for Data Engineers?
@pysparkpulse 5 місяців тому
@shwetadawkhar4815 Will be creating a detailed on this on interview process and how to crack DE interviews.
@aadil8409 5 місяців тому
can you pls share the ppt also. it will be very helpful for last time rivison.
@pysparkpulse 5 місяців тому
Sure Will attach it
@rawat7203 6 місяців тому
Hi Sir, pyspark: joinDf = df.alias('a').join(df.alias('b'), col('a.ManagerId')==col('b.Id')) final_df = joinDf.groupBy('b.Id', 'b.Name').agg(count(col('a.Id')).alias('No_of_reportees')).filter(col('No_of_reportees')>=3) final_df.show() %sql with cte as( select ManagerId, count(*) as no_of_reports from office group by ManagerId having no_of_reports >=5) select o.Name as Emp_Name, c.no_of_reports from cte c join office o on c.ManagerId = o.Id Emp_Name no_of_reports Jane Smith 6 David Lee 5
@pysparkpulse 5 місяців тому
Great work keep going 💯
@rawat7203 6 місяців тому
%sql with result as ( select medication_id, count(distinct(doctor_id)) as count_distinct_doc from prescriptions group by medication_id having count_distinct_doc >=3) select m.medication_name from result r join medications m on r.medication_id = m.medication_id
@pysparkpulse 6 місяців тому
Yes correct 💯😀
@rawat7203 6 місяців тому
Thank you Sir
@pysparkpulse 5 місяців тому
Most welcome
@rawat7203 6 місяців тому
Thankyou sir
@pysparkpulse 6 місяців тому
Thank you for your appreciation 😊
@rawat7203 6 місяців тому
Thank you Sir, keep doing the great work
@pysparkpulse 6 місяців тому
Yes thank you 🙂
@swarajrandhvan9057 6 місяців тому
Nice Explanation! Thank you!
@pysparkpulse 6 місяців тому
Thank you @swaraj
@rawat7203 6 місяців тому
Thank you Sir, very nice Qs
@pysparkpulse 6 місяців тому
Thank you @rawat 😊
@rawat7203 6 місяців тому
Sir please provide data
@rawat7203 6 місяців тому
Thank you Sir
@pysparkpulse 6 місяців тому
Thank you 😊
@vishaldeshatwad8690 6 місяців тому
thank you and please make a video on how to explain project in the interview it will really help
@pysparkpulse 6 місяців тому
Sure 😊
@rawat7203 6 місяців тому
df_bonus = df.withColumn("bonus", when((col('department') == 'Sales') & (col('salary') > 50000), 0.10 * col('salary')) .otherwise( when((col('department') == 'Marketing') & (col('salary') > 60000), 0.10 * col('salary')) .otherwise(0.5 * col('salary')) ) ) high_salary_employees = df.filter(col('salary') > 70000).orderBy(col('salary').desc()).limit(5) high_salary_employees.show()
@rawat7203 6 місяців тому
Hi Sir, can you please add the schema and the data in future videos .. Thanks
@pysparkpulse 6 місяців тому
Hi Rawat, yes I am adding these you can check my recent videos
@ganeshmane9012 6 місяців тому
select temp.medication_name, count(temp.doctor_id) dc from (select md.medication_id, medication_name, doctor_id from medications_data md inner join prescriptions_data pd on md.medication_id=pd.medication_id) temp group by temp.medication_name having dc=3 PostgreSQL Solution
@pysparkpulse 6 місяців тому
Great 😃
@prajju8114 6 місяців тому
Do they ask such easy questions? 😂
@pysparkpulse 6 місяців тому
Yes based on your luck and interviewers mood too 😅
@krishna-mx1dx 6 місяців тому
Very helpful, thanks for sharing :)
@pysparkpulse 6 місяців тому
Thank You. Krishna
@gouravjangid1315 7 місяців тому
Can you share other 2 questions here, i like to solve them
@PriyamJain-qp8js 6 місяців тому
ua-cam.com/video/qImZnrgpzZY/v-deo.html
@pysparkpulse 6 місяців тому
Hi Gourav I uploaded both the videos please check
@rawat7203 7 місяців тому
Sir please provide code for initial dataframe

pysparkpulse

КОМЕНТАРІ