10 recently asked Pyspark Interview Questions | Big Data Interview

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

SQL Interview Questions For Data Scientists And Data Engineers - Tips For Practicing SQL Interviews

Удар по російській колоні в Курській області #shorts #війна #курськ #арміярф

КАКУЮ ДВЕРЬ ВЫБРАТЬ? 😂 #Shorts

SCHOOLBOY. Последняя часть🤓

Trending Big Data Interview Question - Number of Partitions in your Spark Dataframe

Sumit Mittal

Переглядів 25 180

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 20 сер 2024

КОМЕНТАРІ • 38

@Anonymous-fe2ep 11 місяців тому ⁺³
Hello Sir, I was asked the following questions for AWS Developer role. Please make a video on this. Thanks.
Q1. We have *sensitive data* coming in from a source and API. Help me design a pipeline to bring in data, clean and transform it and park it.
Q2. So where does pyspark come into play in this?
Q3. Which all libraries will you need to import to run the above glue job?
Q4. What are shared variables in pyspark
Q5. How to optimize glue jobs
Q6. How to protect sensitive data in your data
Q7. How do you identify sensitive information in your data
Q8. How do you provision a S3 bucket?
Q9. How do I check if a file has been changed or deleted?
Q10. How do I protect my file having sensitive data stored in S3
Q11. How does KMS work?
Q12. Do you know S3 glacier?
Q13. Have you worked on S3 glacier?
@DEwithDhairy 7 місяців тому ⁺¹
PySpark Scenario Based Interview Question And Answers:
ua-cam.com/play/PLqGLh1jt697zXpQy8WyyDr194qoCLNg_0.html&si=Ddhve6jjcy0ZvaLV
@arunsundar3739 4 місяці тому
was curious why spark handles smaller files differently, & also had a fixed view that partition size is 128 MB all times, that view of mine is debunked now, beautifully explained , thank you very much sir :)
@Asyouwish145 Місяць тому
I loved your presentation and understand more with 1 video about configuration ❤
@himanshupatidar9413 Рік тому ⁺¹
Thanks for the simplified explanation, please make next video on deciding configuration for our jobs , ex: which one is better config i)10 executors with 4 cores and 4gb ram each or ii) 5 executors with 8 cores and 8 gb ram, there is no proper explanation about this concept anywhere
@tarunpothala6856 Рік тому ⁺¹
Sir,
Great to see such scenarios explained clearly. We would love to watch some interview questions on databricks. Kindly post them.
@soumikdutta77 Рік тому ⁺¹
Insightful and informative concept, thank you Sir for clearing it out with ease ✅
@sumitmittal07 Рік тому
thank you Soumik
@kavyasri6654 Рік тому
Thank you Sir, also please continue advanced sql playlist, I have completed both basic and advanced playlist it's very helpful.
@kirtisingh7698 Рік тому
Thank you Sir for explaining the answers with a scenario. It's really helpful.
@Rajesherukulla Рік тому
Was literally waiting for your video series... Congo for a great start sumit sir.
@siddheshkankal7567 Рік тому
thank you so much for the great explanation in detail, can you discuss more on like many times interviewer might ask you have worked on how much big data size for that what could be cluster configuration, how you decide it, what can be optimized solution, what kind of data and its size, and more on next to next video expectation on spark optimization techniques
@sonurohini6764 5 місяців тому
Good explanation sir. Make a video on possible scenario based questions like this
@bharanidharanm2653 2 місяці тому
3rd scenario is not clear. Are we updating ant congratulation setting to avoid small files problem
@arpittapwal4651 Рік тому
Great explanation as always. Thank you Sumit sir, waiting for much such videos in future 😊
@sumitmittal07 Рік тому
thank you Arpit
@user-pp4pu8kp7v 5 місяців тому
Please continue the series
@AbhishekVerma-hx8bq Рік тому
Excellent explanation, highly informative!
@RohanKumar-mh3pt Рік тому
very insightful please cover more spark internals scenerio based questions
@Momlifeindia Рік тому
Well explained as always. I was asked the same question in one of the interviews.
@sumitmittal07 Рік тому
thats great to know..
@deepakpatil4419 4 місяці тому
Hi Sir, Thankyou for the explanation..
I have a situation, I am executing a databricks pipeline through Airflow. In one of the task, I am writing the data from dataframe to a path ( in parquet file). The writing operation suppose to create the path on daily basis and write the data into the path. Path is being created but after writing, when I am checking the count, it is showing zero. It's not giving any error as well so really difficult to identify the issue.
But, when I am reprocessing the same task then it's writing the data.
@eyecaptur Рік тому
Great explanation sir as always
@25683687 Рік тому
Really very well explained!
@virajjadhav6579 Рік тому
Thank you Sir, the start of the series is great. Do we have to explain each answer with scenarios?
@anandattagasam7037 Рік тому ⁺¹
Hi sir, I wanted to get confirm. are you saying based on cpu cores, number of partition would happen. Like you said for 1GB data, there would be 8 partition due to paralellism then it will be 4 partition, correct. Pls correct me if i am wrong.
@Ronak-Data-Engineer Рік тому
Very well explained
@sufiyaanbhura6343 Рік тому
Thank you sir!
@suvabratagiri9978 4 місяці тому
Where is the next part ?
@ritumdutta2438 Рік тому
A very interesting start of an exciting series :) ... appreciate all your effort .... Just wanted to confirm one thing ... in case of RDD-s the partition size is always 128 mb right (what you explained applies for dataframe/higher level API-s)?
@sumitmittal07 Рік тому ⁺¹
thats correct, in case of rdd.. it depends on the block size of underlying filesystem. in case of hdfs it will be 128 mb.
@localmartian9047 5 місяців тому
@@sumitmittal07And in case of object store/s3, will it be default Parallelism or the number of splits in source file from s3
@vusalbabashov8242 11 місяців тому
In the example I have, I am getting df.rdd.getNumPartitions() equal to 200 which seems to be the default. I have 160 cores available in the cluster. How should we understand this in the light of what you say in the video, I feel like this part is missing. Also, when should we use spark.conf.set("spark.sql.shuffle.partitions", "auto")
@rohitshingare5352 7 місяців тому
in the context of video he just explained about intial stage of partitions, in your case data is get shuffled that why it has created 200 by default partitions.
@dipeshchaudhary2188 3 місяці тому
As per my understanding, 160 tasks will be perfomed parellely and the remaining 40 tasks will wait in queue. And those 40 tasks will be performed when any 40 out of 160 cores are available again.
@sameersam4476 Рік тому
Sir i have watched your SQL complete playlist can i face the sql interview now??
@sumitmittal07 Рік тому
Yes definitely

Наступне

Автоматичне відтворення

10 recently asked Pyspark Interview Questions | Big Data Interview

10 recently asked Pyspark Interview Questions | Big Data Interview

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

SQL Interview Questions For Data Scientists And Data Engineers - Tips For Practicing SQL Interviews

SQL Interview Questions For Data Scientists And Data Engineers - Tips For Practicing SQL Interviews

Удар по російській колоні в Курській області #shorts #війна #курськ #арміярф

Удар по російській колоні в Курській області #shorts #війна #курськ #арміярф

КАКУЮ ДВЕРЬ ВЫБРАТЬ? 😂 #Shorts

КАКУЮ ДВЕРЬ ВЫБРАТЬ? 😂 #Shorts

SCHOOLBOY. Последняя часть🤓

SCHOOLBOY. Последняя часть🤓

От первого лица: Лагерь 😱 УГНАЛИ ЯХТУ 🤯 РАЗГРОМИЛИ ЛАГЕРЬ 🥹 ВЫГНАЛИ из СТРАНЫ 😭 ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Лагерь 😱 УГНАЛИ ЯХТУ 🤯 РАЗГРОМИЛИ ЛАГЕРЬ 🥹 ВЫГНАЛИ из СТРАНЫ 😭 ГЛАЗАМИ ШКОЛЬНИКА

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

Question 9: Deloitte Interview Questions | data engineers | #pyspark #bigdata #deloitte #interview

Question 9: Deloitte Interview Questions | data engineers | #pyspark #bigdata #deloitte #interview

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

Master Reading Spark DAGs

Master Reading Spark DAGs

Pyspark Advanced interview questions part 1 #Databricks #PysparkInterviewQuestions #DeltaLake

Pyspark Advanced interview questions part 1 #Databricks #PysparkInterviewQuestions #DeltaLake

Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios

Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios

КАПИТАНА С КУРСКОЙ ОБЛАСТИ, НЕ ПРИЗНАЮТ РОДИТЕЛИ. РОМА ИЛИ НЕ РОМА? @dmytrokarpenko

КАПИТАНА С КУРСКОЙ ОБЛАСТИ, НЕ ПРИЗНАЮТ РОДИТЕЛИ. РОМА ИЛИ НЕ РОМА? @dmytrokarpenko

«Це край, де я родилася й живу! І нікуди я звідси не поїду» #shortvideo

«Це край, де я родилася й живу! І нікуди я звідси не поїду» #shortvideo

아이스크림으로 진짜 친구 구별하는법

아이스크림으로 진짜 친구 구별하는법

ПОДІЇ НА КУРЩИНІ: ЕКСКЛЮЗИВНИЙ РЕПОРТАЖ НАТАЛІ НАГОРНОЇ З СУДЖІ

ПОДІЇ НА КУРЩИНІ: ЕКСКЛЮЗИВНИЙ РЕПОРТАЖ НАТАЛІ НАГОРНОЇ З СУДЖІ

От первого лица: Лагерь 😱 УГНАЛИ ЯХТУ 🤯 РАЗГРОМИЛИ ЛАГЕРЬ 🥹 ВЫГНАЛИ из СТРАНЫ 😭 ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Лагерь 😱 УГНАЛИ ЯХТУ 🤯 РАЗГРОМИЛИ ЛАГЕРЬ 🥹 ВЫГНАЛИ из СТРАНЫ 😭 ГЛАЗАМИ ШКОЛЬНИКА

Буданов про плани на Курську область, та Воронеж #shorts #курск #воронеж

Буданов про плани на Курську область, та Воронеж #shorts #курск #воронеж

So brutal REVENGE 😂😭🔥 @BrutalAssaultOFFICIAL #youtube #festival #comedy #metal #corpsepaint

So brutal REVENGE 😂😭🔥 @BrutalAssaultOFFICIAL #youtube #festival #comedy #metal #corpsepaint