Hi Siva. Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few UA-cam videos it mentions it never expires but is tied to the specific version of Spark. Could you please help out on this. I can't seem to find clarification anywhere.
Thank you so much for the book recommendation , I would also highly recommend the same book and also make your own notes from the book. It took me 3 weeks of preparation to pass the exam. Thank you so much 🙏🏻
Thanks so much for this video! I read through "The Definitive Guide" and felt ok, but not super confident, I watched this (and some of your other videos) in the week leading up to the exam, and I just passed!
@@AdvancingAnalytics Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few UA-cam videos it mentions it never expires but is tied to the specific version of Spark. Could you please help out on this. I can't seem to find clarification anywhere.
Superb explanation with much clarify. Not seen anything like this any tutorial. Thanks for posting it. We need more from you 👌👌👏👏 I will refer this channel for my whole office team.
The exam does not require an external webcam when given on Laptops. This video gave me some good points for exam day. Appreciate the work being done here👍🏻
Awesome pictorial explanation of the physical architecture. The explanation of slots and how they relate to tasks was super enlightening. Thank you very much!!! :)
Dewei Zhai, Databricks also recently published the actual PDF version of the spark doc you see in the exam here: www.webassessor.com/zz/DATABRICKS/Python_v2.html
Thanks for the informative video. I am preparing for the Spark Scala certification and felt Python API docs is much better than Scala API which is having a lot of information and examples
Hey! You will have one spark job, but that job will have multiple stages. Each time you see a stage, it means there is a shuffle. So a join/filter/group transformation could have two shuffles, one if the join is wide, one if the group is wide. You would have one job and three stages in this case. Hope that makes sense!
I don't believe it has a code - it's a certification backed by Databricks not Microsoft. I had a skim of the website, my purchase, the exam certificate etc and they all refer to it as "Databricks Certified Associate Developer for Apache Spark 3.0" - no code to be found! academy.databricks.com/exam/databricks-certified-associate-developer
Can't go into actual questions but the exam is focused on the DataFrame API so there's no driver for low level API commands. Understanding how data stores RDDs & how different DataFrame transformations impact RDDs behind the scenes should put you in the right place!
@@AdvancingAnalytics I want to share with you that I passed the exam!! =D thank you for all your videos about databricks. It helped me a lot to complete my learning!
Hi guys, can you please let me know if there were questions on Delta lake. I will be giving the exam in less than 2 weeks. I was planning to write 2.4 first and then write 3.0. only difference between them portion wise is Delta Lake.
Not at the time - had to get really good at scrolling :D - that said, pyspark docs have changed quite a bit since this video, not sure if the format for the exam has been changed to keep up to date!
Hi! Great content on Your channel. I was wondering if You could make a certificate comparison of Associate Developer and Associate Data Engineer (not the professional DE) in terms of what materials one should add to prepare for the Associate DE exam. Cheers! Edit: Would be nice to see your thoughts about Professional DE cert as well :)
Good suggestion, I've not dug into the various new certifications since making this video, probably worth revisiting now there's such a range out there. I should also probably actually run through the Professional Data Engineer cert at some point too! :D Simon
Ooh - I hadn't even seen the "Certified Professional Data Engineer" course was introduced! I haven't taken the exam, if/when I do, I'll make a video! Simon
Can an executor span across multiple worked nodes? Lets say if during spark submit I asked for 4 executors and 4 cores, and the cluster has 8 nodes (2 core each), would the "logical" executor theorectically be spanned across nodes? OR each executor will be granted 2 cores only?
Don't believe an executor can span across machines/nodes. Lots of managed spark platforms assume a single executor per node, as there's not much benefit of splitting a node across multiple workers
Hey, great content! Quick question: Did you have any questions on Spark MLlib that required understanding of the actual algorithms or.. at all? Thanks for the info!
Hola! I've not seen any practice tests, although there may be some around! As for actual practice/preparation - Databricks have a free community edition, it's a single-node public cluster, but great for practicing: databricks.com/try-databricks
browsing documentation is allowed? Because they are providing pdf.. So I am wondering if that same document is allowed to search in browser... Thanks for this video, lots of information
I don't recall there being a search mechanism, everything is embedded in the testing program. Better to just be familiar with the docs and good at scrolling! :)
@@AdvancingAnalytics Hi! Just a kick question... How is the pdf version of the documentation organized? Is it divided by modules and each module with their classes, methods and attributes... or...? I don't know, any tips? :)
Yep, there are two different flavours of the exam, one for Scala and one for Pyspark. From what we've heard, the Scala one is slightly harder as the documentation is a little harder to navigate, but if you're familiar with the Scala docs it'll be fine! Simon
Thank you for making this video. I have 2 questions 1) Will there be questions with more than 1 option correct? 2) do they negative marks for incorrect questions?
I honestly cannot recall if there are options will multiple correct answers, hopefully someone else can help! There are no negative marks for incorrect questions.
Hey - a single task can only run on one slot. That means a slot cannot spread across multiple workers (which makes sense as it's data held in memory). So the size of your RDD blocks / Tasks affects how neatly you can utilise the available slots across your workers. Too chunky and they don't spread evenly, too small and there's an overhead of accessing each task and things slow down. It's a tricky balance :) Simon
Any suggestions on how to practice. Understanding the concepts is one thing but until you have practiced on some sample questions, or problem statements, its bit tough to get level to confidence to appear for exam
Hey, sorry - missed this during the break. The best way to practice is to spin up the Databricks community edition - it's a free learning environment! The Databricks docs have a ton of example notebooks that you can import & work through the code with. After that, pick up a personal project & work it through in anger. I'm definitely a "don't learn it till I try it out myself" person! Simon
Hi Simon: Are you aware of any full length practice exams for the DataBricks certification. I would like to take one of those mock exams before diving in. Thanks
This is a great video! I have a question since this exam will only test the data frames API, should we go through all the Pyspark functions, or just the data frames and SQL functions are required? Thanks!! Expecting more videos of such from you. :)
Hey - the only spark cert I'm really aware of it the Databricks Certified Associate Developer one, you've got a choice of Scala & Python, but it's generally a good overview of the tool, digs into understanding of the engine/architecture etc - academy.databricks.com/exam/databricks-certified-associate-developer
If there are 8 cores available in total in worker nodes and spark default shuffle partitions is 200, what happens? How does 200 make sense when only 8 slots are available? Pls explain. Thanks
The 200 tasks are allocated across the workers, the slots will chunk through the tasks (so each of the 8 slots will likely process 25 tasks). So you generally want the default partitions to be a clean multiple of the number of cores as a rule of thumb. But yeah, it's likely that the 200 default isn't right for that size cluster. The modern spark engine (Spark 3.0 / Databricks runtime 7+) uses a few techniques to override the default during query execution and actually pick an appropriate number of shuffle partitions :)
They list an external camera as a requirement - they ask you to set it up so they can see you from the side, including your screen. Internal laptop cam might disqualify you, not sure!
Depends on your level of spark experience! If you've been using spark most days for a year or so, you'll get by with a day or two of refreshing & cramming. If you're new to spark, it could take a couple of weeks of research, learning & revising. It's very hard to say!
I would argue that some of the architecture questions in the exam are quite tricky, even if you have been working with Spark for a while. So, differing from Simon, I would say that you need at least a week of review, even if you have been using Spark for a while.
Wonderful explanation .. in only 20 minutes .. god bless you, Simon !
I got lot of information from this video which helped me to pass the certification today.. thank you
Wahey! Congrats on passing!
Is it 200$ for one attempt?
@@sundarkris1320 correct, as of today. If you do not pass the exam, you will have to pay $200 again to retake it.
Can you help me to pass this exam
Hi Siva.
Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few UA-cam videos it mentions it never expires but is tied to the specific version of Spark.
Could you please help out on this. I can't seem to find clarification anywhere.
Thank you so much for the book recommendation , I would also highly recommend the same book and also make your own notes from the book. It took me 3 weeks of preparation to pass the exam.
Thank you so much 🙏🏻
Thank you a lot for this video. I am taking the exam on Wednesday. Keep your fingers crossed for me! :)
I nailed it. If you follow these advices, you will surely pass it.
Thanks so much for this video! I read through "The Definitive Guide" and felt ok, but not super confident, I watched this (and some of your other videos) in the week leading up to the exam, and I just passed!
Woohoo! Congrats on passing - glad our videos helped!
@@AdvancingAnalytics
Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few UA-cam videos it mentions it never expires but is tied to the specific version of Spark.
Could you please help out on this. I can't seem to find clarification anywhere.
Thanks! Great video! I loved Spark Architecture's explanation (4:19)
Learning Spark with David Guetta, tomorrow is my assessment, I hope approve 🍀
Passed!
Best explanation I've ever seen
Superb explanation with much clarify. Not seen anything like this any tutorial.
Thanks for posting it. We need more from you 👌👌👏👏
I will refer this channel for my whole office team.
The exam does not require an external webcam when given on Laptops. This video gave me some good points for exam day. Appreciate the work being done here👍🏻
Ah cool - it was stated on the instructions when I originally took it, guess they've relaxed as the world has gone more remote :)
This video helped me a lot to take the exam, thank youuu!!!
Good morning!
Could you explain better how do you define the ideal number of partitions on a shuffle setting?
Awesome pictorial explanation of the physical architecture. The explanation of slots and how they relate to tasks was super enlightening. Thank you very much!!! :)
Nice simple explanation to help map out my certification journey. Thanks!
This is great. Thank you so much for posting such helpful information!
Too good , now i got enough confidence to hit the exam. Thank you
How to access the notebook being shown in demo
Thanks, I didn't even notice that there is a pdf of spark doc to use in the exam!
Dewei Zhai, Databricks also recently published the actual PDF version of the spark doc you see in the exam here: www.webassessor.com/zz/DATABRICKS/Python_v2.html
It helps me a lot on the prep of certification on Spark 3.0 thanks!
Any tips on practice material besides definitive guide and official docs
Any inputs on the resources to help prepare for Databricks Professional Data Engineer certification? Genuinely appreciate the inputs !!
Thanks for the informative video. I am preparing for the Spark Scala certification and felt Python API docs is much better than Scala API which is having a lot of information and examples
Excellent
Thank you man :D
It's long time that I am looking the explanation about slot.please safe me
so if i make a join then filter then group is a job where i have to shuffle ?
Hey! You will have one spark job, but that job will have multiple stages. Each time you see a stage, it means there is a shuffle.
So a join/filter/group transformation could have two shuffles, one if the join is wide, one if the group is wide. You would have one job and three stages in this case. Hope that makes sense!
Please share the exam code no for spark 3.0
I don't believe it has a code - it's a certification backed by Databricks not Microsoft. I had a skim of the website, my purchase, the exam certificate etc and they all refer to it as "Databricks Certified Associate Developer for Apache Spark 3.0" - no code to be found!
academy.databricks.com/exam/databricks-certified-associate-developer
Excellent! How about Low level APIs ? RDDs ? are there questions about that? Thank you..
Can't go into actual questions but the exam is focused on the DataFrame API so there's no driver for low level API commands. Understanding how data stores RDDs & how different DataFrame transformations impact RDDs behind the scenes should put you in the right place!
@@AdvancingAnalytics thanks you for your time and anwser :)
@@AdvancingAnalytics I want to share with you that I passed the exam!! =D thank you for all your videos about databricks. It helped me a lot to complete my learning!
@@adrianajimenez523 woohoo! That's great to hear, congratulations! Glad the videos helped :)
Simon
Hi guys, can you please let me know if there were questions on Delta lake. I will be giving the exam in less than 2 weeks. I was planning to write 2.4 first and then write 3.0. only difference between them portion wise is Delta Lake.
is it for fresher who doesn't know anything about spark , do we need any prior experience before giving the exam.
Can you use ctr+f or some other search functionality on the pdf provided ?
Not at the time - had to get really good at scrolling :D - that said, pyspark docs have changed quite a bit since this video, not sure if the format for the exam has been changed to keep up to date!
This was a fantastic video - thank you so much for sharing this content! Subscribed!
Thanks for subscribing. I am glad it helped.
hello, do you know about this other certification?
Databricks Certified Professional Data Engineer
Hi!
Great content on Your channel.
I was wondering if You could make a certificate comparison of Associate Developer and Associate Data Engineer (not the professional DE) in terms of what materials one should add to prepare for the Associate DE exam.
Cheers!
Edit: Would be nice to see your thoughts about Professional DE cert as well :)
Good suggestion, I've not dug into the various new certifications since making this video, probably worth revisiting now there's such a range out there. I should also probably actually run through the Professional Data Engineer cert at some point too! :D
Simon
@@AdvancingAnalytics That would be great. Your material is always very helpful!
This applies only for developer associate correct ? Could you please share details for developer professional ?
Ooh - I hadn't even seen the "Certified Professional Data Engineer" course was introduced! I haven't taken the exam, if/when I do, I'll make a video!
Simon
is the exam available only online?do we have any test centres to take the exam
Can an executor span across multiple worked nodes?
Lets say if during spark submit I asked for 4 executors and 4 cores, and the cluster has 8 nodes (2 core each), would the "logical" executor theorectically be spanned across nodes? OR each executor will be granted 2 cores only?
Don't believe an executor can span across machines/nodes. Lots of managed spark platforms assume a single executor per node, as there's not much benefit of splitting a node across multiple workers
Hey, great content! Quick question: Did you have any questions on Spark MLlib that required understanding of the actual algorithms or.. at all? Thanks for the info!
Nope, there's no requirement for knowing the data science libraries, pure spark engineering!
Could you please suggest how or from where to practice the format of this test, to be prepared with managing time.
Hola! I've not seen any practice tests, although there may be some around! As for actual practice/preparation - Databricks have a free community edition, it's a single-node public cluster, but great for practicing:
databricks.com/try-databricks
browsing documentation is allowed? Because they are providing pdf.. So I am wondering if that same document is allowed to search in browser... Thanks for this video, lots of information
I don't recall there being a search mechanism, everything is embedded in the testing program. Better to just be familiar with the docs and good at scrolling! :)
@@AdvancingAnalytics Hi! Just a kick question... How is the pdf version of the documentation organized? Is it divided by modules and each module with their classes, methods and attributes... or...? I don't know, any tips? :)
@@veraclmartins Did you get some awnser?
What if I know Scala spark and not pyspark does the exam consider this .
Yep, there are two different flavours of the exam, one for Scala and one for Pyspark. From what we've heard, the Scala one is slightly harder as the documentation is a little harder to navigate, but if you're familiar with the Scala docs it'll be fine!
Simon
@@AdvancingAnalytics thanks much Simon
Thank you for making this video. I have 2 questions
1) Will there be questions with more than 1 option correct?
2) do they negative marks for incorrect questions?
I honestly cannot recall if there are options will multiple correct answers, hopefully someone else can help!
There are no negative marks for incorrect questions.
Hello Sanjeev. There is only one correct answer per question.
@@headindata Thank you sir for responding :)
Hi, great content. Gives a good idea on the difficulty level of the exam. Does the exam contains question on streaming?
No questions on streaming
there are different leveles of exam
Can you clarify if a single task can run on multiple slots? Or is it that every task should be granular enough to run on a single slot.
Hey - a single task can only run on one slot. That means a slot cannot spread across multiple workers (which makes sense as it's data held in memory).
So the size of your RDD blocks / Tasks affects how neatly you can utilise the available slots across your workers. Too chunky and they don't spread evenly, too small and there's an overhead of accessing each task and things slow down. It's a tricky balance :)
Simon
Hi which course I need to select to get databricks spark 3.0 certificate
Hey - there's a specific "Associate Developer For Apache Spark 3.0" course - academy.databricks.com/exam/databricks-certified-associate-developer
Any suggestions on how to practice. Understanding the concepts is one thing but until you have practiced on some sample questions, or problem statements, its bit tough to get level to confidence to appear for exam
Hey, sorry - missed this during the break. The best way to practice is to spin up the Databricks community edition - it's a free learning environment! The Databricks docs have a ton of example notebooks that you can import & work through the code with.
After that, pick up a personal project & work it through in anger. I'm definitely a "don't learn it till I try it out myself" person!
Simon
Hi Simon:
Are you aware of any full length practice exams for the DataBricks certification. I would like to take one of those mock exams before diving in.
Thanks
cool, let's get it done.
Good luck!
This is a great video! I have a question since this exam will only test the data frames API, should we go through all the Pyspark functions, or just the data frames and SQL functions are required? Thanks!! Expecting more videos of such from you. :)
Thanks for this video. I had a question
Which is the best certification for spark? Which would you recommend and why?
Hey - the only spark cert I'm really aware of it the Databricks Certified Associate Developer one, you've got a choice of Scala & Python, but it's generally a good overview of the tool, digs into understanding of the engine/architecture etc - academy.databricks.com/exam/databricks-certified-associate-developer
@@AdvancingAnalytics thank you
If there are 8 cores available in total in worker nodes and spark default shuffle partitions is 200, what happens?
How does 200 make sense when only 8 slots are available? Pls explain. Thanks
The 200 tasks are allocated across the workers, the slots will chunk through the tasks (so each of the 8 slots will likely process 25 tasks). So you generally want the default partitions to be a clean multiple of the number of cores as a rule of thumb.
But yeah, it's likely that the 200 default isn't right for that size cluster. The modern spark engine (Spark 3.0 / Databricks runtime 7+) uses a few techniques to override the default during query execution and actually pick an appropriate number of shuffle partitions :)
Is the laptop internal camera acceptable for this exam?
They list an external camera as a requirement - they ask you to set it up so they can see you from the side, including your screen. Internal laptop cam might disqualify you, not sure!
As far as I know the internal camera is acceptable.
How much time needed to prepare for this certification
Depends on your level of spark experience! If you've been using spark most days for a year or so, you'll get by with a day or two of refreshing & cramming. If you're new to spark, it could take a couple of weeks of research, learning & revising. It's very hard to say!
I would argue that some of the architecture questions in the exam are quite tricky, even if you have been working with Spark for a while. So, differing from Simon, I would say that you need at least a week of review, even if you have been using Spark for a while.
First
When you lean back the audio quality getting bad