It really like your style of teaching. You firstly provide use with some problem like "Not every consumer is able to connect to ADLSg2 and read Delta format" and then you explain how to solve it using tools and features that are required on the exam. It's phenomenal!
Thanks Piotr for this great video on dedicated SQL pool and its distribution methods..!! Have subscribed to your channel as a Data Engineer to learn more about data engg to enhance my knowledge!!
Hello Piotr, thank you for the excellent excellent explanation, I am preparing for DP-203 and I took Microsoft test exams and I saw several questions about partitions and sharding , I am looking forward to that chapter,
@@TybulOnAzure Hello Piotr, I would like some help with this question on partitioning if possible: You have an Azure Synapse Analytics dedicated SQL pool. You plan to create a fact table named Table1 that will contain a clustered columnstore index. You need to optimize data compression and query performance for Table1. What is the minimum number of rows that Table1 should contain before you create partitions? A. 100,000 B. 600,000 C. 1 million D. 60 million. Most ppl (including me) go with D but I also saw many ppl choosing C, moreover on many website the editor's answer is even A. Appreciate it if you could provide some insight on this as my exam is approaching soon!
@@LongshengZhao Due to Candidate Agreement (learn.microsoft.com/en-us/credentials/support/certification-exam-candidate-agreement) I'm not discussing any exam questions. As for partitioning - today I'm recording an episode about it and it will be available early next week for "Data Engineer" members of my channel. Remaining viewers will be able to watch it in two weeks.
Wow, that's really similar to Teradata, but publicly accessible! In Teradata, you can have more nodes, but distribution methods are similar. Guess there is also an EXPLAIN statement which tells the SQL pool to describe how it is going to run the query, all these CCS and shuffles, based on internal statistics. And query log which has every run query metadata which can be used to calculate compute and storage skews. Do you have hints like "do the hash join, I insist" here? Can you force the db to perform statistics recalc?
Thanks for mentioning Teradata - I've never used it and I didn't have a clue that it is so similar. And yes, there are query hints in dedicated SQL pool (they are not supported in serverless pool, though). You can also update statistics: learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics
do you think we will be forced to data movement somehow!? for example, orders table that has our customer's orders, suppose that if we use our customer ID column in hash distribution (evenly distribute our data) if we run a T-SQL query that groups or filters by product category or date, it will need to retrieve data from other nodes. or am I wrong?
Yes, that could happen. The reality is that your data distribution method won't be able to satisfy every possible query, so it's best to focus on optimizing the most important and frequent ones.
Hi Piotr, Loving this course, im currently on episode 06 and its helping me understand so much easier than i thought i would. Im preparing to take my dp-203 exam. Ive already passed the az-900 and im currently a business intelligence analyst and aiming to move up to data engineer, ive been working with oracle sql developer with etls for regulatory reporting return creation. I was wondering if i follow all the episodes in your playlist will i have gained enough knowledge to take the exam and pass or should i learn more content that you havent made videos on, on a site like udemy? Which site did you use to go over the course and exam prep questions? Thank you😁
Hi, based on the feedback I received from other students - yes, it is possible to pass the exam based on my playlist. However, I strongly recommend to practice the stuff I'm talking about and visit DP-203 page on MS Learn: learn.microsoft.com/en-us/credentials/certifications/azure-data-engineer/?practice-assessment-type=certification
@@TybulOnAzure Thank you so much for the response Piotr! Feels like a response from a celebrity :D haha im joking. will continue my studying using your content and let you know how the exam goes! :D will also join the membership because your explanations are the best on youtube! thank you sir
I'm just wondering... It seems to me that you said in one of the episodes that there are problems with Azure Synapse Analytics and Microsoft will not necessarily support it. Is it still worth learning or maybe concentrate on Fabric for example?
Microsoft is supporting it and will support it as many customers built their solutions using Synapse Analytics. On the other hand, we should rather not expect many new features added to it. If I were you, I would focus on Fabric (unless you have an existing project where you use Synapse Analytics).
@@TybulOnAzure but I guess most of the concepts (if not all of them) that you talk about here are useful in Data Engineering workflow and by proxy in Fabric, so it's still worth every minute of my time to watch this series
Many thanks Piotr. Explanations are super clear & straight to the point!
Thanks!
It really like your style of teaching. You firstly provide use with some problem like "Not every consumer is able to connect to ADLSg2 and read Delta format" and then you explain how to solve it using tools and features that are required on the exam. It's phenomenal!
Glad you liked it!
Great explanation! I appreciate how you reference every topic to relate with production. Thank you very much
My pleasure!
Really good videos and explanations... more clear doesn't have!! Thanks for your efforts
Glad you like them!
Thanks Piotr for this great video on dedicated SQL pool and its distribution methods..!! Have subscribed to your channel as a Data Engineer to learn more about data engg to enhance my knowledge!!
Great! Welcome aboard!
Hello Piotr, thank you for the excellent excellent explanation, I am preparing for DP-203 and I took Microsoft test exams and I saw several questions about partitions and sharding , I am looking forward to that chapter,
Thanks Jack. Episode about partitioning in Synapse Dedicated SQL Pool will be recorded quite soon.
@@TybulOnAzure Hello Piotr, I would like some help with this question on partitioning if possible: You have an Azure Synapse Analytics dedicated SQL pool. You plan to create a fact table named Table1 that will contain a clustered columnstore index. You need to optimize data compression and query performance for Table1. What is the minimum number of rows that Table1 should contain before you create partitions? A. 100,000 B. 600,000 C. 1 million D. 60 million. Most ppl (including me) go with D but I also saw many ppl choosing C, moreover on many website the editor's answer is even A. Appreciate it if you could provide some insight on this as my exam is approaching soon!
@@LongshengZhao Due to Candidate Agreement (learn.microsoft.com/en-us/credentials/support/certification-exam-candidate-agreement) I'm not discussing any exam questions.
As for partitioning - today I'm recording an episode about it and it will be available early next week for "Data Engineer" members of my channel. Remaining viewers will be able to watch it in two weeks.
38:53 that "better for you to get the correct answer" look xD
Did you know the answer? ;)
Wow, that's really similar to Teradata, but publicly accessible!
In Teradata, you can have more nodes, but distribution methods are similar. Guess there is also an EXPLAIN statement which tells the SQL pool to describe how it is going to run the query, all these CCS and shuffles, based on internal statistics.
And query log which has every run query metadata which can be used to calculate compute and storage skews.
Do you have hints like "do the hash join, I insist" here?
Can you force the db to perform statistics recalc?
Thanks for mentioning Teradata - I've never used it and I didn't have a clue that it is so similar.
And yes, there are query hints in dedicated SQL pool (they are not supported in serverless pool, though). You can also update statistics: learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics
Hi Piotr,
Regarding Hash, should we avoid using columns which contains a lot of duplicates? For example is latest? True/False
Yes, columns like True/False or male/female are pretty bad candidates for hash function.
Is Z-ORDER in delta lake does similar distribution to HASH (except applying has function) ?
I might cover Z-ORDER in other episode after I finish the DP-203 series.
do you think we will be forced to data movement somehow!?
for example, orders table that has our customer's orders, suppose that if we use our customer ID column in hash distribution (evenly distribute our data) if we run a T-SQL query that groups or filters by product category or date, it will need to retrieve data from other nodes. or am I wrong?
Yes, that could happen. The reality is that your data distribution method won't be able to satisfy every possible query, so it's best to focus on optimizing the most important and frequent ones.
Hi Piotr,
Loving this course, im currently on episode 06 and its helping me understand so much easier than i thought i would.
Im preparing to take my dp-203 exam.
Ive already passed the az-900 and im currently a business intelligence analyst and aiming to move up to data engineer, ive been working with oracle sql developer with etls for regulatory reporting return creation.
I was wondering if i follow all the episodes in your playlist will i have gained enough knowledge to take the exam and pass or should i learn more content that you havent made videos on, on a site like udemy?
Which site did you use to go over the course and exam prep questions?
Thank you😁
Hi, based on the feedback I received from other students - yes, it is possible to pass the exam based on my playlist. However, I strongly recommend to practice the stuff I'm talking about and visit DP-203 page on MS Learn: learn.microsoft.com/en-us/credentials/certifications/azure-data-engineer/?practice-assessment-type=certification
@@TybulOnAzure Thank you so much for the response Piotr! Feels like a response from a celebrity :D haha im joking. will continue my studying using your content and let you know how the exam goes! :D will also join the membership because your explanations are the best on youtube! thank you sir
I'm just wondering... It seems to me that you said in one of the episodes that there are problems with Azure Synapse Analytics and Microsoft will not necessarily support it. Is it still worth learning or maybe concentrate on Fabric for example?
Microsoft is supporting it and will support it as many customers built their solutions using Synapse Analytics. On the other hand, we should rather not expect many new features added to it.
If I were you, I would focus on Fabric (unless you have an existing project where you use Synapse Analytics).
@@TybulOnAzure but I guess most of the concepts (if not all of them) that you talk about here are useful in Data Engineering workflow and by proxy in Fabric, so it's still worth every minute of my time to watch this series
Thank you man
Any time
38:53 This look gave me chills XDDDDD
:)
thanks
You're welcome!