DP-203: 43 - Azure Synapse Dedicated SQL Pool - Architecture and overview

Поділитися
Вставка
  • Опубліковано 1 гру 2024

КОМЕНТАРІ • 36

  • @heljava
    @heljava 21 день тому +1

    Many thanks Piotr. Explanations are super clear & straight to the point!

  • @TheMapleSight
    @TheMapleSight 4 місяці тому +3

    It really like your style of teaching. You firstly provide use with some problem like "Not every consumer is able to connect to ADLSg2 and read Delta format" and then you explain how to solve it using tools and features that are required on the exam. It's phenomenal!

  • @swathi8273
    @swathi8273 3 місяці тому +2

    Great explanation! I appreciate how you reference every topic to relate with production. Thank you very much

  •  Місяць тому

    Really good videos and explanations... more clear doesn't have!! Thanks for your efforts

  • @prabhuraghupathi9131
    @prabhuraghupathi9131 5 місяців тому +1

    Thanks Piotr for this great video on dedicated SQL pool and its distribution methods..!! Have subscribed to your channel as a Data Engineer to learn more about data engg to enhance my knowledge!!

  • @jacklarrytairo5562
    @jacklarrytairo5562 5 місяців тому +1

    Hello Piotr, thank you for the excellent excellent explanation, I am preparing for DP-203 and I took Microsoft test exams and I saw several questions about partitions and sharding , I am looking forward to that chapter,

    • @TybulOnAzure
      @TybulOnAzure  5 місяців тому

      Thanks Jack. Episode about partitioning in Synapse Dedicated SQL Pool will be recorded quite soon.

    • @LongshengZhao
      @LongshengZhao 4 місяці тому

      @@TybulOnAzure Hello Piotr, I would like some help with this question on partitioning if possible: You have an Azure Synapse Analytics dedicated SQL pool. You plan to create a fact table named Table1 that will contain a clustered columnstore index. You need to optimize data compression and query performance for Table1. What is the minimum number of rows that Table1 should contain before you create partitions? A. 100,000 B. 600,000 C. 1 million D. 60 million. Most ppl (including me) go with D but I also saw many ppl choosing C, moreover on many website the editor's answer is even A. Appreciate it if you could provide some insight on this as my exam is approaching soon!

    • @TybulOnAzure
      @TybulOnAzure  4 місяці тому

      @@LongshengZhao Due to Candidate Agreement (learn.microsoft.com/en-us/credentials/support/certification-exam-candidate-agreement) I'm not discussing any exam questions.
      As for partitioning - today I'm recording an episode about it and it will be available early next week for "Data Engineer" members of my channel. Remaining viewers will be able to watch it in two weeks.

  • @SAJO91
    @SAJO91 20 днів тому

    38:53 that "better for you to get the correct answer" look xD

  • @dmitryzvorikin
    @dmitryzvorikin 2 місяці тому

    Wow, that's really similar to Teradata, but publicly accessible!
    In Teradata, you can have more nodes, but distribution methods are similar. Guess there is also an EXPLAIN statement which tells the SQL pool to describe how it is going to run the query, all these CCS and shuffles, based on internal statistics.
    And query log which has every run query metadata which can be used to calculate compute and storage skews.
    Do you have hints like "do the hash join, I insist" here?
    Can you force the db to perform statistics recalc?

    • @TybulOnAzure
      @TybulOnAzure  2 місяці тому

      Thanks for mentioning Teradata - I've never used it and I didn't have a clue that it is so similar.
      And yes, there are query hints in dedicated SQL pool (they are not supported in serverless pool, though). You can also update statistics: learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics

  • @mdpdurawix1834
    @mdpdurawix1834 22 дні тому

    Hi Piotr,
    Regarding Hash, should we avoid using columns which contains a lot of duplicates? For example is latest? True/False

    • @TybulOnAzure
      @TybulOnAzure  22 дні тому +1

      Yes, columns like True/False or male/female are pretty bad candidates for hash function.

  • @swathi8273
    @swathi8273 3 місяці тому

    Is Z-ORDER in delta lake does similar distribution to HASH (except applying has function) ?

    • @TybulOnAzure
      @TybulOnAzure  3 місяці тому +1

      I might cover Z-ORDER in other episode after I finish the DP-203 series.

  • @SAJO91
    @SAJO91 19 днів тому

    do you think we will be forced to data movement somehow!?
    for example, orders table that has our customer's orders, suppose that if we use our customer ID column in hash distribution (evenly distribute our data) if we run a T-SQL query that groups or filters by product category or date, it will need to retrieve data from other nodes. or am I wrong?

    • @TybulOnAzure
      @TybulOnAzure  19 днів тому +1

      Yes, that could happen. The reality is that your data distribution method won't be able to satisfy every possible query, so it's best to focus on optimizing the most important and frequent ones.

  • @ryleyalexander8097
    @ryleyalexander8097 4 місяці тому

    Hi Piotr,
    Loving this course, im currently on episode 06 and its helping me understand so much easier than i thought i would.
    Im preparing to take my dp-203 exam.
    Ive already passed the az-900 and im currently a business intelligence analyst and aiming to move up to data engineer, ive been working with oracle sql developer with etls for regulatory reporting return creation.
    I was wondering if i follow all the episodes in your playlist will i have gained enough knowledge to take the exam and pass or should i learn more content that you havent made videos on, on a site like udemy?
    Which site did you use to go over the course and exam prep questions?
    Thank you😁

    • @TybulOnAzure
      @TybulOnAzure  4 місяці тому +1

      Hi, based on the feedback I received from other students - yes, it is possible to pass the exam based on my playlist. However, I strongly recommend to practice the stuff I'm talking about and visit DP-203 page on MS Learn: learn.microsoft.com/en-us/credentials/certifications/azure-data-engineer/?practice-assessment-type=certification

    • @ryleyalexander8097
      @ryleyalexander8097 4 місяці тому

      @@TybulOnAzure Thank you so much for the response Piotr! Feels like a response from a celebrity :D haha im joking. will continue my studying using your content and let you know how the exam goes! :D will also join the membership because your explanations are the best on youtube! thank you sir

  • @TheMapleSight
    @TheMapleSight 4 місяці тому

    I'm just wondering... It seems to me that you said in one of the episodes that there are problems with Azure Synapse Analytics and Microsoft will not necessarily support it. Is it still worth learning or maybe concentrate on Fabric for example?

    • @TybulOnAzure
      @TybulOnAzure  4 місяці тому

      Microsoft is supporting it and will support it as many customers built their solutions using Synapse Analytics. On the other hand, we should rather not expect many new features added to it.
      If I were you, I would focus on Fabric (unless you have an existing project where you use Synapse Analytics).

    • @TheMapleSight
      @TheMapleSight 4 місяці тому

      ​@@TybulOnAzure but I guess most of the concepts (if not all of them) that you talk about here are useful in Data Engineering workflow and by proxy in Fabric, so it's still worth every minute of my time to watch this series

  • @mission_possible
    @mission_possible 4 місяці тому

    Thank you man

  • @TheMapleSight
    @TheMapleSight 4 місяці тому

    38:53 This look gave me chills XDDDDD

  • @pst659
    @pst659 4 місяці тому

    thanks