Azure Synapse Analytics | Data Distribution Strategy and Best Practices

Arshad Ali - Aas Trailblazers

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 вер 2024

КОМЕНТАРІ • 46

@orxanbabashov 7 місяців тому
This is the first time I ever subscribed a channel as well. Huge thanks !!!!
@VK-ln9vk Рік тому
i wish there are 100000 LIKE buttons. THE BEST VIDEO on the azure synapse distribution. Understood clearly about the distributions with the demo.Thank you so much 🙏
@VirtusRex48 Рік тому
One of the best Synapse videos out there; highly recommend!!!
@hoanglieuit Рік тому
This is the first time I ever subscribed a channel.
@vinayak6685 2 роки тому ⁺¹
Really happy to find this video. Loved the practical demo on how the distributions happened. Subscribed(500th subscriber😁). Waiting for more such awesome content🤩
@Zaf567 2 роки тому
Have watched many videos related to this but yours is awesome.
@goelnikhils Рік тому
What hard work in creating this video. Very good content
@husnabanu4370 Рік тому
wow so detailed explaination with all the visuals and query example is making so easy to understand...
@donanuradha2162 3 роки тому ⁺¹
Very well explained how data is distributed in Synapse SQL DW
@ArshadAliAasTrailblazers 3 роки тому
Thanks Anuradha, I am happy it was helpful for you!
@vaibhavvaidya1442 3 роки тому
Never saw explanation like this on azure synapse, Amazing :)
@ArshadAliAasTrailblazers 2 роки тому
Thanks Vaibhav for your kind words, glad it was helpful!
@julianromero3359 Рік тому
Amazing explanation, thanks for concepts are very clear and practical to understand. I hope find more contents from you. 🤗
@Farisito 11 місяців тому
Thank you a lot ALI, very useful in my case
@gvgnaidu6526 2 роки тому
Amazing explanation and nice representation of all the aspects. Thank you so much Arshad
@SQLTalk 2 роки тому
This is a very well done and helpful video. Thank you for making it.
@danielveraec 2 роки тому
Thanks for sharing this knowledge. Really helpfully!!
@jubershikalgar4205 2 роки тому
Thank you very much for this video.
It was a very helpful and learnt alot about synapse.
@MohammedKhan-np7dn 3 роки тому
Looking forward for the next session
@ArshadAliAasTrailblazers 2 роки тому
Thanks Mohammed, I just posted a video on CI/CD and planning to post few more in next couple of weeks.
@peaceneeded 2 роки тому
Simply Amazing Explanation !
@MohammedKhan-np7dn 3 роки тому
Thank you to explain the concepts in detail.
@ArshadAliAasTrailblazers 2 роки тому
You are welcome!
@MohammedKhan-np7dn 3 роки тому
Very Good session to understand the concepts in Synapse Analytics
@ArshadAliAasTrailblazers 2 роки тому
Thanks Mohammed for your kind words, glad it was helpful!
@abc_987 24 дні тому
JUST GOLD
@vivekvishal2500 2 роки тому ⁺¹
Great Sir 👌
@kuldeepgawande9550 3 роки тому
Excellent explanation. Thank you.
@ArshadAliAasTrailblazers 2 роки тому
You are welcome!
@upendarjakkula2561 2 роки тому
Extraordinary 👌
@user-yj9rv7us4x Рік тому ⁺¹
👍🏻👍🏻👍🏻
@shuaibpantnagar 2 роки тому
Very nicely explained the Azure Synapse specially SQL pool. I have question here. Both Synapse and Azure Data bricks have spark engine. How would I choose one between them for my my project work?
@amittyagi9171 2 роки тому
Thank you so much. You are amazing.
@TiffanyMorris123 3 роки тому ⁺¹
Thanks for this video! Question you touched quickly on creating statistics in Synapse prior to running queries based on the query patterns.. For my case I have a large group of users from admins to analysts to developers and I can not predict the types of queries that they will run. Is there a best practices that I can pass on to the users when planning to create the stats before running their queries? Do you plan on future tutorials on this topic? thanks!
@ArshadAliAasTrailblazers 2 роки тому
Thanks Tiffany! While creating stats in advance is a proactive way to optimize the performance, engine also learns from first time submitted queries to optimize the performance for future submissions when AUTO_CREATE_STATISTICS setting is ON (which is ON by default). You can find more details about it here: docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-statistics
To shorten statistics maintenance time, be selective about which columns have statistics, or need the most frequent updating. For example, you might want to update date columns where new values may be added daily. Focus on having statistics for columns involved in joins, columns used in the WHERE clause, and columns found in GROUP BY. docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool#maintain-statistics
@HGoIchetan09 3 роки тому
Excellent explanation.. Thanks..
@ArshadAliAasTrailblazers 2 роки тому
You are welcome
@samuelrocha9079 2 роки тому
Thank you for the video, one of the bests that I ever watched in terms of learning data.
Just a quick question, in round-robin table, you said the data will be shuffled when you query the group by ProductKey, and the distribution will be organized by that field, so, what if after that, I decide to execute the same query, but grouping by a different field? The shuffle will happen again? and the distribution will be by this other field that I'm considering to group?
@SushilChauhan Рік тому
yes.
@user-yj9rv7us4x Рік тому ⁺¹
👍👍👍👍
@sumitrauniyar7347 2 роки тому
how does replicate distribution work when we have 1 compute node?
@Mohammad.aarif_222 5 місяців тому
From where I need to store files in blob storage
@SSingh-lr2ue 3 роки тому
Thank you for the clear explaination . however i am not clear about where does 60 buckets or 60 distribution gets stored , Is it in azure storage ? In short not getting the purpose/difference of azure storage and SQL Database instance attached with compute node , Could you please explain more about it ?
@ArshadAliAasTrailblazers 2 роки тому
For developers, I think the important thing to consider is how it scales out, for example, if you have 2 nodes, each of these nodes will have 30 distributions attached to it, likewise if you 4 nodes, each of these nodes will have 15 distributions. By this scaling out from 2 to 4 nodes, each of these nodes now will have roughly half of the data (assuming there is no data skewness), and will take roughly half the time to complete processing. docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#service-levels
@BabatundeAdeleye-mw5ce 10 місяців тому
The 60 distributions are stored in the sql database instance in the sql pool. data from azure store are distributed to the distributions in different patterns, depending on the distribution type defined on the sql pool table during table creation. sql engine then gets these data from the distributions as instructed in your query, which may require it to move data around or not before executing the aggregate function on the data and sending the output to the control node, which in turn sends the same to the user for viewing.
@Mohammad.aarif_222 5 місяців тому
How do I make external table

Наступне

Автоматичне відтворення

Azure Synapse Analytics | Index Options | Columnstore Index | Best Practices