#13 | Micro Partitions & Data Clustering In Snowflake | Snowflake Hands-on Tutorial

Data Engineering Simplified

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 лис 2024

КОМЕНТАРІ • 126

@jacksonjulian5968 2 роки тому ⁺¹¹
Thankyou for your selfless service, surely these video series will always be the master guide for all those who are going to learn Snowflake in future
@DataEngineering 2 роки тому
Thank you @Jackson Julian 🙏 for watching my video and your word of appreciation really means a lot to me.
@pkphuloria 2 роки тому ⁺²
Best Data Engineering channel in You tube for Snowflake. Excellent !
@DataEngineering 2 роки тому
Thank you 🙏 @ pradeep phuloria for watching my video and your word of appreciation really means a lot to me.
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
I have already published other knowledge series and snowflake certification videos and if you are interested, you can refer them.
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
🌐 SnowPro Guide ➥ bit.ly/35S7Rcb
🌐 Snowflake SQL Series Playlist ➥ bit.ly/3AH6kCq
🌐 SnowPro Question Dump (300 questions) ➥ bit.ly/2ZLQm9E
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
@wernerzirkel2753 11 місяців тому ⁺¹
Thanks for sharing these videos. I have been searching for a while for Snowflake training videos and event considered buying a training. But this series here is perfect, condensed, sales-free information. 100 % what I need to understand this technoogy. Will recommend this to other colleagues.
@DataEngineering 11 місяців тому
Glad you liked it... and thanks for sharing it with your colleagues and your community..
and yes, I know many of us are not fully aware of snowpark Python API, if you want to manage snowflake more programatically.. you can watch my paid contents (data + code available) .. many folks don't know the power of snowpark... these 2 videos... will help you to broaden your knowledge..
These contents are available in discounted price for limited time.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command...
1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/?couponCode=DIWALI50
2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/?couponCode=DIPAWALI35
@saltydog996 9 місяців тому ⁺¹
I have difficulty understanding the concept of the documentation. This tutorial saved me. Thank you.
@DataEngineering 9 місяців тому
Glad it was helpful!
@sujathapullareddy7726 Рік тому ⁺¹
Thank you very much, I could understand micro-partitioning very well now., so well explained.!
@DataEngineering Рік тому
Glad it helped!
@alexsanders7803 2 роки тому ⁺¹
Thank you for the detailed explanation
@DataEngineering 2 роки тому ⁺¹
Glad it was helpful!
@davidsun5648 Рік тому ⁺²
Thanks for the exceptional presentation. One quick question: where are micro partitions physically stored? I think it should be cloud storage(e.g., Azure Blob), right? Thanks in advance.
@karthikeyanudayakumar9553 2 місяці тому
Excellent explanation 🎇
@thamilkadavarayar6707 2 роки тому ⁺¹
Another good introduction about micro partitioning... keep up with your hard work putting all these information for us. THANK YOU.
@DataEngineering 2 роки тому
My pleasure!
@SowmyaAS-x7t Місяць тому
Excellent explanation.
@DataEngineering Місяць тому
Glad you think so!
@mdwasim-cv2ig Рік тому
Thanks for the such an easy explanation, watching all your videos as i am going to appear Certification exam. Please guide
@maheshbabu6925 2 роки тому ⁺¹
Nice explanation towards real time examples
@DataEngineering 2 роки тому
Glad you liked it.
Thank you 🙏 @Mahesh Babu, for watching my video and your word of appreciation really means a lot to me.
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
I have already published other knowledge series and snowflake certification videos and if you are interested, you can refer them.
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
🌐 SnowPro Guide ➥ bit.ly/35S7Rcb
🌐 Snowflake SQL Series Playlist ➥ bit.ly/3AH6kCq
🌐 SnowPro Question Dump (300 questions) ➥ bit.ly/2ZLQm9E
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
@sun15220 2 роки тому ⁺¹
Thank you for the videos. Very well structured curriculum.
@DataEngineering 2 роки тому
Glad you like them!
@gchillam 2 роки тому ⁺²
Excellent presentation of Micro-Partitions.
However, one thing to note is that the max size on a micro-partition is 16Mb, not 500Mb. This is easy to confirm. If you take the micro-partitions count from the SYSTEM$CLUSTERING_INFORMATION for a given table and then pull the size of the table from the tables view in Information_schema. Then divide the table size by the micro-partitions, and you get an avg. Micro-Partition size. If you look at lots of tables, you will find that most tables' average micro-partitions are usually smaller than 16Mb.
@DataEngineering 2 роки тому ⁺¹
Excellent observation.... never thought in that direction. As per snowflake documentation.. "Each micro-partition contains between 50 MB and 500 MB of uncompressed data" and I am not sure if compress & un-compressed playing a role here..
Thanks again for your feedback and your word of appreciation really means a lot to me.
@pratikparbhane8677 2 роки тому ⁺²
for compressed partition its 16MB and uncompressed its 50-500MB
@DataEngineering 2 роки тому
@@pratikparbhane8677 Snowflake does not say if compressed data will be 16Mb or more or less. For exam point of view, it should be between 50-500Mb as per documentation. We can perform some trial and errror and infer if it would be 8Mb or 16Mb etc.. I hope this clarifies.
@gchillam 2 роки тому
@@DataEngineering Well from working with Snowflake since their beginning the MAX size of a micro-partition is 16mb. That used to be on the exam. Regardless if it still is on the exam it is very important to understand how micro-partitions really work. I specialize in optimizing Snowflake for very large implementations and understanding how well the micro-partitions are getting packed is a very big deal for performance. I bring this up as it is so often overlooked and misunderstood, but is the foundation for performance.
@s4dgo4t 2 роки тому ⁺²
@@gchillam It's clearly stated in the SF doc that it is 16 Mb "compressed" and 50-500Mb in their uncompressed form.
@vinothkannaramsingh8224 27 днів тому
So, external/internal stage data will have micro partition (or)
when loading into table it will have micro partition ?
@Niteshkumar-pq5on Рік тому
Thanks for the very informative and useful videos on Snowflake. I am unable to access the SQL codes for the hands on. Could you please help?
@krist17860 Рік тому
Hi excellent tutorial thank you . I am trying to access the SQL scripts , but get a 401. Is this as intended ?
@YDENTERMAINT Рік тому
Hi excellent tutorial thank you . I am trying to access the SQL scripts , but get a 401. Is this as intended ?
@brajamajumder4325 5 місяців тому
Thanks for Sharing , great inside
@RajanieshKaushikk 2 роки тому
YOU are TOOOOO Good...Thanks a lot for this wonderful video!!
@DataEngineering 2 роки тому
You are welcome
@chaitanyakrishna5873 2 роки тому
Thank You for all of your Great Efforts. Extraordinary Explanation & Nowhere i found such in-depth explanation..Very soon i am going to take SnowPro Certification
@DataEngineering 2 роки тому
You are most welcome
@shivamrai162 Рік тому
Not able to download any note from your site. Kindly update it. I need to revisit every time coz i keep forgetting concept. I need something to refer to. Thank you. Great content
@amitjaiswal781 Рік тому
Amazing session Thank you
@DataEngineering Рік тому
Thank you so much!
@raghuram6264 2 роки тому ⁺²
Very good efforts sir. So helpful in real time projects. Are you providing any training classes.
@DataEngineering 2 роки тому
No, I don't provide any training and I share my knowledge and my experience via this channel to help snowflake community.
Thank you 🙏 for watching my video and your word of appreciation really means a lot to me.
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
I have already published other knowledge series and snowflake certification videos and if you are interested, you can refer them.
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
🌐 SnowPro Guide ➥ bit.ly/35S7Rcb
🌐 Snowflake SQL Series Playlist ➥ bit.ly/3AH6kCq
🌐 SnowPro Question Dump (300 questions) ➥ bit.ly/2ZLQm9E
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
@RAMANKUMARKHARCHE 11 місяців тому
Thank you so much. I got the idea about the micro partitioning. And I have subscribed your channel as well.😀 Just one other thing can you share your sample database structure with data which is used in video. That would be super helpful. 🙏
@abdullahsiddique7787 2 роки тому ⁺¹
if cluster key is not defined explicitly , snowflake creates micro partitions as per the order in which data comes for loading. i dont think it automatically orders as per DOB in this example. It will create micro partition as the per order of data loaded . You can do a quick insert with that dataset and check once.
@DataEngineering 2 роки тому ⁺¹
You are right. I have just taken a simple demo data and what you say is right.. meanwhile let me check the actual behaviour as you explained.
@abdullahsiddique7787 2 роки тому
@@DataEngineering sure
@kumarsingh1741 Рік тому
Can you please post the site from where you inserted the data in the tables
@vedantshirodkar Рік тому
Great video sir.
I have one doubt. How much time Snowflake would take to implement manual/custom clustering since it involves reshuffling of data ?
Is it created instantly?
Any other way to get the status of manually clustered table?
@DataEngineering Рік тому
there is no such concept of manual re-shuffling of data.
@raviagarwal9834 Рік тому
Hello...I am looking for online training in Snowflake along with AWS....let me know if you conduct these
@shayankabasi160 2 роки тому ⁺¹
Awesome course, best pubic Snowflake course in youtube. @Data Engineering Simplified - could you please advise when the chapter -14 time travel will be published.
@DataEngineering 2 роки тому ⁺¹
Thank you 🙏 for watching my video and your word of appreciation really means a lot to me. It is coming anything in next 7 days @Shayan.
@DataEngineering 2 роки тому ⁺¹
@Shayan Kabasi, time travel video is out now (ua-cam.com/video/9k8ADXunhIk/v-deo.html)
@subodhagrawal4087 9 місяців тому
Amazing you are teaching like paid course.
@DataEngineering 9 місяців тому
feel free to review my Udemy courses
www.udemy.com/user/data-engineering-simplified/
@PriyaSharma-ci2kh 2 роки тому
could you pls explain this line Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps prevent skew?How it prevent skew
@DataEngineering 2 роки тому
Snowflake does not reveal how micro partition works internally and how does it perform all management stuff to optimize the storage..
If you have you gone through my cloud server layer chapter (ch-3 ua-cam.com/video/IocdgUB94KQ/v-deo.html), it has all the statistics and it decides how to organize the micro partition with overlap or without overlaop and make sure that data is not skewed....
But thanks for the note and I will do some additional research to add more detail in a separate video in future.
@ameyabapat9090 2 роки тому ⁺¹
In case of cdc/delta, 10:00 to 13:00 once they create new partition, they would need to update the older partition which might contain data which was updated during CDC and become stale. Is it correct?
@DataEngineering 2 роки тому
Micro-partitions are immutable and once it is created, it can not be changed... and if there are lot of updated or delete happening, snowflake query (the existing one which was running fast) gets slower..
@varunjain2645 2 роки тому
dhanywad bhaiii..its been very help help in understanding fine concepts ...WILL CONNECT ONCE I LL CRACK THE INTEVIEW..
@DataEngineering 2 роки тому
All the best for your exam.
@MexiqueInc 6 місяців тому
Hello , currently on a mission to finish all the videos on 2024. Please include all the data use for the procedures. Been following along all the videos with the new UI is a challenge by itself. Would appreciate it I could find the files on the description. Thanks for your awesome job.
@saraneegupta7426 Рік тому
What is the difference of scanning data by row or by column..what advantage columnar db has
@rameshbaburamachandran2431 Рік тому
In cluster depth realtime calculation couldnt find how you are getting average depth as 3.69, could you pls give the details?
@majakmasti3592 2 роки тому ⁺¹
Sir if u teach in Hindi. Then I'm sure you are reach shortly 1M subscriber. Bcz your teaching style is good.
@DataEngineering 2 роки тому ⁺²
Thanks for your note... this is 3rd request about Hindi videos. I can also make video in Hindi..
I generally explain in very simple English, what is that you find it challenging with my video being in English (share your thought in my insta account instagram.com/learn_dataengineering/)
Sabhi kuch to technical words hi hain... Hindi me samajhane me bhi lagabhag vaisa hi lagega... to kya challenge hai audience ko English me? Please help with your thought..so I can start making in Hindi too...
@majakmasti3592 2 роки тому
@@DataEngineering deeply understanding sirf Hindi mein hi hoga Sir.. most of people in India is highly or deeply understanding in Hindi.
Learning ke liye language ka restriction nahi hona chaiye ...
Aisa mera manana hai
@DataEngineering 2 роки тому ⁺¹
@@majakmasti3592 make sense.. started planning ... jaldi hi soch ke video release karunga.. thanks again for your note..
@SenthilPrakashM-d7b Рік тому
If the micro partitions are immutable, how updating/deleting an existing record in micro partition works.? By doing deletion will the space in micro partition be reused?
@vedantshirodkar Рік тому
Deletion would require recreation of entire partion which contains that record.
@kothandans8 2 роки тому
Very clearly explained and very useful . Can you please explain about lambda functions used in snowflake batch data ingestion process?
@DataEngineering 2 роки тому
Thanks for watching my video.
Snowflake does not have any lamda function, it is available in AWS and not in snowflake. Snowflake has task and that will be available in ch-18 and will be published soon.
@kothandans8 2 роки тому
@@DataEngineering Thank you!
@swarnalathabanala1665 2 роки тому
If CDC doesn't update existing partitions, how will the delta consolidation works? could you explain with clear example
@DataEngineering 2 роки тому
Micro-partition has not direct link with CDC .. if there is lot of churn be it CDCC or any other update.. you can run a process to re-adjust the partition specially if query performance degrade..
if you would like to understand the CDC behavior.. snowflake has a much nicer way to implement it..
➥ Chapter-17 Snowflake Streams & Change Data Capture 🌐 ua-cam.com/video/DXI0GDSwE_E/v-deo.html
➥ Chapter-19 ETL (Data Pipeline) in Snowflake 🌐 ua-cam.com/video/9FejjGVZrPg/v-deo.html
@dineshkumarnagarajan200 2 роки тому
Wow!! Thanks for the detailed explanation.. Truly appreciate the hard work put behind this...
Can you comment why cant i access the website or the sql scripts, it says 404 forbidden.
@DataEngineering 2 роки тому
Let me check, will fix the website issue
Thank you 🙏 for watching my video and your word of appreciation really means a lot to me.
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
I have already published other knowledge series and snowflake certification videos and if you are interested, you can refer them.
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
🌐 SnowPro Guide ➥ bit.ly/35S7Rcb
🌐 Snowflake SQL Series Playlist ➥ bit.ly/3AH6kCq
🌐 SnowPro Question Dump (300 questions) ➥ bit.ly/2ZLQm9E
@b123786 2 роки тому
any video that covers bytes sent over the network? or is that a indication that you need a bigger warehouse
@DataEngineering 2 роки тому
so far I have not touched upon that topic.. made a note of it.. will review and will see if I can make some video around it...
Thanks for sharing with me and community... would like to know what challenges you have..so I can suggest more..
@b123786 2 роки тому
@@DataEngineering so my query was scanning thought a large dataset billions of rows using 2xl cluster and the recommendation was to use 4xl since large amount of data was from remote storage ... would you say that is accurate
@bidyadharapanda552 Рік тому
So when we are adding clustering key columns it is taking more time and it is scanning more file size of data.. so how is it a benefit?
@DataEngineering Рік тому
clustering narrow down the scanning... it picks only those partition that has that specific data set.. and avoid full scanning.
@vvkwadhwa 2 роки тому ⁺¹
too gud tutorial and your knowledge and the explaination way is awesome. Request you to club all of your videos at single path because still few more concepts details are needed. Please email or share details ASAP.
Thanks a lot ....
@DataEngineering 2 роки тому
Thank you 🙏 for watching my video and your word of appreciation really means a lot to me.
The playlist has already having all published 14 chapter as on today. You can check it
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
@LucianMoldovanu 2 роки тому
Hi. Wasn't it expected that the query time would be lower when you queried from the third table, the one clustered by order priority? This was not seen, and the time (as well as bytes scanned) were actually the highest ones when using the third table.
@DataEngineering 2 роки тому ⁺¹
Will check and come back.
@ameyabapat9090 2 роки тому
does reclustering recreate the micro partitions as per the latest data ?
@DataEngineering 2 роки тому
latest data create new micro partitions.. it does not change the existing micro partition..
@auravivianaduartehernandez820 9 місяців тому
Hello @DataEngineering, excellent content however, it would be good if you specify the updated links, they are already broken
@wernerzirkel2753 11 місяців тому
one question, az 27:18 you are using the command "alter table t2_order_dt" suspend recluster. So I see that Snowflake is reclustering automatically by default but at the same time the micro partitions are immutable. Does it create new micropartitions and delete old ones ?
@DataEngineering 11 місяців тому ⁺¹
yes.. it does
@sgnaneswari2902 2 роки тому ⁺¹
Hi
I am loading data from S3 to snowflake using copy command, but for the first time it is giving me like copy executed with 0 files processed.
@DataEngineering 2 роки тому
Thanks for your query @S gnaneswari.
You can check copy history or pipe history table for any error. There could be two reason, the 1st that the file already loaded and when you try to load the same file, it does not load any duplicate data, 2nd, there might be some error which you can validate by going to information schema tables.
Do let me know if this solve the proble or watch Ch-10 (ua-cam.com/video/PNK49SJvXjE/v-deo.html)
@gokukanishka 2 роки тому
For the auto clustered tables , when we dont provide cluster by clause … how do we know which column snowflake choose to partition it
@DataEngineering 2 роки тому ⁺¹
There is no supporting document available to answer this question. Will search and try to find it out..
@livethecolourfulworld Рік тому
Hello Sir, I am vikram singh for Delhi. Watch your video for snowflake. I want to be a do certification exam for snowflake and I am working in .Net developer for last 15 years. So please advise me change a profile for right decision or not
@DataEngineering Рік тому
If you don't have any experience on data engineering domain and if you would like to come to data engineering space, along side snowflake, also gain knowledge about data engineering space.
You can watch all the playlist and get an idea, if this make sense to you or not. It is hard to say in black & white that changing profile from x tech to Snowflake will make a person successfuly. But, it is true that Snowflake has much more brighter future than .Net
@livethecolourfulworld Рік тому
Thanks
@angelnadar1209 2 роки тому
Snowflake automatically selects a cluster key for a table ,so on what basis does snowflake selects the cluster key?
@DataEngineering 2 роки тому
When data is loaded, snowflake cloud service layer analyze the data and select a best approach to store it.. since it is not an open source tech, I did not find any doc around it.
@vinothkannaramsingh8224 27 днів тому
1. In this video you are saying NO clustering works efficiently than manual specifying clustering key ?? what's the takeaway ?
@vishnuramjatin4898 Рік тому
How is this different from indexing?
@DataEngineering Рік тому
Indexing is a very different concept and micro-partition is a very difference concept. But if you see that micro-partition file's metadata is stored in cloud service like (similar to indexing).
@navneetkumar-rr4up 2 роки тому
Sir,please help with this
1. We have micro-partition (automatic by SF) then why we need clustering key.?
2. If we change the clustering key in the table , MP will not change (immutable) then again why are changing clustering key?
3. when you didnot provide the cluster key for the first table, automatic clustering was OFF but still MP was working then again why do we need clustering key?
I hope you reply soon, my test is on wednesday.
@DataEngineering 2 роки тому
Hi Navneet, here is my answer
1. Micropartition is a kind of file format (like parquet or ORC) and whether you have cluster key enabled or not, internal storage in snowflake is via Micropartition.
2. If we change the cluster key, the new micro partitions will be written and old will be destroyed and it is an expensive operation.
3. Clustering key is to how the fields will be organized in rows and columns withing micro-partition and if you would like to understand it better, try to focucs on overlap concept in my video and it will be clear to you.
The exam will not go that deep and I will suggest to try my micro-partition question paper.
ua-cam.com/video/QMUIW8OtI6c/v-deo.html
if you have any further query, drop a note again.
Wish you all the best for your exam
@bisram123 2 роки тому ⁺¹
It is very nice Sir, Thank you . how did you create this Treeview chart. any software for that. __/\__
@DataEngineering 2 роки тому
Thank you 🙏 for watching my video and your word of appreciation really means a lot to me. It is done via mind mapping too, generally used for brain storming.
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
I have already published other knowledge series and snowflake certification videos and if you are interested, you can refer them.
🌐 Snowflake Complete Guide Playlist ➥ bit.ly/3iNTVGI
🌐 SnowPro Guide ➥ bit.ly/35S7Rcb
🌐 Snowflake SQL Series Playlist ➥ bit.ly/3AH6kCq
🌐 SnowPro Question Dump (300 questions) ➥ bit.ly/2ZLQm9E
⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡
@livenotlive Рік тому
@@DataEngineering Why the question list is private?
@DataEngineering Рік тому
@@livenotlive pls check this new playlist ua-cam.com/play/PLba2xJ7yxHB5X2CMe7qZZu-V4LxNE1HbF.html
@lakshmikanththopuri5823 2 роки тому ⁺¹
can we load pdf files into snowflake
@DataEngineering 2 роки тому
The question is "Why would you load PDF in snowflake".. do you want to use snowflake as storage or as database.
Technically, if the PDF size is less than 8Mb, then you can store the PDF as binaries. Watch episode-7 for more detail and for data type ua-cam.com/video/Pi3z1NyBd-Y/v-deo.html
@spacesolutions9656 2 роки тому
Can you please share SQL scripts
@DataEngineering 2 роки тому
will release the SQLs soon.. via my website..
@lakshmiprabha2449 2 роки тому
May i know why the date for first query is changed (2020-01-02) whereas the dates in the other 2 queries are the same (2020-01-01). Will that not retrieve different rows ? Video at 23 min 59 sec.
@DataEngineering 2 роки тому
it is recording issue.. but when you go to the query history.. the query history shows the correct dates.. but will checkc once again.. and confirm
Thanks for your observation and sharing it with all of us..
@saraneegupta7426 Рік тому
When a data gets deleted... What happenes with the MP...
@abhijeet2218 Рік тому
unale to download anything please make it available
@rajeshraxz9571 2 роки тому
SQL script shows error , we can't access this one
Pls help on that
@DataEngineering 2 роки тому
let me check.. there is some issue in my website.. will fix it.
@mohammedvahid5099 Рік тому
Voice is very low ...
@DataEngineering Рік тому
Strange, nobody has ever complained about the low voice... can you check in a difference device.
@nityanandkore 2 роки тому ⁺¹
This is best playlist.. I am eagerly waiting for rest all videos.. kudos to you.. by any chance can you share yours LinkedIn profile to connect
@DataEngineering 2 роки тому
Will upload soon
@jayanthkaja 2 роки тому
Thank you it really helped us and understand about snowflake platform. Once again thank you for selfless service...

Наступне

Автоматичне відтворення

What is Snowflake Time Travel | Chapter-14 | Snowflake Hands-on Tutorial