Hello I am preparing the dp-203 and your channel is simply magical. you explain complex concepts very simply. I really like your method with the whiteboard and the hand drawings. thank you very much for this quality content in your channel. I know it's a lot of preparation work behind the final video.😊🙏
Hi Piotr, It would be nice to see some video about how to verify the quality of the data in different layers before it reaches end user. Great video like always, please keep it up!
As a "Data Engineer" member of my channel, you’ll have the special privilege of suggesting topics for new videos and voting on them. If you have a topic in mind, I’d love for you to join as a member. I’ll be setting up the first poll once I complete the DP-203 course.
You are explaining the complex concepts in a nice way. So I thought it would be great to listen the partitioning concept from you.Beacause I found it somewhat confusing when I started learning it by myself.
Hi Tybul. I am training to become a data engineer on Azure and I was planning in joining the club of the "Junior section". However, I could not find what I was looking for. For a fee, would you be able to to make interviews for real job scenarios? Would it be something you would consider to be part of your service package? Your tutorials are great and it gives me confidence, great work!
Due to UA-cam's membership policy, I can't offer 1:1 meetings. However, I'm thinking about introducing a new membership tier that would include a monthly group call. In these sessions, we could cover different topics, brainstorm ideas, do live training or interviews, consult, or just have a casual chat. Please note, though, it would be a group setting.
Hello Pybul, this course is very good. It was what I wanted to complement my architecture data master. I'm really not clear on how to load the same database every day without repeating the same data over and over again, with increasing daily cost. Can you give a real example of how to face and solve this problem?
Sure. Basically you would write your data extraction SQL queries in an incremental way. Take a look here (learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) for more details.
15:58 You mention that we can process all data from scratch. Is it also possible to easily process data from a certain point? For example, all data from the last 2 weeks.
Hello Sir, The question I will ask may not be relevant to topic of the video. Is there a specific reason to partition our Sales Orders Dataset by the Ingestion Date?
You can call it however you want, e.g. staging, raw or bronze. The important thing is to make everyone aware what it means and what kind of data it stores. I talked more about data lake zones in 30th episode.
Hi Tybul. Nice explanation . I have a query regarding PII. Can we anonymise the PII in the raw data itself ? or we anonymise the PII during transformations?
It depends on your requirements and what your legal team says, e.g. you might not be able to store PII data in raw layer at all. Then what? I can see three basic options: 1. Don't ingest PII data at all (if possible). 2. Get rid of PII data on the fly before writing the data to the raw layer. 3. Add an additional zone (raw-PII) with tight security measures, dump your raw data there, then read from it, get rid of PII data and save the outcome in regular raw layer. Optionally, set automatic removal of files from raw-PII layer after few days or so.
Hello
I am preparing the dp-203 and your channel is simply magical.
you explain complex concepts very simply. I really like your method with the whiteboard and the hand drawings.
thank you very much for this quality content in your channel.
I know it's a lot of preparation work behind the final video.😊🙏
Great content on how to structure/organize our data in Raw layer!!
Hi Piotr,
It would be nice to see some video about how to verify the quality of the data in different layers before it reaches end user.
Great video like always, please keep it up!
As a "Data Engineer" member of my channel, you’ll have the special privilege of suggesting topics for new videos and voting on them. If you have a topic in mind, I’d love for you to join as a member. I’ll be setting up the first poll once I complete the DP-203 course.
Wonderful
Hi Tybul,
The contents that you are delivering is awesome!!!. Can you also please make a video on Data partitioning and its types and implementation.
What do you have in mind?
You are explaining the complex concepts in a nice way. So I thought it would be great to listen the partitioning concept from you.Beacause I found it somewhat confusing when I started learning it by myself.
i agree with christian
Hi Tybul. I am training to become a data engineer on Azure and I was planning in joining the club of the "Junior section". However, I could not find what I was looking for.
For a fee, would you be able to to make interviews for real job scenarios? Would it be something you would consider to be part of your service package?
Your tutorials are great and it gives me confidence, great work!
Due to UA-cam's membership policy, I can't offer 1:1 meetings. However, I'm thinking about introducing a new membership tier that would include a monthly group call. In these sessions, we could cover different topics, brainstorm ideas, do live training or interviews, consult, or just have a casual chat. Please note, though, it would be a group setting.
@@TybulOnAzure thanks for replying. That group setting would be a good start
can you please explain medallion architecture?
It is mentioned in future episodes.
Hello Pybul, this course is very good. It was what I wanted to complement my architecture data master. I'm really not clear on how to load the same database every day without repeating the same data over and over again, with increasing daily cost. Can you give a real example of how to face and solve this problem?
Sure. Basically you would write your data extraction SQL queries in an incremental way.
Take a look here (learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) for more details.
@@TybulOnAzure Thanks Tybul
15:58 You mention that we can process all data from scratch. Is it also possible to easily process data from a certain point? For example, all data from the last 2 weeks.
It is possible to process only a subset of data - I'm mentioning this in the "Dynamic ADF" episode.
Hello Sir,
The question I will ask may not be relevant to topic of the video. Is there a specific reason to partition our Sales Orders Dataset by the Ingestion Date?
Yes - just to know when given set of data was ingested from the source.
Is raw layer also called 'staging'? I think it's used for silver layer
You can call it however you want, e.g. staging, raw or bronze. The important thing is to make everyone aware what it means and what kind of data it stores.
I talked more about data lake zones in 30th episode.
🤙 Thanks
Hi Tybul. Nice explanation . I have a query regarding PII. Can we anonymise the PII in the raw data itself ? or we anonymise the PII during transformations?
It depends on your requirements and what your legal team says, e.g. you might not be able to store PII data in raw layer at all. Then what? I can see three basic options:
1. Don't ingest PII data at all (if possible).
2. Get rid of PII data on the fly before writing the data to the raw layer.
3. Add an additional zone (raw-PII) with tight security measures, dump your raw data there, then read from it, get rid of PII data and save the outcome in regular raw layer. Optionally, set automatic removal of files from raw-PII layer after few days or so.
@@TybulOnAzure thanks for the detailed explanation .