DP-203: 09 - Data lake structure - Raw layer

Поділитися
Вставка
  • Опубліковано 30 лис 2024

КОМЕНТАРІ • 27

  • @christianraouldjatio1738
    @christianraouldjatio1738 Рік тому +16

    Hello
    I am preparing the dp-203 and your channel is simply magical.
    you explain complex concepts very simply. I really like your method with the whiteboard and the hand drawings.
    thank you very much for this quality content in your channel.
    I know it's a lot of preparation work behind the final video.😊🙏

  • @prabhuraghupathi9131
    @prabhuraghupathi9131 8 місяців тому

    Great content on how to structure/organize our data in Raw layer!!

  • @mdpdurawix1834
    @mdpdurawix1834 3 місяці тому

    Hi Piotr,
    It would be nice to see some video about how to verify the quality of the data in different layers before it reaches end user.
    Great video like always, please keep it up!

    • @TybulOnAzure
      @TybulOnAzure  3 місяці тому +1

      As a "Data Engineer" member of my channel, you’ll have the special privilege of suggesting topics for new videos and voting on them. If you have a topic in mind, I’d love for you to join as a member. I’ll be setting up the first poll once I complete the DP-203 course.

  • @PamTiwari
    @PamTiwari 2 місяці тому

    Wonderful

  • @listen_learn_earn
    @listen_learn_earn 3 місяці тому

    Hi Tybul,
    The contents that you are delivering is awesome!!!. Can you also please make a video on Data partitioning and its types and implementation.

    • @TybulOnAzure
      @TybulOnAzure  3 місяці тому

      What do you have in mind?

    • @listen_learn_earn
      @listen_learn_earn 3 місяці тому

      You are explaining the complex concepts in a nice way. So I thought it would be great to listen the partitioning concept from you.Beacause I found it somewhat confusing when I started learning it by myself.

  • @zouhair8161
    @zouhair8161 11 місяців тому

    i agree with christian

  • @amataratsu006-xs6hv
    @amataratsu006-xs6hv 2 місяці тому

    Hi Tybul. I am training to become a data engineer on Azure and I was planning in joining the club of the "Junior section". However, I could not find what I was looking for.
    For a fee, would you be able to to make interviews for real job scenarios? Would it be something you would consider to be part of your service package?
    Your tutorials are great and it gives me confidence, great work!

    • @TybulOnAzure
      @TybulOnAzure  Місяць тому +1

      Due to UA-cam's membership policy, I can't offer 1:1 meetings. However, I'm thinking about introducing a new membership tier that would include a monthly group call. In these sessions, we could cover different topics, brainstorm ideas, do live training or interviews, consult, or just have a casual chat. Please note, though, it would be a group setting.

    • @amataratsu006-xs6hv
      @amataratsu006-xs6hv Місяць тому

      @@TybulOnAzure thanks for replying. That group setting would be a good start

  • @smithapisharath9610
    @smithapisharath9610 3 місяці тому

    can you please explain medallion architecture?

    • @TybulOnAzure
      @TybulOnAzure  3 місяці тому

      It is mentioned in future episodes.

  • @tecain
    @tecain 6 місяців тому

    Hello Pybul, this course is very good. It was what I wanted to complement my architecture data master. I'm really not clear on how to load the same database every day without repeating the same data over and over again, with increasing daily cost. Can you give a real example of how to face and solve this problem?

    • @TybulOnAzure
      @TybulOnAzure  6 місяців тому +1

      Sure. Basically you would write your data extraction SQL queries in an incremental way.
      Take a look here (learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) for more details.

    • @tecain
      @tecain 6 місяців тому

      @@TybulOnAzure Thanks Tybul

  • @qaz56q
    @qaz56q 6 місяців тому

    15:58 You mention that we can process all data from scratch. Is it also possible to easily process data from a certain point? For example, all data from the last 2 weeks.

    • @TybulOnAzure
      @TybulOnAzure  6 місяців тому +1

      It is possible to process only a subset of data - I'm mentioning this in the "Dynamic ADF" episode.

  • @yakupbilen7612
    @yakupbilen7612 5 місяців тому

    Hello Sir,
    The question I will ask may not be relevant to topic of the video. Is there a specific reason to partition our Sales Orders Dataset by the Ingestion Date?

    • @TybulOnAzure
      @TybulOnAzure  5 місяців тому

      Yes - just to know when given set of data was ingested from the source.

  • @TheMapleSight
    @TheMapleSight 6 місяців тому

    Is raw layer also called 'staging'? I think it's used for silver layer

    • @TybulOnAzure
      @TybulOnAzure  6 місяців тому +1

      You can call it however you want, e.g. staging, raw or bronze. The important thing is to make everyone aware what it means and what kind of data it stores.
      I talked more about data lake zones in 30th episode.

  • @LATAMDataEngineer
    @LATAMDataEngineer 7 місяців тому

    🤙 Thanks

  • @chgeetanjali7919
    @chgeetanjali7919 8 місяців тому

    Hi Tybul. Nice explanation . I have a query regarding PII. Can we anonymise the PII in the raw data itself ? or we anonymise the PII during transformations?

    • @TybulOnAzure
      @TybulOnAzure  8 місяців тому

      It depends on your requirements and what your legal team says, e.g. you might not be able to store PII data in raw layer at all. Then what? I can see three basic options:
      1. Don't ingest PII data at all (if possible).
      2. Get rid of PII data on the fly before writing the data to the raw layer.
      3. Add an additional zone (raw-PII) with tight security measures, dump your raw data there, then read from it, get rid of PII data and save the outcome in regular raw layer. Optionally, set automatic removal of files from raw-PII layer after few days or so.

    • @chgeetanjali7919
      @chgeetanjali7919 8 місяців тому

      @@TybulOnAzure thanks for the detailed explanation .