Це відео не доступне.
Перепрошуємо.

3.06 Mastering Common Silver and Gold zone transformations with PySpark in Microsoft Fabric

Поділитися
Вставка
  • Опубліковано 16 сер 2024
  • • Microsoft Fabric For B...
    This video explores common transformation techniques in Silver and Gold zones that are part of Medallion architecture. I explain data enrichment and type conversion transformations and demonstrate how to use PySpark API's and methods to address these tasks.
    I also demonstrate how to process historical data from the Bronze layer using Window functions. Next, I explain core Kimball dimensional modelling concepts and demonstrate how they can be implemented using PySpark methods.
    Finally, I demonstrate creating aggregates.
    You can download the related demo notebook from here: github.com/faz...
    Chapters:
    00:00- Introduction
    02:21- Preview
    06:19- Lakehouse historical data storage strategy
    09:00- Demo start- preparing data
    10:24- Creating shortcuts to Bronze tables
    11:24- Notebook demo- reading data from shortcuts
    12:30- Inspecting data frame schema
    13:48- Data Type conversion transformations
    16:05- Ordering data
    20:00- Handling historical data using Window functions
    24:25- Data enrichment transformations
    25-45- Using regular expressions to parse text data
    26:40- Generating time dimension
    30:45- Dimensional modelling concepts
    32:12- Slowly changing dimensions (SCD)
    33:05- SCD Type-2 dimensions
    34:54- Surrogate keys
    35:32- Relationships between facts and dimensions
    37:00- Generating surrogate keys using monotonically_increasing_id function
    38:00- Distributed computing and Spark partitions
    41:31- Reducing data frame partition count
    43:02- How to link Fact and Dimension tables
    47:14- Incremental write into destination tables
    49:02- Using MERGE INTO query for destination write
    50:50- Aggregation transformations
    Please subscribe: / @fazizov
    Official Documentation:
    learn.microsof...
    learn.microsof...
    sparkbyexample...
    www.kimballgro...
    spark.apache.o...
    Hashtags:
    #datafactory, #microsoft,#microsoftfabric ,#azure, #dataengineering,#cloudcomputing, #dataanalytics, #lakehouse, #azuretutorial, #azuretraining, #datapipeline, #dataextraction , #dataintegration, #datatransfer, #dataflow, #spark, #deltalake, #synapse, #synapsedataenginering, #demo, #datalake, #transformation, #ingested, #datawarehouse, #dataintegration, #azuredatabricks ,#databricks, #bigdata, #bigdatatechnologies, #pyspark, #sparksql, #notebook ,#transformationvideo, #bronze, #medallion, #kimball, #dimensions , #modeling, #facts, #silver, #gold, #historical data, #dimensional

КОМЕНТАРІ • 6

  • @joseluiscorreasalazar5670
    @joseluiscorreasalazar5670 Місяць тому +1

    Thank you very much! This is one of the best tutorials on Fabric Lakehouses out there

    • @fazizov
      @fazizov  Місяць тому +1

      Thanks for watching!

  • @kevthebandit
    @kevthebandit 6 місяців тому +5

    Thanks for breaking this down!

    • @fazizov
      @fazizov  6 місяців тому

      Thanks for feedback!

  • @digitalevidenceofthings
    @digitalevidenceofthings 6 місяців тому +1

    This is incredible, exactly what I needed to see to ensure I'm on the right track. Thank you for taking the time to do this video!

    • @fazizov
      @fazizov  6 місяців тому +1

      Glad it was helpful, thanks!