Your data is in the Lakehouse, but now what? | Microsoft Fabric (Public Preview)

Поділитися
Вставка
  • Опубліковано 4 лип 2024
  • You've got your data into OneLake and a Lakehouse, but now what? What can you do with that data after you've landed it in Microsoft Fabric? Justyna walks us through different areas where you can leverage your data throughout fabric. From data warehouses to even Power BI!
    What is Data engineering in Microsoft Fabric?
    learn.microsoft.com/fabric/da...
    What is Spark compute in Microsoft Fabric?
    learn.microsoft.com/fabric/da...
    Develop, execute, and manage Microsoft Fabric notebooks
    learn.microsoft.com/fabric/da...
    OneLake shortcuts
    learn.microsoft.com/fabric/on...
    Justyna Lucznik
    / justynalucznik
    / justyna-lucznik
    📢 Become a member: guyinacu.be/membership
    *******************
    Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
    🎓 Guy in a Cube courses: guyinacu.be/courses
    *******************
    LET'S CONNECT!
    *******************
    -- / guyinacube
    -- / awsaxton
    -- / patrickdba
    -- / guyinacube
    -- / guyinacube
    -- guyinacube.com
    **Gear**
    🛠 Check out my Tools page - guyinacube.com/tools/
    #MicrosoftFabric #Lakehouse #GuyInACube
  • Наука та технологія

КОМЕНТАРІ • 42

  • @powerranger3357
    @powerranger3357 Рік тому +12

    Great video. Would like to see/walk through what the process is of loading those initial FAct/Dim tables that are being connected to in the video. The statement was made that no ETL/Data movement needs to be completed (and while thats true for the BI developer), I feel like its not an accurate statement when looking at the end to end process.

    • @adolfojsocorro
      @adolfojsocorro Рік тому +4

      I also think this is a confusing statement made in several videos and docs. The data has to somehow initially get to the lakehouse and subsequently refreshed on some schedule. I don't think Fabric eliminates those ETL processes.

    • @GuyInACube
      @GuyInACube  Рік тому +2

      The comment about no ETL/Data movement is true after you get it into OneLake. Then that one copy of the data can be reused across the different engines like was shown in this video. We did a video also about how to create your first lakehouse. ua-cam.com/video/SFta1_70T_U/v-deo.html.
      I also recommend going through the end to end tutorials that are in the Fabric documentation.

  • @prasadparab5638
    @prasadparab5638 Рік тому +2

    Awesome.... great features... Thanks a lot for sharing this information 🙏👏👏👍

    • @GuyInACube
      @GuyInACube  Рік тому

      Most welcome! Thanks for watching 👊

  • @rohtashgoyal
    @rohtashgoyal 7 місяців тому

    Liked the option of connecting with lake in direct query mode and getting performance of import mode.

  • @dogsborodave
    @dogsborodave Рік тому

    Hey! I remember Justyna from a breakout session at Build! Love GIAC that much more.

  • @hannesw.8297
    @hannesw.8297 Рік тому +4

    How does the Fabric accomplish the same speed of a "classic" PBI import model compared to the new Direct Lakehouse mode?
    Doesn´t this require the whole lakehouse to be in memory upfront?
    And, of course, thanks for the video!

    • @juniorholder1230
      @juniorholder1230 Рік тому

      EXACTLY what I'm not seeing being answered.

    • @GuyInACube
      @GuyInACube  Рік тому

      Data still needs to be in memory for Direct Lake datasets, however the engine tries to load just columns and not full tables based on the query patterns. But, at some point you could bump into limits based on your SKU and how much memory is available. Still work to be done on better paging and what not. In general, on my end, I still think of Direct Lake datasets as import datasets from a Power BI perspective. It's just way faster to load up the data and you don't have to refresh or make copies of the data.

  • @tuyetvyvu4638
    @tuyetvyvu4638 Рік тому +1

    Did anyone have the same problem. I had created a lakehouse with delta table, then I created report from default dataset. However, when I ran dataflow, data appeared in the lakehouse but my default dataset did not refresh

  • @eziola
    @eziola Рік тому

    What was the process to create a relational data model with custom columns and measures in the OneLake? I normally create the data model, columns, relationships and measures in Power BI desktop. How is a data model created using only OneLake for Power BI to connect to?

  • @ivankhoo93
    @ivankhoo93 Рік тому +6

    Hi, great video on the lakehouse, have a question, how about the data loading experience of loading the e.g 3.5 bn rows into the lakehouse initially?
    Would assume that loading 3.5 bn rows of data into the lakehouse would take some time although we can see the experience of using that data from the lakehouse is rather smooth which is great.

    • @GuyInACube
      @GuyInACube  Рік тому +1

      The loading of the data is covered in the creating your first lakehouse video - ua-cam.com/video/SFta1_70T_U/v-deo.html, as well as the OneLake video - ua-cam.com/video/wEcRTSNhtLg/v-deo.html.
      There are different ways to get data into OneLake - Data Factory Pipelines, Dataflows Gen2, shortcuts to existing storage accounts, etc...

  • @hilldiver
    @hilldiver Рік тому +1

    Coming from the Power BI side of things one point that's not so clear is how the OneLake "one copy" concept works with the lakehouse medallion architecture. As an example I have a dataflow gen2 which loads Contoso from a local SQL server to a lakehouse with no transformations, this is the bronze/raw data layer. If I then want to do some transformations, e.g. merging product category and subcategory into product table to help with star schema, how do I do this? Shortcuts don't help as far as I can see, as they're just a virtualisation of what exists. According to the introduction page of the end-to-end lakehouse tutorial it says "Create a lakehouse. It includes an optional section to implement the medallion architecture that is the bronze, silver, and gold layers." but this is the only mention I've been able to find?

    • @culpritdesign
      @culpritdesign Рік тому

      I think you can use spark, or stored procedures (or databricks if you're going that route) to shape your data into dimensional models. I think that's why they showed the spark notebooks, even if they did not explain it very explicitly. You can see at 2:33 there is a stored procedure section in the Lakehouse. I am learning this now too. I am used to using stored procedures but I want to learn Spark and Databricks.

  • @shahzadkheros
    @shahzadkheros 10 місяців тому

    great video

  • @lercosta01
    @lercosta01 Рік тому +1

    Pretty awesome!

    • @GuyInACube
      @GuyInACube  Рік тому

      We appreciate that! Thanks for watching 👊

  • @TheGreyFatCat
    @TheGreyFatCat Рік тому

    impressive response time using direct lake with that substantial row count - will the fabric capacity SKU impact this? what SKU was being shown in this demo?

    • @GuyInACube
      @GuyInACube  Рік тому +1

      Even with a flow SKU like an F2, you will see fast response times. Where your mileage will vary is around how much compute capacity you have. So with the lower SKUs, you will bump up against the limit faster and then start to encounter throttling. Although, right now, the new workloads don't count towards your Capacity Unit limit until August 1, 2023. So, you won't get throttled. This is a good time to test what it looks like to gauge how much capacity you may need.

  • @The0nlySplash
    @The0nlySplash 2 місяці тому

    Our all microsoft company just switched to a databricks lakehouse and now all of a sudden MS is offering their own product. Damn this looks good

  • @curiousjoe395
    @curiousjoe395 3 місяці тому

    How do I have a DW table showing up in the Lakehouse Endpoint please?

  • @scottbradley1194
    @scottbradley1194 11 місяців тому

    You mention connecting Excel to a lakehouse. What are the options to do this other than a Power BI dataset?

  • @geoleighyers
    @geoleighyers 7 місяців тому

    Q: Yeah, somehow when I try to update my data in Lakehouse, the tables on PowerBI do not refresh. Is there a limit of spark queries for Microsoft Fabric Trial?

  • @angmathew4377
    @angmathew4377 Рік тому +1

    Is data store in SqlServer or is it File storage lile csv, etc and query file for powerBi reports?

    • @GuyInACube
      @GuyInACube  Рік тому

      Check out the OneLake video we did with Josh Caplan - ua-cam.com/video/wEcRTSNhtLg/v-deo.html
      Data is stored in Delta Parquet files.

  • @rushankpatil
    @rushankpatil Рік тому +1

    How does it affect the CPU usage in Powerbi premium? with 3 billion rows

    • @GuyInACube
      @GuyInACube  Рік тому +1

      It depends. As always, it depends on data structure, data types, query patterns, etc... best way to know how it will work with your data is to test it. Right now, is a great time also as the new workloads aren't counting towards your Capacity Unit limit until August 1, 2023. So you won't be throttled as a result.

  • @gabrielmenezes3967
    @gabrielmenezes3967 Рік тому +2

    How do we handle security on DirectLake datasets?

    • @stevefox7469
      @stevefox7469 Рік тому

      Good question. e.g Row Level Security

    • @GuyInACube
      @GuyInACube  Рік тому +1

      One Security is really going to give you the ability to handle that at the OneLake level which will carry through to the different engines. Still waiting to hear news on when that will be available. That will be the answer with regards to Row Level Security.

  • @francisjohn6638
    @francisjohn6638 Рік тому +1

    This is cool :)

  • @ahsankhawaja474
    @ahsankhawaja474 Рік тому +1

    without RLS really it just POC, RLS will force this to direct query mode from direct lake mode, so we have to wait till One Security is out

    • @GuyInACube
      @GuyInACube  Рік тому

      I'm really looking forward to One Security. Also, I know the team is looking at how to optimize Direct Lake based on feedback so make sure you get that in at aka.ms/fabricideas if you have thoughts.

    • @Enidehalas
      @Enidehalas Рік тому +1

      So you are saying Fabric preview does not support RLS at the moment ? Also, I have never heard about One Security yet, nor found anything relevant on google; Where can I find more info ?

  • @adolfojsocorro
    @adolfojsocorro Рік тому +1

    Isn't the instantaneous nature of Direct Lake similar to today's direct connections to datasets?

    • @GuyInACube
      @GuyInACube  Рік тому

      No. It's different architecturally. Think of this still like an imported dataset, but the storage engine is loading the data from the delta parquet files within OneLake. And it's really fast at doing that.

    • @adolfojsocorro
      @adolfojsocorro Рік тому

      @@GuyInACube to the report itself, isn't it the same? It's just a connection to a dataset that, to it, is transparently updated. I love that it's faster, but to the report, functionally speaking, it's the same.

    • @Enidehalas
      @Enidehalas Рік тому +1

      @@adolfojsocorro Unlike DirectQuery, you don't lose some functionality (DAX functions, parts of RLS etc)

  • @roberttyler2861
    @roberttyler2861 11 місяців тому +1

    The friction this will bring between Engineering and Analytics teams.. oh god..