Advancing Fabric - A Quick Microsoft Fabric Tour

Поділитися
Вставка
  • Опубліковано 8 січ 2025

КОМЕНТАРІ • 18

  • @ianstats97
    @ianstats97 Рік тому +3

    I really like all your videos. I am wondering how CI-CD will work for data engineering in the new fabric workspace

  • @EngineerNick
    @EngineerNick Рік тому +1

    Thank you for the tour :)

  • @vjraitila
    @vjraitila Рік тому +1

    I agree that it is a bit confusing that "managed" tables are not also visible on the files side under the lakehouse workload. Does it also mean that you cannot refer to tables by path in your notebooks? And the same for tables in a warehouse?

  • @HiYurd
    @HiYurd Рік тому +1

    Great video. Really like seeing a demo. So how are the Synapse "Data Pipelines" in Fabric different than the Data Factory components in Fabric?

    • @chasedoe2594
      @chasedoe2594 Рік тому

      It is the same thing. When you click on the pipelines, it will just lead you to the data factory view. There seems to be now concept of ADF workspace anymore. After you save, it will appear as pipeline in Fabric workspace, which will include other stuff like DWH, LH, Spark etc.

  • @amitnavgire8113
    @amitnavgire8113 Рік тому +1

    Wheen i sign up for fabric it redirects me to powerbi and i dont see these options like data factory etc

  • @crouch.g
    @crouch.g Рік тому

    How is performance/cost managed?
    No choices on Spark pools etc.
    In one way this is great in another how will Fabric manage that in the background?

    • @jordanfox470
      @jordanfox470 Рік тому

      You can choose spark pools at the workspace level. Xsmall, small, medium, large, etc. Core availability is managed at the sku level for fabric across a 24 hr period though, so it's going to be super confusing and honestly extremely expensive if every workspace is popping on their own spark pools and warehouses.

  • @chasedoe2594
    @chasedoe2594 Рік тому +1

    I am still confused regarding the compute size of either Spark or Data Wareshouse compute.
    Fabrics seems to back by "per capacity" so does this means I need to reserve CPU capacity ahead of using spark or data warehouse compute?
    Coz normally in Synapse / databricks we can provision whatever we need without any reservation. I hope I misunderstand that coz if it requires reservation on the fabric level, then who the hell will pay for 24x7 capacity reservation for 4 hours a day spark utilization.

    • @AdvancingAnalytics
      @AdvancingAnalytics  Рік тому

      I agree, it's a confusing model! You pay for a capacity, and that capacity can be used for any of the Fabric workloads. If you get the smallest capacity, an F2, that gives you basically 4 CPUs of power. This is averaged over 24 hours - so you could run a spark job using 12 cores for a couple of hours on that F2 capacity, and that averages out well within your capacity. I think it's going to take a while before people figure out how best to work - do you take one big capacity and try and fit all of your different workloads inside it, or several smaller capacities so you don't end up throttling other important workloads.
      There are some hints at further things coming - surge protection etc so you can put some guiderails around workloads to keep some of your capacity for other activities

    • @crouch.g
      @crouch.g Рік тому

      @@AdvancingAnalytics there is no way I can see in fabric of choosing 12 cores for a couple of hours? Am I missing somthing?

    • @chasedoe2594
      @chasedoe2594 Рік тому

      ​@@AdvancingAnalytics Really, what happened if there is a surge amount of data like month-end processing? Will that mean it will just not finish the execution since my capacity reservation is not enough?
      Or do I need to do Synapse Dedicated SQL Pool style scaling which I need to call Azure API to scale up my capacity.
      Data Factory in Fabric is just the worst data factory ever. The only upside of ADF is DE can orchestrate jobs over various azure resources without handling any API integration. Now they took out all the Azure Activity, and requires AWS styles integration that everything just go through AWS lambda. The connection management is now switch to "PoweBI Data source" (now just call Data) which has horrible UI. So what's the point of using Data Factory. And the connection to Fabric cannot be created manually, and when it not just not automatic create them... Data Factory just cannot load data into Lake House/warehouse and there is nothing I can do about it. And yes we can do that API integration, we currently do custom made CI/CD for PowerBI report based on PowerBI API, and after Build 2023, MS did slightly update some of its API and my PowerBI CICD just broke. And that's the M/A cost for manually do the API integration to any system.
      And MS selling this product as "single workspace for everything" but what I speculate that, enterprise need to split workspace for data transformation workload (Spark+DF) and query workload (adhoc query + BI) anyway in order to manage compute resource allocation. So it will not be a single workspace anyway. Moreover managing per capacity on PowerBI Premium already a single dedicate job to do. Now that same capacity will be use with additional 4 more services. From my POV, this is just unnecessary complexity for the sake of so-called "simplicity and single compute and workspace."
      Synapse warehouse update is welcomed but also expecting serverless option of Dedicated SQL Pool to become serverless since AWS Redshift and GBQ already have those option on the table. Rather than giving serverless pricing model, now MS is all into RI styled pricing which is even worse for Engineer to manage the surge of usage. Yes there might be workaround method to manage such issue but other we just don't need to manage that on AWS and GCP.
      Since, Synapse has been pushed out of Azure in long run and without serverless pricing. And Data Factory might become just a crippled orchestrator. And managing Spark Compute Infrastructure might get even more complicated. Upside I can see is just PowerBI will leverage Lake Storage directly. Since we have to refactor/migrate out of Azure to O365.... I think we might re-evaluate cloud vendor entirely. I really not sure there is enough upside to be on Fabric. I can see use case for using Fabric as a single presentation layer. But apart from that, other PaaS products seems to be a better option. IMHO. And there still is no good option for serverless data warehouse.
      Sorry for a very long ranting.

  •  Рік тому

  • @amitnavgire8113
    @amitnavgire8113 Рік тому

    The copy ativity has so limited connectors and also the activities in data factory are so less... fabric is basically a scaled down version of Synapse

    • @chasedoe2594
      @chasedoe2594 Рік тому

      Yeah, I still say this is another repackage and rebrand BS from MS again. Yes they have new synapse that works with Delta but that's it, the rest is just make synapse workspace worse and even harder to managed.

  • @NeumsFor9
    @NeumsFor9 Рік тому

    I love how vendors create a market requiring people to rearrange their mental image map to do essentially the same things you've been doing.....but really what's the new business value? Where is THAT?!

  • @AnnChu-tb4hp
    @AnnChu-tb4hp Рік тому

    Are they copying Databricks?