Azure Synapse Analytics - Parquet, Partitions & PowerBI with SQL On Demand!

Поділитися
Вставка
  • Опубліковано 5 жов 2024

КОМЕНТАРІ • 35

  •  4 роки тому +3

    From a non-native English speaker, I'd love for you to get a decent microphone to improve your audio. Other than that, thanks for letting the world know how great Synapse is.

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому +2

      Thanks Gerardo, we are working on it. Thanks for watching and for the feedback.

    • @platanin2003
      @platanin2003 3 роки тому

      You can also turn CC in Spanish and voila! you can watch whatever you want in UA-cam translated to you

  • @vap72a25
    @vap72a25 3 роки тому

    Another Great video.. now I want to see if Dedicated pools work the same on Delta.

  • @falgunoza111
    @falgunoza111 4 роки тому +1

    Another great video Simon. Any of your videos explain delta tables in detail? I am struggling to find the good material on it.

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому +2

      Hey! Some of the earlier Synapse videos focus on comparing Delta between Synapse and Databricks, this covers some of the major functionality (merging, vacuum, optimise etc). I don't think I have a pure Delta overview knocking around - if that would be useful, I can add it to the list!
      Simon

    • @falgunoza111
      @falgunoza111 4 роки тому

      @@AdvancingAnalytics It'll be extremely helpful. I am not able to understand what could be the best way to keep the data latest. If we get file with value 1 and tomorrow new file with updated value2 do we keep overwritten file or delta table will keep both records etc etc.

  • @NealAmin
    @NealAmin 2 роки тому

    Superbly useful. Thank you

  • @jugnu1234
    @jugnu1234 4 роки тому +3

    Very nice video Simon, once SQL on-demand started supporting delta format it will be much easier to directly expose merged/enriched data form data lake (via hive/directquery) instead of loading it into Azure SQL DW first (for most of cases), what do you think?

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому +3

      Absolutely - once Delta support is in, we can do scalable processing in spark, land it properly in Delta tables, then query directly from SQL On Demand without having to move the data anywhere. There's a cost balance to work out, but it certainly opens up a lot of potential solutions that minimise data movement. Also - SQL OD on top of Delta tables that are ingesting a real-time stream, worth investigating when enabled!
      Simon

    • @leoafurlongiv
      @leoafurlongiv 4 роки тому +2

      @@AdvancingAnalytics it seems like the vanilla Spark pools are noticeably slower as compared to Databricks. What have you seen so far?

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому +1

      Hey Leo - apologies, missed this one originally!
      So I've not done any deep performance comparisons (feels a little unfair given it's still early in the preview for Synapse!) but yeah, I've generally found that Synapse pools are quicker to spin up than Databricks, but they seem to take a little longer to execute. We don't have quite the same diagnostic tools to dig into it, but I'll make a note that a like-for-like performance showdown could be interesting!
      Simon

  • @courtneyh1533
    @courtneyh1533 3 роки тому

    Hi Simon, thank you for creating these videos. Thanks to your content I have been able to confidently use Azure Synapse Analytics. I was wondering if you knew of a way to interface with a Azure Synapse Analytics Spark Cluster and it's database/tables through Sql Server Management Studio?

    • @AdvancingAnalytics
      @AdvancingAnalytics  3 роки тому

      Hey Chaed! The only way to do it currently is to use Serverless SQL as an intermediary. If the tables you create in Spark are parquet, they will be visible (via a metadata replica) to the Serverless SQL side which can be queried from Management Studio. If it's other types of hive table (Delta, Avro etc) then that won't work unfortunately!
      Simon

  • @johndavies4758
    @johndavies4758 2 роки тому

    Great video Simon, keep them coming. For some reason there is a lot of echo on the video, like you don't have enough soft furnishings or sounds deadening material in your studio.

  • @mzhukovs
    @mzhukovs 3 роки тому

    Great stuff

  • @zycbrasil2618
    @zycbrasil2618 3 роки тому

    Hi Simon. How to retrieve (include) the column pickupMonth when reading from partitions?
    Other option than option("basePath", path)?

  • @SonPham-zy2zp
    @SonPham-zy2zp Рік тому

    Hi, Is it possible to query your taxi View created in SQL on demand in a spark notebook?. it does not seem to work for me. Do you have any ideas why?

  • @axelvulsteke1444
    @axelvulsteke1444 4 роки тому

    Hi Simon! Your videos are really interesting! I'm used to databricks and delta a couple of years now but the delta tables are really needed to be readable by the SQL-on-demand. Do you have any idea when this will be possible?

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому +1

      Hey, thanks for watching! I agree, it's still a big missing gap that delta is not readable by SQL On-Demand. I'm hoping it's a feature they manage to implement before Synapse Workspace goes Generally Available, but I don't have any timescales I can share on that front!

    • @axelvulsteke1444
      @axelvulsteke1444 4 роки тому

      @@AdvancingAnalytics Thanks for your fast answer. I assume that MS is also aware of this missing gap! Keep up the good work.

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому

      @@axelvulsteke1444 Yep, they're definitely aware!

  • @vinayrana4664
    @vinayrana4664 3 роки тому

    Will it cost me some money if I am adding power bi In my synapse workspace?

  • @josephcarrier4900
    @josephcarrier4900 2 роки тому

    Can you save the results from SQL Scripts in a lake or do you have to import on to a local device?

    • @AdvancingAnalytics
      @AdvancingAnalytics  2 роки тому

      You'd need to write it to a table to save the query results. You can do that as a CREATE EXTERNAL TABLE AS SELECT command. Docs here: docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas

  • @jordanfox470
    @jordanfox470 4 роки тому

    Is there a way to write a pandas data frame back to the data lake?

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 роки тому

      Yep, you can use spark.createDataFrame(pandasdf) to take a pandas dataframe "pandasdf" and convert it to a spark dataframe, which you can then write out as usual.
      If you're dealing with huge dataframes, this might be fairly inefficient, so you'd want to switch and use Koalas (spark-friendly pandas) or just dataframes directly!
      Simon

    • @jordanfox470
      @jordanfox470 4 роки тому

      @@AdvancingAnalytics I'm fetching data from a rest api and using json_normalize and wasn't getting the same results when I tried to use sparks explode

  • @YT-yt-yt-3
    @YT-yt-yt-3 2 роки тому

    Good content you are doing in your channel however audio quality is poor(most of your videos) and making it hard to follow perhaps for non-native english audience.

    • @AdvancingAnalytics
      @AdvancingAnalytics  2 роки тому +1

      All the vids from the first 4-5 months are pretty bad sound quality. If you check the latest vids, they should be a lot better & the subtitles a lot more accurate!