From a non-native English speaker, I'd love for you to get a decent microphone to improve your audio. Other than that, thanks for letting the world know how great Synapse is.
Hey! Some of the earlier Synapse videos focus on comparing Delta between Synapse and Databricks, this covers some of the major functionality (merging, vacuum, optimise etc). I don't think I have a pure Delta overview knocking around - if that would be useful, I can add it to the list! Simon
@@AdvancingAnalytics It'll be extremely helpful. I am not able to understand what could be the best way to keep the data latest. If we get file with value 1 and tomorrow new file with updated value2 do we keep overwritten file or delta table will keep both records etc etc.
Very nice video Simon, once SQL on-demand started supporting delta format it will be much easier to directly expose merged/enriched data form data lake (via hive/directquery) instead of loading it into Azure SQL DW first (for most of cases), what do you think?
Absolutely - once Delta support is in, we can do scalable processing in spark, land it properly in Delta tables, then query directly from SQL On Demand without having to move the data anywhere. There's a cost balance to work out, but it certainly opens up a lot of potential solutions that minimise data movement. Also - SQL OD on top of Delta tables that are ingesting a real-time stream, worth investigating when enabled! Simon
Hey Leo - apologies, missed this one originally! So I've not done any deep performance comparisons (feels a little unfair given it's still early in the preview for Synapse!) but yeah, I've generally found that Synapse pools are quicker to spin up than Databricks, but they seem to take a little longer to execute. We don't have quite the same diagnostic tools to dig into it, but I'll make a note that a like-for-like performance showdown could be interesting! Simon
Hi Simon, thank you for creating these videos. Thanks to your content I have been able to confidently use Azure Synapse Analytics. I was wondering if you knew of a way to interface with a Azure Synapse Analytics Spark Cluster and it's database/tables through Sql Server Management Studio?
Hey Chaed! The only way to do it currently is to use Serverless SQL as an intermediary. If the tables you create in Spark are parquet, they will be visible (via a metadata replica) to the Serverless SQL side which can be queried from Management Studio. If it's other types of hive table (Delta, Avro etc) then that won't work unfortunately! Simon
Great video Simon, keep them coming. For some reason there is a lot of echo on the video, like you don't have enough soft furnishings or sounds deadening material in your studio.
Hi Simon! Your videos are really interesting! I'm used to databricks and delta a couple of years now but the delta tables are really needed to be readable by the SQL-on-demand. Do you have any idea when this will be possible?
Hey, thanks for watching! I agree, it's still a big missing gap that delta is not readable by SQL On-Demand. I'm hoping it's a feature they manage to implement before Synapse Workspace goes Generally Available, but I don't have any timescales I can share on that front!
You'd need to write it to a table to save the query results. You can do that as a CREATE EXTERNAL TABLE AS SELECT command. Docs here: docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas
Yep, you can use spark.createDataFrame(pandasdf) to take a pandas dataframe "pandasdf" and convert it to a spark dataframe, which you can then write out as usual. If you're dealing with huge dataframes, this might be fairly inefficient, so you'd want to switch and use Koalas (spark-friendly pandas) or just dataframes directly! Simon
Good content you are doing in your channel however audio quality is poor(most of your videos) and making it hard to follow perhaps for non-native english audience.
All the vids from the first 4-5 months are pretty bad sound quality. If you check the latest vids, they should be a lot better & the subtitles a lot more accurate!
From a non-native English speaker, I'd love for you to get a decent microphone to improve your audio. Other than that, thanks for letting the world know how great Synapse is.
Thanks Gerardo, we are working on it. Thanks for watching and for the feedback.
You can also turn CC in Spanish and voila! you can watch whatever you want in UA-cam translated to you
Another Great video.. now I want to see if Dedicated pools work the same on Delta.
Another great video Simon. Any of your videos explain delta tables in detail? I am struggling to find the good material on it.
Hey! Some of the earlier Synapse videos focus on comparing Delta between Synapse and Databricks, this covers some of the major functionality (merging, vacuum, optimise etc). I don't think I have a pure Delta overview knocking around - if that would be useful, I can add it to the list!
Simon
@@AdvancingAnalytics It'll be extremely helpful. I am not able to understand what could be the best way to keep the data latest. If we get file with value 1 and tomorrow new file with updated value2 do we keep overwritten file or delta table will keep both records etc etc.
Superbly useful. Thank you
Very nice video Simon, once SQL on-demand started supporting delta format it will be much easier to directly expose merged/enriched data form data lake (via hive/directquery) instead of loading it into Azure SQL DW first (for most of cases), what do you think?
Absolutely - once Delta support is in, we can do scalable processing in spark, land it properly in Delta tables, then query directly from SQL On Demand without having to move the data anywhere. There's a cost balance to work out, but it certainly opens up a lot of potential solutions that minimise data movement. Also - SQL OD on top of Delta tables that are ingesting a real-time stream, worth investigating when enabled!
Simon
@@AdvancingAnalytics it seems like the vanilla Spark pools are noticeably slower as compared to Databricks. What have you seen so far?
Hey Leo - apologies, missed this one originally!
So I've not done any deep performance comparisons (feels a little unfair given it's still early in the preview for Synapse!) but yeah, I've generally found that Synapse pools are quicker to spin up than Databricks, but they seem to take a little longer to execute. We don't have quite the same diagnostic tools to dig into it, but I'll make a note that a like-for-like performance showdown could be interesting!
Simon
Hi Simon, thank you for creating these videos. Thanks to your content I have been able to confidently use Azure Synapse Analytics. I was wondering if you knew of a way to interface with a Azure Synapse Analytics Spark Cluster and it's database/tables through Sql Server Management Studio?
Hey Chaed! The only way to do it currently is to use Serverless SQL as an intermediary. If the tables you create in Spark are parquet, they will be visible (via a metadata replica) to the Serverless SQL side which can be queried from Management Studio. If it's other types of hive table (Delta, Avro etc) then that won't work unfortunately!
Simon
Great video Simon, keep them coming. For some reason there is a lot of echo on the video, like you don't have enough soft furnishings or sounds deadening material in your studio.
Great stuff
Hi Simon. How to retrieve (include) the column pickupMonth when reading from partitions?
Other option than option("basePath", path)?
Hi, Is it possible to query your taxi View created in SQL on demand in a spark notebook?. it does not seem to work for me. Do you have any ideas why?
Hi Simon! Your videos are really interesting! I'm used to databricks and delta a couple of years now but the delta tables are really needed to be readable by the SQL-on-demand. Do you have any idea when this will be possible?
Hey, thanks for watching! I agree, it's still a big missing gap that delta is not readable by SQL On-Demand. I'm hoping it's a feature they manage to implement before Synapse Workspace goes Generally Available, but I don't have any timescales I can share on that front!
@@AdvancingAnalytics Thanks for your fast answer. I assume that MS is also aware of this missing gap! Keep up the good work.
@@axelvulsteke1444 Yep, they're definitely aware!
Will it cost me some money if I am adding power bi In my synapse workspace?
Can you save the results from SQL Scripts in a lake or do you have to import on to a local device?
You'd need to write it to a table to save the query results. You can do that as a CREATE EXTERNAL TABLE AS SELECT command. Docs here: docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas
Is there a way to write a pandas data frame back to the data lake?
Yep, you can use spark.createDataFrame(pandasdf) to take a pandas dataframe "pandasdf" and convert it to a spark dataframe, which you can then write out as usual.
If you're dealing with huge dataframes, this might be fairly inefficient, so you'd want to switch and use Koalas (spark-friendly pandas) or just dataframes directly!
Simon
@@AdvancingAnalytics I'm fetching data from a rest api and using json_normalize and wasn't getting the same results when I tried to use sparks explode
Good content you are doing in your channel however audio quality is poor(most of your videos) and making it hard to follow perhaps for non-native english audience.
All the vids from the first 4-5 months are pretty bad sound quality. If you check the latest vids, they should be a lot better & the subtitles a lot more accurate!