So…this may be a stupid question, but will we be able to create a shortcut to an azure sql table instead of having to pipeline copy the data and keep it updated?
Thank you for this detailed tutorial! Keep up the great work. Quick questions, what video editing tool do you all use to overlay the blue boxes and commentary?
Thanks for sharing! Is there a performance impact when using shortcuts? My understanding is data imported into the lakehouse as delta tables but with shortcuts the data remains in it's original format. Maybe a video about it?
I believe it uses the same in-memory processing under the covers which means the data is pulled into memory and processed using the data in place as a source for streaming the data during processing, but some verification would be good.
Thank you for this, I have a question though when we use Shortcut say to Amazon S3, and start using that data in Fabric for analysis, is that data not leaving s3 at all even to process In memory on the fabric end? I was comparing this to connecting an external HDD to a PC and accessing data , I was under impression that data gets in to memory of PC , if this is the same concept are we looking at egress charges ? Finally how will shortcuts work with high volume of data?
Q: I'm new in this field, and I'm curious can I use this tool to store historical data as part of migrating data and build my own data storage? Or if there are any tools that suitable instead of using SharePoint Folder? Thanks to GIAC for sharing great video!
How about changing schemas in my source data, like, when I make a shortcut to some data somewhere in the OneLake and have a sophisticated setup of joins and merges, and then one of the data owners decides to change the schema of their data. Will this immediately break my setup? How can we prevent this from happening? How does Fabric help to mitigate this?
One thing that's not so clear to me is how the OneLake "one copy" concept works with the lakehouse medallion architecture. As an example I have a landing lakehouse where I use a dataflow gen2 to load Contoso DB from a local SQL server with no transformations, this is the bronze raw data layer. If I then want to do some transformations, e.g. merging product category and subcategory into product table to help with star schema, how do I do this? Shortcuts don't help as far as I can see, as they're just a virtualisation of what exists. So I'd need to create a pipeline copy job with the transformations and land in a new silver lakehouse, but then I've copied the data, which goes against the onelake concept? Am I missing something fundamental here?
One Copy means that you don‘t have the same table in different formats or data stores. You don‘t have to copy a delta table on the data lake to the data warehouse or import the data of your gold layer to Power Bi. So you avoid redundant copies of the same tables. Nevertheless you want to build a medaillon architecture on the data lake for flexibility and scaling
I believe Fabric is somewhat of a consultation of the tools to handle structured data warehousing and lakehouse for unstructured data with powerbi and data factory thrown in for good measure.
one of the main differences is that Synapse is a PaaS offering while Fabric is a SaaS offering aiming to provide a unified analytics solution. So it makes sense to rather bring Synapse to the Power BI offering than the other way around. When it comes to user experience, there are far more users of Power Bi than of Synapse. So much more people feel at home in the Power Bi environment
correct me if im wrong, i think the general idea is creating shortcuts to all your sources (onedrive, cloud databases, ADLS, datalake, data warehouse or whatever), so you don't have to perform ETL/API to physically transfer/replicate the data into somewhere else. that sounds convenience isnt it, especially when u own the biggest AI bot in the world, they get access to the entire data in the world
So…this may be a stupid question, but will we be able to create a shortcut to an azure sql table instead of having to pipeline copy the data and keep it updated?
This dude is awesome. Data by day, BJJ by night.
Awesome... Thanks for explaining in details 👏🙏
Most welcome! Thanks for watching 👊
Beatiful. I am waiting when otgnization will decide to implement MS Fabric.
Thank you for this detailed tutorial! Keep up the great work. Quick questions, what video editing tool do you all use to overlay the blue boxes and commentary?
Thanks for sharing! Is there a performance impact when using shortcuts? My understanding is data imported into the lakehouse as delta tables but with shortcuts the data remains in it's original format. Maybe a video about it?
I believe it uses the same in-memory processing under the covers which means the data is pulled into memory and processed using the data in place as a source for streaming the data during processing, but some verification would be good.
Love this! So inspiring and clear on how this new element works!
Also, how can I get my hands on a "guy in a cube" sticker for my laptop? ❤
Thank you for this, I have a question though when we use Shortcut say to Amazon S3, and start using that data in Fabric for analysis, is that data not leaving s3 at all even to process In memory on the fabric end? I was comparing this to connecting an external HDD to a PC and accessing data , I was under impression that data gets in to memory of PC , if this is the same concept are we looking at egress charges ? Finally how will shortcuts work with high volume of data?
Lakehouse Links are an interesting feature to use, it looks like a database link for Cloud endpoints.
Q: I'm new in this field, and I'm curious can I use this tool to store historical data as part of migrating data and build my own data storage? Or if there are any tools that suitable instead of using SharePoint Folder? Thanks to GIAC for sharing great video!
How about changing schemas in my source data, like, when I make a shortcut to some data somewhere in the OneLake and have a sophisticated setup of joins and merges, and then one of the data owners decides to change the schema of their data. Will this immediately break my setup? How can we prevent this from happening? How does Fabric help to mitigate this?
Thank You !
Hi Guys, whe i create a pipeline, is possibel to "copy" using an data gateway like i do in powerbi?
How can we create Linked Server for Fabric lakehouse in ssms.
what is the difference between One LAke and LAke house?
Reflex? - Looks interesting.
One thing that's not so clear to me is how the OneLake "one copy" concept works with the lakehouse medallion architecture. As an example I have a landing lakehouse where I use a dataflow gen2 to load Contoso DB from a local SQL server with no transformations, this is the bronze raw data layer. If I then want to do some transformations, e.g. merging product category and subcategory into product table to help with star schema, how do I do this? Shortcuts don't help as far as I can see, as they're just a virtualisation of what exists. So I'd need to create a pipeline copy job with the transformations and land in a new silver lakehouse, but then I've copied the data, which goes against the onelake concept? Am I missing something fundamental here?
it's probably up to your design patterns. You could use Azure Data Factor, Spark Notebooks, Lakehouse Stored Procedures, Data Flows.
One Copy means that you don‘t have the same table in different formats or data stores. You don‘t have to copy a delta table on the data lake to the data warehouse or import the data of your gold layer to Power Bi.
So you avoid redundant copies of the same tables.
Nevertheless you want to build a medaillon architecture on the data lake for flexibility and scaling
How does Synapse differ from Fabric? Why didn't the team use the Synapse platform and add features if there were new ones?
I believe Fabric is somewhat of a consultation of the tools to handle structured data warehousing and lakehouse for unstructured data with powerbi and data factory thrown in for good measure.
one of the main differences is that Synapse is a PaaS offering while Fabric is a SaaS offering aiming to provide a unified analytics solution.
So it makes sense to rather bring Synapse to the Power BI offering than the other way around.
When it comes to user experience, there are far more users of Power Bi than of Synapse. So much more people feel at home in the Power Bi environment
Awesome video and first
Didn't we already have a hundred different ways to store structured and unstructured data? What does lakehouse give us that say, NTFS didn't?
correct me if im wrong, i think the general idea is creating shortcuts to all your sources (onedrive, cloud databases, ADLS, datalake, data warehouse or whatever), so you don't have to perform ETL/API to physically transfer/replicate the data into somewhere else.
that sounds convenience isnt it, especially when u own the biggest AI bot in the world, they get access to the entire data in the world
it‘s not so much about storing the data but the ability to work with the data, govern the data and getting insights out of the data
Wow