Note that doing this approach could result in the underlying table being queried multiple times on your data source to satisfy the multiple Power Query queries.
Could you create one query of the source in power query and then "refer" to it multiple times rather than copy? Would this reduce the number of times the source is queried? I've done this many times for consistency sake, but am not sure if it results in the source being hit multiple times. I usually then disable load of the "source" query so it does not become a table in my data model.
@@danneubauer6474That won't help. Each query is evaluated in isolation. Also for cleaner reference to just a single column use: MyTable[[MyColumn]]. This way you don't have to convert list to table vs MyTable[MyColumn]
From what I can tell the only way to truly avoid this is with premium capacity or premium per user dataflows which then do allow you to do true query chaining with the results of each query stored to disk for use as a source by downstream queries.
Rather than creating a new item "Not Supplied", I use "" or "". That way those items appear first in an ascending sorted list or a slicer. Plus they stand out when looking at many rows in a table. Now that it is clear they exist in my data, I can take steps to address them.
Dude I can't tell you how much your videos have helped me. I inherited a mess of a database in my new position and had no one to really learn from. You rock at teaching
so thank you for this video. there was not too much talking, but only the amount to provide support for the steps. i love how you broke the task down into easy to follow steps and explaining why it was done that way. 🙂
Dude, you just saved my life! I was looking for it and all the results I got were like "how to replace ID with the name", and I wanted the opposite! You just got a subscriber from Brazil! Great content! Cheers!
Excellent video! Exactly what I needed with no unnecessary filler. As a budding data engineer, this was a huge help! You are both a scholar and a gentleman! :-D
That is why I doubt if it really useful to use that technique. I see huge cons (long merging in PQ) and tiny pros (a little bit more usability). If we speak about a small model, there should not be any noticable difference in productivity, but if we consider a big one, we could merging in PQ would be painful. So why should we do all of that and pay more then receive? Or when this approach really matters and helps?
@@hukumka2601 I have the same struggle currently, as I need to decide which approach to take for generating relationship keys (create integers vs concatenated keys). I like integer approach however it slows down refresh time significantly. Luckily I have a premium PBI capacity so I am considering moving most of the data transformations to data flows.
Hi Patrick! I had to create IDs and I did a very similar process but instead of 'right click -> add as a new query', I've duplicated the entire table ('right click on main table-> duplicate') and from them I've performed exact same steps that you, what is the difference? Thanks for creating such great videos!
First of all, mapping tables are cool:) When your end user is someone who knows how to use PBI, then this method may come in handy to clean up the main table, however most people on the end of the chain possibly just know the frontend and only things they will change will be the filters. Therefore there is no need to create an artificial (in this particular example) mapping table. Nevertheless great video, I really enjoy your content
@@impala4641 I find star schemas useful for speeding up report performance, however when one needs to build your star schema from your fact table, this can really reduce refresh performance, so one needs to balance these two points. If you're scheduling refreshes you might be able to offload refresh performance to off peak hours. Win win.
@@impala4641 it’ll reduce the RAM required to hold the data model, and also make some DAX calculations easier. Power BI is designed to use Star Schemas.
Thanks, I actually used this yesterday and your steps worked like a champ, YA HIGH FIVE! Thanks for explaining what trim does too, that tool is very helpful. Going to put my gloves back on, clean clean clean data haha good thing I have a janitorial degree from the Corp. haha
Patrick, very cool video.. Normaly you receive multiple tables and have to do something with it, now you give an example if you receive one big (=wide) table. Thank you for this interesting example.
Again, I went back here to refreshen up the ideas you got here Patrick. It really helped me a lot with my stuff! How about using this method in Import mode of connection then the data is updated, Does the other table will also be updated together with your other keys? Thanks a lot man!
That is seriously good stuff! I've been thinking about something similar and now I have the ultimate solution to make this work. My only more-burning question at the moment is how do I get one of those Power BI coffee mugs... lusting after that!
I use this method all the time, works very well! Cool to see you guys use the same methods! What's the best practice for troubleshooting the data once you've broken everything out? For example, if you need to sift through that fact table by airline name, it become rather tedious to go back and forth between the tables matching keys. Worse yet, if you have multiple dimensions that are filtering the fact table, it can be difficult to identify the proper keys to look through the fact table. If the source is a relational database, this could be done in the database, but in this situation, the source is a CSV or other file, so that type of out-of-Power BI querying is not possible. Thanks!
I usually create a table visual in report space with the columns I need, and just add some slicers for the dimensions I care about QCing. Then I browse the data in report space rather than in query space.
Hi Patrick, thanks for the video. I have one question: Why we don't use AirlineName directly as the "Key"? We can skip the step of merge and it should be faster. Isn't it? Or I miss anything?
Hi Patrick , I love your explanation very much , actually im beginner , pls help me below , i want to lookup one particular product in another table , but that product was booked by two different customer , finally it was sold to one customer , how to create a relationship for this from one table to another table
Great video! If you get additional data, let’s say, with a new airline, will the refresh process take care of everything? Meaning add the ID to your airline table?
I do this also, but instead of merge I do a transform with my buffered table. So if I have multiple columns, it's one step. I usually do TblID[ID]{List.PositionOf(TblID[Element], [ThingToReplace]} I don't know if the merges would be faster
Is having the relation on ID giving a better performance than just having it on airlinename in the airline helper table? Is the performance gain of this worth the performance overheat you mention for generating the keys? I would (until I saw this video) just have made the link on airline name....
Great job, Patrick. This is helpful. I am going to use this technique in my dataflows so that it doesn't slow down the refresh. My question is about CamelCase. I heard (from the Tabular Editor Best Practices Analyzer) that CamelCase is not best practice. Why do people say that and what do you think?
Great video, I do this when I want to split up a column that has multiple values, such as a tags column that would have a list of tags delimited by semi-colon. That way the user can select a single tag and see all matching rows that have that tag. Question: Why duplicate instead of reference if you are doing multiple columns?
Maybe a newbie question, but still. I come from SAP BW world. How to ensure a new index will be automatically created and a new entry will be automatically added into this Airline dimension table when a new unique Airline name appears in source data (excel, csv, table,...).?
Can you still use this method if the incoming values for Airline are constantly changing? (e.g., new airlines are regularly being added to your original table)
Great video Patrick. Such a clean way to create lookup table and join. I have 2 related questions. 1. If I need to join 2 tables on multiple columns (e.g. composite keys), do I create a lookup table with those columns from the 1 side of the 1:N relationship? 2. If I need to join on BETWEEN clause, e.g. table1.date between table2.startdt and table2.enddt, what would be the best approach? Thanks in advance.
Your report should be faster when interacting with visuals. This is because joins are more performant when using integer data types instead of text strings (especially for bigger models with millions of records). On the negative side, it can make your dataset refreshes slower because of the additional steps needed to create these keys.
Would be REALLY cool if power query had an automated way of doing this. Right click on column -> "Create Unique Dim Table" and it does this all automatically.
Will a relationship between two integers in PowerBI perform faster than a relationship between two strings? In SQL I would say yes, but for Power BI - I don't know.
Thank you so much for the video. How to connect Dynamic folder(File name changes day to day) ? Extracting the data from new file through refresh is getting failed?
You must be a mind reading Jedi. I needed this as I am currently doing a similar approach through much more convoluted methods of duplicating tables and removing columns to get down to a basic ID Table. Your method will save me much time and my mind and emotional state are very appreciative. Have a question of how I can possibly have my company's IT department give me access to view the relationships they have built and had created through our data warehouse. Currently, they have many many tables with similar or exact names of different columns and attributes. It's making me go through a process of trying out different combinations figure out where they have pulled the data from and what relationships they have built between the two.I relate it to shooting a target with an arrow, in the dark, blindfolded, and with one arm tied behind my back. I'm not lazy, I just want to be efficient and not waste my time with the guessing game approach I find myself in.
Hi Patrick ! love the video ! Just one question that I couldn't find answer to : does matching on Id's rather than Airplane name improves the performance of the model ? Thanks à lot
Yooow! Hi Patrick. You asked if I would make this in a diferant way. Yes I would. Until 5:21 I make just like you. But after, I wold add a custom step as "Table.Buffer()", and I wouldn't "add as new query". I wold make a Reference, rename this new query as "Arlines", remove other coluns, and make the same you do until 6:41. Ok, ok calm down you are thinkin "But Daniel, this would make a 'A cyclic reference ' and this won't work". So, to resolve this I make a new Reference to the 1st query, rename to "fData" (or something) and then I Merge Query with "Arlines". To end I would hide my 1st table from my data model. So let me justify, I wold do all this work because if I find out that I forgot to make a repalce, I would just need to make this new step in one Query. The way you did if my file change just a little bit, I would need to change twice. You think that this would gain or loss process time? (sorry possible spelling errors, I don't speak english very well)
@@biexbr I still can't understand what does table.buffer() does. When I update the list with a new different row (ie: New airline) the lookup does not update. so after merging the rows became null.
I really liked the way of subtly taking performance into consideration (Look for int), is there a visual form of execution plan? What would be the equivalent of SQL Execution Plan to use with Power BI? Exists?
This is a beautiful and relevant video Patrick. I've often found myself thinking about this business case where the dimension data is long text strings and doing joins on such dimensions is fraught with uncertainty at the best of times. There is one use case I find myself thinking about, which the video does not address so I'm pinging to understand how you have thought about this. Imagine the flat file had different values for "TWA", "Transworld Airlines", "Trans World Airlines". This technique would create a different custom key for each of these entries - in reality however, these should point to the same key. Therefore, using this technique in power query will not cover this particular use case. Up in my head, the only way to do this is through manual intervention where the key is inserted through a manual scan of the table to ensure that "TWA", "Transworld Airlines" and "Trans World Airlines" all point to the same key. Short question - is there a way to reject this "lazy" technique and become more "efficient"??!!!?!
I'm sure you've already found your answer, but a creating a Transformation table would solve that problem. Power Query's documentation would show you how to do that.
What if we have to consider multiple columns for this approach? Also, if we have to use a table as global filters across different sheets developed using different table. I.e. star schema.
Hi Patrick can please do a video on aggregation , i have created an aggregated table using fax and want to create a dynamic filter for a column not included in the aggregation table
All the IDs get regenerated with each refresh so new values shouldn't present a problem. Because of this you definitely don't want to take a dependency on ID values in your reports since "US Air" could = 1 today but = 2 tomorrow. In dimensional modeling parlance these are "Surrogate Keys" and should never be exposed to users. It is best practice to hide surrogate ID columns in the model.
@@krishorrocks639 Thanks for commenting on this. I’m new in this field and looking for further clarification. If I do want to depend on an ID from refresh to refresh, how is this typically done?
What happens if with time a new airline name appears in the source file? Will it be added automatically to the "Airline" table? Or will it result in an airline name without a related key?
What's the difference between this method and using a reference table? (ie right click / duplicate - then remove all other columns etc..?) - Great vid... as always !
From a data source perspective there isn't any difference between reference and duplicate. Both result in two independent queries that will query all the way back to the data source. When you reference, changes in the referenced query will affect the referencing query. Duplicate is just a one time copy/paste of query steps into a new query.
What if I want to make two columns as my primary key, I mean, instead of doing just Airline Name as a Primary Key, I want Primary Key and Claim Site, both of them as my primary key?
Patrick, I am using your solution but I am facing performance problems when I join (inner join) both tables by the text key column (number of rows 1.000.000 aprox). Thanks
Nice one Patrick! QQ, assuming I am using this method with a DW what happen if new data (Airline in your case) get added? Would the index capture the new lines? Cheers.
Note that doing this approach could result in the underlying table being queried multiple times on your data source to satisfy the multiple Power Query queries.
Could you create one query of the source in power query and then "refer" to it multiple times rather than copy? Would this reduce the number of times the source is queried? I've done this many times for consistency sake, but am not sure if it results in the source being hit multiple times.
I usually then disable load of the "source" query so it does not become a table in my data model.
@@danneubauer6474That won't help. Each query is evaluated in isolation. Also for cleaner reference to just a single column use: MyTable[[MyColumn]]. This way you don't have to convert list to table vs MyTable[MyColumn]
From what I can tell the only way to truly avoid this is with premium capacity or premium per user dataflows which then do allow you to do true query chaining with the results of each query stored to disk for use as a source by downstream queries.
Rather than creating a new item "Not Supplied", I use "" or "". That way those items appear first in an ascending sorted list or a slicer. Plus they stand out when looking at many rows in a table. Now that it is clear they exist in my data, I can take steps to address them.
Thanks for sharing that John! nice trick to get them to the top.
AA😂😂😊😅😊😅,p
S
,,,zx,,,,,zzzz**
Dude I can't tell you how much your videos have helped me. I inherited a mess of a database in my new position and had no one to really learn from. You rock at teaching
That's awesome to hear. Thanks for watching! 👊
What I like is how the star schema changes how you look at you data. It organizes your thought. Segments your perspective.
so thank you for this video. there was not too much talking, but only the amount to provide support for the steps. i love how you broke the task down into easy to follow steps and explaining why it was done that way. 🙂
Dude, you just saved my life! I was looking for it and all the results I got were like "how to replace ID with the name", and I wanted the opposite! You just got a subscriber from Brazil! Great content! Cheers!
Excellent video! Exactly what I needed with no unnecessary filler. As a budding data engineer, this was a huge help! You are both a scholar and a gentleman! :-D
Thanks for confirming I wasn't that nuts when doing exactly this. The merge step can take a LONG time for bigger tables though. Nice video!
Indeed
That is why I doubt if it really useful to use that technique. I see huge cons (long merging in PQ) and tiny pros (a little bit more usability).
If we speak about a small model, there should not be any noticable difference in productivity, but if we consider a big one, we could merging in PQ would be painful.
So why should we do all of that and pay more then receive? Or when this approach really matters and helps?
@@hukumka2601 I have the same struggle currently, as I need to decide which approach to take for generating relationship keys (create integers vs concatenated keys). I like integer approach however it slows down refresh time significantly. Luckily I have a premium PBI capacity so I am considering moving most of the data transformations to data flows.
I’m just learning Power BI and your videos are so helpful and fun to watch! Thank you so much!
Hi Patrick! I had to create IDs and I did a very similar process but instead of 'right click -> add as a new query', I've duplicated the entire table ('right click on main table-> duplicate') and from them I've performed exact same steps that you, what is the difference? Thanks for creating such great videos!
Thanx Patrick, when comparing this method with using Dax combine values is there a performance difference?
I can’t thank you enough for this video! I think this will solve our problem at work! 🙌🏻🙌🏻
Awesome! 👊
First of all, mapping tables are cool:)
When your end user is someone who knows how to use PBI, then this method may come in handy to clean up the main table, however most people on the end of the chain possibly just know the frontend and only things they will change will be the filters. Therefore there is no need to create an artificial (in this particular example) mapping table.
Nevertheless great video, I really enjoy your content
I get what you say. But will this improve performance or size of the data? Will that be a valid reason to create this kind of tables?
@@impala4641 I find star schemas useful for speeding up report performance, however when one needs to build your star schema from your fact table, this can really reduce refresh performance, so one needs to balance these two points. If you're scheduling refreshes you might be able to offload refresh performance to off peak hours. Win win.
@@impala4641 it’ll reduce the RAM required to hold the data model, and also make some DAX calculations easier. Power BI is designed to use Star Schemas.
Thanks dude. I didn't know about this method which doesn't use the "duplicate function". Much easier!
Love it! 👊
You don't even imagine how I learn looking your videos, thousands thanks for your great job
Your instructions are certainly stepping stones towards becoming "a big deal", keep 'em comin'
BAM! 👊 Thanks Alexius!
Hey Patrick, your videos are just awesome. Thank you so much for such easy to understand and accurate explanations ! Great Job
Appreciate that! Thanks for watching 👊
This is great! Thank you so much! You guys make this fun to learn. Keep up the good work!
Thanks, I actually used this yesterday and your steps worked like a champ, YA HIGH FIVE! Thanks for explaining what trim does too, that tool is very helpful. Going to put my gloves back on, clean clean clean data haha good thing I have a janitorial degree from the Corp. haha
Awesome!! I am going to try this method! Thank you for walking through it.
Nice video. I like these quick an useful data wrangling type videos. Please keep them up.
I have been searching for you. Great video🥰🥰 thanks a lot
Patrick, very cool video.. Normaly you receive multiple tables and have to do something with it, now you give an example if you receive one big (=wide) table. Thank you for this interesting example.
Again, I went back here to refreshen up the ideas you got here Patrick. It really helped me a lot with my stuff! How about using this method in Import mode of connection then the data is updated, Does the other table will also be updated together with your other keys? Thanks a lot man!
Brilliant way to generate look up tables. Thank you
Thanks man, one hour looking for this
Thanks ! Excellent advice at 6:45 !
That is seriously good stuff! I've been thinking about something similar and now I have the ultimate solution to make this work. My only more-burning question at the moment is how do I get one of those Power BI coffee mugs... lusting after that!
This is incredibly helpful. Thank you so much!
Happy to help. Thanks for watching! 👊
Thanks, Patrick. Good stuff as always!
Thank you! 👊
I use this method all the time, works very well! Cool to see you guys use the same methods!
What's the best practice for troubleshooting the data once you've broken everything out?
For example, if you need to sift through that fact table by airline name, it become rather tedious to go back and forth between the tables matching keys. Worse yet, if you have multiple dimensions that are filtering the fact table, it can be difficult to identify the proper keys to look through the fact table.
If the source is a relational database, this could be done in the database, but in this situation, the source is a CSV or other file, so that type of out-of-Power BI querying is not possible.
Thanks!
I usually create a table visual in report space with the columns I need, and just add some slicers for the dimensions I care about QCing. Then I browse the data in report space rather than in query space.
This was so helpful! Now I’m trying to add more columns from my two fact tables to the new tables 😅 without my PK’s yet.
Yep. All tallies with what I do ! Thanks for confirming !
Thanks for watching! 👊
Good Stuff Patrick!!
Appreciate that Nelson! 👊
Come back to the UK Nelson 😊
Wolfstar eheh I will eventually! But for now I’m enjoying this February’s - almost summer weather - in Lisbon :)
Really useful tutorial for messy data. Thanks!
The most beautiful part is that it makes that column disappear from original table!
Hi Patrick, thanks for the video.
I have one question: Why we don't use AirlineName directly as the "Key"? We can skip the step of merge and it should be faster. Isn't it? Or I miss anything?
Yeah it was just the example that was used. Definitely different ways you can do it.
Guy in a Cube thanks for confirming! 🤗
Hi Patrick , I love your explanation very much , actually im beginner , pls help me below , i want to lookup one particular product in another table , but that product was booked by two different customer , finally it was sold to one customer , how to create a relationship for this from one table to another table
Nice video, do we have other methods to remove many to many in power bi?
This video helped me a lot thanks was getting low percentages merging tables
Great video! If you get additional data, let’s say, with a new airline, will the refresh process take care of everything? Meaning add the ID to your airline table?
Just what I needed. Thank you
Great one Patrick..I'd add one more step to yours and hide the airline id from the transactions table
This was so helpful! Now I’m trying to add more columns from my two fact tables to the new tables 😅 without my PK’s yet and having some difficulty 😢
Thanks for this, now I know how to narrow my fact tables down
I do this also, but instead of merge I do a transform with my buffered table. So if I have multiple columns, it's one step. I usually do
TblID[ID]{List.PositionOf(TblID[Element], [ThingToReplace]}
I don't know if the merges would be faster
Is having the relation on ID giving a better performance than just having it on airlinename in the airline helper table? Is the performance gain of this worth the performance overheat you mention for generating the keys? I would (until I saw this video) just have made the link on airline name....
Thanks a ton GuyInACube! Super useful video here :)
Great job, Patrick. This is helpful. I am going to use this technique in my dataflows so that it doesn't slow down the refresh. My question is about CamelCase. I heard (from the Tabular Editor Best Practices Analyzer) that CamelCase is not best practice. Why do people say that and what do you think?
Great video, I do this when I want to split up a column that has multiple values, such as a tags column that would have a list of tags delimited by semi-colon. That way the user can select a single tag and see all matching rows that have that tag.
Question: Why duplicate instead of reference if you are doing multiple columns?
Hi Zoe, as far as I know, you cannot merge referenced queries, only duplicated ones
Thank you Luis! That makes sense
This was perfect. Thank you!!
Cool video, thanks! Do you have a video about caveats of joining on strings? Tnx!
We do not. We should definitely do something about strings. Lots of things to consider.
Came looking for exactly this. Great stuff!!
looking for what?
Surrogate key?
Thank you so much ! This helped me a lot.
Maybe a newbie question, but still. I come from SAP BW world. How to ensure a new index will be automatically created and a new entry will be automatically added into this Airline dimension table when a new unique Airline name appears in source data (excel, csv, table,...).?
Can you still use this method if the incoming values for Airline are constantly changing? (e.g., new airlines are regularly being added to your original table)
This video helped me a LOT, thank you so much
Great video Patrick. Such a clean way to create lookup table and join. I have 2 related questions.
1. If I need to join 2 tables on multiple columns (e.g. composite keys), do I create a lookup table with those columns from the 1 side of the 1:N relationship?
2. If I need to join on BETWEEN clause, e.g. table1.date between table2.startdt and table2.enddt, what would be the best approach?
Thanks in advance.
@ANIRBAN PAL, did you solve your two challenges?
How does this affect performance?
Your report should be faster when interacting with visuals. This is because joins are more performant when using integer data types instead of text strings (especially for bigger models with millions of records). On the negative side, it can make your dataset refreshes slower because of the additional steps needed to create these keys.
I wish that I had watched this video last week! I did this in a much more manual way.
Would be REALLY cool if power query had an automated way of doing this. Right click on column -> "Create Unique Dim Table" and it does this all automatically.
Best explanation ever.
Thanks for watching! 👊
Will a relationship between two integers in PowerBI perform faster than a relationship between two strings? In SQL I would say yes, but for Power BI - I don't know.
Love every your video, huge help to me. Thank you so much!
Thank you so much for the video.
How to connect Dynamic folder(File name changes day to day) ? Extracting the data from new file through refresh is getting failed?
Does it work if new rows get added in dimension table and fact table. Will the new ids automatically get mapped?
You must be a mind reading Jedi. I needed this as I am currently doing a similar approach through much more convoluted methods of duplicating tables and removing columns to get down to a basic ID Table. Your method will save me much time and my mind and emotional state are very appreciative. Have a question of how I can possibly have my company's IT department give me access to view the relationships they have built and had created through our data warehouse. Currently, they have many many tables with similar or exact names of different columns and attributes. It's making me go through a process of trying out different combinations figure out where they have pulled the data from and what relationships they have built between the two.I relate it to shooting a target with an arrow, in the dark, blindfolded, and with one arm tied behind my back. I'm not lazy, I just want to be efficient and not waste my time with the guessing game approach I find myself in.
ask them. Develop a relationship with someone in ICT.
Thanks for the video. It was really helpful.
Thank you for giving good information
Hey Patrick, great video! Do the Airline query update the names when new ones is added in your ERP system?
Great video, Patrick!
Great video, thanks for all the help
Hi, Patrick, if new Airline is added to original table will be it auto added to new query
Thanks
John
Hello All, Is there a method of automating the process of Merging two tabular model ? I am using manual method in BISM normalizer
Hi Patrick ! love the video ! Just one question that I couldn't find answer to : does matching on Id's rather than Airplane name improves the performance of the model ? Thanks à lot
How do you add the ID back into the fact table if you want to avoid (merging for query load time reasons)?
Yooow! Hi Patrick. You asked if I would make this in a diferant way. Yes I would.
Until 5:21 I make just like you. But after, I wold add a custom step as "Table.Buffer()", and I wouldn't "add as new query". I wold make a Reference, rename this new query as "Arlines", remove other coluns, and make the same you do until 6:41.
Ok, ok calm down you are thinkin "But Daniel, this would make a 'A cyclic reference ' and this won't work".
So, to resolve this I make a new Reference to the 1st query, rename to "fData" (or something) and then I Merge Query with "Arlines".
To end I would hide my 1st table from my data model.
So let me justify, I wold do all this work because if I find out that I forgot to make a repalce, I would just need to make this new step in one Query. The way you did if my file change just a little bit, I would need to change twice.
You think that this would gain or loss process time? (sorry possible spelling errors, I don't speak english very well)
Oh, and maybe I would add a new step to Capitalized Each Word with Text.Proper.
@@biexbr I still can't understand what does table.buffer() does. When I update the list with a new different row (ie: New airline) the lookup does not update. so after merging the rows became null.
How would this work with multiple columns. Columns example: cost center, cost center mapping and period.
Hey Patrick, I want to use a parameter to filter Top N output of a matrix ; using the parameter slicer. Could you please show me how? Thanks
Cool T-Shirt Patrick !!
Thanks. 👊
I really liked the way of subtly taking performance into consideration (Look for int), is there a visual form of execution plan? What would be the equivalent of SQL Execution Plan to use with Power BI? Exists?
Very helpful - thank you!
This is a beautiful and relevant video Patrick. I've often found myself thinking about this business case where the dimension data is long text strings and doing joins on such dimensions is fraught with uncertainty at the best of times. There is one use case I find myself thinking about, which the video does not address so I'm pinging to understand how you have thought about this. Imagine the flat file had different values for "TWA", "Transworld Airlines", "Trans World Airlines". This technique would create a different custom key for each of these entries - in reality however, these should point to the same key. Therefore, using this technique in power query will not cover this particular use case. Up in my head, the only way to do this is through manual intervention where the key is inserted through a manual scan of the table to ensure that "TWA", "Transworld Airlines" and "Trans World Airlines" all point to the same key. Short question - is there a way to reject this "lazy" technique and become more "efficient"??!!!?!
I'm sure you've already found your answer, but a creating a Transformation table would solve that problem. Power Query's documentation would show you how to do that.
What if we have to consider multiple columns for this approach?
Also, if we have to use a table as global filters across different sheets developed using different table. I.e. star schema.
hey Patrick... would about joining on alphanumeric keys with Tpye as "any" ?
thanks this helped a lot!
Hi Patrick can please do a video on aggregation , i have created an aggregated table using fax and want to create a dynamic filter for a column not included in the aggregation table
Espectacular!! Muchas gracias!
Appreciate it, thanks! 👊
Very dope video. It really helped me
Is there a way to automate this , i have 40 tables I need to move from the flat-file
Thanks in my real world with no clean data warehouse the data modelling and joining is the biggest hurdle to using power bi
This is beautiful
Excelente, Just saved me hours of Work!!
That's awesome! 👊
The problem id that with any big dataset this method will make the data refresh terribly slow.
Hey Patrick, what if new airlines get added to the flat table? Will it update the index table? Or do we have to do all this process again?
All the IDs get regenerated with each refresh so new values shouldn't present a problem. Because of this you definitely don't want to take a dependency on ID values in your reports since "US Air" could = 1 today but = 2 tomorrow. In dimensional modeling parlance these are "Surrogate Keys" and should never be exposed to users. It is best practice to hide surrogate ID columns in the model.
@@krishorrocks639 Thanks for commenting on this. I’m new in this field and looking for further clarification.
If I do want to depend on an ID from refresh to refresh, how is this typically done?
What happens if with time a new airline name appears in the source file? Will it be added automatically to the "Airline" table? Or will it result in an airline name without a related key?
Added automatically. PBI will import the data then follow the transformations, one if which will create the new key
What's the difference between this method and using a reference table? (ie right click / duplicate - then remove all other columns etc..?) - Great vid... as always !
From a data source perspective there isn't any difference between reference and duplicate. Both result in two independent queries that will query all the way back to the data source. When you reference, changes in the referenced query will affect the referencing query. Duplicate is just a one time copy/paste of query steps into a new query.
What if I want to make two columns as my primary key, I mean, instead of doing just Airline Name as a Primary Key, I want Primary Key and Claim Site, both of them as my primary key?
Patrick, I am using your solution but I am facing performance problems when I join (inner join) both tables by the text key column (number of rows 1.000.000 aprox). Thanks
Nice one Patrick! QQ, assuming I am using this method with a DW what happen if new data (Airline in your case) get added? Would the index capture the new lines? Cheers.
I second this question :)
@@Elkhamasi your DW should already have an index. But, if you reference the query rather than duplicate, it should be dynamic.