CORRECTIONS: Expand this comment to see corrections. * CORRECTIONS * At 12:00, a couple of the columns on the Subtask table are oddly named. "Task_Assignment_Datetime" and "Task_Type_Code" should really have been called "Subtask_Assignment_Datetime" and "Subtask_Type_Code", respectively. In addition, one of the assignment datetimes (November 12th at 204pm) is unrealistic, as we'd expect it to be later than the creation datetime of the parent Task.
As a database administrator, I approve of your explanations. I work with tons of data, and I wish every database was normalized and had solid loads for denormalized reporting databases. But in the real world it is rare. When we get feeds from outside and then need to report on them it always makes the process ridden with issues. Unfortunately having 10 different data sources outside of your control and then trying to join them for reporting is why database administration is still a solid job.
Hi, I also aspire to be a database administrator, currently I work in admin for a bank but I'm currently doing my second year in BSc IT. I decided to take on this journey cause I realized how important IT is in the banking sector and how most processes are being automated. My main goal with pursing this Bsc IT degree is to become a database administrator and my question to you sir, if you don't mind answering is, how did you get there? What qualifications did you have to acquire for example, and what experience or path did you take to get there? I'm 29 years old currently and I just need clarity on how to get to that position as quickly as possible. My aim is to finish my Bsc IT hopefully in record time (3 years) and then do a COMPTia in Data to specialize in data and after that hope for the best in terms of getting the relevant experience.
@snakhokonkebuthelezi2959 to be honest I got to the Database Admin job by working in a support position that required help from the DBA team regularly. I made friends with the team and their Boss and asked to be a part of it. He asked me some questions you can't find the answers to on Google, and told me I have some number of hours to think about them and answer. I answered them close enough so he said I could join the team. Even to this day that boss was the smartest DBA I had ever met, fortunately I learned from him and quickly also became an extremely knowledgeable DBA, even more so than most of the team. Often in life, it's getting into a position where you can talk to a supervisor of the position you want. If you are a reliable smart person, the best thing to do is get to be friends with the boss you want. I have done exactly this at least 5 times in my life. Who you know often pulls just as much weight as your resume. I have been a DBA for 15 years. I have worked with DB2, Oracle, SQL server and many other smaller systems. Also learned Unix/Linux along the way. If you can find a support job that works with DBA team you can get to know the team and eventually probably be transfered. You can get a job directly but you will probably have a bit harder time because so many need experience now. If you can backup, restore, migrate data, copy and alter tables, reindex and change the file storage of the database as well as write SQL you can probably get most jobs. There are more in depth things like mirroring, replication, and so many others but not every company uses such things.
Hi, is there anyway I can reach out to for advice seeking. I am in 11th grade and i do I.T. Database is a part of our curriculum and I have an interest in database administration. Do you have an Instagram account??
As a junior developer who didn’t have enough knowledge about databases, I stumbled upon this video because I was curious about ‘What is Denormalization?’ Your video introduced me to many new concepts that I now need to contemplate and learn. Thank you very much.
Just wanted to say how much I appreciate your content. You have a really amazing ability to explain complex topics with just the right amount of depth. I love that you use real enough examples that aren't overly contrived. I hope whatever you are doing these days you are being compensated well for and I hope to see you back creating educational content some day.
One project, we deliberately did a 'denormalizing' thing. We had data from instruments that took readings every 5 minutes and recorded this raw data to a table. Then, we had to get an accumulated calculation that basically summed these readings for an entire month, quarter, and year (well, it was more complex than that, but that's the general idea). These 'sums' had to be accessed often for various reasons. So we made a table of 'cached sums' where we saved the results of the 'expensive, time-consuming' calculation. If there was no table entry for a particular instrument, we ran the 'slow' calculation that queried every instrument reading for the past year, saved the result in the 'cached sums' table and returned the result. Next time we needed that particular sum, it would be quickly returned from the 'cached sums' table. So the 'cached sums' data was, technically speaking, redundant data (the original readings being the 'source data'). We did consider that editing the raw readings causes the 'cached sums' to be invalidated, but we compromised.
That was a very nice example with the 'UnitPrice'. Ultimately it's about knowing your domain and distinguishing , what Uncle Bob calls, 'critical business rules' from something rather temporary and not so 'critical'.
Love the content! Helped me get a job. Great explanations and practical examples! Please upload more - I think this channel could be huge over the long run!
I misunderstood the topic, I thought it‘s about „reverse-engeneering“ a given (older) database. But it was great to watch. Vocabulary is everything! Thank you so much for your efforts!!! ❤️❤️❤️
Hi @Decomplexify, would you cover how relational databases are redesigned into non-relational ones? This is typically done during system/product migration to cloud.
Thank you SO MUCH for such neat, organized and really SIMPLE to understand explanations! Not just this but I learned a lot from most of your videos. Keep up the great work and thanks a lot (Y)
This Dude is a Fantastic Teacher. All of his Videos I've loved. Learnt so much in 5 videos...wow! Off to implement it in Microsoft Acces in Office 365 You gotta download Acces though....it doest run on the cloud. PowerApps you create will though... Use your learning ;) Can't believe this series of Tutorials has such a low uptake?
I've been trying to untangle a real world problem described here (huge user base) recently... You provided my solution I can't implement But understand I'm the victim of multiple legacy RDBMS where no one has the SOURCE CODE... only the FRONT END... Frustrating but solvable Reverse Engineering?
I’d like to see some videos on various data model types and when to use them on structure or non structured data (e.g. relational, dimensional, object, etc)
I found the discussion on denormalization for performance very interesting and sad at the same time. It is kind of a social marketing issue to me more than a technical issue. I mean we try so optimize queries (whixh contradicts completely the very reason for which thos were envisioned by Codd) instead of demanding from vendors better optimization to happens in the phisical (or what you called process) layer. Being a relational query language a description of the expected form/structure of the data and not the steps to get there it should not matter if the language lets you to express it in multiple ways. If thise are equivalent we should expect same performance. But I do know that in practice we as consumers tend to have no choice other than adapt and bound to vendors while finding our hacks for improving SQL performance from the logical layer leading us to unwanted things such as denirmalization for performance.
“Successful people make money. It’s not that people who make money become successful, but that successful people attract money. They bring success to what they do.” - Wayne Dyer..
In any way I have the opinion that if you are modelling data using the relational data model and databases that are SQL based (therefore that do not support the relational data model properly) event if for whatever reason one decides to denormalize a relation one should be completely aware of the decision and its implications. Possibly, the deviations from the ideal model and the flawed one should be documented and given reasons so at least it becomes easier to interpret the data and build data products on top of it.
100% this is the way. My approach is always strive for 5th normal form first, and then work backwards and denormalize from there if I see a benefit. It's common in high-read/low-write tables. I document my reasoning and test any DML procedures - it then becomes important to enforce that the proven, transactional procedures are the only interface that developers have for DML in order to avoid corruption.
Thanks for the good content. I have my issues with the first example. To me this is a table that documents things that happened in the real world as they happened. That often changes the contexts. The price is no longer a property of the item, it actually refers to the transaction itself. Sure there is still a price table with an item to price relationship, but the actually paid price is dependent to the transaction, even if it is derived in some way from the item price. There can be a typing mistake, a calculation error, a discount because of goodwill… all not displayable in other tables with joins. Do you get my point?
You're welcome! Regarding the first example, I think I covered this at 5:54, where it says we can imagine changes being made to the conventions around pricing, e.g. the conventions change in such a way as to support volume discounts, tiered pricing based on membership level, etc. The denormalized design anticipates the possibility of these kinds of changes, and continues to work if such changes get introduced.
@@decomplexify Ok I get your point. But I was thinking about, if the relationship is to the transaction (the PK) and not to the item. Isn‘t it in 3NF? I‘m not 100% sure, because the item price is surely the source, but in my opinion no longer related from the moment on where it is about real world documentation. And it‘s not abou a future decision how pricing works. To me it‘s from the moment on, when its about documentation what happened. When there is a calculation mistake, I actually paid 10$ instead of 2•5,01$=10,02$ Then I have to log the actual booked transaction values not the theoretical correct values by joining… Do you get what I‘m about?
@@thorstenl.4928 Sure, I understand exactly what you're saying, and when it comes to many transaction tables that I've encountered in real life, I would agree with you. In many cases, a transaction table constitutes your only source of truth about "what really happened". But that's not true in all cases. For the purposes of my example from the video, I'm going to stipulate that "what really happens" is that when a player decides to purchase a certain quantity of a certain type of item, the system looks up the item type in the Item_Type_Daily_Prices table where Date = today, so as to obtain the Unit Price. The system then checks to see whether the player has enough game currency to go through with the purchase. If so, the system debits the appropriate amount of game currency from the player's balance, and adds the purchased items to the player's inventory. All of this constitutes "what really happened". Only once all this has been done does the system write out an Item_Purchase_Transaction, for informational purposes. If the Item_Purchase_Transaction is written out wrongly, for example if it's written out with a Unit Price that differs from the price that was actually used as the basis for debiting the player's balance, then, with regard to the price that really was charged, the Item_Purchase_Transaction will fail to reflect "what really happened". This can be mitigated by normalizing Item_Purchase_Transaction i.e. removing the Unit_Price column. If you don't explicity record the Unit_Price on the transaction then the transaction can never be in conflict with the unit price that really was charged.
Aargh I hate it when Data Warehousing is called Data Mart! Nevertheless a fantastic video. I was thinking the whole time that dimensional modelling was a perfect example of demoralisation. Great video!
Whoa ... I no sooner watch your normatization video and get suitably impressed than I am confronted with de-normalization. Does that put me back to the point I was before I watched the normalization video? ;-)
Is sorting a table denormalization? It seems to violate the first normal form requirement that rows not be ordered. It also seems like it would improve read performance at the cost of update, insert, and delete performance. Wouldn't a sorted index be a better option than a sorted table though?
So, just to be clear, in properly relational database platforms, there is no such thing as "row order" within a table. And so there's no such thing as a sorted table. A table belongs to what I was calling the Logic Layer: a table is a logical abstraction that corresponds roughly to a mathematical object called a relation. And like a relation, it doesn't have a row order; it's simply a "set" of rows. But the physical implementation of the table is a different story. It belongs to what I was calling the Processing Layer. And it certainly can consist of sorted records. (Clustered indexes, etc.) Because this ordering of records is in the Processing Layer and doesn't show though to the Logic Layer, it doesn't violate First Normal Form. In the Processing Layer we're concerned with performance and efficiency and we can make whatever tweaks are necessary to that end, but none of those tweaks will affect how the table (the logical abstraction within the Logic Layer) looks. Performance is a big topic and yes, some decisions will improve read performance at the expense of other operations but really it depends on the specifics of the queries are that are going to be run against the database.
@@decomplexify To paraphrase what you wrote with my limited academic understanding: The rows of a table aren't stored in computer memory addresses in any specific order. Tables are a temporary construct of the rows after reading from memory where the data is most likely fragmented and only ordered by chance. Row order in a table emerges through processing and arranging of the rows as they are read from memory, using an ORDER BY clause for example. Thus, no table is required to ensure rows are unordered, rows are already unordered as a result of the way typical computer storage media operates?
@@user-fk8zw5js2p Not quite, no! You need to get away from the idea that that physical implementation ("computer memory addresses", "typical computer storage media") matters in any way. For the moment, just forget about computer memory addresses and typical computer storage media. It's irrelevant to what a table is, and it's irrelevant to what properties tables have (like the property of having unordered rows). If there are four people in a family - Jack, Jill, Cindy, and Dan - then we can represent the whole family as a set: {Jack, Jill, Cindy, Dan}. Of course, I could just as easily have expressed the set as {Dan, Cindy, Jill, Jack}: it expresses exactly the same thing as {Jack, Jill, Cindy, Dan}. When you describe a set, it doesn't matter what order you list the elements in. Same with a table. I can have a table with one column: FAMILY_MEMBER. The rows are going to be: Jill Dan Jack Cindy I listed the rows for you in an arbitrary order; I could just as easily have listed them in a different order. A table is a set - a set of rows. And sets have no order to their elements. In describing the properties of relational database tables like I've just done, I haven't said anything about computer storage or memory addresses. Computer storage and memory addresses simply aren't relevant to the topic.
@@decomplexify That makes sense. The rows of a table can be ordered anyway anybody wants any number of times, but that doesn't make them stop being a set. I'm guessing that with modern DBMS, creating/altering a table so that it's not in first normal form requires extra effort if the system even allows it. I understand that it's important to know what 1NF is and how it relates to the other normal forms. But it seems to me that most of the requirements for 1NF are built into the DBMS, so i'm a little disappointed that there isn't anything i would have to do to get a table to 1NF, other than making sure values are atomic.
@@user-fk8zw5js2p One important thing that would violate 1NF, despite being permitted by all RDMS platforms, is creating a table without a primary key on it. I go into the 1NF criteria in detail in my Learn Database Normalization video.
@Decomplexify How does this example be exported to a NOSQL design ideas? I personally would use a graph DB and use a high level languages to manage the data. I always use 2x copies of same graph and original DB graph data doesn’t get manipulated. Since I‘m a beginner and small projects, it is easy to manage but not sure how to manage it for really large graphs
Fun fact... All modern rdbms (oracle, sql server, sap hana) have graph execution Engines... No sql / non relational databases have a meaning only for data Lakes...where you have unstructured data... Otherwise only rdbms with in Memory columnstore table is the modern option
Does the so called advantage of a denormalized relation (which I understand in this case is put a being normalized by possible future states of the data model) wiuld persist if the original data model was actually in C. J. Date 6N form. I mean the price is initially dependent on the date, but then in the future it coul be dependent on timestamp and later on timestamp plus a state of the user given one or multiple business rules. So denormalization sounds like you are actually trying to normalize by changes in the data model in in the future and in a way I believe that's the reason to normalize up to 6N (I know the discussion is not so simple if you also read the work of David McGoveran). ButI also u derstand that if you are actually collecting the data you want it to be normalized even if in the future you have to change your model. On the other hand maybe if you are maintaining a data warehouse since you are not writing ñ, updating or deleting having denirmalized relations could be a bit less dangerous although I persobally think it makes the interpretation if the data (I mean its meaning or semantics) harder, specially considering that proper documentation on the data model and its relation to the data collection mechanisms and the reality it tries to represent are often inexistent. Anyway, nice discussion, and thanks for the video!
Hi Victor, glad you liked the video, and thanks for your question. In the normalized design shown in the video, the Item_Type_Daily_Prices table actually is in 6NF. So 6NF isn't really helping us here with the issue I've raised, i.e. that we'd have to change our model if the conventions around pricing changed. In many respects, 6NF ensures flexibility. But not in this particular respect!
@@decomplexify I see. I've just finished reading C. J. Date book "Database Design and Relational Theory: Normal Forms and All That Jazz" so now I have it much more clear that normalisation and orthogonality can only reduce redundancy but not entirely eliminate it. The issue is much more complex than I thought initially. Even the definition of redundancy given might not be ideal and need further research it seems. For instance the use of quantifiers in predicates as an element for the existence of redundancy in a design is still a bit confusing in my mind. I really wish there was a study group on this but it is very odd as people are in general reactive to these discussions and just dismiss it as not being practical, similar to the reasons people go the easy "non-structured" ways and say that the relational model does not perform well.
At 4:40 - you said that there is no source for the data. Why can't you just make one? For example, something like this: SELECT Purchase_Date, Item_Type, MAX(Unit_Price) AS Price, COUNT(DISTINCT(Unit_Price)) AS duplicateCheck FROM @Item_Purchase_Transaction GROUP BY Purchase_Date, Item_Type HAVING COUNT(DISTINCT(Unit_Price)) = 1 Assuming you want to keep the functionality of one price per item/day combination, this query gives you a source you can use to populate a normalized table. You could also remove the 'HAVING' clause - the historical data is already wrong, so it might as well be wrong in a way that suits a normalized approach.
Absolutely, you could do what you're describing here. The question is: should you? From my experience, as the receiving system, trying to "clean up" anomalous data from the source system often creates more problems than it solves. In a lot of cases, it's better to faithfully reflect what came through from the source system, even at the risk of faithfully reflecting unclean data. You can see how this applies to Item Purchase Transactions. If your goal is to ensure that the receiving system's database is fully normalized, you're going to have to design the receiving system's Item_Purchase_Transaction table in such a way that it omits the Unit_Price column. The idea here is that Unit_Price will instead be looked up in our Item_Type_Daily_Prices table, which is populated by the SQL you included in your comment. How will this play out? Most likely, everything's going to work fine for a while. It's very plausible that the source system might adhere 100% to the rule that every Item Type has just one Unit Price on a given day. Those prices will duly make their way into our Item_Type_Daily_Prices table. Everything will work as it should. The source system will continue adhering to this rule for many months - until one day, it stops adhering to the rule! Realistically, this could happen if the business users of the source system decide one day that the rule no longer suits them: they want to be more flexible in setting their prices. It's at this point - the point at which they abandon the rule - that the receiving system's Item_Type_Daily_Prices table becomes useless. COUNT(DISTINCT(Unit_Price)) > 1 all over the place and you end up with lots of transactions for which you've failed to capture the Unit Price (or if, as you mentioned at the end of your comment, you've omitted the HAVING clause, then you end up with lots of transactions for which you are recording the WRONG Unit Price, which is a very serious matter). You have a mess in the receiving system. And yet it's at exactly this point that the source system's Item_Purchase_Transaction table can be said to be normalized! (As there is no longer any rule about every Item Type having just one Unit Price on a given day.) This is what I mean about normalization in the receiving system potentially creating more problems than it solves. It is much better to treat the source's Item_Purchase_Transaction table as gospel, and to be wary about the "rule" as to item type daily prices - as, even if it applies today, this rule could easily be abandoned tomorrow.
@@decomplexify Thanks for the prompt and detailed response. I think I understand where I went wrong here - I've assumed that our database imports data from the old database once, and then runs independently. If I read your response correctly, your intention is that the source and destination databases run concurrently? In which case, yes, I can see that the tradeoff of correctness vs normalization definitely favors the former. If you don't mind, I'll keep the comment up specifically as a response to the claim "We wouldn't have any source from which to load it".
@@DDEGStakeholderManagement No problem - yes, what I've been describing is an interface from a source database to a target database. The data in the source database is constantly changing. Typically these types of interfaces bring the data across from source to target on a regular basis (like once a day) - so-called batch processing - or else they bring it across in real time or near real time. Either way, the goal is to make the target database reflect the latest state (or at least the state as at last night or whatever) of the data in the source database. This all falls into the category of making different computer systems talk to each other. One example of this would be interfacing data from an end-user system (or multiple end-user systems) into a Data Warehouse, which has the job of gathering the data together and putting it into a form that's convenient for reporting off - e.g. tracking sales over time, things like that.
CORRECTIONS: Expand this comment to see corrections.
* CORRECTIONS *
At 12:00, a couple of the columns on the Subtask table are oddly named. "Task_Assignment_Datetime" and "Task_Type_Code" should really have been called "Subtask_Assignment_Datetime" and "Subtask_Type_Code", respectively. In addition, one of the assignment datetimes (November 12th at 204pm) is unrealistic, as we'd expect it to be later than the creation datetime of the parent Task.
The return of the king (2022)
As a database administrator, I approve of your explanations. I work with tons of data, and I wish every database was normalized and had solid loads for denormalized reporting databases. But in the real world it is rare. When we get feeds from outside and then need to report on them it always makes the process ridden with issues. Unfortunately having 10 different data sources outside of your control and then trying to join them for reporting is why database administration is still a solid job.
Hi, I also aspire to be a database administrator, currently I work in admin for a bank but I'm currently doing my second year in BSc IT. I decided to take on this journey cause I realized how important IT is in the banking sector and how most processes are being automated. My main goal with pursing this Bsc IT degree is to become a database administrator and my question to you sir, if you don't mind answering is, how did you get there? What qualifications did you have to acquire for example, and what experience or path did you take to get there? I'm 29 years old currently and I just need clarity on how to get to that position as quickly as possible. My aim is to finish my Bsc IT hopefully in record time (3 years) and then do a COMPTia in Data to specialize in data and after that hope for the best in terms of getting the relevant experience.
@snakhokonkebuthelezi2959 to be honest I got to the Database Admin job by working in a support position that required help from the DBA team regularly. I made friends with the team and their Boss and asked to be a part of it. He asked me some questions you can't find the answers to on Google, and told me I have some number of hours to think about them and answer. I answered them close enough so he said I could join the team. Even to this day that boss was the smartest DBA I had ever met, fortunately I learned from him and quickly also became an extremely knowledgeable DBA, even more so than most of the team. Often in life, it's getting into a position where you can talk to a supervisor of the position you want. If you are a reliable smart person, the best thing to do is get to be friends with the boss you want. I have done exactly this at least 5 times in my life. Who you know often pulls just as much weight as your resume. I have been a DBA for 15 years. I have worked with DB2, Oracle, SQL server and many other smaller systems. Also learned Unix/Linux along the way. If you can find a support job that works with DBA team you can get to know the team and eventually probably be transfered. You can get a job directly but you will probably have a bit harder time because so many need experience now. If you can backup, restore, migrate data, copy and alter tables, reindex and change the file storage of the database as well as write SQL you can probably get most jobs. There are more in depth things like mirroring, replication, and so many others but not every company uses such things.
👍
Hi, is there anyway I can reach out to for advice seeking. I am in 11th grade and i do I.T. Database is a part of our curriculum and I have an interest in database administration. Do you have an Instagram account??
As a junior developer who didn’t have enough knowledge about databases, I stumbled upon this video because I was curious about ‘What is Denormalization?’ Your video introduced me to many new concepts that I now need to contemplate and learn. Thank you very much.
Just wanted to say how much I appreciate your content. You have a really amazing ability to explain complex topics with just the right amount of depth. I love that you use real enough examples that aren't overly contrived. I hope whatever you are doing these days you are being compensated well for and I hope to see you back creating educational content some day.
One project, we deliberately did a 'denormalizing' thing. We had data from instruments that took readings every 5 minutes and recorded this raw data to a table. Then, we had to get an accumulated calculation that basically summed these readings for an entire month, quarter, and year (well, it was more complex than that, but that's the general idea). These 'sums' had to be accessed often for various reasons. So we made a table of 'cached sums' where we saved the results of the 'expensive, time-consuming' calculation. If there was no table entry for a particular instrument, we ran the 'slow' calculation that queried every instrument reading for the past year, saved the result in the 'cached sums' table and returned the result. Next time we needed that particular sum, it would be quickly returned from the 'cached sums' table.
So the 'cached sums' data was, technically speaking, redundant data (the original readings being the 'source data'). We did consider that editing the raw readings causes the 'cached sums' to be invalidated, but we compromised.
That was a very nice example with the 'UnitPrice'. Ultimately it's about knowing your domain and distinguishing , what Uncle Bob calls, 'critical business rules' from something rather temporary and not so 'critical'.
You have THE BEST database videos. Like... Ever.
Thanks!
I know I'm watching this a year latter, but DUDE! Great video content! I hope you make many, many, more videos!
Love the content! Helped me get a job. Great explanations and practical examples! Please upload more - I think this channel could be huge over the long run!
I misunderstood the topic, I thought it‘s about „reverse-engeneering“ a given (older) database. But it was great to watch. Vocabulary is everything! Thank you so much for your efforts!!! ❤️❤️❤️
This is one of a simplest explanation for a complicated topic. Good work, Decomplexify! Living up to the name.
I should be paying my tuition to you instead, but all I can thank you with in this instance is a thank you and a like button. Thank you.
Your channel is solid and informative! Please keep making content!
Amazing explanation covering all the denormalization facets. Congrats
Thankyou so much, your content is nothing less than gold to me for my learning curiosity.
Great video - thanks! It would be great to see a video on how to load denormalized data into normalized db
Please keep making videos! They are so good!
Excellent explanation! I'd like to see a video on "Views" in relational databases.
Always happy to see new content from you!
Superb examples and explanation in practical terms!
Hi @Decomplexify, would you cover how relational databases are redesigned into non-relational ones? This is typically done during system/product migration to cloud.
Do you mean moving access to dataverse and SharePoint?
And PowerPages etc?
@@ThomasBaxterSoutar I'm presuming stuff like NoSQL databases such as Couchbase, MongoDB, and Redis.
Thank you for your amazing videos !
Excellent explanation
Thanks for making this. I think I requested this on your normalization video - appreciate it!
Thank you SO MUCH for such neat, organized and really SIMPLE to understand explanations! Not just this but I learned a lot from most of your videos. Keep up the great work and thanks a lot (Y)
great video. Suggestion for a new video: indexes. What they are, how they work, and their importance to efficient queries.
This Dude is a Fantastic Teacher.
All of his Videos I've loved.
Learnt so much in 5 videos...wow!
Off to implement it in Microsoft Acces in Office 365
You gotta download Acces though....it doest run on the cloud.
PowerApps you create will though...
Use your learning ;)
Can't believe this series of Tutorials has such a low uptake?
I've been trying to untangle a real world problem described here (huge user base) recently...
You provided my solution
I can't implement
But understand
I'm the victim of multiple legacy RDBMS
where no one has the SOURCE CODE...
only the FRONT END...
Frustrating but solvable
Reverse Engineering?
Just in time for my exam, thank you
New decomplexify video!! Great stuff
I love these videos from Decomplexify!
I’d like to see some videos on various data model types and when to use them on structure or non structured data (e.g. relational, dimensional, object, etc)
I found the discussion on denormalization for performance very interesting and sad at the same time. It is kind of a social marketing issue to me more than a technical issue. I mean we try so optimize queries (whixh contradicts completely the very reason for which thos were envisioned by Codd) instead of demanding from vendors better optimization to happens in the phisical (or what you called process) layer. Being a relational query language a description of the expected form/structure of the data and not the steps to get there it should not matter if the language lets you to express it in multiple ways. If thise are equivalent we should expect same performance. But I do know that in practice we as consumers tend to have no choice other than adapt and bound to vendors while finding our hacks for improving SQL performance from the logical layer leading us to unwanted things such as denirmalization for performance.
Yes, I entirely agree with that!
Of course it's helpful. Great content thank you.
wow thanks my friend
Hi, could you please create a video on Data Modelling. Your videos are so awesome!!
“Successful people make money. It’s not that people who make money become successful, but that successful people attract money. They bring success to what they do.” - Wayne Dyer..
In any way I have the opinion that if you are modelling data using the relational data model and databases that are SQL based (therefore that do not support the relational data model properly) event if for whatever reason one decides to denormalize a relation one should be completely aware of the decision and its implications. Possibly, the deviations from the ideal model and the flawed one should be documented and given reasons so at least it becomes easier to interpret the data and build data products on top of it.
100% this is the way. My approach is always strive for 5th normal form first, and then work backwards and denormalize from there if I see a benefit. It's common in high-read/low-write tables. I document my reasoning and test any DML procedures - it then becomes important to enforce that the proven, transactional procedures are the only interface that developers have for DML in order to avoid corruption.
Very happy about that video!
Very helpful video
Thanks for the good content.
I have my issues with the first example.
To me this is a table that documents things that happened in the real world as they happened.
That often changes the contexts.
The price is no longer a property of the item, it actually refers to the transaction itself. Sure there is still a price table with an item to price relationship, but the actually paid price is dependent to the transaction, even if it is derived in some way from the item price. There can be a typing mistake, a calculation error, a discount because of goodwill… all not displayable in other tables with joins.
Do you get my point?
You're welcome! Regarding the first example, I think I covered this at 5:54, where it says we can imagine changes being made to the conventions around pricing, e.g. the conventions change in such a way as to support volume discounts, tiered pricing based on membership level, etc. The denormalized design anticipates the possibility of these kinds of changes, and continues to work if such changes get introduced.
@@decomplexify Ok I get your point. But I was thinking about, if the relationship is to the transaction (the PK) and not to the item. Isn‘t it in 3NF?
I‘m not 100% sure, because the item price is surely the source, but in my opinion no longer related from the moment on where it is about real world documentation.
And it‘s not abou a future decision how pricing works. To me it‘s from the moment on, when its about documentation what happened.
When there is a calculation mistake, I actually paid 10$ instead of 2•5,01$=10,02$
Then I have to log the actual booked transaction values not the theoretical correct values by joining…
Do you get what I‘m about?
@@thorstenl.4928 Sure, I understand exactly what you're saying, and when it comes to many transaction tables that I've encountered in real life, I would agree with you. In many cases, a transaction table constitutes your only source of truth about "what really happened". But that's not true in all cases.
For the purposes of my example from the video, I'm going to stipulate that "what really happens" is that when a player decides to purchase a certain quantity of a certain type of item, the system looks up the item type in the Item_Type_Daily_Prices table where Date = today, so as to obtain the Unit Price. The system then checks to see whether the player has enough game currency to go through with the purchase. If so, the system debits the appropriate amount of game currency from the player's balance, and adds the purchased items to the player's inventory. All of this constitutes "what really happened".
Only once all this has been done does the system write out an Item_Purchase_Transaction, for informational purposes.
If the Item_Purchase_Transaction is written out wrongly, for example if it's written out with a Unit Price that differs from the price that was actually used as the basis for debiting the player's balance, then, with regard to the price that really was charged, the Item_Purchase_Transaction will fail to reflect "what really happened".
This can be mitigated by normalizing Item_Purchase_Transaction i.e. removing the Unit_Price column. If you don't explicity record the Unit_Price on the transaction then the transaction can never be in conflict with the unit price that really was charged.
@@decomplexify I get you.
Thanks for your efford. ;-)
Great Channel by the way. Keep going…
Keep it up mate
Aargh I hate it when Data Warehousing is called Data Mart! Nevertheless a fantastic video. I was thinking the whole time that dimensional modelling was a perfect example of demoralisation.
Great video!
Thanks!
The fact tables in dimensional models are in 3NF. Only dimensions are 2NF, 3NF if snowflakes.
do you have a video on subqueries?
can you make one video on indexing as well
Banger video, i love it!
Whoa ... I no sooner watch your normatization video and get suitably impressed than I am confronted with de-normalization. Does that put me back to the point I was before I watched the normalization video? ;-)
He iis back
Is sorting a table denormalization? It seems to violate the first normal form requirement that rows not be ordered. It also seems like it would improve read performance at the cost of update, insert, and delete performance. Wouldn't a sorted index be a better option than a sorted table though?
So, just to be clear, in properly relational database platforms, there is no such thing as "row order" within a table. And so there's no such thing as a sorted table. A table belongs to what I was calling the Logic Layer: a table is a logical abstraction that corresponds roughly to a mathematical object called a relation. And like a relation, it doesn't have a row order; it's simply a "set" of rows.
But the physical implementation of the table is a different story. It belongs to what I was calling the Processing Layer. And it certainly can consist of sorted records. (Clustered indexes, etc.) Because this ordering of records is in the Processing Layer and doesn't show though to the Logic Layer, it doesn't violate First Normal Form.
In the Processing Layer we're concerned with performance and efficiency and we can make whatever tweaks are necessary to that end, but none of those tweaks will affect how the table (the logical abstraction within the Logic Layer) looks. Performance is a big topic and yes, some decisions will improve read performance at the expense of other operations but really it depends on the specifics of the queries are that are going to be run against the database.
@@decomplexify To paraphrase what you wrote with my limited academic understanding:
The rows of a table aren't stored in computer memory addresses in any specific order. Tables are a temporary construct of the rows after reading from memory where the data is most likely fragmented and only ordered by chance. Row order in a table emerges through processing and arranging of the rows as they are read from memory, using an ORDER BY clause for example. Thus, no table is required to ensure rows are unordered, rows are already unordered as a result of the way typical computer storage media operates?
@@user-fk8zw5js2p Not quite, no! You need to get away from the idea that that physical implementation ("computer memory addresses", "typical computer storage media") matters in any way. For the moment, just forget about computer memory addresses and typical computer storage media. It's irrelevant to what a table is, and it's irrelevant to what properties tables have (like the property of having unordered rows).
If there are four people in a family - Jack, Jill, Cindy, and Dan - then we can represent the whole family as a set: {Jack, Jill, Cindy, Dan}. Of course, I could just as easily have expressed the set as {Dan, Cindy, Jill, Jack}: it expresses exactly the same thing as {Jack, Jill, Cindy, Dan}. When you describe a set, it doesn't matter what order you list the elements in.
Same with a table. I can have a table with one column: FAMILY_MEMBER. The rows are going to be:
Jill
Dan
Jack
Cindy
I listed the rows for you in an arbitrary order; I could just as easily have listed them in a different order. A table is a set - a set of rows. And sets have no order to their elements.
In describing the properties of relational database tables like I've just done, I haven't said anything about computer storage or memory addresses. Computer storage and memory addresses simply aren't relevant to the topic.
@@decomplexify That makes sense. The rows of a table can be ordered anyway anybody wants any number of times, but that doesn't make them stop being a set.
I'm guessing that with modern DBMS, creating/altering a table so that it's not in first normal form requires extra effort if the system even allows it. I understand that it's important to know what 1NF is and how it relates to the other normal forms. But it seems to me that most of the requirements for 1NF are built into the DBMS, so i'm a little disappointed that there isn't anything i would have to do to get a table to 1NF, other than making sure values are atomic.
@@user-fk8zw5js2p One important thing that would violate 1NF, despite being permitted by all RDMS platforms, is creating a table without a primary key on it. I go into the 1NF criteria in detail in my Learn Database Normalization video.
keep going bro
@Decomplexify How does this example be exported to a NOSQL design ideas? I personally would use a graph DB and use a high level languages to manage the data. I always use 2x copies of same graph and original DB graph data doesn’t get manipulated. Since I‘m a beginner and small projects, it is easy to manage but not sure how to manage it for really large graphs
Fun fact...
All modern rdbms (oracle, sql server, sap hana) have graph execution Engines...
No sql / non relational databases have a meaning only for data Lakes...where you have unstructured data...
Otherwise only rdbms with in Memory columnstore table is the modern option
Thanks!
Bro came bro conquered, bro left
I clicked "Like" not for content, only for the headline.
Does the so called advantage of a denormalized relation (which I understand in this case is put a being normalized by possible future states of the data model) wiuld persist if the original data model was actually in C. J. Date 6N form. I mean the price is initially dependent on the date, but then in the future it coul be dependent on timestamp and later on timestamp plus a state of the user given one or multiple business rules. So denormalization sounds like you are actually trying to normalize by changes in the data model in in the future and in a way I believe that's the reason to normalize up to 6N (I know the discussion is not so simple if you also read the work of David McGoveran). ButI also u derstand that if you are actually collecting the data you want it to be normalized even if in the future you have to change your model. On the other hand maybe if you are maintaining a data warehouse since you are not writing ñ, updating or deleting having denirmalized relations could be a bit less dangerous although I persobally think it makes the interpretation if the data (I mean its meaning or semantics) harder, specially considering that proper documentation on the data model and its relation to the data collection mechanisms and the reality it tries to represent are often inexistent. Anyway, nice discussion, and thanks for the video!
Hi Victor, glad you liked the video, and thanks for your question. In the normalized design shown in the video, the Item_Type_Daily_Prices table actually is in 6NF. So 6NF isn't really helping us here with the issue I've raised, i.e. that we'd have to change our model if the conventions around pricing changed. In many respects, 6NF ensures flexibility. But not in this particular respect!
@@decomplexify I see. I've just finished reading C. J. Date book "Database Design and Relational Theory: Normal Forms and All That Jazz" so now I have it much more clear that normalisation and orthogonality can only reduce redundancy but not entirely eliminate it. The issue is much more complex than I thought initially. Even the definition of redundancy given might not be ideal and need further research it seems. For instance the use of quantifiers in predicates as an element for the existence of redundancy in a design is still a bit confusing in my mind. I really wish there was a study group on this but it is very odd as people are in general reactive to these discussions and just dismiss it as not being practical, similar to the reasons people go the easy "non-structured" ways and say that the relational model does not perform well.
At 4:40 - you said that there is no source for the data. Why can't you just make one? For example, something like this:
SELECT
Purchase_Date,
Item_Type,
MAX(Unit_Price) AS Price,
COUNT(DISTINCT(Unit_Price)) AS duplicateCheck
FROM @Item_Purchase_Transaction
GROUP BY Purchase_Date, Item_Type
HAVING COUNT(DISTINCT(Unit_Price)) = 1
Assuming you want to keep the functionality of one price per item/day combination, this query gives you a source you can use to populate a normalized table. You could also remove the 'HAVING' clause - the historical data is already wrong, so it might as well be wrong in a way that suits a normalized approach.
Absolutely, you could do what you're describing here. The question is: should you?
From my experience, as the receiving system, trying to "clean up" anomalous data from the source system often creates more problems than it solves. In a lot of cases, it's better to faithfully reflect what came through from the source system, even at the risk of faithfully reflecting unclean data.
You can see how this applies to Item Purchase Transactions. If your goal is to ensure that the receiving system's database is fully normalized, you're going to have to design the receiving system's Item_Purchase_Transaction table in such a way that it omits the Unit_Price column. The idea here is that Unit_Price will instead be looked up in our Item_Type_Daily_Prices table, which is populated by the SQL you included in your comment.
How will this play out? Most likely, everything's going to work fine for a while. It's very plausible that the source system might adhere 100% to the rule that every Item Type has just one Unit Price on a given day. Those prices will duly make their way into our Item_Type_Daily_Prices table. Everything will work as it should. The source system will continue adhering to this rule for many months - until one day, it stops adhering to the rule!
Realistically, this could happen if the business users of the source system decide one day that the rule no longer suits them: they want to be more flexible in setting their prices. It's at this point - the point at which they abandon the rule - that the receiving system's Item_Type_Daily_Prices table becomes useless. COUNT(DISTINCT(Unit_Price)) > 1 all over the place and you end up with lots of transactions for which you've failed to capture the Unit Price (or if, as you mentioned at the end of your comment, you've omitted the HAVING clause, then you end up with lots of transactions for which you are recording the WRONG Unit Price, which is a very serious matter). You have a mess in the receiving system. And yet it's at exactly this point that the source system's Item_Purchase_Transaction table can be said to be normalized! (As there is no longer any rule about every Item Type having just one Unit Price on a given day.)
This is what I mean about normalization in the receiving system potentially creating more problems than it solves. It is much better to treat the source's Item_Purchase_Transaction table as gospel, and to be wary about the "rule" as to item type daily prices - as, even if it applies today, this rule could easily be abandoned tomorrow.
@@decomplexify Thanks for the prompt and detailed response. I think I understand where I went wrong here - I've assumed that our database imports data from the old database once, and then runs independently. If I read your response correctly, your intention is that the source and destination databases run concurrently? In which case, yes, I can see that the tradeoff of correctness vs normalization definitely favors the former.
If you don't mind, I'll keep the comment up specifically as a response to the claim "We wouldn't have any source from which to load it".
@@DDEGStakeholderManagement No problem - yes, what I've been describing is an interface from a source database to a target database. The data in the source database is constantly changing. Typically these types of interfaces bring the data across from source to target on a regular basis (like once a day) - so-called batch processing - or else they bring it across in real time or near real time. Either way, the goal is to make the target database reflect the latest state (or at least the state as at last night or whatever) of the data in the source database.
This all falls into the category of making different computer systems talk to each other. One example of this would be interfacing data from an end-user system (or multiple end-user systems) into a Data Warehouse, which has the job of gathering the data together and putting it into a form that's convenient for reporting off - e.g. tracking sales over time, things like that.