Want to improve your database design skills? Get my Database Design project Guides here (diagrams, explanations, and SQL scripts): www.databasestar.com/dbdesign/?
Done Thanks for making these keep it up! Mistakes to avoid when designing db: 1. Use a separate field as primary key (not a business relevant field or a field that you already have as an attribute because that might change, or not be unique). You can still enforce constraints on other fields such as making them unique without using them as pk 2. Don’t store redundant data (store date of birth only instead of age and date of birth) and calculate the age from dates 3. Maintain referential integrity by implementing constraints (primary key, foreign key, unique, not null, check) 4. Optional columns should be moved to another table (for example customer phones, home phone, work phone etc) should be moved out of the customer table and into a customer_phone table which references customer id and then stores different phones for each customer with different phone types. Phone typed can have their own table and then the customer_phone can reference the phone_type_id
You should also add #5, which is about choosing the correct datatype for the data. For example date in 'datetime' instead of text, and name in 'varchar(200)' instead of 'varchar(2000)'.
Excellent video. Let me offer another potential problem to avoid (in my opinion). Postgresql allows you to name tables and columns (and perhaps other identifiers) using reserved words and words that are used in other contexts. So: -- If you name your table: integer (non-reserved word), Postgresql will want the name in quotes when you refer to it -- If you name your column: inner (reserved word), Postgresql will want the name in quotes when you refer to it I'm guessing you stand with me on avoiding this situation at all costs. Not the least because it can be confusing a year down the road when you have to maintain your code. Do not stop making your excellent videos and blog posts, please!
Hi, I have been working with spare parts databases for years, and I agree with you that you can not use the Model or part number as a key when we started to create the database we made the same mistake until we found that a spare part from one company had the same number as a bearing form another company,. Do you know how many items had the number 9 as a part number? for example, a toaster from one company had the same number as another toaster from the same company, what a mess! And as you said, never store a dynamic value like age in a database. or print the spares count like PART XXXXXX U-BOLT, COUNT 371. That should only be a number in the database with a setting for reorder point..
Nicely done! I remember learning these principles years ago when I read Ralph Kimball's books on data warehousing. They still ring true - the books and the principles. ✅
I think number one is brilliant! I have seen many changes over my life, like phone numbers getting longer, so it's really an intelligent approach in my opinion. Thank you for another great video!
Great insights. Thank you. #6 might have few issues, specially when you doing search and now you have join so many tables. I think it comes under EAV pattern, which has many issue like complex queries, performance, limited indexing etc. for me complex query was issue when you try to build a query for search with many filters.
Thank you so much for this informative presentation. I am a DNP student, totally new to information technology. My professor sent us the link. These videos are making my class whole lot interesting. I do not hate the course anymore. Thanks to your videos.
Thanks for the comment and glad you like the video! Good to hear you don't hate the course anymore! Do you have any other questions or issues on databases, that I could make videos about?
Your lesson is full of good tips for users to appreciate using a relational database program like Microsoft Access more effectively and with pride and joy.
Amazing video. a question about number #6: does it make sense to have a users table and a user_phones table, and instead of having a separate table for phone types, we define an enum column in user_phones table for type of the phone?
Good question. You could have an ENUM for the phone number type, that could work. I recommend using separate lookup tables as they are easier to maintain and you can add extra information to them if needed. I've got a video here on enum vs check constraint vs lookup table: ua-cam.com/video/CkRbgDD7jNM/v-deo.html
Great video thank you! As for storing phone numbers, you make good points but I don't want to worry about several tables storing all these different numbers. I would likely design it so either a customer/client table has 4-5 options for phone numbers and accept the possible blanks where they show up, or make 1 phone number table with 6+ options for numbers and a notes field for any additional instructions. If you're scaling up and dealing with thousands of phone numbers (like within a large company or org) then your method would likely be better.
Age can be set as a non-persistent calculated column. Then it's value is populated at run-time. But that's more of a data warehouse thing than 3NF design.
I have a question. I'm working on a project that needs to store a set of relationships and there can be any number of sets. They want to make multiple drafts (sets) and then make one of them live. So in effect they are all drafts but only one can be live. I thought about making a table just to hold the one "live" set. That table would always have either zero records or one record. I've never seen anybody do that, not sure if that is a good idea. The reason for considering that is because they use the word 'draft' and my goal is to keep the terminology in the real world as close as possible to the database schema to make conversations easier. So I want to avoid having a table "draft" where one has a status of "live".
Good question. That design (a separate table for just the live record) could work, and it could be enforced with a unique constraint on a specific column. However, having the records in one table with a single "live" record could also work as well. If you have records in a single table, you could ensure there is only one live record by storing the ID of that row in another table as a foreign key. Hope that helps.
Thanks for your Wonderful Video of sharing Expertise! As a newbie I'm planning a database but not sure how to reflect the age problem (#2) in Design.. bonus question: how to achieve it in Diagram that for example I take a variable from a config File to do my query? Like if I say calculate the Due date that always is constant (but based on Config Variable x=2 Years) How to draw the diagram for this Matter? Thank you
Glad it was helpful! I think it depends on if you have the config value in your database or in a separate file. If it's in a separate file, you can read this value when you perform the Update or Insert statement. If it's in the database, you would also have to consider that when you Insert or Update the value. I think the design would be the same - your Due Date would be in the table for your record.
At 9:00, how do we handle the case where the customer may have multiple phone types and we want to store all of them? From the diagram shown, I’m under the impression that the customer table contains their customer_id, which allows us to look up the phone_number and phone_type_id in the customer_phone_number table. But imagine we want to add a business_number as well, doesn’t “phone_type_id” make it so that there is only 1 type of number associated with that customer_id? Or are we saying the primary key is the phone number and the customer id is a foreign key? Some clarification here would help. Thanks!
The customer_phone_number table will allow you to have multiple rows for a single customer, so you can store a phone number for the "business" phone_type_id as well as the "personal" phone_type_id. You could also have two records for the same customer and same phone_type_id, which means two business phone numbers for the same customer. There's no primary key on this design, and no unique constraint on the columns. You could (and I probably should have) add a primary key column to this table, something like customer_phone_number_id, but this won't prevent a customer having two phone numbers of the same type. If you want to prevent that, you can add a unique constraint (or a PK) on the (customer_id, phone_type_id) columns.
Could you please make a video on how someone might do an entry level ETL project? Or perhaps if you could offer any insight here. Let us assume that someone "knows" enough SQL and Python to move forward into data engineering, how would this person develop an interesting entry level ETL pipeline project to put on their resume/portfolio? What are the necessary tools at a fundamental level? I know for batch processing I could use something like Apache Airflow.. but that is about it. I really want to start an ETL project but there are surprisingly not that many clear resources on that topic as of yet. Thanks for the videos.
HI, great video, quick question. Why #6 you create 3 tables to solve the problem? why not simply have a phone number table and customer table and one of the columns in the phone table can be the type instead of having a separate table just for the type? isn;t the type table not necessary? I would like to hear your thoughts
Good question! You could do it in 2 tables: one for customer, and one for phone number. You could have a type column in the phone number table that would accept the values you want, such as "Mobile", "Work", "Home", "Work Mobile". However, if there is a defined list, the only way to ensure that the right values can be selected here would be to add a Check Constraint to the column, and then if you're designing an application to edit this data, you would need to define the list in the user interface. If it's a defined list like this, I usually suggest creating a separate table with the possible values (phone_type in this example). This way you can use the FK relationship to ensure the values are only in the list, and can edit the list easily.
Redudant Data vs. Query Performance let's say you would have to a bunch of joins in order to retrieve the data. Is storing the value redundantly a solution in this case?
Yeah that is an option. But having joins isn't necessarily a performance issue. If the tables are small, and if indexes are added, then the query can still perform well.
Can you LATER change varchar(200) into varchar(300) in your API code and database, but still keeping the stored old variables in database, without wiping them when you make the change in length?
@@DatabaseStar example. For Invoice Table you set a default primary key of invoice_id that is autoincrementing and then set an invoice_code that is also autoincremented as the primary key is autoincremented. Id 1 invoice_code 125001 Id 2 invoice_code 125002 ....e.tc
@@DatabaseStar Oracle has a free database modeller, so has SQL server(sql management studio). and with MySQL, it is one built in. And then you have the expensive ones like Toad. But I am sure that thee are many others, like the ones for sqlite
About the phone number, you don’t usually care about too many phones about someone, unless you’re designing a database for a phonebook, usually two fields on the customer for phone more than suffice, make your queries simpler(no joins needed) and with a slightly better performance(that explain can tell you over thousands of records), so putting as dismembering the phone number as multiple tables, as a globally good advice you’re potentially adding unneeded complexity. When you’re modeling a database you need to know the requirements. A generally true good advice is KISS(Keep it Simple Stupid)
That's a good point. However the reason to have a separate phone number is because you can often have multiple phone numbers for a person, and they can have different types (mobile, work mobile, work, home, business, etc). It's not a good design to just have phone1 and phone2 because you don't know what type they are and you're restricted to 2 phoen numbers. Having a lookup table for phone number shouldn't slow down performance, as it won't be a large table, and with indexes the lookups will be quick.
@@DatabaseStar I once worked for a company how did a CRM for pharma sales reps and was following in the "keep it simple" path with several fields. You'll end up with four or five fields for phone numbers and still run into situations where it's not enough. They finally re-did the phone number in a more normalized way and things got just simpler.
Want to improve your database design skills? Get my Database Design project Guides here (diagrams, explanations, and SQL scripts): www.databasestar.com/dbdesign/?
Done Thanks for making these keep it up!
Mistakes to avoid when designing db:
1. Use a separate field as primary key (not a business relevant field or a field that you already have as an attribute because that might change, or not be unique). You can still enforce constraints on other fields such as making them unique without using them as pk
2. Don’t store redundant data (store date of birth only instead of age and date of birth) and calculate the age from dates
3. Maintain referential integrity by implementing constraints (primary key, foreign key, unique, not null, check)
4. Optional columns should be moved to another table (for example customer phones, home phone, work phone etc) should be moved out of the customer table and into a customer_phone table which references customer id and then stores different phones for each customer with different phone types. Phone typed can have their own table and then the customer_phone can reference the phone_type_id
Thanks!
You should also add #5, which is about choosing the correct datatype for the data. For example date in 'datetime' instead of text, and name in 'varchar(200)' instead of 'varchar(2000)'.
thank thank you for not just referring us to some terrible blog for every answer. Very thorough and thoughtful. Great information and educational.
Glad you liked it! I try to make my videos as useful as possible.
Excellent video. Let me offer another potential problem to avoid (in my opinion).
Postgresql allows you to name tables and columns (and perhaps other identifiers) using reserved words and words that are used in other contexts. So:
-- If you name your table: integer (non-reserved word), Postgresql will want the name in quotes when you refer to it
-- If you name your column: inner (reserved word), Postgresql will want the name in quotes when you refer to it
I'm guessing you stand with me on avoiding this situation at all costs. Not the least because it can be confusing a year down the road when you have to maintain your code.
Do not stop making your excellent videos and blog posts, please!
Thanks for the tip! I agree, we should avoid using reserved words for table names, even if the database lets us - for the reasons you mentioned.
Hi, I have been working with spare parts databases for years, and I agree with you that you can not use the Model or part number as a key when we started to create the database we made the same mistake until we found that a spare part from one company had the same number as a bearing form another company,. Do you know how many items had the number 9 as a part number? for example, a toaster from one company had the same number as another toaster from the same company, what a mess!
And as you said, never store a dynamic value like age in a database. or print the spares count like PART XXXXXX U-BOLT, COUNT 371. That should only be a number in the database with a setting for reorder point..
Thanks for sharing! Yes that's a good example of why not to use that as a primary key.
They are all great tips. A good one to remember was Mistake 6. Storing optional types of data in different columns. Thank you for the information!!
Glad it was helpful!
I have been doing database designs for many years - your information is spot on. Thank you for putting this out.
Glad it was helpful!
Nicely done! I remember learning these principles years ago when I read Ralph Kimball's books on data warehousing. They still ring true - the books and the principles. ✅
Thanks! I’ve read one of those books and yeah it’s still quite valid
I think number one is brilliant! I have seen many changes over my life, like phone numbers getting longer, so it's really an intelligent approach in my opinion. Thank you for another great video!
Thanks for the comment, glad you liked the video!
Great insights. Thank you. #6 might have few issues, specially when you doing search and now you have join so many tables. I think it comes under EAV pattern, which has many issue like complex queries, performance, limited indexing etc. for me complex query was issue when you try to build a query for search with many filters.
Good points! I'm glad you like the video.
Thank you so much for this informative presentation. I am a DNP student, totally new to information technology. My professor sent us the link. These videos are making my class whole lot interesting. I do not hate the course anymore. Thanks to your videos.
Thanks for the comment and glad you like the video! Good to hear you don't hate the course anymore! Do you have any other questions or issues on databases, that I could make videos about?
Your lesson is full of good tips for users to appreciate using a relational database program like Microsoft Access more effectively and with pride and joy.
Thanks, I’m glad you find the videos useful.
I'm guilty of step 6; always trying to use columns instead of creating new tables.
Yeah it’s a good one to remember!
Amazing video. a question about number #6: does it make sense to have a users table and a user_phones table, and instead of having a separate table for phone types, we define an enum column in user_phones table for type of the phone?
Good question. You could have an ENUM for the phone number type, that could work. I recommend using separate lookup tables as they are easier to maintain and you can add extra information to them if needed. I've got a video here on enum vs check constraint vs lookup table: ua-cam.com/video/CkRbgDD7jNM/v-deo.html
Great video thank you! As for storing phone numbers, you make good points but I don't want to worry about several tables storing all these different numbers. I would likely design it so either a customer/client table has 4-5 options for phone numbers and accept the possible blanks where they show up, or make 1 phone number table with 6+ options for numbers and a notes field for any additional instructions. If you're scaling up and dealing with thousands of phone numbers (like within a large company or org) then your method would likely be better.
Thanks! That's a good point, and it's good you've made a conscious decision about your preferred approach and are happy with the impacts.
Age can be set as a non-persistent calculated column. Then it's value is populated at run-time. But that's more of a data warehouse thing than 3NF design.
Oh that’s good to know!
Thank you for the good information. I can say that information number 6 was very useful for me. 😀
Glad it was helpful!
I have a question. I'm working on a project that needs to store a set of relationships and there can be any number of sets. They want to make multiple drafts (sets) and then make one of them live. So in effect they are all drafts but only one can be live. I thought about making a table just to hold the one "live" set. That table would always have either zero records or one record. I've never seen anybody do that, not sure if that is a good idea. The reason for considering that is because they use the word 'draft' and my goal is to keep the terminology in the real world as close as possible to the database schema to make conversations easier. So I want to avoid having a table "draft" where one has a status of "live".
Good question. That design (a separate table for just the live record) could work, and it could be enforced with a unique constraint on a specific column.
However, having the records in one table with a single "live" record could also work as well.
If you have records in a single table, you could ensure there is only one live record by storing the ID of that row in another table as a foreign key.
Hope that helps.
@@DatabaseStar Thank you.
Thanks for your Wonderful Video of sharing Expertise! As a newbie I'm planning a database but not sure how to reflect the age problem (#2) in Design.. bonus question: how to achieve it in Diagram that for example I take a variable from a config File to do my query? Like if I say calculate the Due date that always is constant (but based on Config Variable x=2 Years)
How to draw the diagram for this Matter?
Thank you
Glad it was helpful!
I think it depends on if you have the config value in your database or in a separate file.
If it's in a separate file, you can read this value when you perform the Update or Insert statement.
If it's in the database, you would also have to consider that when you Insert or Update the value.
I think the design would be the same - your Due Date would be in the table for your record.
About number 1, can I use composite key of a business id number and customer name instead to make it more unique?
You could do that, but you may have the same issues if the business ID number changes.
This was amazing. Databases are also amazing and powerful tools!
Thanks! Glad you liked it.
Very useful info, thanks! Especially the tip about optional columns 👍
Thanks! Glad it was helpful.
At 9:00, how do we handle the case where the customer may have multiple phone types and we want to store all of them?
From the diagram shown, I’m under the impression that the customer table contains their customer_id, which allows us to look up the phone_number and phone_type_id in the customer_phone_number table.
But imagine we want to add a business_number as well, doesn’t “phone_type_id” make it so that there is only 1 type of number associated with that customer_id?
Or are we saying the primary key is the phone number and the customer id is a foreign key?
Some clarification here would help. Thanks!
The customer_phone_number table will allow you to have multiple rows for a single customer, so you can store a phone number for the "business" phone_type_id as well as the "personal" phone_type_id.
You could also have two records for the same customer and same phone_type_id, which means two business phone numbers for the same customer. There's no primary key on this design, and no unique constraint on the columns.
You could (and I probably should have) add a primary key column to this table, something like customer_phone_number_id, but this won't prevent a customer having two phone numbers of the same type.
If you want to prevent that, you can add a unique constraint (or a PK) on the (customer_id, phone_type_id) columns.
Thanks for sharing all of these mistakes
No worries!
This is a very informative and high quality channel! Thank you!
Thanks, glad you find it useful!
Every second of this video is worth it, I learned a lot 🙏🙏
Thanks!
Could you please make a video on how someone might do an entry level ETL project?
Or perhaps if you could offer any insight here.
Let us assume that someone "knows" enough SQL and Python to move forward into data engineering, how would this person develop an interesting entry level ETL pipeline project to put on their resume/portfolio?
What are the necessary tools at a fundamental level? I know for batch processing I could use something like Apache Airflow.. but that is about it.
I really want to start an ETL project but there are surprisingly not that many clear resources on that topic as of yet.
Thanks for the videos.
Good question. I haven't looked into ETL projects but I can do a video on that soon.
Thank you, The video was very informative especially your example on how to store multiple phone numbers.
Thanks, I’m glad you liked it!
DB structure for a city guide? includes product orders and room booking
What do you mean by a city guide?
Best Channel on UA-cam, new subscriber, keep going 😍
Thanks! Glad you like my channel!
HI, great video, quick question. Why #6 you create 3 tables to solve the problem? why not simply have a phone number table and customer table and one of the columns in the phone table can be the type instead of having a separate table just for the type? isn;t the type table not necessary? I would like to hear your thoughts
Good question! You could do it in 2 tables: one for customer, and one for phone number. You could have a type column in the phone number table that would accept the values you want, such as "Mobile", "Work", "Home", "Work Mobile".
However, if there is a defined list, the only way to ensure that the right values can be selected here would be to add a Check Constraint to the column, and then if you're designing an application to edit this data, you would need to define the list in the user interface.
If it's a defined list like this, I usually suggest creating a separate table with the possible values (phone_type in this example). This way you can use the FK relationship to ensure the values are only in the list, and can edit the list easily.
Redudant Data vs. Query Performance
let's say you would have to a bunch of joins in order to retrieve the data. Is storing the value redundantly a solution in this case?
Yeah that is an option. But having joins isn't necessarily a performance issue. If the tables are small, and if indexes are added, then the query can still perform well.
I came for database tidbits... I stayed for the soothing accent =]
Thanks!
Thank you. That was very informative and useful
Glad it was helpful!
Fantastic video, I learnt alot. Thanks
Glad it was helpful!
Can you LATER change varchar(200) into varchar(300) in your API code and database, but still keeping the stored old variables in database, without wiping them when you make the change in length?
Yes, you can do this. You can run an Alter Table statement to increase the size of a column and the data should be preserved.
Excellent video. Thank you.
Glad you liked it!
can't thank you enough mister .. i really appreciate it 😞❤
You're welcome!
What if you make the primary key as an auto incrementing code number that’s an integer and controlled by another key that’s unique
Making the primary key an auto generating number is a good idea and pretty common. What do you mean by controlled by another?
@@DatabaseStar example. For Invoice Table you set a default primary key of invoice_id that is autoincrementing and then set an invoice_code that is also autoincremented as the primary key is autoincremented.
Id 1 invoice_code 125001
Id 2 invoice_code 125002
....e.tc
Excellent video.
Glad you liked it!
Awesome video and full of so much useful information. Thanks for sharing this 👏👍
Glad you like it!
Thank you
You're welcome
and all tips were equally helpful!
Glad you think so!
Thanks
You’re welcome
Thanks. these videos are so useful
Glad you like them!
Great video, thanks for making it
Thanks, glad you liked it!
Love you for this, thank you.
You're so welcome!
Great!
Thanks!
I thought names with spaces had to be in brackets? or is that only alias names that have spaces?
Yeah names with spaces need to be in brackets or single quotes, off the top of my head it may depend on the database, but the concept is the same.
@@DatabaseStar oh ok, I'm still new. Thanks for the reply!
Doesn't #6 move you to a snowflake design?
Yes, it would be a more snowflake design or data warehouse design. This would work for that, but maybe not for a transactional database.
Very good video! 👏👏👏
Thanks!
thank you
You're welcome
Which soft ware tool do I use to create a Data Base design
I use LucidChart but you can use many different tools. I've got a list on my post here: www.databasestar.com/data-modeling-tools/
@@DatabaseStar Oracle has a free database modeller, so has SQL server(sql management studio). and with MySQL, it is one built in. And then you have the expensive ones like Toad. But I am sure that thee are many others, like the ones for sqlite
I love this channel
Thanks!
About the phone number, you don’t usually care about too many phones about someone, unless you’re designing a database for a phonebook, usually two fields on the customer for phone more than suffice, make your queries simpler(no joins needed) and with a slightly better performance(that explain can tell you over thousands of records), so putting as dismembering the phone number as multiple tables, as a globally good advice you’re potentially adding unneeded complexity.
When you’re modeling a database you need to know the requirements.
A generally true good advice is KISS(Keep it Simple Stupid)
That's a good point. However the reason to have a separate phone number is because you can often have multiple phone numbers for a person, and they can have different types (mobile, work mobile, work, home, business, etc). It's not a good design to just have phone1 and phone2 because you don't know what type they are and you're restricted to 2 phoen numbers.
Having a lookup table for phone number shouldn't slow down performance, as it won't be a large table, and with indexes the lookups will be quick.
@@DatabaseStar I once worked for a company how did a CRM for pharma sales reps and was following in the "keep it simple" path with several fields. You'll end up with four or five fields for phone numbers and still run into situations where it's not enough. They finally re-did the phone number in a more normalized way and things got just simpler.
so number 6 is basically making a proper join table rather than having a messy ass many to many ?
Yeah, that's right!
Genius!
Thanks!
good video
Thanks!
It is very sorry that there us only 4 comments in 13 years fir such a informative video presentation.
Glad you like it! But the video is not 13 years old, it’s only a few weeks old.
Im curious! What kind of accent/dialect is this?
I’m from Australia so it’s an Australian accent!
u should have watcehd how to avoid 7 of the most common yt video making mistakes... :(
Oh why is that?