One of the best SQL project tutorials I have seen so far especially because of the way the tutor explained the different steps followed by great and useful examples 👌 I learned a lot 🙏
Glad you're enjoying our courses! We already have several Python guided projects available on our website at mavenanalytics.io/guided-projects. I would recommend checking out the AirBnB Listing Analysis in Python -- it's a basic project and it's free!
Hi Matthew - Thanks for the note! I reran the SQL query and got 20 again, and I created a PivotTable in Excel and got 23 just like you. When I looked into the discrepancy, I found that the three additional order IDs were 740, 3292 and 4623. I looked at those orders individually, and for each order, the final item ID was NULL. This is something that SQL was ignoring and that Excel was counting as an item. As for which one is correct, I think it could go both ways. This is a data set that we created, but in reality, rows with a NULL item ID could either mean that the item ID is missing but the item was actually purchased, or that the row was created in error and the item wasn't actually purchased. I do think that the former is more likely though! You bring up a great point here that the different tools return different results based on how they handle NULL values. Thanks for catching that and I'm glad the discussion can be documented here in the comments.
Object 2 Q4. How many items were ordered within this date range SELECT COUNT(item_id) from order_details: output: 12097 vs SELECT COUNT(*) FROM order_details; output: 12234 I feel this should be explained further.🤔
Great question! With COUNT(item_id), the query will return the number of non-NULL item_ids in the table. With COUNT(*), the query will return the number of rows in the table. To view the rows that are captured in your second query and not your first query, you can run: SELECT * FROM order_details WHERE item_id IS NULL; You can see that some rows have order_id values, but are missing item_id values. To see an example of this, check out order 161: SELECT * FROM order_details where order_id = 161; I assumed that even when item_id was NULL, because it was captured as a row with a valid order_id, it still counted as an item. If you assume that NULL item_ids shouldn't be counted at all (which is also valid), then you can also choose to ignore those ids. In short, either query can be correct based on your particular assumptions.
Glad to hear that! Here's where you can find all our guided projects, including the baby name analysis one: mavenanalytics.io/guided-projects#projects-list And if you're just looking for the data, you can find it here: mavenanalytics.io/data-playground?search=baby%20names Happy learning!
I think you got this one wrong here 15:21 / 35:04 you were suppose to tell the number ot items that were sold. total number of items which were available at the restaurant in that range. This is thr query I think should have been used. select count(distinct item_id) from order_details;
Based on the question that you're answering, I would say both solutions are correct. In the video, my goal was to find the number of items ordered within the date range (customer A orders 2 burgers, customer B orders 1 burger = 3 items) which is why I did a SELECT COUNT(*) FROM order_details; If your goal is to find the number of unique menu items that were sold at the restaurant within the date range (customer A orders 2 burgers and 1 pasta dish = 2 menu items), your solution of SELECT COUNT(DISTINCT item_id) FROM order_details; would correctly answer that question. Thanks for bringing this up!
@@kundankumar5520 Yep, that's correct. If you take a look at the rows with NULL values by doing a: SELECT * FROM order_details WHERE item_id IS NULL; you'll see that there are many rows where the item_id is NULL, but the rest of the row is filled with values. My assumption here would be that the order did happen, but for some reason the item_id didn't come through. In this case, it's up to you to decide how to handle the NULL values. If your goal is to do a deep dive into the item_ids, you may decide just to ignore the NULL values. In this case, my goal was to find the total number of items ordered, and because those rows with item_ids still seem valid, I chose to include the NULL values.
I'm having trouble joining the data sets. It seems like when I loaded the data into SQL, it made the item_id a STRING and the menu_item_id an INTEGER. Could that be the problem?
@MsGinnieG Yep! You have a couple options at this point. You can either reload the data sets into SQL and make sure both ID columns have the same data type, or when you're joining the tables, you can use the CAST function. For example, SELECT * FROM table1 JOIN table2 ON CAST(table1.item_id AS INT) = table2.menu_item_id. Good luck!
Here's the CREATE statement for the order_details table, where you can see the data types and the item_id field allows for NULL values. CREATE TABLE order_details ( order_details_id SMALLINT NOT NULL, order_id SMALLINT NOT NULL, order_date DATE, order_time TIME, item_id SMALLINT, PRIMARY KEY (order_details_id) );
Correct, you can also look at the item_name instead of the item_id to get the same results. Personally, I like to count IDs rather than names in case there are misspellings. Also, in this case, by using item_id, you can just look at the order_details table. If you use item_name, you would have to join the order_details table with the menu_items table, which takes more computing power. So in the end, while you could use item_name, I would recommend using item_id, which is the simpler option.
hi all, In the second objective last question where we need to find the number of order_ids with more than 12 orders, I am getting 23 as result. Kindly help
Thanks for your question! That is also a valid answer depending on your assumptions. -- Query 1 Solution: 20 SELECT COUNT(*) FROM (SELECT order_id, COUNT(item_id) AS num_items FROM order_details GROUP BY order_id HAVING num_items > 12) AS num_orders; -- Query 2 Solution: 23 SELECT COUNT(*) FROM (SELECT order_id, COUNT(*) AS num_items FROM order_details GROUP BY order_id HAVING num_items > 12) AS num_orders; For query 1, because we're doing a COUNT(item_id), we're counting only non-NULL item_ids. If our assumption is that only non-NULL item_ids are valid, then this is the correct answer. For query 2, because we're doing a COUNT(*), we're counting NULL item_ids as well. This can also be valid. In some rows of this order_details table, the order_id has a value, but the item_id is NULL. If our assumption is that these NULL item_ids should count as items ordered, then this is the correct answer.
In the order details table, the order id provides the unique ID for each order that was placed. For example, order_id 2 represents a single order that was placed at 11:57 that included 5 menu items.
@@Alice-at-Maven So doesn't mean that the total orders made within the date range should equal the total number of single orders placed? If so, shouldn't the answer be 439? Thanks!
@@kylewvs1 You are correct in that the total orders should equal the total number of single orders. However, the answer should still be 5370. If you run the SELECT * FROM order_details; within MySQL Workbench, it'll output the first 1000 rows and the last order_id is 439. However, the full table is larger than that. You can test it out yourself by running SELECT * FROM order_details ORDER BY order_id DESC; and you'll see that the order_ids go up to 5370. Hope that clarifies things!
have to say that every buck I spent in Udemy for Maven Analytics courses is worth it. appreciate these practical guidance. thanks a lot.
You're very welcome!
True that man
Easy to follow and felt guided throughout though I did attempt every question. Amazing content Maven as always!
One of the best SQL project tutorials I have seen so far especially because of the way the tutor explained the different steps followed by great and useful examples 👌
I learned a lot 🙏
Glad you enjoyed it!
Great explanations!!
This type of content is fantastic! Great practice!
Thank you so much for this guided project.😃
You're very welcome!
Really enjoyed this project and completing it!
Glad to hear that and nice work completing the project!
That was an awesome class.
Glad you enjoyed it!
Nice project thank you for providing me nice content.
When will you guys launch a guided project for python analytics . Your courses are awesome
Glad you're enjoying our courses!
We already have several Python guided projects available on our website at mavenanalytics.io/guided-projects.
I would recommend checking out the AirBnB Listing Analysis in Python -- it's a basic project and it's free!
Here's our most recent Python guided project on UA-cam: ua-cam.com/video/fwOU0lfreu8/v-deo.html
I just ran the numbers in EXCEL and POWER BI for Questions - How many orders had more than 12 items - I got 23 not 20.
Hi Matthew - Thanks for the note! I reran the SQL query and got 20 again, and I created a PivotTable in Excel and got 23 just like you.
When I looked into the discrepancy, I found that the three additional order IDs were 740, 3292 and 4623. I looked at those orders individually, and for each order, the final item ID was NULL. This is something that SQL was ignoring and that Excel was counting as an item.
As for which one is correct, I think it could go both ways. This is a data set that we created, but in reality, rows with a NULL item ID could either mean that the item ID is missing but the item was actually purchased, or that the row was created in error and the item wasn't actually purchased. I do think that the former is more likely though!
You bring up a great point here that the different tools return different results based on how they handle NULL values. Thanks for catching that and I'm glad the discussion can be documented here in the comments.
Object 2 Q4. How many items were ordered within this date range
SELECT COUNT(item_id)
from order_details:
output: 12097
vs
SELECT COUNT(*)
FROM order_details;
output: 12234
I feel this should be explained further.🤔
I was JUST looking at this! I believe the top query is the correct format because there are a few null values entered under the item_id column.
Great question!
With COUNT(item_id), the query will return the number of non-NULL item_ids in the table. With COUNT(*), the query will return the number of rows in the table.
To view the rows that are captured in your second query and not your first query, you can run:
SELECT *
FROM order_details
WHERE item_id IS NULL;
You can see that some rows have order_id values, but are missing item_id values. To see an example of this, check out order 161:
SELECT * FROM order_details where order_id = 161;
I assumed that even when item_id was NULL, because it was captured as a row with a valid order_id, it still counted as an item. If you assume that NULL item_ids shouldn't be counted at all (which is also valid), then you can also choose to ignore those ids.
In short, either query can be correct based on your particular assumptions.
Thankyou so much!! loved your style!! easy to follow. Where can i find maven baby name analysis project??
Glad to hear that! Here's where you can find all our guided projects, including the baby name analysis one: mavenanalytics.io/guided-projects#projects-list
And if you're just looking for the data, you can find it here: mavenanalytics.io/data-playground?search=baby%20names
Happy learning!
I think you got this one wrong here 15:21 / 35:04
you were suppose to tell the number ot items that were sold. total number of items which were available at the restaurant in that range. This is thr query I think should have been used.
select count(distinct item_id)
from order_details;
Based on the question that you're answering, I would say both solutions are correct.
In the video, my goal was to find the number of items ordered within the date range (customer A orders 2 burgers, customer B orders 1 burger = 3 items) which is why I did a
SELECT COUNT(*) FROM order_details;
If your goal is to find the number of unique menu items that were sold at the restaurant within the date range (customer A orders 2 burgers and 1 pasta dish = 2 menu items), your solution of
SELECT COUNT(DISTINCT item_id) FROM order_details;
would correctly answer that question. Thanks for bringing this up!
@@Alice-at-Maven Then you are counting the null value as well
@@kundankumar5520 Yep, that's correct. If you take a look at the rows with NULL values by doing a:
SELECT *
FROM order_details
WHERE item_id IS NULL;
you'll see that there are many rows where the item_id is NULL, but the rest of the row is filled with values. My assumption here would be that the order did happen, but for some reason the item_id didn't come through.
In this case, it's up to you to decide how to handle the NULL values. If your goal is to do a deep dive into the item_ids, you may decide just to ignore the NULL values. In this case, my goal was to find the total number of items ordered, and because those rows with item_ids still seem valid, I chose to include the NULL values.
I'm having trouble joining the data sets. It seems like when I loaded the data into SQL, it made the item_id a STRING and the menu_item_id an INTEGER. Could that be the problem?
@MsGinnieG Yep! You have a couple options at this point. You can either reload the data sets into SQL and make sure both ID columns have the same data type, or when you're joining the tables, you can use the CAST function. For example, SELECT * FROM table1 JOIN table2 ON CAST(table1.item_id AS INT) = table2.menu_item_id. Good luck!
I can't get the order_details.csv to import into pgadmin, not sure what datatypes to use, and the NULLs are preventing it. Anyone have any tips?
Here's the CREATE statement for the order_details table, where you can see the data types and the item_id field allows for NULL values.
CREATE TABLE order_details (
order_details_id SMALLINT NOT NULL,
order_id SMALLINT NOT NULL,
order_date DATE,
order_time TIME,
item_id SMALLINT,
PRIMARY KEY (order_details_id)
);
@@Alice-at-Maven That was fast, thanks!
HI,
In objective 2 where we need to find the most ordered items I counted the item name and I am getting the same results,Is it correct or not ?
Correct, you can also look at the item_name instead of the item_id to get the same results.
Personally, I like to count IDs rather than names in case there are misspellings. Also, in this case, by using item_id, you can just look at the order_details table. If you use item_name, you would have to join the order_details table with the menu_items table, which takes more computing power.
So in the end, while you could use item_name, I would recommend using item_id, which is the simpler option.
hi all,
In the second objective last question where we need to find the number of order_ids with more than 12 orders, I am getting 23 as result. Kindly help
Thanks for your question! That is also a valid answer depending on your assumptions.
-- Query 1 Solution: 20
SELECT COUNT(*) FROM
(SELECT order_id, COUNT(item_id) AS num_items
FROM order_details
GROUP BY order_id
HAVING num_items > 12) AS num_orders;
-- Query 2 Solution: 23
SELECT COUNT(*) FROM
(SELECT order_id, COUNT(*) AS num_items
FROM order_details
GROUP BY order_id
HAVING num_items > 12) AS num_orders;
For query 1, because we're doing a COUNT(item_id), we're counting only non-NULL item_ids. If our assumption is that only non-NULL item_ids are valid, then this is the correct answer.
For query 2, because we're doing a COUNT(*), we're counting NULL item_ids as well. This can also be valid. In some rows of this order_details table, the order_id has a value, but the item_id is NULL. If our assumption is that these NULL item_ids should count as items ordered, then this is the correct answer.
do anyone have the code link
I don't understand what does Column Order_ID mean in order_details table can someone explain what's that ?
In the order details table, the order id provides the unique ID for each order that was placed. For example, order_id 2 represents a single order that was placed at 11:57 that included 5 menu items.
@@Alice-at-Maven So doesn't mean that the total orders made within the date range should equal the total number of single orders placed? If so, shouldn't the answer be 439? Thanks!
@@kylewvs1 You are correct in that the total orders should equal the total number of single orders. However, the answer should still be 5370.
If you run the SELECT * FROM order_details; within MySQL Workbench, it'll output the first 1000 rows and the last order_id is 439. However, the full table is larger than that.
You can test it out yourself by running SELECT * FROM order_details ORDER BY order_id DESC; and you'll see that the order_ids go up to 5370. Hope that clarifies things!
@@Alice-at-Maven Yes it does - Thanks!
The vocal fry is very irritating, but trying to finish the project here. Thanks for the material.