Well done, great work clearly communicated. My congratulations. I am busy with the same dataset in R programming and I am having great time playing with over 5 millions observations. I am concerned with not taking care of unusual data points (observations) like long duration over 24 hours, way out the micro mobility service would have. The data set doesn't desagregate the casual riders into two types of pass or rates. Just like gender, so no point to bring them into play. That long barchart for comparison across the months (with time variable) would gives a better sense of variation with a line or scatterplot. The last but not the least, your analysis focus more on the counts of the trips or rides than the time duration. From business point of view, while the number of customers matters, the most important is how much they pay, and in this case the proxy variable to measure and analyze that would be the trip duration (the longer you ride the more you pay). Look there, approximately 500 observations with riding duration of less than 1 mint and counting them would distort or impair with the really picture and insights as compared to where you consider and analysis the time spent. I am feeling is that would be impossible to move from one station to another in such short time (the stations are scattered) and it would also be pointless to pay a service like that just for few seconds.
I absolutely agree with you, especially with the last point. Every business is focused on how much it is making in relation to the number of clients they have, and a possible way of improvement which I believe is the reason for this analysis. Hence, analyzing the time duration would have given an idea of how much is earned, as the longer the ride, the more the client is likely to pay. How is it coming along with your analysis in R? Just starting mine in R, and I am a total newbie to R. I will earnestly need your help with some questions. Can I email you, please?
@@edmondnathan7611 I'm about to work on this case study but Im lost and confused and can not figure out where to begin. How do we connect to share ideas on how to work on this case study?Thank you
Thank you very much , you did an amazing job, and explained it very well, I hope they called you from Google or Amazon or another big tech one, because honestly you deserve it lol.
Where did you get your information regarding the annual benefits? Perhaps the case study has changed since you took the course, but I could not find any information regarding annual members getting unlimited 45 min rides in the documents provided by Google.
It's a bad idea to combine all into a single spreadsheet. Excel can't handle millions of rows. An alternative is to set macro while cleaning a dataset, then apply it to other dataset
I have been stuck in this case study for over a year. Everytime I start I get stuck. How to combine the data of 12 month!? The excel and SQL doesn't let me do , coz data is huge. Following the case study document isn't helping, they talk about opening excels and csvs separately (there is only csv), and store them in one excel, then do data cleaning , then upload to SQL db, (btw MySQL takes csv files)- dude the data is large! we can't do it ! So now I have downloaded 12 months data - csv, how do I proceed !!?? someone please help....
There are multiple ways to make this work. If you are using SQL you can use UNIONs or create new table, and just INSERT INTO the files into the table and it should work
Hi I just chose this case study. I downloaded the data and I don't know how to separate CVS and XLS.. Also what program is easier to use for cleaning and analysis?
They give divvy trip dataset which does not include price, does not show the gender, name, age of the rider which leaves me with a very limited dataset.
Tip: You can join all of them using PowerQuery (get ALL the files as .csv in a paste - open a new excell file - acess DATA options and find the option "insert data from paste" and select the paste that have all the files - it'll open a windom with all the files selected - click to upload (First option from bottom left to right) - then, when shows a new pop-up, click on *CONECTION ONLY* (most important step) - after that, You can insert a New pivot table, but select the second option "from a database" and insert the conection you've Just created. Now you can use the pivot table (without showing all the 5.6 millions rows) and make your analyses. Remember that, If You want to change a column or create a New one, open the conection you've made and click on *EDIT* to open PowerQuery and make your changes (those steps aren't teached on this course.) Hope I can help you.
congratulations! I have been stucked with this same dataset two weeks now🤐, I don't know where to start 😑 please if there is anyone here that we can work together with both cyclistic case study to get insight I will really appreciate and any suggestion how I can go about it , I'm confused, stucked at excel since last week .
hi, i'm using excel for this case study, if you don't mind we can work together because i'm little bit confused too, idk what should i do after cleaning the dataset...
@@yisranindhita2282 Hi Yisra! I am almost done with my data cleaning and analyzing using Sql and its really for fun to work with. after Sql then i will use power Bi for it.
stuck so badly with this capstone😶😶
Well done, great work clearly communicated. My congratulations. I am busy with the same dataset in R programming and I am having great time playing with over 5 millions observations.
I am concerned with not taking care of unusual data points (observations) like long duration over 24 hours, way out the micro mobility service would have. The data set doesn't desagregate the casual riders into two types of pass or rates. Just like gender, so no point to bring them into play.
That long barchart for comparison across the months (with time variable) would gives a better sense of variation with a line or scatterplot.
The last but not the least, your analysis focus more on the counts of the trips or rides than the time duration. From business point of view, while the number of customers matters, the most important is how much they pay, and in this case the proxy variable to measure and analyze that would be the trip duration (the longer you ride the more you pay). Look there, approximately 500 observations with riding duration of less than 1 mint and counting them would distort or impair with the really picture and insights as compared to where you consider and analysis the time spent. I am feeling is that would be impossible to move from one station to another in such short time (the stations are scattered) and it would also be pointless to pay a service like that just for few seconds.
I absolutely agree with you, especially with the last point. Every business is focused on how much it is making in relation to the number of clients they have, and a possible way of improvement which I believe is the reason for this analysis. Hence, analyzing the time duration would have given an idea of how much is earned, as the longer the ride, the more the client is likely to pay.
How is it coming along with your analysis in R? Just starting mine in R, and I am a total newbie to R. I will earnestly need your help with some questions. Can I email you, please?
@@edmondnathan7611 I'm about to work on this case study but Im lost and confused and can not figure out where to begin. How do we connect to share ideas on how to work on this case study?Thank you
you are so good at this.
Thank you very much , you did an amazing job, and explained it very well, I hope they called you from Google or Amazon or another big tech one, because honestly you deserve it lol.
Where did you get your information regarding the annual benefits? Perhaps the case study has changed since you took the course, but I could not find any information regarding annual members getting unlimited 45 min rides in the documents provided by Google.
Hi, i am stucked with this dataset. My problem is, how do i combine this dataset together? Like all the months in just one spreadsheet. Thank you 😢
It's a bad idea to combine all into a single spreadsheet. Excel can't handle millions of rows. An alternative is to set macro while cleaning a dataset, then apply it to other dataset
I have been stuck in this case study for over a year. Everytime I start I get stuck.
How to combine the data of 12 month!? The excel and SQL doesn't let me do , coz data is huge. Following the case study document isn't helping, they talk about opening excels and csvs separately (there is only csv), and store them in one excel, then do data cleaning , then upload to SQL db, (btw MySQL takes csv files)- dude the data is large! we can't do it !
So now I have downloaded 12 months data - csv, how do I proceed !!?? someone please help....
How do i connect with you?
thank you for this presentation, it inspires me
amazing work congratulations!!!
12 data set we have to prepare them one by one or join then work with it???
Does anyone know how to merge the 12 months? Is there a way to do so... or is it even the correct approach?
There are multiple ways to make this work. If you are using SQL you can use UNIONs or create new table, and just INSERT INTO the files into the table and it should work
Hi, Hows do I contact you? I need help with SQL? do you have a tutoring channel or something? thanks in advance
Pls attach the link of this data set - Raw data set.
where to download the complete dataset
Hi I just chose this case study. I downloaded the data and I don't know how to separate CVS and XLS.. Also what program is easier to use for cleaning and analysis?
keep it in csv only and do cleaning in excel and then upload all the data in SQL (or R if you're good at it) and do the analysis
Sql or R will be better.
Excel crashed on me when I first tried to use it for my analysis
They give divvy trip dataset which does not include price, does not show the gender, name, age of the rider which leaves me with a very limited dataset.
Exactly the point
I stuck where i cant upload the csv files beacause it is too large
Tip: You can join all of them using PowerQuery (get ALL the files as .csv in a paste - open a new excell file - acess DATA options and find the option "insert data from paste" and select the paste that have all the files - it'll open a windom with all the files selected - click to upload (First option from bottom left to right) - then, when shows a new pop-up, click on *CONECTION ONLY* (most important step) - after that, You can insert a New pivot table, but select the second option "from a database" and insert the conection you've Just created.
Now you can use the pivot table (without showing all the 5.6 millions rows) and make your analyses.
Remember that, If You want to change a column or create a New one, open the conection you've made and click on *EDIT* to open PowerQuery and make your changes (those steps aren't teached on this course.)
Hope I can help you.
Valuable
I have some questions
congratulations! I have been stucked with this same dataset two weeks now🤐, I don't know where to start 😑 please if there is anyone here that we can work together with both cyclistic case study to get insight I will really appreciate and any suggestion how I can go about it , I'm confused, stucked at excel since last week .
I am just starting the project as well. We could run through it together. Although, because of the size of the data, I am using R to analyze mine.
@@edmondnathan7611 ohhh, I am not used to the R yet but I think I will use sql
hi, i'm using excel for this case study, if you don't mind we can work together because i'm little bit confused too, idk what should i do after cleaning the dataset...
@@yisranindhita2282 Hi Yisra! I am almost done with my data cleaning and analyzing using Sql and its really for fun to work with. after Sql then i will use power Bi for it.
@@lateefopeyemi3808 can u please help me out i am confused