I might be missing something, but... Once you have trained and tested the model, what is the process to apply the model to predict the following year? In this video you trained the mode to predict the "Next_WAR" which in this case would be the players 2022 WAR, and then evaluated the model based on the real result vs. your predicted result. But, if you wanted to predict 2023 WAR, how would the code need to be adjusted? Essentially, how do you used the trained model to predict 2023 player WAR?
@@willcarroll9762 This model can only predict one year out into the future. To predict 2023, you would need 2022 data. It's not necessairly a full time series analysis, but a linear regression model used to predict the following years stats. Predicting Next WAR is predicting next years stat. You could attempt to create a column for 2 years out into the future by shifting the 'WAR' column again and testing how the model predicts two years into the future and so on. My guess is it may start performing poorly at that stage.
you could train it based on the first 3 months of data to predict the next 6 months of the season or however u want. For my mlb ML model i train it on March-July to predict August-October
Hello! This is an awesome project and walkthrough that you've done! I actually wanted to try predicting HR's instead of WAR's in this model, but when I tried it with scaling the data for ridge regression, I would get HR numbers between 0 to 1 with the minmax scaler. But if I skip that part, I'd get the whole number of the predicted HR for the next year. Would it still be accurate if we are just looking at HR's when I skip the scaling? Again, Great Video!
Hey Vik, coming here from your more recent video with NBA stats analysis. In this instance, is pybaseball replacing the more manual work being done by playwright and having to parse the specific html in order to scrape the data you need? Is there an equivalent for the NBA to pybaseball? I think there may be one for the NFL that I've seen in places but this is all new to me so I can't be sure. Just struggling a bit with adapting that previous video to be a regular python file instead of following along directly with your Jupyter tutorial is all.
A little confused on the Sequential Feature Selector, you mention that after normalising the data - it picks the features that it thinks will help with accuracy the most, how is it determining that? Sorry if that's a stupid question.
a bit confused on what the purpose of making the full copy and then dropna() was. it doesn't seem like the full copy was used at all throughout the rest of the code?
That's great to hear, Wanjohi! I actually started a site called Dataquest where you can learn data science from scratch - the data scientist path will teach you all the main data science skills - www.dataquest.io/path/data-scientist/ .
Does anyone know why I wouldn’t be able to import pybaseball on JupyterLab anymore? I’m trying to follow along on my own notebook and for some reason I’m getting an error code that the module doesn’t exist. Thanks for any help in advance!
I have a problem running this...help removed_columns = ['NEXT_WAR', 'Name', 'Team' ,'IDfg', 'Season'] selected_columns = dataset.columns[~dataset.columns.isin(removed_columns)] 'AttributeError: 'function' object has no attribute 'columns'
It looks like 'dataset' is a function for some reason. It should be a pandas Dataframe. Make sure you didn't accidentally assign to the `dataset` variable.
@@DataquestioI have that same issue and I have just started Dquest and was just using this as a follow along project while I wasn’t studying. I have some knowledge but not yet to this stage yet just working towards familiarity
the fact that this type of content is FREE is mind blowing
he fact that people are not knowing this is another mind blowing thing
@@anishapostate4221 i wldnt say that, there are plenty more ppl who dont know this than do
Great project...love the concept of dataquest's guided project walkthroughs. Thanks Vik
That day I joined the webinar slightly late so I was excited about watching this video.
I might be missing something, but...
Once you have trained and tested the model, what is the process to apply the model to predict the following year?
In this video you trained the mode to predict the "Next_WAR" which in this case would be the players 2022 WAR, and then evaluated the model based on the real result vs. your predicted result. But, if you wanted to predict 2023 WAR, how would the code need to be adjusted?
Essentially, how do you used the trained model to predict 2023 player WAR?
You ever figure it out? I’m struggling there too
@@willcarroll9762 This model can only predict one year out into the future. To predict 2023, you would need 2022 data. It's not necessairly a full time series analysis, but a linear regression model used to predict the following years stats. Predicting Next WAR is predicting next years stat. You could attempt to create a column for 2 years out into the future by shifting the 'WAR' column again and testing how the model predicts two years into the future and so on. My guess is it may start performing poorly at that stage.
you could train it based on the first 3 months of data to predict the next 6 months of the season or however u want. For my mlb ML model i train it on March-July to predict August-October
Hello! This is an awesome project and walkthrough that you've done!
I actually wanted to try predicting HR's instead of WAR's in this model, but when I tried it with scaling the data for ridge regression, I would get HR numbers between 0 to 1 with the minmax scaler. But if I skip that part, I'd get the whole number of the predicted HR for the next year. Would it still be accurate if we are just looking at HR's when I skip the scaling?
Again, Great Video!
You don't want to scale your target column. So if you're predicting HRs, you want to scale all of the columns except the HR column.
Hey Vik, coming here from your more recent video with NBA stats analysis. In this instance, is pybaseball replacing the more manual work being done by playwright and having to parse the specific html in order to scrape the data you need? Is there an equivalent for the NBA to pybaseball? I think there may be one for the NFL that I've seen in places but this is all new to me so I can't be sure. Just struggling a bit with adapting that previous video to be a regular python file instead of following along directly with your Jupyter tutorial is all.
A little confused on the Sequential Feature Selector, you mention that after normalising the data - it picks the features that it thinks will help with accuracy the most, how is it determining that? Sorry if that's a stupid question.
How would you be able to do this for "Predicting" an player to record a hit in a given game? Is that possible?
any idea on why pybaseball package no longer loads. I tried pip install pybaseball, and I get an error.
Great video. Thanks 💯💯
Incredible video - thank you so much
I appreciate this content sir. Thank you so much!
a bit confused on what the purpose of making the full copy and then dropna() was. it doesn't seem like the full copy was used at all throughout the rest of the code?
Your videos are amazing. I'm starting to love ML.
What advice will you give to someone who is starting Data Science...
That's great to hear, Wanjohi! I actually started a site called Dataquest where you can learn data science from scratch - the data scientist path will teach you all the main data science skills - www.dataquest.io/path/data-scientist/ .
Thank you immensely for sharing
how would you adjust the code to predict 2023 war?
did you solve for this?
Is it possible to this in R I am just started to learn about programming so I don’t have much knowledge about this
What editor are you using for this?
It’s Jupyter Notebook
thank you for this content!
I don't think pybaseball is working any more. I get a blank .csv at the beginning after supposedly downloading the Fangraphs data.
Does anyone know why I wouldn’t be able to import pybaseball on JupyterLab anymore? I’m trying to follow along on my own notebook and for some reason I’m getting an error code that the module doesn’t exist. Thanks for any help in advance!
Can you do something similar for English Premier league soccer?
I am having trouble finding the batting csv file
I’m not getting the CSV when I run this. Can anyone help?
'Customer segmentation and clustering in retail using machine learning' with real data set. Please make a project tutorial in this project😭😭😭😭
nice
I have a problem running this...help
removed_columns = ['NEXT_WAR', 'Name', 'Team' ,'IDfg', 'Season']
selected_columns = dataset.columns[~dataset.columns.isin(removed_columns)]
'AttributeError: 'function' object has no attribute 'columns'
It looks like 'dataset' is a function for some reason. It should be a pandas Dataframe. Make sure you didn't accidentally assign to the `dataset` variable.
@@DataquestioI have that same issue and I have just started Dquest and was just using this as a follow along project while I wasn’t studying. I have some knowledge but not yet to this stage yet just working towards familiarity
Ty for sharing. Amazing content.