Predict The Stock Market With Machine Learning And Python
Вставка
- Опубліковано 11 чер 2024
- In this tutorial, we'll learn how to predict tomorrow's S&P 500 index price using historical data. We'll also learn how to avoid common issues that make most stock price models overfit in the real world.
We'll start by downloading S&P 500 prices using a package called yfinance. Then, we'll clean up the data with pandas, and get it ready for machine learning.
We'll train a random forest model and make predictions using backtesting. Then, we'll improve the model by adding predictors. We'll end with next steps you can use to improve the model on your own.
You can find an overview of the project and the code here - github.com/dataquestio/projec... .
If you enjoyed this tutorial, check out this link bit.ly/3O8MDef for free courses that will help you master data skills.
Chapters
00:00 - Introduction
01:28 - Downloading S&P 500 price data
03:30 - Cleaning and visualizing our stock market data
04:29 - Setting up our target for machine learning
08:19 - Training an initial machine learning model
17:01 - Building a backtesting system
23:05 - Adding additional predictors to our model
28:45 - Improving our model
33:37 - Summary and next steps with the model
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
Hi everyone! You can find the code for this tutorial here - github.com/dataquestio/project-walkthroughs/tree/master/sp_500 .
Thanks Vik!
Thanks Vic, However your F1 score is at 0.5. How does that factor in?
Thanks, but it's incomplete.
Hey Viki. You should have used the pd.dropna(inplace=True).
Great video. Will you or can you provide additional information on other useful classifiers and also how to merge other data sources like news and sentiment into this code?
Clear and to the point. I hate super long videos full of things that don't provide much value. This one was great. I like that he walked through general data science/machine learning steps. In particular the data cleansing which many skip over, but it is actually an important step. Also, a pet peeve of mine is audio quality. This video you can hear the presenter clearly and he doesn't sound like his is working from a tin can.
Excellent. This tutorial corrects an error that pretty much every other video from others that I have seen has made. Don't seek MSE precision in your target as your goal. That's not what practitioners are looking for. Do what this educator has done instead. This model gets it right as used in the real world. Solid base to work with. Well done!
No, this is not even close to how practitioners have approached the problem in the last 30 years…
I’m new to coding but have always been an avid market watcher and looking for opportunities. Best video I’ve seen since I started scouring the depths of UA-cam for this content last week. Thank you sir!
I cannot thank you enough! It's very straight to point and I've learned more in this video than in n online courses and articles.
This was an amazing walkthrough. I have learned so much!
thank yiu so much fir the video. I have taken varius courses in different places, and your video and teaching style are certainly the best !
Very thorough and loved it sir. Thanks for the video lesson.
Great video. Thank you for the insights. Going to be tuning into more of your work.
Searched & watched a LOT of videos. This is the best. Well done man.
have you tried them? do they work on real data?
My man is doing noble work. Kudos!
Great video. Really clear and at a pace that allowed me to follow it easily and learn some new and simple techniques in how to manipulate data.
Excellent video, thank you for sharing this. Hopefully I can see more ML related videos going forward.
Watched up to 2:26 and I already know this is going to be excellent.
Clear and concise explanation from the start and you know this is going to be more than your ordinary YT tutorial
It's not excellent, you can't beat the market as regular person. You basically compete with Harvard graduates with math, computer science, etc. Degrees. Again, one UA-cam video won't make you beat the market
@@alang.2054 someone had to break this kids dreams of being rich off a youtube vid
@alang.2054 Where'd you get that she said she would beat the market from her comment?
I read an observation just stating that, this video is higher quality than most YT videos that claim to teach you something specific yet just give you fluff..
Thats a really good video and it seems you really know what you are talking about. Thanks!
Vik, I echo the compliments on the excellent video. I was able to use my own bespoke weekly market timing signals aligned with weekly S&P closes to finally get a grounded statistical "opinion" on the predictability of forward returns - as only my second Python exercise! Thanks!
Very useful man, thanks for show us the way!
Explaining is on top. Thank you!
DUDE THIS IS SO HELPFUL
Thanks for your great video. Im curious to read more about the whole issue of predicting actual prices versus only the direction. Do you have a good source on this?
This is awesome, instead of showing what you need to learn or try it shows how to actually build a model. This is very usefull. Thank you!
Could we get a similar video bus featuring a deep learning model instead?
What are you talking about? Do you really think this guy would show you real ways to make money? On market you compete with professionals in multi billion hedge funds with degrees, you can't beat them with UA-cam video
Super helpful - Thank You !!!
Incredible video! This helped me a whole lot I really do appreciate it! I Just Liked and Subcribed!
Thank you very much for this! Truly found this useful for my first ML Project. However, a bit confused by the 'combined' graph - how did you get it? :) (I had to do mine using the train_test_split import.)
What a great framework to ML time-series data for prediction. Thanks for sharing!
thank you thank you !! this is great, suscribed :)
Thanks so much, you're a blessing
Thanks for your great video. Im curious to read more about the whole issue of predicting actual prices versus only the direction. Do you have a good source on this? I can see why the latter is more robust, but once you start accounting for transaction costs, the magnitude of the direction is also important. curious to get your thought on this too.
No one does what he did because it’s stupid. It’s been common practice for over 40 years to calculate the logged odds of the derivative of the price (logged odds of the returns).
Hi Vik. Thank you very much. Is it possible to predict two days in advance instead of just tomorrow?
Thank you so much, I’m learning to build and plot models, I’m basically copied your code and tried to understand it,
What’s your advice to learn how to do it yourself?
Great video, thank you!
I'm hoping you can do a follow up video to this. Would be great to see how you would incorporate macro data into your model, such as news or interest rates.
This was very well delivered. Thank yo sharing.
I will consider the suggestions you made and see how this works.
Very exciting with a bit of 😅.....
Vik thank you for this video! Greetings from Poland. Please explain to me how to connect the model so that operating on a virtual server bought and sold instruments? How do you combine it?
How would you use the volume column?
Not sure how to use the volume, can we build some relative volume indicator? Can you give a hint, or maybe a link to a video, where you use volume somehow to improve your model?
Volume should influence the model significantly.
Great tutorial!
Very good explanation, thanks.
This is very nice way to get started using data science with the markets. This gives a nice framework to get started. And attempt to expand the predictors (on RSI based or Change in Open Interest , some correlation with the major stocks composing that index) . Thank you for sharing.
Congratulations for your explanation and it was very clear. I would like to suggest you to prepare a vide including news about the stock into this model. Thanks
Cool Video! Thank you!!
The S&P 500 is still up 10% this year. It's not a get-rich-quick scheme, but it's a proven strategy for wealth accumulation over time, Which happens path i'm considering so as to hedge the losses on my $350k portfolio, but are there any drawbacks to buying such quality stocks?
Well, one potential downside is that they may not offer the same rapid growth potential as riskier, smaller-cap stocks. So, it depends on your goals and risk tolerance. you may want to work with a financial advisor who can help with right approach.
this is definitely considerable! think you could suggest any advisors i can get on the phone with? i'm in dire need of proper portfolio allocation
very much appreciated, your response suggests a person of benevolence.. just inputted her full name on my browser, and came across her site, top-notch qualifications! she seems well-qualified
@@TeresaBricklefuck you bots no ones gonna fall for that
Spam comment chain, please remove
These are great for practice Keep em coming
Glad you like them, Prathamesh! -Vik
Great tutorial 🙏
This was an excellent presentation.
Wow, the concept of predicting the stock market using machine learning and Python is such a fascinating topic! The blend of finance and technology is always an area ripe for innovative approaches. It's impressive how machine learning can analyze vast amounts of data to find patterns that might not be obvious at first glance. Python, with its extensive libraries and community support, is an excellent choice for such complex computations. It's exciting to think about how these tools can provide insights into market trends and possibly even predict future movements. The intersection of machine learning and finance is definitely a space to watch! 📈💡🤖
Hi, great lesson,
I have a question.
I'm still new to data science.
But why didn't you use the data as a predictor?
Im asking because say we want to predict what happens in the next day.
How do i pass it to the model when i didn't train with it
Great job! I used the majority of your code but for a specific company. My personal aspect is that this "result" is a bit messy. Do you have any tips on how we could make a clear graph towards the end with "predicted values"? I tried graphing with "Tomorrow" with respect to "Close"m but no difference. Part of that reason could because of the wide X-axis.
Thanks again, looking forward to your answer! / Alexander
Thank you for your videos. But what if I have multiple stocks to predict, and when I parse one stock id in, I want to get the specific prediction for that id only. will it be feasible?
Excellent video!
Brilliant video Vik! Towards the end, you mentioned adding news to the model. Could you share how one could integrate that?
Thanks!
Hi Jeevan - the easiest way to do it is to scrape daily headlines from say the new york times, and create a "sentiment" model to indicate confidence in the market. The output of that model could then be a predictor column. Of course, you could get a lot more complicated than this :)
cool went threw the whole process on mini conda.
Great video , I hope to see more tutorials like this in the future.
Great stuff!
Amazing video!! Have yiou looked at the performances of other ML techniques, e.g, MLPregressor?
Great video. It seems that the yfinance api is no longer functioning. Could you please do an updated video using a different method to collect the date? Thanks.
Hi, how do I predict the next , for instance in a new data.
Hello Vik, Thanks for the great tutorial, really informative. Do you know how to add lorentzian classification to the model in your example?
Hey man, how did you get into this kind of work? Im so keen to find some work doing what you did but am finding limited possibilities
Excellent Video. Thank you for sharing. Question, how can we compare the 'influence' from another stock in the same industry, ie, two retail stocks, or two energy stocks?
correlation maybe.
hi, I wonder how reliable would this be if I predicted the 10, 20, or more candles into the future with an accuracy of 75 to 90 percent. do you think its gonna be useful in the financial markets. since I did create features which predicts the prices with an accuracy of 85 percent.
great channel, will try to get some of my time to get to do something meaningful with the help of dataquest
How do you add additional columns that will display information from yahoo finance such as pe ratio dividens and so on
The features used for the random forest cannot be the high, close, low , open values directly without any transformation because what the model is essentially doing is creating a overfit of non linear decisions to certain prices ranges. It is basically memorizing that when the close was above X value and open below Y value predict 1 or 0. You need to normalize the predictors in some way so that the model can use them independently of how high the value the stock is and truly create generalizable rules. Ratios are good since they use percentage instead of using absolute values and allow the model to use information of multiple candles as well.
Quite important comment.
Excellen video. I think you have a great teaching ability. I'm surprised you did not start with the usual "THIS IS NOT FINANCIAL ADVICE..." disclaimer 😇
Sir your explaining skills are top notch
your ability to hide though is not....
Actually you forgot to measure the expectancy of a trade in the case it has a precision of 42%. Because what makes a strategy profitable is bit the win rate but rather the expectancy of the trades. Although it is a great video and a good tutorial about programming. Thanks and keep up the good work.
Good and clear explanation :) Although there are other factors to be considered like bid offer spread and commissions. Also, when the market goes against you, do you wait before the end of day to close the losing position? Maybe setting a stop loss and including it in the model and back testing can help. Thanks.
how would commissions help? lol
@@Mike-fm3km In the back testing of the model, it may seem profitable but after considering the commissions/transaction fees, it might be unprofitable instead.
How has the model done this year? Does it show a topping formation?
Which SOFTWEAR used for run this code ?
hello sir , can this be used for day trading , in indian market for options trading of bank nifty and nifty in a 5 minutes candle time frame during market hours and feeding real time data?
Hi Vik - thank you for the great video
This could be a dumb Qs - in "Improving Our Model" section, why didn't you change Predictors to "NEW_Predictors" when you defined the function/ when you've copy paste?
Does this matter?
Thank you,
AL
"NEW_Predictors" was passed while calling backtest function which calls predict function with "New_Predictors". Hence New_Predictors was used for modelling
Isn’t there leakeage in the ‘trend’ feature, considering it is a function of future values (‘target’)?
Hi @Everyone, I am getting this following error when trying to get the predictions for the second time on the new_predictors
Code: predictions = backtest(nifty50, model, new_predictors) FYI, I am using Nifty50 dataset.
ValueError: Length of values (1) does not match length of index (250)
Can anyone guide me through this error, I am not getting it. Any help would be much appreciated.
Thanks, Vic.
Thank you ❤❤
Hi, the way you explain is much better than any other. But I have a [ ValueError: No objects to concatenate ] after
" predictions = backtest(AXIS, model, predictors) ".
How to solve it?
you dont have enough data to concatenate
Do we have any latest updates to this model? Adding extended logic for improvements?
I got an error all_predictions[ ] is not defined. Use all_predictions = [ ] instead and it will work for you. Hope this helps
Amazing work! Although I have a few doubts. I selected 18 features - from global stock indices, currency, and commodity - to predict daily directional changes in Nifty 50.
1. I'm not using the closing price for input variables rather I'm using the difference in previous close and current close. Is this a correct approach.
2. Also, can I split the target variable into 5 category (Up, Down, Neutral, Extended Up, Extended Down).
1) wouldn’t that be the same as using closing values?
2) interesting idea but it will probably reduce the over all effectiveness of the model because it reduces the amount of training data that meets the 5 categories vs 2. I don’t know about India exchanges, but in the US, for example, Fidelity charges $0 trade fee and keeps $0 from market makers for order flow. It all goes to the customer as price improvement. This is an extreme case, but my point is that I’m 2023, there should be markets you can trade for little to no cost. The brokers want your limit orders because it provides their other customers more liquidity without having to execute through a market maker. Also, they sell the limit order data to hedge funds that use that extra level of info to have an edge on the markets.
i didnt get the point of shift{1} in the trend column? why shift 1 forward?
Hello, thank you very much for the video, I am new to ML, I would like to know how to use the model? How do I see the prediction for the next day? thanks and greetings
Hello! Why the column "Tomorrow" wasn't used for training? 🤔
I suggest you google the semi strong efficient market hypothesis. Would save a lot of time.
In Step:31 (Time 20:00) the 10 year loop looks very similar to LSTM. Why not use LSTM instead?
is it possible to have a view of the daily basis, but also input training on intraday data to improve the daily view?
What did you use for the risk rate as there is no such thing that exists in finance
Anyone know how to get stock market data api for NEPSE? I'm a beginner at these things and I wanted to know if a feature like yfinance is available for NEPSE for python. Thanks.
Hey Vik! You mentioned at 9:35 if someone could find a linear correlation in stock/financial market they’d do pretty well. Is this because linear relationships are easier to analyze than non-linear relationships?
Mathematically speaking, a linear relationship is when one thing goes up, something else goes up as well. So if you found the linear relationship between something and the stock market, if your indicator did something you would be able to tell exactly what the stock market would do next. Finding this would make you an infallible trader
Hint: on a recent macbook you can use all its cores by:
import joblib
N_CORES = joblib.cpu_count(only_physical_cores=True)
...
model = RandomForestClassifier(n_estimators='your value', min_samples_split='your other value', random_state=1, n_jobs=N_CORES)
The speedup is amazing
you don't need any information about the system to do this, n_jobs = -1 will use all the available cores with no imports or extra lines :)
If I'm running this code today, will I be able to see the prediction for tomorrow's in the target column as 1 ( if it's going up)? For some reason, I'm not able to see it on the table. Am I missing something ? can anyone guide? thnx
I dont understand why we use backtest, that means we train our model several times (10 years, 11 years, ...., etc ) and get all predictions but in the real world if I want to predict tomorrow with that I will be used the last training in the loop which is the whole data minus the last 250 days, can anyone clarify if I am wrong?
I get how we can predict for one day, but can we predict with this model for several days, or what the trend will be for the next week?
The confusion I have on this video is if you call 'fit' multiple times on the different periods of data does not sklearn create a new random forest model from scratch? Shouldn't you build all the data together in one training set with the test chunks left out and put into a singular data set?
hey, please can anyone tell me, is this one classification model or regression model?
The last row regenerated by the backtest is 2022-05-17. If the latest existing close is 2022-05-18 (assuming we’re in the morning of 2022-05-19), how is it we can predict the close of 2022-05-19?
I suppose this has something to do with dropping rows with NaN…
So a question, it’s currently 04-11 and I’m only getting the predictions for 04-10. As in I’m not getting the predictor for 04-12, it is also currently past 4pm so I’m assuming it is because the tomorrow price is = shift(-1). How is this a predictor then if it only gives me the prediction for day of and not day after?
Ill take a notes: the model without hyperparameter tuning. if hyperparamter tuning is done, when backtesting we no longer need to look for the best parameters. In contrast to cross-validation which requires more tuning
I'm quite confused about the plot at 16:50 . Can someone help explain it? Thanks
What a deep voice
Super 👏👏💪