You pinpointed exactly what I was wondering about. As a person who worked in the financial market for more than a decade and just learned ML. Min-Max scaling was a big question mark for me. First you never know what is the max of a certain price especially in a market (like gold) is always doing higher highs. Also, minimum price is not actually known throughout the data set, so unless you are a bank who has 80-150 years of recorded data for minimum, your data set size will never reflect the true lows or true highs. this leads that most ML models in YT tutorials just plainly panic and fails when the price is doing new historical highs or lows (according to the data set it was trained on, not actual historical highs or lows). scaling and standardization is crucial no doubts, but MinMax technique is fundamentally wrong and reflects absolutely an ignorance about the market dynamics and the core principles of training a ML model.
just normalize the data, also to his point if anyone has an actual edge why post it online. If he has an edge and makes millions why youtube and why courses eh ?
On top of your points, some tutorials even fit min-max scaler to both train and test . This is sooooo wrong that using future data to fit the scaler. Most of the tutorials are really just trash.
it's important for models who based in estimating weights using gradient because you will most likely get low weight for features that have high range and high for the opposite
sequence length of 1 is fine if the lstm is stateful (hidden state from prev period is used as input aswell). if the lstm is statless, you need to pass the whole sequence (and zero hidden state as input). so it basically depends what kind of lstm you are training. (stateful or stateless) but lstms are still useless for price prediction, because they tend to output the last price of the input sequence. thats what i learned when playing around with lstms for stock price prediction.
"Stateful" is just another way of saying that you are passing in a sequence... the hidden state h(t) is derived from the inputs x(1), ..., x(t). If you are holding the state from past samples then there's something seriously wrong.
@@LazyProgrammerOfficial did you had a chance to evaluate the performance of transformes (multiheadattention) on stock price prediction? i am thinking on giving it another try :)
Thank you very much. I recommend not saying that when scaling the ideas is to have values be "small". People who might take you literally will think you mean very small values (ex. 1.2x10^-20). I would also introduce stationarity at your timestamp for "Stock returns instead of ..." since this is a step towards that.
This video is so underrated. I happen to see the same errors. I would also advise that instead of using a min-max scaler, to use a power transform and standardize.
Thanks for this video. I finally took action and bought the course on Udemy. I am broke so I usually find a way to get stuff for free so this was a big step for me. I have been trading for more than 2 years now and wanted to apply ML in ways different than what I have seen online. So, thank you for making this course!
LOVE this video! Cannot help laughing when watching the virus part, but it is so true! I am really glad that I didn’t use min max scaler in my time series tutorials. Thank you for your contribution to the machine learning community, sir!
@@shahinhashemi5799 No he's pretty objective, and the promotion part isn't every other minute, it's a dedicated part of the video. Just like any UA-cam video. Exaggeration is for those who think louder means smarter.
I cannot find you other videos about other mistakes. I agree about using the return value instead of price as input. However this will result in input range between -1,1. What activation function would you use then ?
use a moving average to continually plot a time series graph. theres indicators, theres greeks to measure total risk. returns aint normal, they are lognormal, so you'll obv have skew. Point of the video is form sequences of multiple models that create your strategy, then feed it in so it evolves over time (ML is basically used for parameter optimization, so you can get the best timeframe for a strategy, or use a timeframe to figure out the best moving avg window, rsi levels bla bla, so sequences are supposed to be multiple dimensional, not a scalar)
../and all of these are already derived from the underlying returns and standard deviations, so they are normalized to fit the mean/variance of the underlying position/portfolio. you can just add them to your expected profit at face value, so theres your expected returns.
Integer Differencing is excessive and may significantly erode memory content. There exists some degree of tempered fractional differencing that has minimun information destruction with "good enough" stationary
Ha, I think you're trying to sound smart, but first differencing is standard in time series, whilst "tempered fractional differencing" shows not even 1 page of search results.
@AbhishekML Overstationarizing is one of the single greatest deteriments to predictability via reduction in a time series memory content. Dr Marcos de Prado highlights the tradeoff by building models on the weakest degree of fractional differencing that rejects the null of nonstationarity (ADF statistc). The differencing-memory tradeoff is not a universality ( it doesn't hold for all processes)
@AbhishekML Financial time series (especially fragile assets) exhibit semi-long range dependency in the cmeans but especially in cvol even when one accounts for spurious fd via structural breaks. Integer Differencing destroys an excessive amount of predictability to ensure stationarity
Thank you very much for this video, I was starting to think I was worng until I saw this video. There are tons of mistakes out there, specifically on this topic.
I am a little confused. If I am about to standardize the data then it is i.i.d data no longer sequential. In this case this case does it make sense to use LSTM at all?
I have a question: is it possible to use that idea to find patterns in hours instead of days? I mean, there are some observable patterns, like: "some stocks gain or lose right before they close and begin the day up (and lose) or low and increase over the day". Is it possible?
Thats just stock price very volatile, this is cause outliers, deep learning is very sensitive about outliers, If you use price as target, the model can catch the pattern, because how volatile the price
is not importent to keep it in same scale. but i made not a prediction of the next N steps of price. i made only buy sell or wait in CFD forex :) then when you understand you can normalize the input data i have 25200 ticks as input data AUDUSD. but the normalization i m not use a formel from statistic or internet i have my own formel to calculate normalize input data :) yes you have right. not copie a code. understand how it work and write it self. and test it. and test it.. and test it.. and when your later version are better you can made more version^^ when not start at begin and learn to understand how Q-learning work^^ greeze from switzerland and yes my englisch is bullshit xD i made it since 1 year as a hobby.. first version was on 23% off all trades are win trades and atm 32% of all trades are win trades and when i had 34% the ai made win with 50pips TP 20pips SL :P
and NEVER EVER NEVER sue the SAME input DATA for 2 times!! you not want that your KI only can trade only YOUR INPUT data look YT videos 99.9999% only train with the same INPUT DATA so long that the KI the input data KNOWS xD thats bulshit i use test data from AUDUSD different times USDCAD EURUSD AUDCAD etc etc etc 4 years data and more... in 1 train step to see is this version a version with potenzial or crap but you NEVER know how LONG U must TRAIN to KNOW that it work YOU NEVER KNOW :P
@@jdaniele $9.99 course: predict stock prices with LSTMs! $50 course: pointing out all the mistakes in the $9.99 course. I rather pay more at least the instructor is honest ;)
@@datascienceprofessor Yes, maybe. So we need a $150 course pointing out "all" the mistakes of the $50 course. 😋 And we'll need a $500 course pointing out "all" the mistakes of the $150 course.... OMG😮.. 😋😋 If you go through the process, it's a asymptotic curve. Then, if we can afford it, the best is to buy a $10.000 course. hahahah😂 Will it cover all the errors? Who knows....🤔 So a $50 course could still have many errors, right? If, for example, a $9.99 course offers 85% of right information and a $50 course reaches the 92%, that 7% more (actually 8.2% more if compared to 85%), costs me 500% ($50/$10), a bit too much. So I should pay +400% to have just +8.2% more. Will it worth? Anyway, most of the discounted ($9.99 for just few days at year) courses on Udemy, are usually sold between $30 and $200. Following your reasoning, a $200 course should be better than a $50, right? So I think if we buy a $100-$200 course discounted to $9.99, it has the best value for that money, even if it is not perfect yet, for sure better than a FIXED $50 priced course! Fixed priced courses just pissed me off... sorry! 😅
@@jdaniele You're just reaching and making up fictitious examples. Lazy is well known for having actually studied this type of material and applying it day to day. The others are obviously just marketers trying to capitalize on trends like ML and crypto. If you can't tell the difference, then you're probably not the target audience for this kind of course.
Sir i need a project of stock price prediction lstm model (back propagation algorithm) and maa website or web app or using streamlit i will pay you reply to this comment
In this video you comment that using prices is wrong but using returns is correct. Does using logs of prices have the same problem? (I ask because logs of prices are commonly used in finance because they have the property that adding logs gives the return over a period of time.) Logs of prices have no min or max, so I imagine they are similarly wrong.
after do LSTM about stock price, i agree LSTM don't work with stock prediction, because the direction of the stock totally messed out and also even have the tecnical indicator or candlesticks candle pattern does not help much too
I really like your video. I can not agree more with all of those Video / Code example they share on youtube like it really works xD. Thank you for this video tho ! I am currently creating a real AI Trading bot using Deep RNN and I wanted to use LSTM Cells and maybe GRU Cells as well but I ended up not having good results during my training process. Hopefully your video will help me understand a little bit more why I am not able to have a better recall. (yes i am doing a classification prediction)
awesome...i saw so many examples, with this mistake, but always, i felt what they are doing has some flaw. But, was unable to reason it myself. Thanks for the clarification.
Thank you so much for explaining these concepts properly, it can be seen that you have a lot of experience in this subject. I started learning machine learning techniques for analyzing economical data but I could not figure out the best method in order to forecast stock prices.
It took me a while to grasp, but thank you a lot. Mistake number 5 should be all over the internet! Everybody, if you are using a training window does not mean you are using a sequence, it is about the sequence of training windows!!!!!
I'm working on imports and exports data. I'm using Time Series Generator-LSTM . my training data prediction has r2 error = 0.99 while the testing prediction has -0.39. what parameters you suggest for better results on testing predictions?
tldr just use min/max on indicators on bounded quantities like the outputs of some indicators not on the price itself and dont use it on price action because you "Cap" it at the maximum value that Could easily be growing still
You pinpointed exactly what I was wondering about. As a person who worked in the financial market for more than a decade and just learned ML. Min-Max scaling was a big question mark for me. First you never know what is the max of a certain price especially in a market (like gold) is always doing higher highs. Also, minimum price is not actually known throughout the data set, so unless you are a bank who has 80-150 years of recorded data for minimum, your data set size will never reflect the true lows or true highs. this leads that most ML models in YT tutorials just plainly panic and fails when the price is doing new historical highs or lows (according to the data set it was trained on, not actual historical highs or lows). scaling and standardization is crucial no doubts, but MinMax technique is fundamentally wrong and reflects absolutely an ignorance about the market dynamics and the core principles of training a ML model.
just normalize the data, also to his point if anyone has an actual edge why post it online. If he has an edge and makes millions why youtube and why courses eh ?
On top of your points, some tutorials even fit min-max scaler to both train and test . This is sooooo wrong that using future data to fit the scaler. Most of the tutorials are really just trash.
This video is a gem. I saw a lot of blogs and tutorials repeating the mistakes you had mentioned.
could you please add a link to where I could find those topic, much appreciated. Thanks
I believe it's important to keep things in the same scale because the algorithms apply the same learning rate to all feature dimensions.
it's important for models who based in estimating weights using gradient because you will most likely get low weight for features that have high range and high for the opposite
sequence length of 1 is fine if the lstm is stateful (hidden state from prev period is used as input aswell). if the lstm is statless, you need to pass the whole sequence (and zero hidden state as input). so it basically depends what kind of lstm you are training. (stateful or stateless) but lstms are still useless for price prediction, because they tend to output the last price of the input sequence. thats what i learned when playing around with lstms for stock price prediction.
pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html (shows both approaches aswell)
"Stateful" is just another way of saying that you are passing in a sequence... the hidden state h(t) is derived from the inputs x(1), ..., x(t).
If you are holding the state from past samples then there's something seriously wrong.
@@LazyProgrammerOfficial did you had a chance to evaluate the performance of transformes (multiheadattention) on stock price prediction? i am thinking on giving it another try :)
You’re right but we can use price and minmaxscaling locally to find patterns I usually apply it locally when sampling data and not on the whole data
ya exactly, there is a local min max if you define a timeframe
Thank you very much. I recommend not saying that when scaling the ideas is to have values be "small". People who might take you literally will think you mean very small values (ex. 1.2x10^-20). I would also introduce stationarity at your timestamp for "Stock returns instead of ..." since this is a step towards that.
This video is so underrated. I happen to see the same errors. I would also advise that instead of using a min-max scaler, to use a power transform and standardize.
Thanks for this video. I finally took action and bought the course on Udemy. I am broke so I usually find a way to get stuff for free so this was a big step for me. I have been trading for more than 2 years now and wanted to apply ML in ways different than what I have seen online. So, thank you for making this course!
How is trading going with deep learning?
@@YaShaheed it's a grift, unfortunately.
LOVE this video! Cannot help laughing when watching the virus part, but it is so true! I am really glad that I didn’t use min max scaler in my time series tutorials. Thank you for your contribution to the machine learning community, sir!
This is just a promotion video, if you think carefully.
Interesting, please do share
@@LazyProgrammerOfficial I classified it as promotion video, as there are various contradicting information.
@@alaincheong7275 I welcome you to elaborate
Literally true, shitting on everyone else's code with a very pretentious tone, plugging a course every other minute
@@shahinhashemi5799 No he's pretty objective, and the promotion part isn't every other minute, it's a dedicated part of the video. Just like any UA-cam video. Exaggeration is for those who think louder means smarter.
I cannot find you other videos about other mistakes.
I agree about using the return value instead of price as input. However this will result in input range between -1,1. What activation function would you use then ?
use a moving average to continually plot a time series graph. theres indicators, theres greeks to measure total risk. returns aint normal, they are lognormal, so you'll obv have skew. Point of the video is form sequences of multiple models that create your strategy, then feed it in so it evolves over time (ML is basically used for parameter optimization, so you can get the best timeframe for a strategy, or use a timeframe to figure out the best moving avg window, rsi levels bla bla, so sequences are supposed to be multiple dimensional, not a scalar)
../and all of these are already derived from the underlying returns and standard deviations, so they are normalized to fit the mean/variance of the underlying position/portfolio. you can just add them to your expected profit at face value, so theres your expected returns.
Check my website for a link to all videos. Using returns would not limit the range. The range of returns is unlimited.
@@LazyProgrammerOfficial Why the range of returns is unlimited? Doesn't it have also maximum value in your train data ?
@@doragababa3433 Any fixed set of data would have a min and a max, that's not what is meant by "unlimited". Unlimited refers to the allowable values.
Integer Differencing is excessive and may significantly erode memory content. There exists some degree of tempered fractional differencing that has minimun information destruction with "good enough" stationary
Ha, I think you're trying to sound smart, but first differencing is standard in time series, whilst "tempered fractional differencing" shows not even 1 page of search results.
@AbhishekML Overstationarizing is one of the single greatest deteriments to predictability via reduction in a time series memory content. Dr Marcos de Prado highlights the tradeoff by building models on the weakest degree of fractional differencing that rejects the null of nonstationarity (ADF statistc). The differencing-memory tradeoff is not a universality ( it doesn't hold for all processes)
@AbhishekML Financial time series (especially fragile assets) exhibit semi-long range dependency in the cmeans but especially in cvol even when one accounts for spurious fd via structural breaks. Integer Differencing destroys an excessive amount of predictability to ensure stationarity
Very insightful, very true the bit about using 1 sequence not multiple.
Are these courses applied machine learning or advanced machine learning in depth of its working mechanism to object layer?
Thank you very much for this video, I was starting to think I was worng until I saw this video. There are tons of mistakes out there, specifically on this topic.
Are you planning to explain more about the other mistakes?
Yes
Thank you, I was going to make this as a final project in a course :V thank you so much, I'll definitively go to the course
I am a little confused. If I am about to standardize the data then it is i.i.d data no longer sequential. In this case this case does it make sense to use LSTM at all?
You can standardize data that is not IID. They are not related.
Where are the other videos? are they coming?
Hey, thanks for the great content,
i am using R, do you think it's as good as python for this kind of analysis ?
Stationarity with heteroskedasticity....LN rets is fine. How can you normalise with a window that extends beyond the lookback being used??? Lol
Ooof please post the other videos soon
I have a question: is it possible to use that idea to find patterns in hours instead of days? I mean, there are some observable patterns, like: "some stocks gain or lose right before they close and begin the day up (and lose) or low and increase over the day". Is it possible?
For Min Max scaling why not just use Zero Mean Normalization instead
why arent priests talking about this?
Nice video. Keep up this series :)
one of the best mentor I had ever seen !!!!!! RESPECT from INDIA
Really good breakdown, nice one!
Why is this episode 16? Where is episode 15? Is this part of your paid for course?
These are not part of a course, these are part of UA-cam. You can click on my UA-cam channel to see all the videos I've uploaded as usual on UA-cam.
Why should we not predict the price?
Why say it's useless?
Thats just stock price very volatile, this is cause outliers, deep learning is very sensitive about outliers, If you use price as target, the model can catch the pattern, because how volatile the price
**can't**
is not importent to keep it in same scale. but i made not a prediction of the next N steps of price. i made only buy sell or wait in CFD forex :)
then when you understand you can normalize the input data i have 25200 ticks as input data AUDUSD. but the normalization i m not use a formel from statistic or internet i have my own formel to calculate normalize input data :)
yes you have right. not copie a code.
understand how it work and write it self. and test it. and test it.. and test it.. and when your later version are better you can made more version^^ when not start at begin and learn to understand how Q-learning work^^
greeze from switzerland and yes my englisch is bullshit xD
i made it since 1 year as a hobby.. first version was on 23% off all trades are win trades and atm 32% of all trades are win trades and when i had 34% the ai made win with 50pips TP 20pips SL :P
and NEVER EVER NEVER sue the SAME input DATA for 2 times!! you not want that your KI only can trade only YOUR INPUT data look YT videos 99.9999% only train with the same INPUT DATA so long that the KI the input data KNOWS xD thats bulshit
i use test data from AUDUSD different times USDCAD EURUSD AUDCAD etc etc etc
4 years data and more... in 1 train step to see is this version a version with potenzial or crap but you NEVER know how LONG U must TRAIN to KNOW that it work YOU NEVER KNOW :P
@@51nibblerWtf are you saying you Indian.
6:05 lmao
50 euros? I will wait for a 9.99 offer, thanks.... :)
not every course goes to $9.99 lols
@@spinLOL533 that's true, as much as, not every course will sell... :)
@@jdaniele
$9.99 course: predict stock prices with LSTMs!
$50 course: pointing out all the mistakes in the $9.99 course.
I rather pay more at least the instructor is honest ;)
@@datascienceprofessor
Yes, maybe. So we need a $150 course pointing out "all" the mistakes of the $50 course. 😋
And we'll need a $500 course pointing out "all" the mistakes of the $150 course.... OMG😮.. 😋😋
If you go through the process, it's a asymptotic curve.
Then, if we can afford it, the best is to buy a $10.000 course. hahahah😂
Will it cover all the errors? Who knows....🤔
So a $50 course could still have many errors, right?
If, for example, a $9.99 course offers 85% of right information and a $50 course reaches the 92%, that 7% more (actually 8.2% more if compared to 85%), costs me 500% ($50/$10), a bit too much.
So I should pay +400% to have just +8.2% more. Will it worth?
Anyway, most of the discounted ($9.99 for just few days at year) courses on Udemy, are usually sold between $30 and $200.
Following your reasoning, a $200 course should be better than a $50, right?
So I think if we buy a $100-$200 course discounted to $9.99, it has the best value for that money, even if it is not perfect yet, for sure better than a FIXED $50 priced course!
Fixed priced courses just pissed me off... sorry! 😅
@@jdaniele You're just reaching and making up fictitious examples. Lazy is well known for having actually studied this type of material and applying it day to day. The others are obviously just marketers trying to capitalize on trends like ML and crypto. If you can't tell the difference, then you're probably not the target audience for this kind of course.
lstm is old. use transformers
I am in the sector, The only useful video about stock price prediction !
hmu when he makes the course free, i dont have money to buy it
I'm creating my own library with GPT now, I don't have to rely looking for scrapes of others coders.
Unbeknownst to you, GPTs are trained using Github code and therefore make the exact same mistake. I covered examples in one of my courses.
I use it to implement the initial structure to same time, but I'll know what you mean.@@LazyProgrammerOfficial
heh heh heh
Sir i need a project of stock price prediction lstm model (back propagation algorithm) and maa website or web app or using streamlit i will pay you reply to this comment
Meme is speaking about LSTM without understanding the core idea of LSTM 😂
If you think there's a mistake here, can you state it?
@@LazyProgrammerOfficial LSTM works with sequences.
@@andrey5197 The video doesn't state that LSTM doesn't work with sequences. Did you not understand the video?
@@LazyProgrammerOfficial2:40
@@andrey5197 Not sure what you're implying - you didn't understand what was being said at 2:40?
In this video you comment that using prices is wrong but using returns is correct. Does using logs of prices have the same problem? (I ask because logs of prices are commonly used in finance because they have the property that adding logs gives the return over a period of time.) Logs of prices have no min or max, so I imagine they are similarly wrong.
Yes, the same logic applies to log prices
after do LSTM about stock price, i agree LSTM don't work with stock prediction, because the direction of the stock totally messed out and also even have the tecnical indicator or candlesticks candle pattern does not help much too
I really like your video. I can not agree more with all of those Video / Code example they share on youtube like it really works xD. Thank you for this video tho ! I am currently creating a real AI Trading bot using Deep RNN and I wanted to use LSTM Cells and maybe GRU Cells as well but I ended up not having good results during my training process. Hopefully your video will help me understand a little bit more why I am not able to have a better recall. (yes i am doing a classification prediction)
any luck with your project ?
Hi Arii, please reach out to me if you're still working on this project. Would love to talk with you.
Holy shit I just standardized the data on one of my LSTM models and I instantly got over 10x less loss
awesome...i saw so many examples, with this mistake, but always, i felt what they are doing has some flaw. But, was unable to reason it myself. Thanks for the clarification.
What happen if we use rolling returns instead of just returns?
Discount?
Can be found at my website!
6:25 "Some people are using a sequence length of 1... Nor is it funny or entertaining"
Thank you so much for explaining these concepts properly, it can be seen that you have a lot of experience in this subject. I started learning machine learning techniques for analyzing economical data but I could not figure out the best method in order to forecast stock prices.
I'd discover a better way of normalize the data for stock prediction.
It took me a while to grasp, but thank you a lot. Mistake number 5 should be all over the internet! Everybody, if you are using a training window does not mean you are using a sequence, it is about the sequence of training windows!!!!!
Is it same with lag of prices as inputs?
This video is about lagged prices as inputs!
I'm working on imports and exports data. I'm using Time Series Generator-LSTM . my training data prediction has r2 error = 0.99 while the testing prediction has -0.39. what parameters you suggest for better results on testing predictions?
Ask chatgpt
did you find any solution ? facing the same issue
Hi. How to get VIP materials please
I just discovered your channel and I’m interested in the VIP course.
Welcome! You can find links to the VIP versions of my courses via my website, lazyprogrammer.me
@@LazyProgrammerOfficial thank you
tldr just use min/max on indicators on bounded quantities like the outputs of some indicators not on the price itself and dont use it on price action because you "Cap" it at the maximum value that Could easily be growing still