Thank you jakob! Are there any advances that you see in the state of the art since this presentation? More than Four years of experience, pretty cool!! Good for you
Absolutely, 4 years on and the state of the art models have evolved past LSTMs to attention base models, and recently compressive transformation models. That's not to say LSTMs aren't still useful ;)
At 16:39, I am unclear on what is meant by "Shift sequence window to remove 0th element and push predicted value as nth element." Is this just saying that the window is slid forward 1 point and now includes the predicted value? Which I understand the concept of, but... I'm picturing, and the math works out such that, the 100 windows of size 50 each would go "perfectly" into the 5001 datapoints with no overlap. Pushing each size 50 window only 1 datapoint at a time would introduce overlap (intentionally so, I would believe) and result in far more than 100 windows. What am I missing?
Question is, if 50 steps feed into 1 sample, and the result of the sample which is y predicted - y actual for observation 51, coming back to the noisy indian's first question, what is feed into the input neuron? Y - Y predict? If so, how are we then accounting for feature2 = X2 (eg opening price), feature3 = X3 (eg volume) etc? Really appreciate any answer here, this is still confusing to me. 2nd question, I get it that a batch is defined by the number of set's of say 50 observations (AKA samples), but how are these chained to each other through each epoch run. My understanding is that each batch is 1 complete neuro-net, so how are they linked up?
Very disappointed with this code when I did understand that y_train is composed by a sequence of 30 days returns. Once you introduce a system to de-normalizate both y_test and y_predictions you find out, unfortunately, that like other codes, where data are not 'normalized' but scaled with some scaler, the predictions are shifted to the right as usual. Clearly you mention the use of the code for volatility predictions but as the prediction power is lagged ...
The training data is shuffled ( np.random.shuffle(train) ), would that affect the prediction since the order in which the values appear is vital for finding patterns?
Before he splitted the data into training sequences, then he shuffled that sequences. He's using LSTMs in a way such that at the end of each input sequence the internal state of the network is reset, so shuffling the sequences would not affect the result of the predictions. Shuffling the input data anyway is always advisable since it helps the learning algorithm to converge faster if using backprop with a small batch size.
As a little supplement for Simone's comment: That would be indeed a problem if LSTM would be used not on fixed-length sequences, but on arbitrarily long ones. In this scenario (arbitrary length) you would feed measurements/samples at each timestep one by one, and you would have to set `stateful` parameter in `layers.LSTM` as True. With that config LSTM would keep internal state until you explicitly call `model.reset_states()`, which is advised to do after each epoch (for this specific scenario). Shuffling data before feeding this model indeed would lead to lost of all patterns in data.
Hi Jakob, Rather trying to predict the Sp500 close prices can it instead predict, whether the next close is going to be up or down ? This would limit output to either 1 or o , also it would be easier to evaluate whether the network is learning anything.
Hi! Thank you so very much for the video! I have a question: in order to compute my prediction errors I need to de-normalize the values. I understand the formula to the de-normalization -as you mentioned in your website- but I don't know how the window size for the testing data works. I mean should I run the exact same code for the normalization with different formula on the prediction?
Thanks for uploading this, Jakob. Really clear explanation on LSTMs. I'm interested in adapting this code to accept multiple input dimensions from a CSV but am struggling with importing and normalising the vector. Do you have any advice on how to do this?
Thanks Harry, I try and elaborate on the import and normalising a bit more in my blog article: www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction And if you have any more issues with the import bit just Google "importing csv into numpy" and you'll get lots of examples.
Hii How to handle persistent model problem. While doing time series analysis i get the output which seems to be one time step ahead of the actual series. How to rectify this problem?? This thing i am getting with several ML, DL, and as well as with statistical algos. Please do reply??
Hi Jakob.. nice explanation.. I am a bit confused on prediction part.. Can you please tell me something more.. The LSTM model predicts on X_test which is passed in sp500.csv file.. and that suggests that the model which is being validated against the dataset is priorly known.. Now how to extend the predictions in the situation where we don't have data.. that is in future and we don't have any dataset.. Or my understanding to the dataset is wrong. ??
Got it! He has mentioned it in the blog linked: www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction If however we want to do real magic and predict many time steps ahead we only use the first window from the testing data as an initiation window. At each time step we then pop the oldest entry out of the rear of the window and append the prediction for the next time step to the front of the window, in essence shifting the window along so it slowly builds itself with predictions, until the window is full of only predicted values (in our case, as our window is of size 50 this would occur after 50 time steps). We then keep this up indefinitely, predicting the next time step on the predictions of the previous future time steps, to hopefully see an emerging trend.
Jakob, If I have one data point for every day of the year, do i need to have the window size of at-least 365 to capture the yearly seasonality? I think so, as you are shuffling all the windows too in the training set.
Hi Jacob, thanks a lot such a framework. But I am bit confuse. How to denormalize predicted value? for example: X y [1,2,3] [4] [2,3,4] [5] [3,4,5] [x] if above example is normalized by the given formula, how can we denormalize when we predict 4,5 and x?
That's incorrect, the NN doesn't revert to the series mean values at all and is mapping higher-level non-linear relationships. However, there is an issue especially prevalent with a time series like stock prices which this particular NN does not deal with and that is the issue of time series non-stationarity. There is work being done to tackle that via bayesian nonparametrics within LSTM NNs, however that work is far outside the scope of this video/talk.
That was a pretty clear presentation, and the presenter did not adopt the know-all attitude. Superb!
That audience talked a bit too much.
Chao Pan questions are cool, anecdotal questabrags suck.
no such thing as too much , talk anyx and anyx can b perfx
Well done on LSTM explanation -- very thorough.
Useful, much more practical than Siraj Ravel's
I think Siraj video lectures is not for beginners if you have advanced knowledge of AI then it is good for you
Nice work and presentation!
I was very entertained. Thank you very much for sharing.
Thanks a lot for the code walkthrough, very helpful!
Thank you jakob! Are there any advances that you see in the state of the art since this presentation? More than Four years of experience, pretty cool!! Good for you
Absolutely, 4 years on and the state of the art models have evolved past LSTMs to attention base models, and recently compressive transformation models. That's not to say LSTMs aren't still useful ;)
Cool. I have learned a lot from your code.thx.
Great video!
At 16:39, I am unclear on what is meant by "Shift sequence window to remove 0th element and push predicted value as nth element."
Is this just saying that the window is slid forward 1 point and now includes the predicted value? Which I understand the concept of, but...
I'm picturing, and the math works out such that, the 100 windows of size 50 each would go "perfectly" into the 5001 datapoints with no overlap. Pushing each size 50 window only 1 datapoint at a time would introduce overlap (intentionally so, I would believe) and result in far more than 100 windows. What am I missing?
Question is, if 50 steps feed into 1 sample, and the result of the sample which is y predicted - y actual for observation 51, coming back to the noisy indian's first question, what is feed into the input neuron? Y - Y predict? If so, how are we then accounting for feature2 = X2 (eg opening price), feature3 = X3 (eg volume) etc? Really appreciate any answer here, this is still confusing to me. 2nd question, I get it that a batch is defined by the number of set's of say 50 observations (AKA samples), but how are these chained to each other through each epoch run. My understanding is that each batch is 1 complete neuro-net, so how are they linked up?
skydiving? you are awesome!
Very disappointed with this code when I did understand that y_train is composed by a sequence of 30 days returns. Once you introduce a system to de-normalizate both y_test and y_predictions you find out, unfortunately, that like other codes, where data are not 'normalized' but scaled with some scaler, the predictions are shifted to the right as usual. Clearly you mention the use of the code for volatility predictions but as the prediction power is lagged ...
The training data is shuffled ( np.random.shuffle(train) ), would that affect the prediction since the order in which the values appear is vital for finding patterns?
Before he splitted the data into training sequences, then he shuffled that sequences. He's using LSTMs in a way such that at the end of each input sequence the internal state of the network is reset, so shuffling the sequences would not affect the result of the predictions. Shuffling the input data anyway is always advisable since it helps the learning algorithm to converge faster if using backprop with a small batch size.
As a little supplement for Simone's comment: That would be indeed a problem if LSTM would be used not on fixed-length sequences, but on arbitrarily long ones. In this scenario (arbitrary length) you would feed measurements/samples at each timestep one by one, and you would have to set `stateful` parameter in `layers.LSTM` as True. With that config LSTM would keep internal state until you explicitly call `model.reset_states()`, which is advised to do after each epoch (for this specific scenario). Shuffling data before feeding this model indeed would lead to lost of all patterns in data.
This is great.
great one!
cool content!
Turn off the Overhead Lights for goodness sakes
Great presentation.....unfortunate that participants couldnt just shut up and take it all in.
Hi Jakob, Rather trying to predict the Sp500 close prices can it instead predict, whether the next close is going to be up or down ? This would limit output to either 1 or o , also it would be easier to evaluate whether the network is learning anything.
Hi! Thank you so very much for the video! I have a question: in order to compute my prediction errors I need to de-normalize the values. I understand the formula to the de-normalization -as you mentioned in your website- but I don't know how the window size for the testing data works. I mean should I run the exact same code for the normalization with different formula on the prediction?
Thanks for uploading this, Jakob. Really clear explanation on LSTMs. I'm interested in adapting this code to accept multiple input dimensions from a CSV but am struggling with importing and normalising the vector. Do you have any advice on how to do this?
Thanks Harry, I try and elaborate on the import and normalising a bit more in my blog article: www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction
And if you have any more issues with the import bit just Google "importing csv into numpy" and you'll get lots of examples.
Perfect, thanks again!
Why don't you try using pandas for importing csv files. Its really simple. Just check out google for that.
Hii
How to handle persistent model problem. While doing time series analysis i get the output which seems to be one time step ahead of the actual series. How to rectify this problem?? This thing i am getting with several ML, DL, and as well as with statistical algos. Please do reply??
Hi Jakob.. nice explanation.. I am a bit confused on prediction part.. Can you please tell me something more.. The LSTM model predicts on X_test which is passed in sp500.csv file.. and that suggests that the model which is being validated against the dataset is priorly known.. Now how to extend the predictions in the situation where we don't have data.. that is in future and we don't have any dataset..
Or my understanding to the dataset is wrong. ??
Got it! He has mentioned it in the blog linked: www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction
If however we want to do real magic and predict many time steps ahead we only use the first window from the testing data as an initiation window. At each time step we then pop the oldest entry out of the rear of the window and append the prediction for the next time step to the front of the window, in essence shifting the window along so it slowly builds itself with predictions, until the window is full of only predicted values (in our case, as our window is of size 50 this would occur after 50 time steps). We then keep this up indefinitely, predicting the next time step on the predictions of the previous future time steps, to hopefully see an emerging trend.
Jakob, If I have one data point for every day of the year, do i need to have the window size of at-least 365 to capture the yearly seasonality?
I think so, as you are shuffling all the windows too in the training set.
How yo can use the predictions to calculated expected returns
there is only thing remind me this video , that is necklace
why no CC ?
Hi Jacob, thanks a lot such a framework.
But I am bit confuse.
How to denormalize predicted value?
for example:
X y
[1,2,3] [4]
[2,3,4] [5]
[3,4,5] [x]
if above example is normalized by the given formula, how can we denormalize when we predict 4,5 and x?
It’s hard to figure out exactly what the input data matrix is. The link to CSV file no longer works. Thanks
You can find it in the data folder on the github, worked for me :D
Does RNN/LSTM consider seasonality?
It seems to me that thus NN predicts the return to the mean value or stationary state of the series. So predictive power seems to ber very doubtful.
That's incorrect, the NN doesn't revert to the series mean values at all and is mapping higher-level non-linear relationships. However, there is an issue especially prevalent with a time series like stock prices which this particular NN does not deal with and that is the issue of time series non-stationarity. There is work being done to tackle that via bayesian nonparametrics within LSTM NNs, however that work is far outside the scope of this video/talk.