This video is a gold mine for multivariate time series data. After searching for hours online, you were the ONLY person that was capable of explaining everything in a simple way. Thank you!
8:32 - iterating over rows in pandas is usually much slower than doing a column-wise operation. Instead of this: df["close_change"] = df.progress_apply( lambda row: 0 if np.isnan(row.prev_close) else row.close - row.prev_close, axis = 'columns' ) Try this: df["close_change"] = df['close'] - df['prev_close'] df["close_change"].fillna(0, inplace=True)
Great tutorial! Thanks! One comment in the preprocessing step. Iterating over each row to create a dictionary and appending those dictionaries to a list is much much slower than copying the dataframe and creating the columns you need like so: features_df = df.copy() features_df['day_of_week'] = features_df['date'].dt.dayofweek features_df['day_of_month'] = features_df['date'].dt.day features_df['week_of_year'] = features_df['date'].dt.week features_df['month'] = features_df['date'].dt.month
Great suggestion Greg, and agree it felt faster. Interestingly got a deprecation warning on .week, so went with features_df['date'].dt.isocalendar().week
The scaler part is huge weakness in the model; by using a minmax scaler you are assuming that the historical ATH (all time high) price will never be reached which is a fundamental mistake as (asset) prices are continuous. Therefore, the model will not likely be able to predict a resistance.
Thanks for the video Venelin. It is really good for learning the coding side of things. To those who wants to do real life projects, I suggest not to apply the same features with same way of scaling. I might be wrong but I don't think it is a good idea to scale days of week ( 0-6 range ) or months etc.. with MinMaxscale( -1, 1) . They are not numerical features like the price or volume. they are categorical data if I am not wrong and scaling them the way they are done will confuse the algorithms.
Great Video Venelin! I guess there is a small mistake at 15:00. It should be [ : train] and [train : ], isn't it? In your case: e.g. x = [1,2,3,4,5] and train_size = 2 - > x1, x2 = x[ : 2] , x[3 : ] -> x1=[1,2] and x2 = [4,5] -> element '3' is missing. I have one question about the create_sequences function (23:45): If the reason behind creating (sequence, label) pairs is teaching our model by showing "if you see such a sequence like ..., its label is ..." , shouldn't "label_position" be equal to (i + sequence_length -1) ?
I can answer your second question. in fact we are creating a sequence to predict the label of next row. Ex: you are getting 60 rows to predict the closing price of Bitcoin on 61st row. That is not explained on video but this should be the reason.
@@mehmetnaml5073 This was really helpful Mehmet; I was getting confused by the video at that point but your explanation makes it much clearer, thanks. It is a bit more obvious how the function works if you add a couple of extra values to the sample_data dataframe, e.g. sample_data = pd.DataFrame(dict(feature=[1, 2, 3, 4, 5, 6, 7], label=[6, 7, 8, 9, 10, 11, 12])). That way you can see it's more of a sliding window function, not a simple "split in the middle" as I originally thought it might be
Thank you very much for this vidéo I have a qst ; please, how to prepare our data, in the case of a multivariate analysis but with redundant dates, for example if the variable Symbol have different values(BTC, ETH, LTC......) ? (so we don't have a unique key )
Hi great video, but i have few questions regarding the notebook: what is difference between google co-op and Jupyter notebook, which one is better to use and if it is possible to have auto fillers in Jupyter notebook.
Hey, Google Colab is a Jupyter-like environment which gives you free compute (CPU & GPU). It is open source: github.com/googlecolab/colabtools And you can read more about it: research.google.com/colaboratory/faq.html Don't know what auto fillers are. IMO, Jupyter lab is better. Thanks for watching!
I was inspired to start using colab jupyter notebooks through this channel but challenge you will find is how to save and access project files on your Google drive. It's not straight forward like on your local machine. Hopefully may be one day Venelin will create a video for that task
@@paulntalo1425 yes, Collab needs some form of permanent storage (like Google Drive) for your files. What problems/questions do you have regarding that?
I'm importing a dataframe from a csv file, but cannot access it's columns by name. What's going on? df.head() returns all the columns in the dataframe but df.columns returns only Index
This video is a gold mine for multivariate time series data. After searching for hours online, you were the ONLY person that was capable of explaining everything in a simple way.
Thank you!
This is by far one of the best videos I have seen about data preprocessing for Time Series Data. Keep up the good work please!
8:32 - iterating over rows in pandas is usually much slower than doing a column-wise operation.
Instead of this:
df["close_change"] = df.progress_apply(
lambda row: 0 if np.isnan(row.prev_close) else row.close - row.prev_close,
axis = 'columns'
)
Try this:
df["close_change"] = df['close'] - df['prev_close']
df["close_change"].fillna(0, inplace=True)
Maybe Venelin was trying to show how to use the progress_apply. But the way you calculated the close_change is the best way.
If you want only the change of the value, you can use also diff()
Love the sweet song you shared in your notebook. Been vibin to Common's music while going through the code. Great stuff thanks for sharing!
Great tutorial! Thanks! One comment in the preprocessing step. Iterating over each row to create a dictionary and appending those dictionaries to a list is much much slower than copying the dataframe and creating the columns you need like so:
features_df = df.copy()
features_df['day_of_week'] = features_df['date'].dt.dayofweek
features_df['day_of_month'] = features_df['date'].dt.day
features_df['week_of_year'] = features_df['date'].dt.week
features_df['month'] = features_df['date'].dt.month
Great suggestion Greg, and agree it felt faster. Interestingly got a deprecation warning on .week, so went with features_df['date'].dt.isocalendar().week
great job Venelin!!...waiting for a video on fine-tuning Transformer based recommender :)
probably the best video on time series.
The scaler part is huge weakness in the model; by using a minmax scaler you are assuming that the historical ATH (all time high) price will never be reached which is a fundamental mistake as (asset) prices are continuous. Therefore, the model will not likely be able to predict a resistance.
Thanks for the video Venelin. It is really good for learning the coding side of things. To those who wants to do real life projects, I suggest not to apply the same features with same way of scaling. I might be wrong but I don't think it is a good idea to scale days of week ( 0-6 range ) or months etc.. with MinMaxscale( -1, 1) . They are not numerical features like the price or volume. they are categorical data if I am not wrong and scaling them the way they are done will confuse the algorithms.
Complete Working Code:
github.com/ibadrather/pytorch_learn/blob/main/Part%2013%20-%20Multivariate_Time_Series_Data_Preprocessing_with_Pandas.ipynb
Thank you very much ;) I was scrolling through all the comments to find this
Great Video Venelin! I guess there is a small mistake at 15:00. It should be [ : train] and [train : ], isn't it? In your case: e.g. x = [1,2,3,4,5] and train_size = 2 - >
x1, x2 = x[ : 2] , x[3 : ] -> x1=[1,2] and x2 = [4,5] -> element '3' is missing.
I have one question about the create_sequences function (23:45):
If the reason behind creating (sequence, label) pairs is teaching our model by showing "if you see such a sequence like ..., its label is ..." , shouldn't "label_position" be equal to (i + sequence_length -1) ?
I can answer your second question. in fact we are creating a sequence to predict the label of next row. Ex: you are getting 60 rows to predict the closing price of Bitcoin on 61st row. That is not explained on video but this should be the reason.
@@mehmetnaml5073 This was really helpful Mehmet; I was getting confused by the video at that point but your explanation makes it much clearer, thanks. It is a bit more obvious how the function works if you add a couple of extra values to the sample_data dataframe, e.g. sample_data = pd.DataFrame(dict(feature=[1, 2, 3, 4, 5, 6, 7], label=[6, 7, 8, 9, 10, 11, 12])). That way you can see it's more of a sliding window function, not a simple "split in the middle" as I originally thought it might be
This is a very high quality videos, Thanks!!
Have you done any anomaly detection on a multi variate time series?
Hey man you are Aweomse, thank you so much for your easy and understandable video, this is the best of the best, thank you so much 👍👍👍👍👍👍👍👍👍👍
Great content! Thanks for your efforts!
Venelin, hey! Very nice video! Can you comment why you picked range from -1 to 1 for scaling?
Thank You for this wonderful video showing casing PyTorch for LSTM Time Series
Thank you very much for this vidéo I have a qst ; please, how to prepare our data, in the case of a multivariate analysis but with redundant dates, for example if the variable Symbol have different values(BTC, ETH, LTC......) ? (so we don't have a unique key )
Thank you for very informative video, may I ask you why we need to transfer our Pandas data frame to sequence?
20:14 is where we write the create_sequences function
thank you for your kindness it's nice Vedio
it wont let me import pytorch_lightning as pl. It says "ModuleNotFoundError: No module named 'torchtext.legacy'" what do i do???
I like the play button. What ide are you using?
Great video! What is the meaning behind creating the sequences?
Great videos! Thanks for sharing!
Excellent tutorial! Thanks a lot!
Beautiful explanation
Hi can we use this approach in time series problem ( for employee attendance prediction for 30 days )
thank you for this great video. very helpful
Thank you for your video. I do have one doubt. How to preprocess the data if we have variable length of the series in train_data
We pad the sequences like in nlp
Thanks Venelin.
Hi great video, but i have few questions regarding the notebook:
what is difference between google co-op and Jupyter notebook, which one is better to use and if it is possible to have auto fillers in Jupyter notebook.
Hey,
Google Colab is a Jupyter-like environment which gives you free compute (CPU & GPU). It is open source:
github.com/googlecolab/colabtools
And you can read more about it:
research.google.com/colaboratory/faq.html
Don't know what auto fillers are. IMO, Jupyter lab is better.
Thanks for watching!
I was inspired to start using colab jupyter notebooks through this channel but challenge you will find is how to save and access project files on your Google drive. It's not straight forward like on your local machine. Hopefully may be one day Venelin will create a video for that task
@@venelin_valkov Thanks very much
@@paulntalo1425 Thanks very much
@@paulntalo1425 yes, Collab needs some form of permanent storage (like Google Drive) for your files. What problems/questions do you have regarding that?
Amazing video 😍
Great Video!!
I'm importing a dataframe from a csv file, but cannot access it's columns by name. What's going on? df.head() returns all the columns in the dataframe but df.columns returns only Index
it's because the first row of the excel file contains the index; i just removed the first row by hand :)
@@mp3311 thanks
bless u