Multivariate Time Series Data Preprocessing with Pandas in Python | Machine Learning Tutorial

Поділитися
Вставка
  • Опубліковано 21 жов 2024

КОМЕНТАРІ • 54

  • @MaximeAntoine97
    @MaximeAntoine97 2 роки тому +10

    This video is a gold mine for multivariate time series data. After searching for hours online, you were the ONLY person that was capable of explaining everything in a simple way.
    Thank you!

  • @Asparuh.Emilov
    @Asparuh.Emilov 3 роки тому +8

    This is by far one of the best videos I have seen about data preprocessing for Time Series Data. Keep up the good work please!

  • @AlistairWalsh
    @AlistairWalsh 3 роки тому +12

    8:32 - iterating over rows in pandas is usually much slower than doing a column-wise operation.
    Instead of this:
    df["close_change"] = df.progress_apply(
    lambda row: 0 if np.isnan(row.prev_close) else row.close - row.prev_close,
    axis = 'columns'
    )
    Try this:
    df["close_change"] = df['close'] - df['prev_close']
    df["close_change"].fillna(0, inplace=True)

    • @maxmohamed9878
      @maxmohamed9878 3 роки тому

      Maybe Venelin was trying to show how to use the progress_apply. But the way you calculated the close_change is the best way.

    • @sinabirecik
      @sinabirecik 2 роки тому

      If you want only the change of the value, you can use also diff()

  • @MayssaRekik
    @MayssaRekik 5 місяців тому

    Love the sweet song you shared in your notebook. Been vibin to Common's music while going through the code. Great stuff thanks for sharing!

  • @gregjuva
    @gregjuva 3 роки тому +3

    Great tutorial! Thanks! One comment in the preprocessing step. Iterating over each row to create a dictionary and appending those dictionaries to a list is much much slower than copying the dataframe and creating the columns you need like so:
    features_df = df.copy()
    features_df['day_of_week'] = features_df['date'].dt.dayofweek
    features_df['day_of_month'] = features_df['date'].dt.day
    features_df['week_of_year'] = features_df['date'].dt.week
    features_df['month'] = features_df['date'].dt.month

    • @Aegilops
      @Aegilops 2 роки тому

      Great suggestion Greg, and agree it felt faster. Interestingly got a deprecation warning on .week, so went with features_df['date'].dt.isocalendar().week

  • @Deepakkumar-sn6tr
    @Deepakkumar-sn6tr 3 роки тому +2

    great job Venelin!!...waiting for a video on fine-tuning Transformer based recommender :)

  • @ephi124
    @ephi124 3 роки тому

    probably the best video on time series.

  • @kadourkadouri3505
    @kadourkadouri3505 Рік тому +1

    The scaler part is huge weakness in the model; by using a minmax scaler you are assuming that the historical ATH (all time high) price will never be reached which is a fundamental mistake as (asset) prices are continuous. Therefore, the model will not likely be able to predict a resistance.

  • @mehmetnaml5073
    @mehmetnaml5073 3 роки тому +3

    Thanks for the video Venelin. It is really good for learning the coding side of things. To those who wants to do real life projects, I suggest not to apply the same features with same way of scaling. I might be wrong but I don't think it is a good idea to scale days of week ( 0-6 range ) or months etc.. with MinMaxscale( -1, 1) . They are not numerical features like the price or volume. they are categorical data if I am not wrong and scaling them the way they are done will confuse the algorithms.

  • @ibadrather
    @ibadrather 2 роки тому +2

    Complete Working Code:
    github.com/ibadrather/pytorch_learn/blob/main/Part%2013%20-%20Multivariate_Time_Series_Data_Preprocessing_with_Pandas.ipynb

    • @saurabhvarshneya4639
      @saurabhvarshneya4639 2 роки тому +1

      Thank you very much ;) I was scrolling through all the comments to find this

  • @bahadrbasaran8908
    @bahadrbasaran8908 3 роки тому +3

    Great Video Venelin! I guess there is a small mistake at 15:00. It should be [ : train] and [train : ], isn't it? In your case: e.g. x = [1,2,3,4,5] and train_size = 2 - >
    x1, x2 = x[ : 2] , x[3 : ] -> x1=[1,2] and x2 = [4,5] -> element '3' is missing.
    I have one question about the create_sequences function (23:45):
    If the reason behind creating (sequence, label) pairs is teaching our model by showing "if you see such a sequence like ..., its label is ..." , shouldn't "label_position" be equal to (i + sequence_length -1) ?

    • @mehmetnaml5073
      @mehmetnaml5073 3 роки тому +2

      I can answer your second question. in fact we are creating a sequence to predict the label of next row. Ex: you are getting 60 rows to predict the closing price of Bitcoin on 61st row. That is not explained on video but this should be the reason.

    • @Aegilops
      @Aegilops 2 роки тому

      @@mehmetnaml5073 This was really helpful Mehmet; I was getting confused by the video at that point but your explanation makes it much clearer, thanks. It is a bit more obvious how the function works if you add a couple of extra values to the sample_data dataframe, e.g. sample_data = pd.DataFrame(dict(feature=[1, 2, 3, 4, 5, 6, 7], label=[6, 7, 8, 9, 10, 11, 12])). That way you can see it's more of a sliding window function, not a simple "split in the middle" as I originally thought it might be

  • @mohammadfadel6447
    @mohammadfadel6447 Рік тому

    This is a very high quality videos, Thanks!!
    Have you done any anomaly detection on a multi variate time series?

  • @marlonlopezpereyra
    @marlonlopezpereyra 9 місяців тому +1

    Hey man you are Aweomse, thank you so much for your easy and understandable video, this is the best of the best, thank you so much 👍👍👍👍👍👍👍👍👍👍

  • @antonbozhinov
    @antonbozhinov 2 роки тому +1

    Great content! Thanks for your efforts!

  • @gj2u
    @gj2u Рік тому

    Venelin, hey! Very nice video! Can you comment why you picked range from -1 to 1 for scaling?

  • @paulntalo1425
    @paulntalo1425 3 роки тому

    Thank You for this wonderful video showing casing PyTorch for LSTM Time Series

  • @Wissam-rk7tv
    @Wissam-rk7tv Рік тому

    Thank you very much for this vidéo I have a qst ; please, how to prepare our data, in the case of a multivariate analysis but with redundant dates, for example if the variable Symbol have different values(BTC, ETH, LTC......) ? (so we don't have a unique key )

  • @Rody2013
    @Rody2013 2 роки тому

    Thank you for very informative video, may I ask you why we need to transfer our Pandas data frame to sequence?

  • @SaudBako
    @SaudBako 2 роки тому

    20:14 is where we write the create_sequences function

  • @piramid53
    @piramid53 Рік тому

    thank you for your kindness it's nice Vedio

  • @justin9915
    @justin9915 2 роки тому

    it wont let me import pytorch_lightning as pl. It says "ModuleNotFoundError: No module named 'torchtext.legacy'" what do i do???

  • @hi_brante3
    @hi_brante3 3 роки тому

    I like the play button. What ide are you using?

  • @mp3311
    @mp3311 2 роки тому

    Great video! What is the meaning behind creating the sequences?

  • @priodyutipradhan66
    @priodyutipradhan66 2 роки тому

    Great videos! Thanks for sharing!

  • @zenfascist
    @zenfascist 3 роки тому

    Excellent tutorial! Thanks a lot!

  • @tattwadarshipanda491
    @tattwadarshipanda491 2 роки тому

    Beautiful explanation

  • @SP-wt9lo
    @SP-wt9lo 2 роки тому

    Hi can we use this approach in time series problem ( for employee attendance prediction for 30 days )

  • @alteshaus3149
    @alteshaus3149 3 роки тому

    thank you for this great video. very helpful

  • @dhavalpatel5595
    @dhavalpatel5595 3 роки тому

    Thank you for your video. I do have one doubt. How to preprocess the data if we have variable length of the series in train_data

  • @Ks-oj6tc
    @Ks-oj6tc 3 роки тому

    Thanks Venelin.

  • @mikheilmgebrishvili9571
    @mikheilmgebrishvili9571 3 роки тому

    Hi great video, but i have few questions regarding the notebook:
    what is difference between google co-op and Jupyter notebook, which one is better to use and if it is possible to have auto fillers in Jupyter notebook.

    • @venelin_valkov
      @venelin_valkov  3 роки тому

      Hey,
      Google Colab is a Jupyter-like environment which gives you free compute (CPU & GPU). It is open source:
      github.com/googlecolab/colabtools
      And you can read more about it:
      research.google.com/colaboratory/faq.html
      Don't know what auto fillers are. IMO, Jupyter lab is better.
      Thanks for watching!

    • @paulntalo1425
      @paulntalo1425 3 роки тому

      I was inspired to start using colab jupyter notebooks through this channel but challenge you will find is how to save and access project files on your Google drive. It's not straight forward like on your local machine. Hopefully may be one day Venelin will create a video for that task

    • @mikheilmgebrishvili9571
      @mikheilmgebrishvili9571 3 роки тому

      @@venelin_valkov Thanks very much

    • @mikheilmgebrishvili9571
      @mikheilmgebrishvili9571 3 роки тому

      @@paulntalo1425 Thanks very much

    • @venelin_valkov
      @venelin_valkov  3 роки тому

      @@paulntalo1425 yes, Collab needs some form of permanent storage (like Google Drive) for your files. What problems/questions do you have regarding that?

  • @HipHop-cz6os
    @HipHop-cz6os 3 роки тому

    Amazing video 😍

  • @vanish6839
    @vanish6839 3 роки тому

    Great Video!!

  • @charmz973
    @charmz973 3 роки тому

    I'm importing a dataframe from a csv file, but cannot access it's columns by name. What's going on? df.head() returns all the columns in the dataframe but df.columns returns only Index

    • @mp3311
      @mp3311 2 роки тому +1

      it's because the first row of the excel file contains the index; i just removed the first row by hand :)

    • @charmz973
      @charmz973 2 роки тому

      @@mp3311 thanks

  • @lyudmilabilerminiagavrilov9132
    @lyudmilabilerminiagavrilov9132 4 місяці тому

    bless u