PyCon.DE 2017 Nils Braun - Time series feature extraction with tsfresh - “get rich or die..

Поділитися
Вставка
  • Опубліковано 21 лип 2024
  • Time series feature extraction with tsfresh - “get rich or die overfitting”
    Nils Braun (@_nilsbraun)
    Currently I am doing my PhD in Particle Physics - which mainly involves development of software in a large collaboration. I love working with Python and C++ to process large amounts of data. Of course it needs to be processed as quickly as possible. I am working on the core reconstruction algorithms for our experiment, which are steered and controlled using Python. Apart from that, I was working as a Data Science Engineer for Blue Yonder, a leading machine learning company, where the idea for tsfresh was born. I am still heavily involved in the project. When I am not writing code, I am updating myself on the newest technical geek stuff (mostly cloud computing and deep learning) or play the guitar.
    Abstract
    Tags: pydata time series data-science machine learning python ai
    Have you ever thought about developing a time series model to predict stock prices? Or do you consider log time series from the operation of cloud resources as being more compelling? In this case you really should consider using the time series feature extraction package tsfresh for your project.
    Description
    Trends such as the Internet of Things (IoT), Industry 4.0, and precision medicine are driven by the availability of cheap sensors and advancing connectivity, which among others increases the availability of temporally annotated data. The resulting time series are the basis for manifold machine learning applications. Examples are the classification of hard drives into risk classes concerning specific defect, the log analysis of server farms for detecting intruders, or regression tasks like the prediction of the remaining lifespan of machinery. Tsfresh also allows to easily setup a machine learning pipeline that predicts stock prices, which we will demonstrate live during the presentation ;). The problem of extracting and selecting relevant features for classification or regression is these domains is especially hard to solve, if each label or regression target is associated with several time series and meta-information simultaneously - which is a common pattern in industrial applications. This talk introduces a distributed and parallel feature extraction and selection algorithm - the recently published Python library tsfresh. The fully automated extraction and importance selection does not only allow to reach better machine learning classification scores, but in combination with the speed of the package, also allows to incorporate tsfresh into automated AI-pipelines.
    Recorded at PyCon.DE 2017 Karlsruhe: pycon.de
    Video editing: Sebastian Neubauer & Andrei Dan
    Tools: Blender, Avidemux & Sonic Pi

КОМЕНТАРІ • 24

  • @shivamant
    @shivamant 4 роки тому

    Nice talk. QA session

  • @nathanboeger9329
    @nathanboeger9329 6 років тому +3

    Nice talk, nice library, very odd room....

  • @knormz1
    @knormz1 5 років тому +1

    This is great. can you tell me, if you have a univariate time series, how would you perform feature selection as you have no Y values ?

    • @nilsbraun5266
      @nilsbraun5266 3 роки тому

      That would be unsupervised feature selection. That is a use-case which is not implemented in tsfresh, we have some (small) discussion here: github.com/blue-yonder/tsfresh/discussions/861

  • @zapy422
    @zapy422 5 років тому +1

    Is feature selection more or less equivalent to anomaly detection?

    • @nilsbraun5266
      @nilsbraun5266 3 роки тому +2

      Not directly. Feature selection is about finding which features are relevant to solve a given problem (e.g. predicting the next time series value or classifying some data). Of course, it could be one step in your anomaly detection pipeline.

  • @RichieWan
    @RichieWan 3 роки тому +2

    Hey, great talk!
    Hope you could answer this:
    Would these features extraction still be relevant for a regression task? I mean, not "auto"regression (forecasting) but rather, based on one or multiple timeseries, there is a single scalar output that needs to be predicted. So basically I want to use TSFRESH to extract features, and use those features in a standard regression setting to predict a target value (linear regression, XGBoost regression, etc.)
    Would this still make sense to use?

    • @kishore961
      @kishore961 2 роки тому

      I'm looking for the same answer. Trying to use tsfresh for a regression task. Could you let me know if you found an answer

    • @Malikk-em6ix
      @Malikk-em6ix 2 роки тому

      @@kishore961 Hey! I am looking for the same answer. Did you find anything?

  • @ashwinimagar4822
    @ashwinimagar4822 5 років тому

    Nice talk.. Which technique is best for feature selection in case of unsupervised learning or clustering of time-series using feature extraction approach with tsfresh.

    • @IFFranciscoME
      @IFFranciscoME 4 роки тому

      You can look for "sequential clustering", and the MASS technique. github.com/matrix-profile-foundation/mass-ts

    • @nilsbraun5266
      @nilsbraun5266 3 роки тому +1

      We have also some discussion here: github.com/blue-yonder/tsfresh/discussions/861

  • @pworeo625
    @pworeo625 3 роки тому

    Is it possible to use the library with non-binary y values? In my case optimally 9 respectively a 3x3 different classes?

    • @nilsbraun5266
      @nilsbraun5266 3 роки тому

      Yes, that is possible. tsfresh supports multi-class feature selection as well.

  • @franciscobahamondes1313
    @franciscobahamondes1313 Рік тому

    How can I utilize tsfresh to generate features of my univariate sales time series? e.g.
    date id units_sold
    15-01-23 1 34.0
    16-01-23 1 43.0
    17-01-23 1 19.0
    where 'id' simbolize ID of the product, in this case, my time series belong only to a single product...
    I did manage to obtain lots of features with 'extract_features' method, but then when I tried to 'extract_relevant_features' or 'select_features' I was unable to go any further since I do not have a 'y' pandas series.
    Is anyone facing the same challenge? thx for your help.

    • @khoile1269
      @khoile1269 10 місяців тому

      Does your dataset only contain sales for 1 product?

  • @wangrichard2140
    @wangrichard2140 4 роки тому

    where can i download code?

    • @nilsbraun5266
      @nilsbraun5266 3 роки тому

      You can find the code here: github.com/blue-yonder/tsfresh

    • @wangrichard2140
      @wangrichard2140 3 роки тому

      @@nilsbraun5266 thks!

  • @Doomer6969
    @Doomer6969 5 років тому

    this guy is pretty funny the crowed is so dead

  • @doclsvlc7878
    @doclsvlc7878 3 роки тому

    5:15

  • @_testing_2024
    @_testing_2024 2 роки тому

    quiet room but it is in Germany so...

  • @_testing_2024
    @_testing_2024 2 роки тому

    Actually the stock example is a pretty bad example