Forecastegy
Forecastegy
  • 6
  • 77 197
Fix Imbalanced Data In Machine Learning
A simple trick to deal with imbalanced classes when training machine learning models with code examples in Scikit-learn, XGBoost, and Tensorflow/Keras.
Remember to like and subscribe. Thanks!
*Video style heavily inspired by @Fireship
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SUPPORT THE CHANNEL 👇❤️
Sign up for a Coursera course:
imp.i384100.net/EaDmQe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SOCIAL MEDIA
LinkedIn: www.linkedin.com/in/mariofilho/
Kaggle: kaggle.com/mariofilho
Twitter: mariofilhoml
Blog: forecastegy.com
Some links above can be from partnerships where I get a commission if you buy a product, without any additional cost to you. Thanks for the support!
Переглядів: 1 452

Відео

Feature Engineering Secret From A Kaggle Grandmaster
Переглядів 38 тис.3 роки тому
Learn how to do feature engineering for tabular data like a Kaggle Grandmaster and get high-performance machine learning models. Like the video? Subscribe and turn on the notifications to get more tips :) 0:00 Intro 1:38 The One Question To Ask Yourself 2:40 Credit Card Fraud Examples 6:34 Brief Info On Categorical Features 7:23 Time Series Feature Engineering 11:53 An Extremely Valuable Exerci...
How To Fill Missing Data With Pandas Fillna - Data Science For Beginners
Переглядів 7893 роки тому
Check my blog for more machine learning content: forecastegy.com Learn how to replace missing values in your pandas DataFrame with the fillna function. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
How To Drop Columns In a Pandas DataFrame - Data Science For Beginners
Переглядів 3833 роки тому
Check my blog for more machine learning content: forecastegy.com Learn how to drop one or more columns in a DataFrame using pandas. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
Multiple Time Series Forecasting With Scikit-Learn
Переглядів 36 тис.3 роки тому
You got a lot of time series data points and want to predict the next step (or steps). What should you do now? Train a model for each series? Is there a way to fit a model for all the series together? Which is better? I have seen many data scientists think about approaching this problem by creating a single model for each product. Although this is one of the possible solutions, it's not likely ...

КОМЕНТАРІ

  • @GeoffDevitt
    @GeoffDevitt 3 дні тому

    A very comprehensive video, thank you. A question I have, you mention that if you have multiple dataset at different time periods, that these can be combined together and have 1 model trained, instead of multiple models e.g. Daily, Weekly, Monthly. My datasets are identical, just the datetime is changing. Does this also apply for other models such as RandomForestClassifier?

  • @svitlanatuchyna7154
    @svitlanatuchyna7154 2 місяці тому

    You are creating amazing videos! Thank you! So well explained, easy to understand, helps to solve real ML problems and at the same time entertaining! Please keep creating more:))

  • @svitlanatuchyna7154
    @svitlanatuchyna7154 2 місяці тому

    Thank you for such an amazing video!!! It is incredibly useful!!!

  • @sarasatti1070
    @sarasatti1070 2 місяці тому

    Hi Mario, and thank you for a very clear and concise explanation. One question I have is, how would you handle it if several of the products are only selling intermittently such that there are many zeros in the series?

  • @user-gz2po7dx3k
    @user-gz2po7dx3k 2 місяці тому

    where are new videos!!!

  • @necuspam
    @necuspam 2 місяці тому

    More intriguing question is: how to train a model, based on thousands of timeseries, determined by multiple parameters, and then to simulate/forecast single timeseries, based on new set of the respective parameters

  • @parvneema
    @parvneema 2 місяці тому

    Very nicely explained. Your videos are good. Why did you start making them?

  • @dianavi3961
    @dianavi3961 2 місяці тому

    Thank you a lot!

  • @userhandle-u7b
    @userhandle-u7b 3 місяці тому

    It would be better if you use slides with key points. It was distracting by the 'hand-writing' on the screen & hard to read. Anyway, thanks

  • @anwarsaidan3959
    @anwarsaidan3959 4 місяці тому

    Thank you very much for this amazing video ! Can we use Cross Validation for hyperparameter tuning in the case of RandomForest with time series data ?

  • @abdullahalmahfuz6700
    @abdullahalmahfuz6700 4 місяці тому

    Should i learn feature Engineering in 2024?

  • @mamyrak1114
    @mamyrak1114 5 місяців тому

    i can do the same processus if in place of week i have a date like yyyy-mm-dd and how to handle the year?

  • @bennyadrianmartinez
    @bennyadrianmartinez 5 місяців тому

    Thank you. You did so very much in such little time in comparison to TWO different bootcamp instructors could in so much time...

  • @yuvrajchauhan9874
    @yuvrajchauhan9874 8 місяців тому

    00:01 Learn feature engineering for high performance models 02:00 Aggregation is essential for extracting useful information from tables and can be compared to the group-by function in various programming languages. 03:56 Feature engineering involves creating customer-specific features to predict fraud in transactions. 06:01 Feature Engineering is all about aggregation and encoding for capturing patterns and anomalies. 08:00 Feature engineering techniques like lag, difference, rolling, and date components are significant for analyzing time series data. 09:55 Seasonal patterns and time differences for feature engineering 11:55 Reverse engineer feature computation from Kaggle solutions 13:57 Feature engineering can be applied universally in tabular data for extracting features from multiple tables. 15:47 Feature engineering techniques used in data processing 17:41 Utilizing feature engineering to create indicators for bot usage from IP data. 19:22 Geolocation and network features are key for advanced feature engineering. 21:03 Graph features are important for model prediction.

  • @jackcarter97
    @jackcarter97 8 місяців тому

    How do I find the season effect features?

  • @jackcarter97
    @jackcarter97 8 місяців тому

    how do I find the season effect features?

  • @chungrandy780
    @chungrandy780 9 місяців тому

    🎯 Key Takeaways for quick navigation: 00:00 📊 *Understanding Feature Engineering for Tabular Data* - Feature engineering is essential for high-performance machine learning models. - The key to feature engineering is aggregation, which involves grouping and summarizing data. - Aggregations can be applied to various types of data, including categorical and numerical variables. 06:22 🔄 *Common Feature Engineering Techniques* - Feature engineering techniques include lag, difference, rolling, date components, and time differences. - Lag captures the previous value of a variable in a sequence. - Difference calculates the difference between consecutive values in a sequence. - Rolling involves computing aggregations over a rolling window of data. - Date components extract information like month or day from dates for seasonality patterns. - Time differences measure the time elapsed between events. 15:21 🧩 *Reverse Engineering Features from a Kaggle Solution* - Analyzing features from a Kaggle competition example. - Median time between bids can be computed by grouping by user and calculating time differences between bids. - Mean number of bids per auction is determined by grouping by user and auction, then counting bid occurrences. - Detecting IP addresses used by both users and bots involves complex filtering and merging based on IP data. 21:05 🌐 *Advanced Feature Engineering* - Geolocation features can be important, calculating distances between locations, and spatial data aggregations. - Network or graph features involve representing data as graphs and computing graph-related metrics. - Suggests exploring the Instacart competition for advanced feature engineering with multiple tables. 22:16 📺 *Conclusion and Next Steps* - Encourages viewers to like, subscribe, and leave comments. - Offers a link to a time series forecasting workshop for further learning. Made with HARPA AI

  • @LifeKiT-i
    @LifeKiT-i 11 місяців тому

    I just checked this amazing video after your feature selection engineering video! I have no idea why this is video isn’t popular!!! Respect the effort you spent on this!

  • @LifeKiT-i
    @LifeKiT-i 11 місяців тому

    I am in a Kaggle competition. Learnt a lot from this video!! Thank you so much for uploading this video for us!!

  • @pcdowling
    @pcdowling Рік тому

    Thank you.

  • @dy8576
    @dy8576 Рік тому

    Love the videos and blogs- absolute mad content, thank you very much

  • @paulkim244
    @paulkim244 Рік тому

    Fantastic video, so many useful references, I'm glad I watched the entire thing!

  • @VG-yw2mp
    @VG-yw2mp Рік тому

    Why dont we use product_code as one of the features while training?

  • @Gabriel-iw3hc
    @Gabriel-iw3hc Рік тому

    how i future forecast with this method ? Ex: forecast week 52 ? i think, need to forecast another series too for another features .

  • @ElChe-Ko
    @ElChe-Ko Рік тому

    Nice! It would be interesting to see what to do if the time series have different lengths.

  • @Septumsempra8818
    @Septumsempra8818 Рік тому

    Are we going to get a video on cross-validation and selecting the right model? Your time series videos have been a wealth of knowledge.

  • @RodrigoLima-o5b
    @RodrigoLima-o5b Рік тому

    Mario, boa tarde. Tem algum dica para usarmos a LSTM para predições com passos à frente em um sistema MISO? .

  • @zulhas9
    @zulhas9 Рік тому

    Hi Mario, thanks for the wonderful presentation. One qouestion, how could you use the feature the "Sales" to predict sales? Using that features, when you predict using .predict function, you have to pass that as an argument. In reality, you would not have that information available.

  • @chengeeri
    @chengeeri Рік тому

    Good One!!!!! Expecting more from You!!!!!!

  • @ThePaintingpeter
    @ThePaintingpeter Рік тому

    I just found your video and it's great. The reference to FeatureTools was frustrating to say the least. The documentation on the site is not working and the github repo also has examples that just don't work. It's too bad

    • @dimka11ggg
      @dimka11ggg Рік тому

      Try different versions, probably examples for some old versions

  • @stonesupermaster
    @stonesupermaster Рік тому

    Hello Mario, I have a question... how does the model know that we're trying to predict multiple products at once? I've trying to train a model in order to predict the sales of 2000 SKU and the main concern I have now is how to do it efficiently. I watched everything that you did but I still have the same problem, do you know where I can find an example of it? thank you very much for your video

    • @AskApt05
      @AskApt05 4 місяці тому

      Hi @stonesupermaster, Facing same problem. Have you found a solution? It would be really helpful if you can share. Thanks!

  • @therussiankid7296
    @therussiankid7296 Рік тому

    #getthistrending

  • @Mohammad-vr9dj
    @Mohammad-vr9dj Рік тому

    Thanks for the useful video. Sorry, is it possible to implement independent spatial sequences simultaneously? I have a dataset which is consist of 1000 independent spatial sequences with dimension 2*7 (2 for x and y, and the length 7 for positions in each time). I implemented it with Simple RNN, LSTM and GRU. Can I do it with transformers (attention mechanism)? Could you introduce me a practical example?

  • @marmadukewynn9826
    @marmadukewynn9826 2 роки тому

    🤘 ρгό𝔪σŞm

  • @MrGhustavo22
    @MrGhustavo22 2 роки тому

    give more, please!

  • @SuperHddf
    @SuperHddf 2 роки тому

    Thank you! :) ♥

  • @gregoryoliveira8358
    @gregoryoliveira8358 2 роки тому

    I used this on my last project. It is very important to read the library documentation and find this unbalanced parameters.

  • @garcialn
    @garcialn 2 роки тому

    Hi, Mario. Big fan of yours from DataHackers here! Do you know if the same applies for imbalanced data sets for anomalies detection? Such as default prediction or fraud detection problems? It's, usually, not a problem from sampling, but its from the nature of those problems having such imbalanced data... Don't know if it would end up creating bias or data leakege because of it...? Do you know better technics for this kinds of problems?

    • @Forecastegy
      @Forecastegy 2 роки тому

      Hi Lucas, you can use it for anomaly detection. This is just a way of telling the model to pay more attention to the less frequent examples. Just remember to calibrate your predictions if you need probabilities instead of just a ranking score.

  • @anwarhermuche
    @anwarhermuche 2 роки тому

    Very clear explanation! Thank you for the video Mario

  • @Kevin-fp6gk
    @Kevin-fp6gk 2 роки тому

    Loved the way you presented.

  • @RicardoZibordi
    @RicardoZibordi 2 роки тому

    Clear, objective and very practical - congratulations!

  • @sekiro_19
    @sekiro_19 2 роки тому

    Thank you so much man crazy good explanation

  • @VamosCoringar
    @VamosCoringar 2 роки тому

    Por essa eu não esperava kkkk

  • @gauravmalik3911
    @gauravmalik3911 2 роки тому

    It would be great if you could show demo also , thank you for information

  • @snk2288
    @snk2288 2 роки тому

    Difference between time features would lead to negative values. Do we take min max scaler after that?

    • @ozan4702
      @ozan4702 Рік тому

      You would want to apply difference such that future data is subtracted from past so its never negative.

    • @darkchoco7407
      @darkchoco7407 Рік тому

      No problem having negative values as features, at all

  • @Mohammad-vr9dj
    @Mohammad-vr9dj 2 роки тому

    Thanks for your useful video. Sorry, If our dataset has two target columns how can we write the codes?

  • @Learner_123
    @Learner_123 2 роки тому

    Thank you for making the topic simple. Since you have combined all the product sales to train and validate your model, How can one use this model to predict sales for 'any single' product only?

    • @zabmaz10
      @zabmaz10 2 роки тому

      I have the same question, but I guess one way is to convert the product code into dummy variables and use those as features in the random forest.

  • @winniethepooh4891
    @winniethepooh4891 2 роки тому

    This channel is a hidden gem !!!

  • @kaianchan7768
    @kaianchan7768 2 роки тому

    Thanks for this tutorial. Will you provide some videos about many features? Thanks!

  • @faraza5161
    @faraza5161 2 роки тому

    The Simple Imputer will impute mean values for the entire column in the missing values. Shouldn't that be done product wise as well? Thanks for a wonderful lecture btw :-)