- 6
- 77 197
Forecastegy
Brazil
Приєднався 7 лип 2021
I am a Machine Learning (ML) Expert & Data Scientist with 7+ years of experience helping companies globally.
Kaggle Grandmaster who achieved multiple 1st place finishes in global Kaggle competitions, and top global rank at 12th of 50,000+
Content Consultant for Applied Data Science with Venture Applications: Data-X (INDENG 135/235) at University of California, Berkeley
Site: forecastegy.com
Follow me on Twitter: mariofilhoml
Kaggle profile: www.kaggle.com/mariofilho
LinkedIn: linkedin.com/in/mariofilho/
🤖 Building machine learning systems since 2014
🏆 2x Prize Winner Kaggle Competitions Grandmaster
📊 Former Lead Data Scientist @ Upwork
🎓 @UCBerkeley Data-X Consultant
Kaggle Grandmaster who achieved multiple 1st place finishes in global Kaggle competitions, and top global rank at 12th of 50,000+
Content Consultant for Applied Data Science with Venture Applications: Data-X (INDENG 135/235) at University of California, Berkeley
Site: forecastegy.com
Follow me on Twitter: mariofilhoml
Kaggle profile: www.kaggle.com/mariofilho
LinkedIn: linkedin.com/in/mariofilho/
🤖 Building machine learning systems since 2014
🏆 2x Prize Winner Kaggle Competitions Grandmaster
📊 Former Lead Data Scientist @ Upwork
🎓 @UCBerkeley Data-X Consultant
Fix Imbalanced Data In Machine Learning
A simple trick to deal with imbalanced classes when training machine learning models with code examples in Scikit-learn, XGBoost, and Tensorflow/Keras.
Remember to like and subscribe. Thanks!
*Video style heavily inspired by @Fireship
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SUPPORT THE CHANNEL 👇❤️
Sign up for a Coursera course:
imp.i384100.net/EaDmQe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SOCIAL MEDIA
LinkedIn: www.linkedin.com/in/mariofilho/
Kaggle: kaggle.com/mariofilho
Twitter: mariofilhoml
Blog: forecastegy.com
Some links above can be from partnerships where I get a commission if you buy a product, without any additional cost to you. Thanks for the support!
Remember to like and subscribe. Thanks!
*Video style heavily inspired by @Fireship
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SUPPORT THE CHANNEL 👇❤️
Sign up for a Coursera course:
imp.i384100.net/EaDmQe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// SOCIAL MEDIA
LinkedIn: www.linkedin.com/in/mariofilho/
Kaggle: kaggle.com/mariofilho
Twitter: mariofilhoml
Blog: forecastegy.com
Some links above can be from partnerships where I get a commission if you buy a product, without any additional cost to you. Thanks for the support!
Переглядів: 1 452
Відео
Feature Engineering Secret From A Kaggle Grandmaster
Переглядів 38 тис.3 роки тому
Learn how to do feature engineering for tabular data like a Kaggle Grandmaster and get high-performance machine learning models. Like the video? Subscribe and turn on the notifications to get more tips :) 0:00 Intro 1:38 The One Question To Ask Yourself 2:40 Credit Card Fraud Examples 6:34 Brief Info On Categorical Features 7:23 Time Series Feature Engineering 11:53 An Extremely Valuable Exerci...
How To Fill Missing Data With Pandas Fillna - Data Science For Beginners
Переглядів 7893 роки тому
Check my blog for more machine learning content: forecastegy.com Learn how to replace missing values in your pandas DataFrame with the fillna function. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
How To Drop Columns In a Pandas DataFrame - Data Science For Beginners
Переглядів 3833 роки тому
Check my blog for more machine learning content: forecastegy.com Learn how to drop one or more columns in a DataFrame using pandas. Like the video? Subscribe and turn on the notifications to get more tips :) Docs: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
Multiple Time Series Forecasting With Scikit-Learn
Переглядів 36 тис.3 роки тому
You got a lot of time series data points and want to predict the next step (or steps). What should you do now? Train a model for each series? Is there a way to fit a model for all the series together? Which is better? I have seen many data scientists think about approaching this problem by creating a single model for each product. Although this is one of the possible solutions, it's not likely ...
A very comprehensive video, thank you. A question I have, you mention that if you have multiple dataset at different time periods, that these can be combined together and have 1 model trained, instead of multiple models e.g. Daily, Weekly, Monthly. My datasets are identical, just the datetime is changing. Does this also apply for other models such as RandomForestClassifier?
You are creating amazing videos! Thank you! So well explained, easy to understand, helps to solve real ML problems and at the same time entertaining! Please keep creating more:))
Thank you for such an amazing video!!! It is incredibly useful!!!
Hi Mario, and thank you for a very clear and concise explanation. One question I have is, how would you handle it if several of the products are only selling intermittently such that there are many zeros in the series?
where are new videos!!!
More intriguing question is: how to train a model, based on thousands of timeseries, determined by multiple parameters, and then to simulate/forecast single timeseries, based on new set of the respective parameters
Very nicely explained. Your videos are good. Why did you start making them?
Thank you a lot!
It would be better if you use slides with key points. It was distracting by the 'hand-writing' on the screen & hard to read. Anyway, thanks
Thank you very much for this amazing video ! Can we use Cross Validation for hyperparameter tuning in the case of RandomForest with time series data ?
Should i learn feature Engineering in 2024?
i can do the same processus if in place of week i have a date like yyyy-mm-dd and how to handle the year?
Thank you. You did so very much in such little time in comparison to TWO different bootcamp instructors could in so much time...
00:01 Learn feature engineering for high performance models 02:00 Aggregation is essential for extracting useful information from tables and can be compared to the group-by function in various programming languages. 03:56 Feature engineering involves creating customer-specific features to predict fraud in transactions. 06:01 Feature Engineering is all about aggregation and encoding for capturing patterns and anomalies. 08:00 Feature engineering techniques like lag, difference, rolling, and date components are significant for analyzing time series data. 09:55 Seasonal patterns and time differences for feature engineering 11:55 Reverse engineer feature computation from Kaggle solutions 13:57 Feature engineering can be applied universally in tabular data for extracting features from multiple tables. 15:47 Feature engineering techniques used in data processing 17:41 Utilizing feature engineering to create indicators for bot usage from IP data. 19:22 Geolocation and network features are key for advanced feature engineering. 21:03 Graph features are important for model prediction.
How do I find the season effect features?
how do I find the season effect features?
🎯 Key Takeaways for quick navigation: 00:00 📊 *Understanding Feature Engineering for Tabular Data* - Feature engineering is essential for high-performance machine learning models. - The key to feature engineering is aggregation, which involves grouping and summarizing data. - Aggregations can be applied to various types of data, including categorical and numerical variables. 06:22 🔄 *Common Feature Engineering Techniques* - Feature engineering techniques include lag, difference, rolling, date components, and time differences. - Lag captures the previous value of a variable in a sequence. - Difference calculates the difference between consecutive values in a sequence. - Rolling involves computing aggregations over a rolling window of data. - Date components extract information like month or day from dates for seasonality patterns. - Time differences measure the time elapsed between events. 15:21 🧩 *Reverse Engineering Features from a Kaggle Solution* - Analyzing features from a Kaggle competition example. - Median time between bids can be computed by grouping by user and calculating time differences between bids. - Mean number of bids per auction is determined by grouping by user and auction, then counting bid occurrences. - Detecting IP addresses used by both users and bots involves complex filtering and merging based on IP data. 21:05 🌐 *Advanced Feature Engineering* - Geolocation features can be important, calculating distances between locations, and spatial data aggregations. - Network or graph features involve representing data as graphs and computing graph-related metrics. - Suggests exploring the Instacart competition for advanced feature engineering with multiple tables. 22:16 📺 *Conclusion and Next Steps* - Encourages viewers to like, subscribe, and leave comments. - Offers a link to a time series forecasting workshop for further learning. Made with HARPA AI
I just checked this amazing video after your feature selection engineering video! I have no idea why this is video isn’t popular!!! Respect the effort you spent on this!
I am in a Kaggle competition. Learnt a lot from this video!! Thank you so much for uploading this video for us!!
Thank you.
Love the videos and blogs- absolute mad content, thank you very much
Fantastic video, so many useful references, I'm glad I watched the entire thing!
Why dont we use product_code as one of the features while training?
how i future forecast with this method ? Ex: forecast week 52 ? i think, need to forecast another series too for another features .
Nice! It would be interesting to see what to do if the time series have different lengths.
Are we going to get a video on cross-validation and selecting the right model? Your time series videos have been a wealth of knowledge.
Mario, boa tarde. Tem algum dica para usarmos a LSTM para predições com passos à frente em um sistema MISO? .
Hi Mario, thanks for the wonderful presentation. One qouestion, how could you use the feature the "Sales" to predict sales? Using that features, when you predict using .predict function, you have to pass that as an argument. In reality, you would not have that information available.
Good One!!!!! Expecting more from You!!!!!!
I just found your video and it's great. The reference to FeatureTools was frustrating to say the least. The documentation on the site is not working and the github repo also has examples that just don't work. It's too bad
Try different versions, probably examples for some old versions
Hello Mario, I have a question... how does the model know that we're trying to predict multiple products at once? I've trying to train a model in order to predict the sales of 2000 SKU and the main concern I have now is how to do it efficiently. I watched everything that you did but I still have the same problem, do you know where I can find an example of it? thank you very much for your video
Hi @stonesupermaster, Facing same problem. Have you found a solution? It would be really helpful if you can share. Thanks!
#getthistrending
Thanks for the useful video. Sorry, is it possible to implement independent spatial sequences simultaneously? I have a dataset which is consist of 1000 independent spatial sequences with dimension 2*7 (2 for x and y, and the length 7 for positions in each time). I implemented it with Simple RNN, LSTM and GRU. Can I do it with transformers (attention mechanism)? Could you introduce me a practical example?
🤘 ρгό𝔪σŞm
give more, please!
Thank you! :) ♥
I used this on my last project. It is very important to read the library documentation and find this unbalanced parameters.
Hi, Mario. Big fan of yours from DataHackers here! Do you know if the same applies for imbalanced data sets for anomalies detection? Such as default prediction or fraud detection problems? It's, usually, not a problem from sampling, but its from the nature of those problems having such imbalanced data... Don't know if it would end up creating bias or data leakege because of it...? Do you know better technics for this kinds of problems?
Hi Lucas, you can use it for anomaly detection. This is just a way of telling the model to pay more attention to the less frequent examples. Just remember to calibrate your predictions if you need probabilities instead of just a ranking score.
Very clear explanation! Thank you for the video Mario
Loved the way you presented.
Clear, objective and very practical - congratulations!
Thank you so much man crazy good explanation
Por essa eu não esperava kkkk
It would be great if you could show demo also , thank you for information
Difference between time features would lead to negative values. Do we take min max scaler after that?
You would want to apply difference such that future data is subtracted from past so its never negative.
No problem having negative values as features, at all
Thanks for your useful video. Sorry, If our dataset has two target columns how can we write the codes?
Thank you for making the topic simple. Since you have combined all the product sales to train and validate your model, How can one use this model to predict sales for 'any single' product only?
I have the same question, but I guess one way is to convert the product code into dummy variables and use those as features in the random forest.
This channel is a hidden gem !!!
Thanks for this tutorial. Will you provide some videos about many features? Thanks!
The Simple Imputer will impute mean values for the entire column in the missing values. Shouldn't that be done product wise as well? Thanks for a wonderful lecture btw :-)