Advanced Feature Engineering Tips and Tricks - Data Science Festival
Вставка
- Опубліковано 1 гру 2024
- Title: Advanced Feature Engineering Tips and Tricks
Speaker: T. Scott Clendaniel
Abstract: While beginners often focus on algorithm selection, professionals know that the real power in Artificial Intelligence and Machine Learning is often Feature Engineering. Unfortunately, this process can be incredibly time-consuming and complicated. This training will give you a robust set of Tips and Tricks to get the most performance in the shortest time, regardless of the algorithm you choose.
Subscribe to our channel: www.youtube.co...
Website: datasciencefes...
LinkedIn: / data-science-festival
Twitter: / datasciencefest
This is amazing. I really like this.
This is one of the best videos, if not the best, I've seen on this topic so far. Most people are too focused on code to get deep into the what and how.
This is GOLD and please keep in mind this metal was created by very big stars.
Can we get a linkedin post or video update regarding if these tips are still applicable today? Also, you highly encourage target mean encoding but I think it inherently leaks some information to the training set. Am I wrong to assume that? Thanks and nice video btw.
Whats funny looking back to this now is that moment google stepped back? That was when they first got BERT to a pre-RLHF GPT-3 level of competence, but the rumor is some execs got spooked and backburnered it. And 2.5ish years on people started unironically intentionally using bing for the first time since they downloaded chrome. I expect those execs got canned but i havent followed closely.
What do you think about boruta based on permutated random forest to help in feature selection. What about some "brute" feature engineering and then boruta, and the most important features "connects" with the dependent variable with a interpretable model like generalized additive models? Thank you
Unfortunately, I know almost nothing about "boruta," so I can't help you on this one.
Awesome talk. Very well prepared. Thanks a lot from Berlin
Great talk! Can you please elaborate on the feature selection method you typically use?
Another question : how can we take output of unsupervised algorithms and input them into supervised algorithms?
You can take the cluster assignment, as in the cluster ID, and use that as a new feature in the supervised algorithm. Thanks for asking!
Thanks a lot for a great session. Also, can you pls elaborate on this method of taking the leaves of a decision tree as a new feature?? As far as I know mostly leaves of a decision tree must be either labels (classification) or target numerical values (in cases of regression). But features can be categorical or numeric. But Im not sure whether if this will be after converting categorical to some form of numeric in any of the imputing ways possible. So according to you those feature vectors will be together converted to a single vector after passing them to the fitted decision tree. Am I right? Or is the method something different?