Rachit Toshniwal
Rachit Toshniwal
  • 77
  • 431 047
Removing constant & Quasi constant features using Variance Threshold | Machine Learning
#variancethreshold #constant #quasiconstant
In this video, we will look at how we can effortlessly remove constant and quasi constant features from our datasets and make them leaner and more robust, using scikit-learn's VarianceThreshold implementation.
I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:
Link:
github.com/rachittoshniwal/machineLearning
If you like my content, please do not forget to upvote this video and subscribe to my channel.
If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.
Thank you!
Переглядів: 3 184

Відео

Principal Component Analysis (PCA) Intuition | Machine Learning
Переглядів 9413 роки тому
#pca #machinelearning #intuition In this video, we'll look at WHAT and the HOW of PCA. Thanks!
Machine Learning Project | Credit Risk Analysis | Learning Curves | Overfitting | Python
Переглядів 22 тис.3 роки тому
#machinelearning #python #project In this video we will look at a Machine Learning project that will try to predict whether someone will get their loan sanctioned or not. We will use a Randomized Search too find optimal set of parameters. We will then use precision recall curves and learning curves to assess the model performance. We will rectify the case of overfitting in the model and make am...
Machine Learning Project | Predicting Student Marks | Python
Переглядів 8 тис.3 роки тому
#ml #project #python In this video, we will make a quick and dirty ML model to predict the marks of a student. We will do some basic EDA, then use Column Transformers and Pipelines to make the model, use the GridSearchCV to find the best performing model and then save it using joblib. The link to the data and the notebook can be found here: github.com/rachittoshniwal/ML-projects/ Hope it helps!
How to tune hyper parameters using Grid Search CV | With and without a Pipeline | Machine Learning
Переглядів 4,9 тис.3 роки тому
#grisdearch #machine #learning #python In this tutorial, we'll look at Grid Search CV, a technique by which we can find the optimal set of hyper-parameters and fine tune our ML model to make a better model. Table of contents: 0:00 Intro 1:08 Randomized Search CV 1:52 Python code for Grid Search CV 3:20 Without a pipeline 8:21 With a pipeline I've uploaded all the relevant code and datasets used...
Churn Modeling Tableau Project for beginners
Переглядів 21 тис.3 роки тому
#tableau #project #beginners In this video, we'll build a simple Tableau dashboard for analyzing customer churn at a bank. We'll use filters, parameters, histograms, dashboard actions, and some formatting to prettify our viz. Table of contents : 0:01 Final dashboard 2:51 Making the sheets 16:08 Making the dashboard 22:50 Dashboard actions You can access the workbooks from my tableau public prof...
How to create and use groups to clean data and create higher level dimensions | Tableau Series
Переглядів 1133 роки тому
#tableau #groups In this video, we'll look at how to create and use groups in Tableau in two different scenarios: 0:01 Using groups for handling messy data 4:23 Using groups for creating higher level "positions" dimension You can access the workbooks from my tableau public profile: public.tableau.com/profile/rachit.toshniwal#!/ Link for all the relevant materials used in these videos: github.co...
How to split different records having same values in a column in Tableau | Tableau Series
Переглядів 5673 роки тому
#tableau In this video, we'll look at how to systematically organize fields into folders in Tableau. You can access the workbook from my tableau public profile: public.tableau.com/profile/rachit.toshniwal#!/ Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank y...
How to systematically organize fields into folders | Tableau Series
Переглядів 943 роки тому
#tableau #folders #organize In this video, we'll look at how to systematically organize fields into folders in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
How to create hierarchies in Tableau | Tableau Series
Переглядів 1473 роки тому
#tableau #hierarchies In this video, we'll look at how to create hierarchies in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
Using the replace function in tableau to clean messy columns | Tableau Series
Переглядів 4,4 тис.3 роки тому
#tableau #replace #function In this video, we'll look at how to use the in-built "replace" function in tableau to clean messy data. 0:01 Replace function in a calculated field 3:33 Exercises for solidifying concepts 4:17 Solutions for the exercises 7:08 Hiding the unwanted columns Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video...
Using the split function in a calculated field to clean messy data | Tableau Series
Переглядів 4623 роки тому
#tableau #split #calculated #field In this video, we'll look at how to use the in-built split function in Tableau to clean messy data. 0:17 Problems with using auto and custom split for non-uniform columns 1:55 Using split in a calculated field Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my c...
Using auto and custom split in Tableau to split columns | Tableau Series
Переглядів 3903 роки тому
#tableau #split #custom #auto In this video, we'll look at how to use the auto and custom split methods in Tableau to split string columns into n-number of different columns. 0:18 What is splitting and how does it happen? 1:41 Custom split 4:20 Auto split 6:49 Where auto split fails Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this vid...
Editing the metadata in Tableau | Tableau Series
Переглядів 4393 роки тому
#tableau #editing #metadata In this video, we'll look at how to edit the metadata in tableau, to make the data ready for analysis and drawing inferences from. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
Data types in Tableau | Numeric, String, Geographic, Boolean, Date, Date & Time | Tableau Series
Переглядів 3443 роки тому
#tableau #data #types In this video, we'll look at the different data types in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
When to add a new connection vs a Data source in Tableau | Tableau Series
Переглядів 1,8 тис.3 роки тому
When to add a new connection vs a Data source in Tableau | Tableau Series
Unions in Tableau | How to use a wildcard for unions in Tableau | Tableau Series
Переглядів 8063 роки тому
Unions in Tableau | How to use a wildcard for unions in Tableau | Tableau Series
Blending multiple distinct data sources in Tableau | Tableau Series
Переглядів 2843 роки тому
Blending multiple distinct data sources in Tableau | Tableau Series
Joins in Tableau | Inner, Outer, Left and Right Joins | Physical and Logical Layer | Tableau Series
Переглядів 1,5 тис.3 роки тому
Joins in Tableau | Inner, Outer, Left and Right Joins | Physical and Logical Layer | Tableau Series
Relationships - The new Tableau data model | Understanding the Performance Options | Tableau Series
Переглядів 5953 роки тому
Relationships - The new Tableau data model | Understanding the Performance Options | Tableau Series
Connecting to a data source in Tableau | Different types of connections | Tableau Series
Переглядів 2473 роки тому
Connecting to a data source in Tableau | Different types of connections | Tableau Series
Downloading Tableau | Tableau Desktop or Tableau Public? Advantages and limitations | Tableau Series
Переглядів 4343 роки тому
Downloading Tableau | Tableau Desktop or Tableau Public? Advantages and limitations | Tableau Series
(Code) Iterative Imputer | MICE Imputer in Python | Machine Learning
Переглядів 14 тис.3 роки тому
(Code) Iterative Imputer | MICE Imputer in Python | Machine Learning
(Code) What is Winsorization | Using percentiles for capping outliers in Python | Machine Learning
Переглядів 6 тис.4 роки тому
(Code) What is Winsorization | Using percentiles for capping outliers in Python | Machine Learning
(Code) Trimming outliers using the IQR method | Machine Learning
Переглядів 2,2 тис.4 роки тому
(Code) Trimming outliers using the IQR method | Machine Learning
(Code) Capping outliers using the IQR method | Machine Learning
Переглядів 6 тис.4 роки тому
(Code) Capping outliers using the IQR method | Machine Learning
Using IQR for handling outliers | Calculating Percentiles | Inner & Outer Fences | Machine Learning
Переглядів 6034 роки тому
Using IQR for handling outliers | Calculating Percentiles | Inner & Outer Fences | Machine Learning
(Code) Trimming outliers using the Z-score method | Machine Learning
Переглядів 1,1 тис.4 роки тому
(Code) Trimming outliers using the Z-score method | Machine Learning
(Code) Capping outliers using the Z-score method | Machine Learning
Переглядів 1,8 тис.4 роки тому
(Code) Capping outliers using the Z-score method | Machine Learning
When to and when NOT to use Z-scores for handling outliers | Machine Learning
Переглядів 7664 роки тому
When to and when NOT to use Z-scores for handling outliers | Machine Learning

КОМЕНТАРІ

  • @leemeiwah
    @leemeiwah Місяць тому

    Thank you for the very clear explanation 🙂

  • @RobertoChavezQuintero
    @RobertoChavezQuintero Місяць тому

    Very well explained!!

  • @akshatjain1746
    @akshatjain1746 Місяць тому

    short simple informative!

  • @ОксанаСуряк-я7в
    @ОксанаСуряк-я7в 2 місяці тому

    Thank you so much Rachit, video is really awesome

  • @fightsatan2408
    @fightsatan2408 2 місяці тому

  • @venkyvenky4715
    @venkyvenky4715 2 місяці тому

    but you can do getdummies before traintestsplit

  • @MRahdianEgakurnia
    @MRahdianEgakurnia 2 місяці тому

    i have an already scaled data with powertransformer, can you really scaled a new data outside the scaled data with scaled data as standard using fit.? because ive tried this and the data seems dont mach the scaled data. Thank you

  • @DataAnalystVictoria
    @DataAnalystVictoria 3 місяці тому

    Thanks! ❤

  • @umutg.8383
    @umutg.8383 3 місяці тому

    MICE part is good but the missingness definitions are all wrong.

  • @rishi1901
    @rishi1901 3 місяці тому

    excellent demonstration !I really appreciate your efforts. Very helpful for me as a beginner

  • @wtfashokjr
    @wtfashokjr 3 місяці тому

    why pd.get_dummies not working for me ?

  • @nishantwhig7206
    @nishantwhig7206 4 місяці тому

    Very clearly explained.Thank you.

  • @DhirajSahu-ct1jp
    @DhirajSahu-ct1jp 4 місяці тому

    Thank you so much!!

  • @r.s.572
    @r.s.572 4 місяці тому

    thank you for explaining this! :) poor PhDs are thankful for people like you who use their free time to do such videos!

  • @KA00_7
    @KA00_7 4 місяці тому

    in-depth and best explanation video

  • @KA00_7
    @KA00_7 4 місяці тому

    learned something new today. Thank you so much

  • @rishidixit7939
    @rishidixit7939 4 місяці тому

    If I want to use Simple Imputer on two different columns but with different strategies on each column then what should I do ?

  • @preethirathod6751
    @preethirathod6751 4 місяці тому

    You have explained so clearly

  • @deeptimittal4552
    @deeptimittal4552 5 місяців тому

    Wow now I completely understand pivoting. I was struggling to get the concepts, now its all clear. Thank you Rachit.

  • @longtuan1615
    @longtuan1615 6 місяців тому

    That's the best video I've seen! Thank you so much. But in this video, the "purchased" column is ignored because this is fully observed. So what happens if missing values are only present in the "age" column, I mean the "experience", "salary" and "purchased" are fully observed and for the same reason, we will ignore them so we only have the "age" column that can not use the regression? Please help me!

  • @cadeepakgoyal7500
    @cadeepakgoyal7500 7 місяців тому

    thanks a lot. really helpful

  • @jahnavinama8534
    @jahnavinama8534 7 місяців тому

    well explanation bro..i am wathching 5 6 vedios about split method but the only one vedio is helpful for me and that's is yours

  • @DrizzyJ77
    @DrizzyJ77 7 місяців тому

    Thanks Needed a clear explanation for my missed class😅

  • @dinushachathuranga7657
    @dinushachathuranga7657 7 місяців тому

    Bunch of thanks for the clear explanation❤

  • @philcrom6299
    @philcrom6299 8 місяців тому

    Wow, that really helped me in evaluating my master thesis!!!

  • @MrTau123
    @MrTau123 8 місяців тому

    Eknumber.

  • @exanessa1234
    @exanessa1234 9 місяців тому

    How imbalanced before AUPRC is preferred.

  • @focus72343
    @focus72343 9 місяців тому

    very simple explanation, thank you and subscribed!

  • @ItzLaltoo
    @ItzLaltoo 10 місяців тому

    Hey, the video was very helpful.. Can anyone explain me while implementing MICE in RStudio we get two columns Iteration & Imputation, how can we connect that with this video. Like in RStudio for each iteration we get 5 imputed dataset (by default). But from this video, we only get one dataset for a iteration.. It would be really helpful if anyone can explain me this. Thanks in advance

  • @cuoivelo8360
    @cuoivelo8360 11 місяців тому

    Can you turn on subtitle for the videos? Im bad at English listening

  • @anonymeironikerin2839
    @anonymeironikerin2839 11 місяців тому

    Thank your very much for this great explanation

  • @shoaibahmed5848
    @shoaibahmed5848 11 місяців тому

    What about 1 row missing value and 4th row missing value is those values to be filled necessary?

  • @ShubhamKumar-xy6kj
    @ShubhamKumar-xy6kj 11 місяців тому

    Great video bro...

  • @roshantonge1952
    @roshantonge1952 Рік тому

    very good video

  • @modhua4497
    @modhua4497 Рік тому

    Thanks, do you have example on how to incorporate LOG or SQRT transformation of features before modeling?

  • @zk321
    @zk321 Рік тому

    The word “namaste" in Sanskrit means “bowing to you". Muslims believe that one can bow/prostrate only to Allah. We don't bow down to any human. It's important to note that religious beliefs and practices can vary among individuals and communities.

  • @ubaidghante8604
    @ubaidghante8604 Рік тому

    Brother found some specific examples to explain MAR and MNAR 😅

  • @martinngobye3574
    @martinngobye3574 Рік тому

    Great explanation regarding column-transformer and pipeline, however how do you have the data frame column names back instead of numbers? Thank you!!

  • @ishikaagarwal6945
    @ishikaagarwal6945 Рік тому

    Nicely explained

  • @osoriomatucurane9511
    @osoriomatucurane9511 Рік тому

    Namaste, Awesome, Sir. I must addmit the best tutorial by far in pandas groupby I have ever accross. Keep it up

  • @bellatrixlestrange9057
    @bellatrixlestrange9057 Рік тому

    best explanation!!!

  • @mathewfernand8460
    @mathewfernand8460 Рік тому

    Sir how can i get the dataset

  • @meenatyagi9740
    @meenatyagi9740 Рік тому

    Very good explaination.I was struggling to get clearity on it .

  • @skyrayzor3693
    @skyrayzor3693 Рік тому

    This tutorial is awesome!!

  • @subtlehyperbole4362
    @subtlehyperbole4362 Рік тому

    (note: this is not an issue specific to your video, but something i have been getting confused by for a long time, this is just the first time I decided to stop and try ask about it in the comment section) It seems like it should be necessary (or maybe if not necessary, at least would be useful) to tell the model which column each imputed indicator is indicating for, right? But in the final dataset that you produce the imputed data indicators are all bunched up as the first four columns. How does the model know A) these features are imputed data indicators and, more importantly, B) which of the remaining 93 columns in the dataset each one is supposed to be for? They could be indicating for columns 5, 6, 7, and 8, or they could be indicating for columns 45, 72, 8, and 92, or any other combination of the remaining 93 feature columns. How does this not affect how the model trains? My brain is thinking that possibly the algorithm somehow susses that out on its own... but i don't understand how or why it can do that. Am i making much ado about nothing here?

    • @rachittoshniwal
      @rachittoshniwal Рік тому

      At the very core of things, the computer only understands 0s and 1s haha. For human interpretability - yes, it might be necessary to label the columns to see what a specific missing indicator column is for, but for a computer, it doesn't matter. The column headers are just for us, as the model only cares about the data being in a 2D array format. For: A) the model doesn't know that a particular column is indicating missing values in some other column. It only cares about the values in it. B) Again, to reiterate, the model doesn't care what each of the other 93 odd columns stand for. It is only looking at their values You can shuffle the column ordering, pass in X.values instead of a dataframe X to the model. It will not affect performance

    • @subtlehyperbole4362
      @subtlehyperbole4362 Рік тому

      @@rachittoshniwal Yeah I understand that but the 93 columns are all features in and of themselves, the 4 imputation indicators aren't really features of data in the same underlying way, right? They are more like features about other features, not features about the event whose label it is trying to train on. It seems like the imputation indicator data point's entire utility is essentially to point at a single data point in another column and say "don't take this data point too seriously because it was made up" -- wouldn't the entire weighting system need to treat that types of columns be different? It feels like it would be problematic (or at least, not useful) to treat those columns as if they were just additional features columns that could be treated like any other of the existing 93 features. (I mean, I guess it depends on the particular algorithm, like i imagine decision tree based algos would probably be able to handle that kinda thing, but others it feels like wouldn't be well served to treat those columns like they were just any other feature columns, no different than any of the others for its own starting purposes)

    • @rachittoshniwal
      @rachittoshniwal Рік тому

      @@subtlehyperbole4362 although the missing indicator column is based off some column X, it is essentially a brand new column holding the information that "there is a column X which has missing values for these rows" so the model will test whether NOT having values in that column X is indicative of something or not.

  • @imranyounas4478
    @imranyounas4478 Рік тому

    i did not understand if s = df['population'] then how remove outlier from all dataset instead of population column.

  • @prashu25925
    @prashu25925 Рік тому

    Do we apply scaling techniques on categorical columns after encoding? Plz help

    • @rachittoshniwal
      @rachittoshniwal Рік тому

      even if you apply scaling after encoding, the 0s and 1s will be converted to some new numbers, but all 0s will be the same number x and all 1s will be the same number y. So it is again essentially encoded, just that instead of 0s and 1s you have x and y

  • @amitblizer4567
    @amitblizer4567 Рік тому

    Very well explained video. Thank you!

  • @kylehankins5988
    @kylehankins5988 Рік тому

    I have also seen univariate imputation refer to a situation were you are only trying to impute one column instead of multiple columns that might more than one missing value

  • @olatheog
    @olatheog Рік тому

    Great video, Rachit. Thank you. I also heard OneHotEncoding is not good for large categorical data in real world projects. Please which method do you advise or is there a video of you doing it that we can watch? Thank you so much