4.6. Train Test Split | Splitting the dataset to Training and Testing data | Machine Learning Course

Поділитися
Вставка
  • Опубліковано 10 січ 2025

КОМЕНТАРІ • 44

  • @realAB459
    @realAB459 3 роки тому +13

    One of the best, underrated and most helpful channel I have ever found on UA-cam

  • @ishtiaqmarwat
    @ishtiaqmarwat 7 місяців тому +1

    I was search the best content to learn Machine Learning and Data Science. This is the best resource I have found online.. Thank you brother.

  • @rokhamkhawaja5323
    @rokhamkhawaja5323 Рік тому +1

    Please Don't Stop your self. you are best

  • @Zarakkahnn
    @Zarakkahnn 2 роки тому +2

    Thanks. I love your elaborations, i get them very easily

  • @palesatjale
    @palesatjale 2 роки тому +1

    Thank you so much for this.

  • @sezermezgil9304
    @sezermezgil9304 3 роки тому

    Great explanation i will keep going this course !

  • @tendulkartejesh3318
    @tendulkartejesh3318 5 місяців тому +1

    What is the use of Random State parameter bro? You are telling that, if you give same value, it will be splitted in the same way as yours. But my question is how does it split and what does it do?

  • @Unstoppable_chAmpions
    @Unstoppable_chAmpions 9 місяців тому

    great video well explained thanks

  • @sonamraj5323
    @sonamraj5323 6 місяців тому

    Amazing 😍

  • @thahoorzain1367
    @thahoorzain1367 Рік тому +2

    Should we apply data standardization before train test split or after train test split ??

    • @maverick5056
      @maverick5056 Рік тому +1

      After split, ik I am late in case you haven't figured it out yet.

    • @rounakmukherjee5642
      @rounakmukherjee5642 7 місяців тому

      absolutely before splitting, after splitting means you need to standardize the train and test data seperately.

  • @vinaynaik953
    @vinaynaik953 3 роки тому +1

    Thanks for your effort

  • @nehapasi9393
    @nehapasi9393 2 роки тому

    Thank you😊

  • @digilearncommunity5598
    @digilearncommunity5598 3 роки тому +3

    Could you please tell me about inplace=true and inplace=false...when do we use this???

    • @Siddhardhan
      @Siddhardhan  3 роки тому +2

      when you give inplace=True, the change will be saved in your original dataframe. if you don't mention it, the change won't be saved

    • @digilearncommunity5598
      @digilearncommunity5598 3 роки тому

      @@Siddhardhan thanks..got it!

  • @NgrinPersianVlogs
    @NgrinPersianVlogs Місяць тому

    Thanks for Video. However the standardization is done wrongly before splitting data.

  • @satviksingh6229
    @satviksingh6229 3 роки тому +1

    can you tell me some mandatory and important steps to follow from getting a dataset to making a final model, which can be used on each and every project?

    • @Siddhardhan
      @Siddhardhan  3 роки тому +8

      1. Data collection
      2. Data cleaning & processing
      3. Exploratory data analysis
      4. Model selection
      5. Model Training
      6. Tuning the parameters & optimization of the model
      7. Model evaluation
      8. Deployment

  • @sachinvithubone4278
    @sachinvithubone4278 3 роки тому +1

    1. what is difference in data pre processing and data analysis.?
    2. How to deploy model?

    • @Siddhardhan
      @Siddhardhan  3 роки тому +2

      In data pre processing, we have to make the data suitable to feed to a machine learning model. For example, standardizing the data, splitting the data, converting text data to numerical data, etc. (You can refer the videos in the 4th Module). Data Analysis is about understanding the data. To see which features are related to each other, to find correlation, etc. That's the difference.

    • @Siddhardhan
      @Siddhardhan  3 роки тому +1

      For deployment, you can use tools like Flask. Deployment won't be covered in this course. We can do it later.

    • @sachinvithubone4278
      @sachinvithubone4278 3 роки тому +1

      @@Siddhardhan Thank you.. I got it the difference between data pre processing and data analysis..

    • @Siddhardhan
      @Siddhardhan  3 роки тому +1

      You're welcome 😇

    • @universal4334
      @universal4334 3 роки тому

      Does exploratory data analysis will be covered here ?

  • @sezermezgil9304
    @sezermezgil9304 3 роки тому +2

    Hey i watched your data standardization video and i guess in that video we splitted the data before we standardized it but here firstly we standardized that. Is that a problem or not exactly ? Thank you.

    • @farahamirah2091
      @farahamirah2091 2 роки тому +1

      I watched it too.
      he said that
      "we can also standardize the data before splitting, but if our data has some outliers then that would be a problem
      , because outliers are abnormal, its better to split the data first then standardize." maybe there is no outlier in this diabetic data?

    • @tusharmawa2542
      @tusharmawa2542 2 роки тому +1

      @@farahamirah2091 @Sezer Mezgil
      He said that in Placement_Dataset and that is regression based hence outliers are possible there. In this video he is talking about diabetes dataset which is a classification one in which outliers hold no meaning as there are only two possible values 1 or 0 while in the previous case many different salaries are possible.

  • @Tony-vo9ok
    @Tony-vo9ok 5 місяців тому

    sick👍

  • @Diamond_Hanz
    @Diamond_Hanz 2 роки тому +1

    Okay!

  • @bikashthapa8622
    @bikashthapa8622 Рік тому

    Thanky you

  • @gesrauw
    @gesrauw 3 роки тому

    Content is good stuff. But would be more pleasant to listen if talking would be bit slower (0.75x) in the future videos. Other than that big thanks!

    • @Siddhardhan
      @Siddhardhan  3 роки тому

      Thanks for the tip😇 will work on that

    • @farahamirah2091
      @farahamirah2091 2 роки тому

      hi, i think the pace is good, maybe you should go to the setting and click playback speed 0.75

  • @ironman4775
    @ironman4775 3 роки тому +1

    Hi ,
    Data Analysis, Train test split, data pre processing is better or
    Data Preprocessing, Analysis, Train test split
    which order is best?.
    My doubt is if we do preprocessing and analysis it may lead to bias as we impute/drop some data.
    and if we do splitting at the end it may cause feature leakage as imputed values or transformations are based on whole data
    could you please clarify?

    • @aminesahraoui5422
      @aminesahraoui5422 3 роки тому

      I think we do Data Preprocessing, Analysis, Train test split
      You can read the comments above to understand the concepts
      (And we all are waiting for Mr Siddhardhan's answer to your question)

    • @Siddhardhan
      @Siddhardhan  3 роки тому +4

      hi! it depends on the dataset and the approach we take. I often prefer doing analysis, data Pre-Processing & train test split. data leakage won't happen all the time. it happens only when we have outliers. if we find that the model is overfitting, then it maybe a sign that data leakage occurs. in that case we can do analysis, train test split & data Pre-Processing. this is the method I follow.

  • @soothingrelaxation8725
    @soothingrelaxation8725 3 роки тому

    How can you visualize your splitted data in folders by using split function please?

  • @anisharaj9854
    @anisharaj9854 18 днів тому

    9:14

  • @bosszz1282
    @bosszz1282 3 роки тому

    Can you tell us how 'randome_state' worked?

    • @restphilanthropy
      @restphilanthropy Рік тому

      Its is a pseudo random number that allows you to reproduce the same train_test_split each time you run it.