Scikit-Learn Model Pipeline Tutorial

Поділитися
Вставка
  • Опубліковано 13 жов 2024
  • Thank you for watching the video!
    Learn Python, SQL, & Data Science for free at mlnow.ai/ :)
    Subscribe if you enjoyed the video!
    Best Courses for Analytics:
    ---------------------------------------------------------------------------------------------------------
    IBM Data Science (Python): bit.ly/3Rn00ZA
    Google Analytics (R): bit.ly/3cPikLQ
    SQL Basics: bit.ly/3Bd9nFu
    Best Courses for Programming:
    ---------------------------------------------------------------------------------------------------------
    Data Science in R: bit.ly/3RhvfFp
    Python for Everybody: bit.ly/3ARQ1Ei
    Data Structures & Algorithms: bit.ly/3CYR6wR
    Best Courses for Machine Learning:
    ---------------------------------------------------------------------------------------------------------
    Math Prerequisites: bit.ly/3ASUtTi
    Machine Learning: bit.ly/3d1QATT
    Deep Learning: bit.ly/3KPfint
    ML Ops: bit.ly/3AWRrxE
    Best Courses for Statistics:
    ---------------------------------------------------------------------------------------------------------
    Introduction to Statistics: bit.ly/3QkEgvM
    Statistics with Python: bit.ly/3BfwejF
    Statistics with R: bit.ly/3QkicBJ
    Best Courses for Big Data:
    ---------------------------------------------------------------------------------------------------------
    Google Cloud Data Engineering: bit.ly/3RjHJw6
    AWS Data Science: bit.ly/3TKnoBS
    Big Data Specialization: bit.ly/3ANqSut
    More Courses:
    ---------------------------------------------------------------------------------------------------------
    Tableau: bit.ly/3q966AN
    Excel: bit.ly/3RBxind
    Computer Vision: bit.ly/3esxVS5
    Natural Language Processing: bit.ly/3edXAgW
    IBM Dev Ops: bit.ly/3RlVKt2
    IBM Full Stack Cloud: bit.ly/3x0pOm6
    Object Oriented Programming (Java): bit.ly/3Bfjn0K
    TensorFlow Advanced Techniques: bit.ly/3BePQV2
    TensorFlow Data and Deployment: bit.ly/3BbC5Xb
    Generative Adversarial Networks / GANs (PyTorch): bit.ly/3RHQiRj

КОМЕНТАРІ • 50

  • @GregHogg
    @GregHogg  Рік тому

    Take my courses at mlnow.ai/!

  • @TheCsePower
    @TheCsePower Рік тому +3

    Thanks Greg. This made me realise how non-standard my code is.
    I learnt:
    - Use copy or deepcopy and not assignment.
    - Always perform preprocessing on the train and test separately.
    - sklearn pipelines have nothing to do with ETL pipelines from Data Engineering.
    - sklearn transfers have nothing to do with NLP Transformers.
    - sk elarn estimators have nothing to do with Statistics estimators.

    • @GregHogg
      @GregHogg  Рік тому

      Super glad you got some useful pointers!!

  • @crepantherx
    @crepantherx 3 роки тому +4

    Keep Posting Greg, I am Data Analyst by profession and your video certainly helps a lot

    • @GregHogg
      @GregHogg  3 роки тому

      That's awesome! Thank you 😄

  • @hansenmarc
    @hansenmarc 2 роки тому +6

    Great stuff! I’m curious why you used FunctionTransformer instead of ColumnTransformer, which could run the two scalers in parallel? Also, since FunctionTransformer is stateless, the documentation says that fit just checks the input rather than actually fitting the scaling parameters. Doesn’t that lead to data leakage since applying transform to test data won’t use parameters learned from fitting on the training data?

  • @kyleGrealis
    @kyleGrealis 2 місяці тому

    thanks, Greg. really good explanation and structured example. this makes it easy to create a template for easy reuse!

  • @AmitabhSuman
    @AmitabhSuman Рік тому

    A very practical video, that I came across on Pipelines. Thank you for this video!

    • @GregHogg
      @GregHogg  Рік тому

      Awesome that's great to hear. You're very welcome ☺️☺️

  • @alexrook5604
    @alexrook5604 Рік тому

    I undstand what you are doing here but I have two questions that I think would be helpful and would make it easier to follow along and replicate you steps.
    1) Where did you get the data. I can't the california_housing dataset that is already in the train/test form.
    2) Why not use scikit-learn tooling rather than doing it yourself? Like you could have used train/test split or pipelines (or column transformer... or similar stuff). That just has me confused.

  • @JJGhostHunters
    @JJGhostHunters 2 роки тому

    Great tutorial! I use the MinMaxScaler with the option to scale from -1 to 1 instead of 0 to 1 when I am dealing with values that can be positive and negative. Seems to be fine, but I may need to reconsider going forward. I have never noticed any issues though.

  • @brandonn8166
    @brandonn8166 Рік тому +3

    Just out of curiosity, is there a reason you don't use train_test_split to get X and y values?

    • @NikitaShilyaev
      @NikitaShilyaev 10 місяців тому

      yes, why he uses X_train for train_predictions instead of another dataset X_valid

  • @TheFrankyguitar
    @TheFrankyguitar 10 місяців тому

    Thanks for this amazing video! Would that work also with a statsmodels model?

    • @GregHogg
      @GregHogg  10 місяців тому +1

      Thanks so much!! And I'm not sure, haven't tried :)

  • @rahiiqbal1294
    @rahiiqbal1294 8 місяців тому

    This was very helpful, thank you :)

  • @JJGhostHunters
    @JJGhostHunters 2 роки тому

    I would love to see a tutorial that covers using pipelines with multilayer perceptron models (MLPs), CNNs and LSTMS.

  • @talyb7383
    @talyb7383 2 роки тому

    Thanks for the great tutorial! what do I need to change to create a pipeline for an image classification model? like the cifar10 model?

    • @GregHogg
      @GregHogg  2 роки тому

      Well, everything. You probably won't be using scikit for that. And you're very welcome!

    • @talyb7383
      @talyb7383 2 роки тому

      @@GregHogg I didnt explained myself clearly... I want to create a pipeline that receives a trained cifar10 model an also make preprocessing on the e data set ? so I cant use your way?

  • @ilanyutsis9653
    @ilanyutsis9653 2 місяці тому

    When you do the StandardScaler().fit on the dataframe, what is the meaning of this operation? what is happening?

  • @lythien390
    @lythien390 2 роки тому

    Thank you Greg! It's a great video!

  • @Nadia-db6nb
    @Nadia-db6nb Рік тому

    Thanks for the great tutorial. Can you make a video on how to combine multiple feature selection methods and feature extraction using python?

  • @marcofogale9719
    @marcofogale9719 7 місяців тому

    Perfect explanation. Thanks a lot

    • @GregHogg
      @GregHogg  7 місяців тому

      Very welcome 😁

  • @nabanitadasgupta
    @nabanitadasgupta 10 місяців тому

    Thank you for the video!

  • @junaidlatif2881
    @junaidlatif2881 Рік тому

    How to transform y variable and then fit model. And after how to reverse transform for the scatter plotting

  • @allanmachado2011
    @allanmachado2011 6 місяців тому

    Thank you!

  • @00SeijiHan00
    @00SeijiHan00 10 місяців тому

    TYSM bro really appreciate this

    • @GregHogg
      @GregHogg  10 місяців тому +1

      Very welcome!!

  • @adriandiaz5688
    @adriandiaz5688 Рік тому

    Great Video!

  • @Supernyv
    @Supernyv 11 місяців тому

    Awesome !

  • @krzysztofzaucha3592
    @krzysztofzaucha3592 6 місяців тому

    nice video Greg

    • @GregHogg
      @GregHogg  6 місяців тому +1

      Thanks so much!!

  • @fabio336ful
    @fabio336ful 2 роки тому

    Did you say pipelines doesn't function for classifications problems? Min: 1:07

    • @GregHogg
      @GregHogg  2 роки тому +1

      Does, not doesn't

    • @fabio336ful
      @fabio336ful 2 роки тому

      @@GregHogg thanks 🙏🏼

  • @juampaaa90
    @juampaaa90 Рік тому

    awesome ty

  • @tareq8109
    @tareq8109 3 роки тому

    Bro can you show how to make youtube and any video downloader make by python

  • @m18293
    @m18293 Рік тому

    Can you share this notebook?

    • @GregHogg
      @GregHogg  Рік тому

      dang i think i lost it, sorry

  • @MrAhsan99
    @MrAhsan99 2 роки тому

    you are ❤

  • @AceOnBase1
    @AceOnBase1 9 місяців тому

    Bro you literally just copied this out of a textbook lmao but I respect the grind.

  • @johnspivack
    @johnspivack 10 місяців тому +1

    Too confusing. Too many tangents, doesn't cover the main idea clearly. Downvoted.

    • @GregHogg
      @GregHogg  10 місяців тому +4

      Well I upvoted it to counter you

    • @n8trh
      @n8trh 11 днів тому

      What tangents? This video was not only to the point from the start, but it also went into depth with useful examples. If you thought those were tangents, I recommend watching again, maybe with more care this time.