Distributed Machine Learning with Apache Spark / PySpark MLlib

Поділитися
Вставка
  • Опубліковано 17 сер 2022
  • The Kaggle housing.csv file: www.kaggle.com/datasets/camnu...
    The Colab Notebook: colab.research.google.com/dri...
    PySpark RDD Introduction: • Apache Spark / PySpark...
    PySpark SQL Introduction: • PySpark Tutorial: Spar...
    PySpark MLlib Docs: spark.apache.org/docs/latest/...
    Thank you for watching the video! You can learn Data Science FASTER at mlnow.ai/ :)
    Master Python at mlnow.ai/course-material/python/!
    Learn SQL & Relational Databases at mlnow.ai/course-material/sql/!
    Learn NumPy, Pandas, and Python for Data Science at mlnow.ai/course-material/data...!
    Become a Machine Learning Expert at mlnow.ai/course-material/ml/!
    Don't forget to subscribe if you enjoyed the video :D

КОМЕНТАРІ • 23

  • @GregHogg
    @GregHogg  10 місяців тому

    Take my courses at mlnow.ai/!

  • @nickie2222
    @nickie2222 10 місяців тому +3

    Thank you for the video on mllib, I haven't watch much yet, but looks promising.
    The machine learning stuff starts about 12:00 - the beginning is a warm up to PySpark. Chapters/timestamps would have been helpful for a 40 min video (with each chapter being a different stage in the process or function).

  • @davtg8172
    @davtg8172 Рік тому

    Thx Greg ! It's a very good tutorial from pyspark ! comprehensive with a lot of examples

  • @maximinmaster7511
    @maximinmaster7511 Рік тому

    Thank you for this tutorial on PySpark !

    • @GregHogg
      @GregHogg  Рік тому +1

      You're very welcome 🙂

  • @erint.4917
    @erint.4917 Рік тому +3

    Great tutorial, Greg - really appreciate how you distilled such a comprehensive overview into a single video. Would you consider doing a video showing how to create a complete ML pipeline -- i.e., using output from Imputer(), StringIndexer(), OneHotEncoderEstimator(), VectorAssembler(), and VectorIndexer() -- for a dataset with multiple categorical and numerical features?

  • @drjabirrahman
    @drjabirrahman Рік тому

    Good information Greg! Thanks for sharing.

    • @GregHogg
      @GregHogg  Рік тому

      Glad to hear it! You're very welcome

  • @Value_Pilgrim
    @Value_Pilgrim Рік тому

    Thanks. That was pretty comprehensive.

  • @suman14san
    @suman14san 11 місяців тому

    Fantastic tutorial.

  • @user-kv2mn8bo4z
    @user-kv2mn8bo4z 2 місяці тому

    This is really helpful.
    Thank U

    • @GregHogg
      @GregHogg  2 місяці тому

      Super glad to hear it, you're very welcome! Thanks so much for the support ❤️

  • @arsheyajain7055
    @arsheyajain7055 Рік тому

    Oh awesome thanks!

  • @GregThatcher
    @GregThatcher 24 дні тому

    Thanks!

    • @GregHogg
      @GregHogg  24 дні тому

      Greg! You're too nice hahaha

  • @ammaralhawashem560
    @ammaralhawashem560 Місяць тому

    Thank you
    I just found one thing is confusing
    which is that you did the standard scaling AFTER the merging into one column
    shouldn't have you done it for each column before the merging?

    • @GregHogg
      @GregHogg  Місяць тому +1

      I don't remember sorry. But you're probably right

    • @ammaralhawashem560
      @ammaralhawashem560 Місяць тому

      @@GregHogg Thankyou for your reply
      I have done an experiment; in order to observe, I tried two features with large difference in values
      and use 5 million rows
      It seems even if we merge all the features before applying the scaling it will still calculate the parameters (mean & STD dev) for each feature
      In summary, you did NOT make any mistake

  • @shiminglu3940
    @shiminglu3940 Рік тому +1

    I wish I had seen this when I took Econ 424(ml) at uw😂