Distributed Machine Learning with Apache Spark / PySpark MLlib
Вставка
- Опубліковано 17 сер 2022
- The Kaggle housing.csv file: www.kaggle.com/datasets/camnu...
The Colab Notebook: colab.research.google.com/dri...
PySpark RDD Introduction: • Apache Spark / PySpark...
PySpark SQL Introduction: • PySpark Tutorial: Spar...
PySpark MLlib Docs: spark.apache.org/docs/latest/...
Thank you for watching the video! You can learn Data Science FASTER at mlnow.ai/ :)
Master Python at mlnow.ai/course-material/python/!
Learn SQL & Relational Databases at mlnow.ai/course-material/sql/!
Learn NumPy, Pandas, and Python for Data Science at mlnow.ai/course-material/data...!
Become a Machine Learning Expert at mlnow.ai/course-material/ml/!
Don't forget to subscribe if you enjoyed the video :D
Take my courses at mlnow.ai/!
Thank you for the video on mllib, I haven't watch much yet, but looks promising.
The machine learning stuff starts about 12:00 - the beginning is a warm up to PySpark. Chapters/timestamps would have been helpful for a 40 min video (with each chapter being a different stage in the process or function).
Thx Greg ! It's a very good tutorial from pyspark ! comprehensive with a lot of examples
Glad to hear it :)
Thank you for this tutorial on PySpark !
You're very welcome 🙂
Great tutorial, Greg - really appreciate how you distilled such a comprehensive overview into a single video. Would you consider doing a video showing how to create a complete ML pipeline -- i.e., using output from Imputer(), StringIndexer(), OneHotEncoderEstimator(), VectorAssembler(), and VectorIndexer() -- for a dataset with multiple categorical and numerical features?
Good information Greg! Thanks for sharing.
Glad to hear it! You're very welcome
Thanks. That was pretty comprehensive.
Glad to hear it!
Fantastic tutorial.
This is really helpful.
Thank U
Super glad to hear it, you're very welcome! Thanks so much for the support ❤️
Oh awesome thanks!
No prob 😊
Thanks!
Greg! You're too nice hahaha
Thank you
I just found one thing is confusing
which is that you did the standard scaling AFTER the merging into one column
shouldn't have you done it for each column before the merging?
I don't remember sorry. But you're probably right
@@GregHogg Thankyou for your reply
I have done an experiment; in order to observe, I tried two features with large difference in values
and use 5 million rows
It seems even if we merge all the features before applying the scaling it will still calculate the parameters (mean & STD dev) for each feature
In summary, you did NOT make any mistake
I wish I had seen this when I took Econ 424(ml) at uw😂
los well well i think u didint search enough