First video I've found that truly goes into the details on how to actually build a Spark cluster (with the MinIO bonus) with Kubernetes, instead of showing everything already built. Thank you very much!
00:14 Setup Data Lake (minIO) on Kubernetes 03:03 Explore minIO on Kubernetes and its options 04:20 Explore minIO web UI (contains kubefwd part) 08:02 Define object (write analytics job on top of minIO) 08:34 Install local Spark 13:59 Start writing PySpark job 15:05 Install PySpark 16:00 Writing analytics job in python 20:24 Run the PySpark job 20:56 Missing Spark dependency (expected) error 21:20 Add the missing dependencies 24:00 Rerun the PySpark job 24:53 Containerize the PySpark job 25:28 Spark-operator setup 27:33 Writing Dockerfile for the spark job 34:18 Build image from Dockerfile 34:49 Push image to registry 36:10 Deploy Spark job 40:27 Check job outcome Thank you Brad!
I figured it out to achieve this in a mesos cluster and also with minIO. It was funny to see because this would have saved me a lot of time haha. Great content sir
This is a great video. Thanks for the great effort. Deserves a subscription. Expansion of this mini-datalake platform with more components such as kafka streaming, spark streaming, etc. would be awesome.
Hey John. Great question! Personally I find Jupyter Notebooks are super helpful for messing around and trying things out, but they are less intended for production code. Once I have code working in a notebook, I'll generally containerize it after the fact and then run it similar to how I showed it in the video. The other great thing about the Spark Operator is its scheduling capabilities (via the ScheduledSparkApplication CRD). So if you have a job that you want to run nightly for example, then you can very easily get that automation from the Operator.
This helped a great Brad, thanks a ton. I got an exception while running the spark-pi example because of latest updates to repo, reason for that is related spark-pi-driver is forbidden: error looking up service account default/spark: serviceaccount spark not found Resolved this with little change in helm install command helm install spark-operator spark-operator/spark-operator --set serviceAccounts.spark.name=spark-user and use the same serviceAccount=spark-user in spark-pi.yaml
I was looking for such in-depth description for so long - take a bow - many many thanks. - loved it.
First video I've found that truly goes into the details on how to actually build a Spark cluster (with the MinIO bonus) with Kubernetes, instead of showing everything already built. Thank you very much!
EXACTLY WHAT I HAVE BEEN LOOKING FOR
Underrated Tutorial and UA-camr!!!! Love this. So Practical and end-to-end demo.. pro level....I work on K3S daily and sometimes K8S..
00:14 Setup Data Lake (minIO) on Kubernetes
03:03 Explore minIO on Kubernetes and its options
04:20 Explore minIO web UI (contains kubefwd part)
08:02 Define object (write analytics job on top of minIO)
08:34 Install local Spark
13:59 Start writing PySpark job
15:05 Install PySpark
16:00 Writing analytics job in python
20:24 Run the PySpark job
20:56 Missing Spark dependency (expected) error
21:20 Add the missing dependencies
24:00 Rerun the PySpark job
24:53 Containerize the PySpark job
25:28 Spark-operator setup
27:33 Writing Dockerfile for the spark job
34:18 Build image from Dockerfile
34:49 Push image to registry
36:10 Deploy Spark job
40:27 Check job outcome
Thank you Brad!
By far the best tutorial that i have seen on this topic! Thank you!!
I figured it out to achieve this in a mesos cluster and also with minIO. It was funny to see because this would have saved me a lot of time haha. Great content sir
21:04 spark s3 dependencies
25:13 spark on k8
nice video. i am trying for publishing minio events to kafka and connecting this to spark streaming app, video helps a lot
I'm planning to run Spark on K8s cluster made of few Raspberries. This was very helpful.
Thanks, much appreciated! Good luck with the Raspberry cluster.
@@bradsheppard6650 Thanks
Nice and informative video.
thank you so much, this was very informative!!
This is a very helpful video thank you
Hi Brad this is a great video, there are few videos/articles which describe in detail how to connect from spark to minio. Keep it up
you are a gigachad for doing this
课程讲的很有趣,简单明了,主题清晰,good
This is a great video. Thanks for the great effort. Deserves a subscription.
Expansion of this mini-datalake platform with more components such as kafka streaming, spark streaming, etc. would be awesome.
Thanks Brad. This was helpful. What do you think of submitting jobs via Jupyter notebook vs spark operator?
Hey John. Great question! Personally I find Jupyter Notebooks are super helpful for messing around and trying things out, but they are less intended for production code. Once I have code working in a notebook, I'll generally containerize it after the fact and then run it similar to how I showed it in the video.
The other great thing about the Spark Operator is its scheduling capabilities (via the ScheduledSparkApplication CRD). So if you have a job that you want to run nightly for example, then you can very easily get that automation from the Operator.
Great explanation ! Thank you !
really good video, you gave me a lot of cool things to work with
That was great, man. Very informative!
thanks bro. saved my day
Awesome video!
any github url to apply commands
This helped a great Brad, thanks a ton.
I got an exception while running the spark-pi example because of latest updates to repo, reason for that is related spark-pi-driver is forbidden: error looking up service account default/spark: serviceaccount spark not found
Resolved this with little change in helm install command
helm install spark-operator spark-operator/spark-operator --set serviceAccounts.spark.name=spark-user
and use the same serviceAccount=spark-user in spark-pi.yaml
Thank you for the great explanation !