How to run Spark with Minio in Kubernetes

Поділитися
Вставка
  • Опубліковано 26 січ 2025

КОМЕНТАРІ • 29

  • @anmfaisal964
    @anmfaisal964 Рік тому +1

    I was looking for such in-depth description for so long - take a bow - many many thanks. - loved it.

  • @citormussa
    @citormussa 3 роки тому +5

    First video I've found that truly goes into the details on how to actually build a Spark cluster (with the MinIO bonus) with Kubernetes, instead of showing everything already built. Thank you very much!

  • @apollon456
    @apollon456 5 місяців тому

    EXACTLY WHAT I HAVE BEEN LOOKING FOR

  • @DevelopersHubChannel
    @DevelopersHubChannel 2 роки тому

    Underrated Tutorial and UA-camr!!!! Love this. So Practical and end-to-end demo.. pro level....I work on K3S daily and sometimes K8S..

  • @th9679
    @th9679 2 роки тому

    00:14 Setup Data Lake (minIO) on Kubernetes
    03:03 Explore minIO on Kubernetes and its options
    04:20 Explore minIO web UI (contains kubefwd part)
    08:02 Define object (write analytics job on top of minIO)
    08:34 Install local Spark
    13:59 Start writing PySpark job
    15:05 Install PySpark
    16:00 Writing analytics job in python
    20:24 Run the PySpark job
    20:56 Missing Spark dependency (expected) error
    21:20 Add the missing dependencies
    24:00 Rerun the PySpark job
    24:53 Containerize the PySpark job
    25:28 Spark-operator setup
    27:33 Writing Dockerfile for the spark job
    34:18 Build image from Dockerfile
    34:49 Push image to registry
    36:10 Deploy Spark job
    40:27 Check job outcome
    Thank you Brad!

  • @321andyy
    @321andyy 11 місяців тому

    By far the best tutorial that i have seen on this topic! Thank you!!

  • @GamerPCForever
    @GamerPCForever 3 роки тому

    I figured it out to achieve this in a mesos cluster and also with minIO. It was funny to see because this would have saved me a lot of time haha. Great content sir

  • @rahul-qo3fi
    @rahul-qo3fi 2 роки тому

    21:04 spark s3 dependencies
    25:13 spark on k8

  • @nikschuetz4112
    @nikschuetz4112 7 місяців тому

    nice video. i am trying for publishing minio events to kafka and connecting this to spark streaming app, video helps a lot

  • @marekkucak6581
    @marekkucak6581 4 роки тому +2

    I'm planning to run Spark on K8s cluster made of few Raspberries. This was very helpful.

    • @bradsheppard6650
      @bradsheppard6650  4 роки тому +2

      Thanks, much appreciated! Good luck with the Raspberry cluster.

    • @marekkucak6581
      @marekkucak6581 4 роки тому

      @@bradsheppard6650 Thanks

  • @anshuman9
    @anshuman9 6 місяців тому

    Nice and informative video.

  • @rahul-qo3fi
    @rahul-qo3fi 2 роки тому

    thank you so much, this was very informative!!

  • @특이점이온다-l6d
    @특이점이온다-l6d Рік тому

    This is a very helpful video thank you

  • @jijosunny8626
    @jijosunny8626 4 роки тому

    Hi Brad this is a great video, there are few videos/articles which describe in detail how to connect from spark to minio. Keep it up

  • @hi-kp7jg
    @hi-kp7jg Рік тому

    you are a gigachad for doing this

  • @优雅世界
    @优雅世界 2 роки тому

    课程讲的很有趣,简单明了,主题清晰,good

  • @ylcnky9406
    @ylcnky9406 4 роки тому

    This is a great video. Thanks for the great effort. Deserves a subscription.
    Expansion of this mini-datalake platform with more components such as kafka streaming, spark streaming, etc. would be awesome.

  • @johnmason9788
    @johnmason9788 3 роки тому +1

    Thanks Brad. This was helpful. What do you think of submitting jobs via Jupyter notebook vs spark operator?

    • @bradsheppard6650
      @bradsheppard6650  3 роки тому +1

      Hey John. Great question! Personally I find Jupyter Notebooks are super helpful for messing around and trying things out, but they are less intended for production code. Once I have code working in a notebook, I'll generally containerize it after the fact and then run it similar to how I showed it in the video.
      The other great thing about the Spark Operator is its scheduling capabilities (via the ScheduledSparkApplication CRD). So if you have a job that you want to run nightly for example, then you can very easily get that automation from the Operator.

  • @tratkotratkov126
    @tratkotratkov126 2 роки тому

    Great explanation ! Thank you !

  • @sahmed0211
    @sahmed0211 3 роки тому

    really good video, you gave me a lot of cool things to work with

  • @fmfvieira
    @fmfvieira 3 роки тому

    That was great, man. Very informative!

  • @kimted3272
    @kimted3272 3 роки тому

    thanks bro. saved my day

  • @afshinyavari7422
    @afshinyavari7422 3 роки тому

    Awesome video!

  • @iamindigamer
    @iamindigamer 4 роки тому

    any github url to apply commands

  • @amarvakacharla
    @amarvakacharla 2 роки тому

    This helped a great Brad, thanks a ton.
    I got an exception while running the spark-pi example because of latest updates to repo, reason for that is related spark-pi-driver is forbidden: error looking up service account default/spark: serviceaccount spark not found
    Resolved this with little change in helm install command
    helm install spark-operator spark-operator/spark-operator --set serviceAccounts.spark.name=spark-user
    and use the same serviceAccount=spark-user in spark-pi.yaml

  • @tratkotratkov126
    @tratkotratkov126 2 роки тому

    Thank you for the great explanation !