Intro To Databricks - What Is Databricks

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 94

  • @SeattleDataGuy
    @SeattleDataGuy  2 роки тому +6

    Also, if you'd like to dive deeper into data strategy and infrastructure and you'd like to support me, you can consider becoming a paid member of my Substack. I have over 100 articles that cover everything from data engineering 101 to leading data teams. Sign up with the link below and get 30% off.
    seattledataguy.substack.com/148e9023

  • @BuyNLarge_
    @BuyNLarge_ 8 місяців тому +9

    🎯 Key Takeaways for quick navigation:
    01:36 *🚀 Databricks offers managed Spark services along with other tools like Delta Lake and MLflow, providing options for data processing and model deployment.*
    03:56 *🏠 Databricks and Snowflake both promote the concept of data lake houses, combining data warehouse and data lake functionalities, but with different emphases on use cases.*
    05:37 *🛠️ Key components of Databricks include workspaces, notebooks, tables, clusters, jobs, and libraries, providing an integrated environment for data processing and analysis.*
    09:33 *📊 Databricks simplifies the transition from notebooks to production by allowing users to create jobs directly from their notebooks, enabling seamless integration and scheduling.*
    11:13 *🌟 Databricks facilitates easier productionization of data science workflows compared to alternatives like Snowflake, with integrated features like job creation and version control.*
    Made with HARPA AI

  • @anishninan8374
    @anishninan8374 2 роки тому +34

    The best part about databricks is that is unifies batch and streaming workloads.
    Also provides a single source for structured, semi structured and unstructured data.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +3

      Yes those are all a lot of reasons why I do like databricks. Especially their streaming functionality.

  • @gardnmi
    @gardnmi 2 роки тому +12

    I've used both. Snowflake is toast. Serverless Clusters will take away the pain of managing clusters and they are making really fast improvement on delta lake which will reduce a lot of the common pains of filter, joins, and updates in spark.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +6

      Oh boy! I want the competition though. The users win in that world

    • @Practicalinvestments
      @Practicalinvestments Рік тому

      @@SeattleDataGuy this is all foreign language but I am an investor who’s been very closely watching this company and it sounds top notch from what you say

    • @jaserogers997
      @jaserogers997 11 місяців тому +2

      This didn't age well.

  • @kerimsever6674
    @kerimsever6674 2 роки тому +5

    Transitioning into using Databricks and this is a great introduction!

  • @dahof2789
    @dahof2789 9 місяців тому +68

    Wanted to watch but can't filter that goofy background noise.... Everyone does it and it adds negative value to the listener.

    • @Skandawin78
      @Skandawin78 6 місяців тому +5

      after reading this i can't listen to his voice

    • @waffles418
      @waffles418 8 днів тому

      thanks for posting this ... by the end of the video I was going nuts ... the more I need to focus on what's being said, the less tolerance I have for background noise

  • @hughesadam87
    @hughesadam87 11 місяців тому +1

    Super helpful - have been using databricks in another system without ever really understanding how much of that other system was simply databricks.

  • @devencareer
    @devencareer Рік тому +2

    Your approach in this particular video, is simple and precise, not deep, just right enough. Thanks! Devendra

  • @rachidt2764
    @rachidt2764 2 роки тому +60

    Would be cool to see a mini project in databricks where you could compare and highlight why you don't see it as a data engineering / BI first tool

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +6

      Yeah I will likely do a comparison between this and foundry

  • @codestrap8031
    @codestrap8031 2 роки тому +13

    Awesome breakdown Ben. I'm looking forward to a comparison with Foundry. My personal opinion is Databricks has the edge when it comes to their pay version of Spark and Delta. It's hard to do a direct head-to-head though because many of Foundry's overlapping capabilities are not documented. IMO Foundry has much better e2e capabilities with a built-in version control system, online IDE, CI/CD, and Data Apps/ML/AI tools. I'm a huge fan of both companies and really think these are the two players that will be left standing at the end of the Big Data OS wars.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      Glad you enjoyed it! I think I might finally be able to start filming next week for pltr

    • @MichaelStephenLau
      @MichaelStephenLau 2 роки тому

      @@SeattleDataGuy Please do help us understand better from a technical/professional perspective on how Palantir solutions stack up against others in the market (Databricks, Snowflake, AWS, Google, etc.).

  • @jackgolding4235
    @jackgolding4235 Рік тому +2

    Had this on while I was working and it just clicked to me, thank you!

  • @NroShock
    @NroShock 2 роки тому +5

    Great video! Would love to see another one on Databricks; moving raw data from blob storage, transforming and storing in databricks tables

  • @hotpeppermovie
    @hotpeppermovie 2 роки тому +6

    Databricks is awesome. Its so easy to work with

  • @severtone263
    @severtone263 2 роки тому +3

    You earned my sub. I am all in. Thank you for this, it was a great help!

  • @ShashankData
    @ShashankData 2 роки тому +2

    🔥 video! Learning so much from your vids man. I’m not really getting the effective difference between DataBricks and Snowflake.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      Filming that video now :)

    • @ShashankData
      @ShashankData 2 роки тому

      @@SeattleDataGuy woohooo looking forward to it!

  • @akshaybaura
    @akshaybaura 2 роки тому +5

    great starter !! it'd be interesting to dive a little deeper into delta lake file format and also compare it with iceberg or hudi formats i.e. where they are similar, different, which situations suit one best over the other.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +2

      Sounds like a great video. Let me see if I can get Ryan Blue in the video.

  • @edwinfokobo5680
    @edwinfokobo5680 11 місяців тому

    I love this an would love to learn more and understand what’s needed as a prerequisite before I can get a job

  • @TheAndrewjoynson
    @TheAndrewjoynson Рік тому

    brilliant intro - well done and and surprisingly i understand a lot of this. (im not a dev nor a data engineer)😄

  • @venkateshkothapalli
    @venkateshkothapalli 2 роки тому +3

    Love Databricks!

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Seems to be a decent amount of love for it!

  • @artandrock4all
    @artandrock4all 2 роки тому +3

    really cool all around video, would love to see more videos on databricks where a more deep dive analysis would be given on each topic ;)

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      I will add it to the list of future data engineering videos

    • @DatabricksPro
      @DatabricksPro 11 місяців тому

      This channel is great, but you may also check mine. Cheers.

  • @aiautoglasscrm
    @aiautoglasscrm Рік тому

    Subscribed. When you say productionize your job, I thought I was going to see API endpoints to hit to get results for example a regression price prediction model that gives you a number after inputting variables

  • @surfh3r0
    @surfh3r0 Рік тому +1

    I'm new on the subject, well explained!

  • @nguyetdang111
    @nguyetdang111 8 місяців тому +2

    Is Databricks different from Azure Databricks?

  • @zonezero3290
    @zonezero3290 6 місяців тому +1

    Thank you for sharing this!

  • @alineremy4273
    @alineremy4273 4 місяці тому +1

    Super! Thank you so much for the video!

  • @horaciosoldman4481
    @horaciosoldman4481 2 роки тому +1

    Thanks for this informative video Ben 🙌

  • @femaledeer
    @femaledeer Рік тому +4

    Video didn't explain what databrick does. When was table show being built ?

  • @JohnNettuno
    @JohnNettuno 3 місяці тому

    Very clear -- very nice presentation

  • @keviny9392
    @keviny9392 2 роки тому

    Excellent content. Exactly what I needed to get started. Thanks

  • @mashagalitskaia8642
    @mashagalitskaia8642 9 місяців тому +1

    a really cool introduction, thanks a lot!

  • @tallalmoshrif6643
    @tallalmoshrif6643 Рік тому

    Great content as always.
    Thanks for sharing.

  • @nujanai
    @nujanai 2 роки тому +1

    Great overview. Thanks!

  • @wrburggraaf
    @wrburggraaf 2 роки тому +1

    How would you compare this to SAS Viya? It seems like this is more for building a data lakehouse whereas SAS is primarily for data analysis and analytics (so it might connect to data bricka to get the data). Could you also do data analysis and analyrics well in data bricks?

  • @raphaeldwain7834
    @raphaeldwain7834 2 роки тому +2

    Very useful. Thanks.

  • @Tic45544
    @Tic45544 3 місяці тому

    Thanks for your job Ben ;)

  • @marklambrecht
    @marklambrecht Рік тому

    We need to have you do a video about SAS Viya too!

  • @DavidKoleckar
    @DavidKoleckar 7 місяців тому +1

    that's high quality vid, thx :]

  • @stelluspereira
    @stelluspereira Рік тому

    Enjoyed our video, thx
    Do you know how to install databricks &PySpark LOCALLY( on laptop ) & code & test locally
    Perhaps a video will be appreciated by the community
    WITHOUT depending on AWS/Azure

  • @ganonymous8448
    @ganonymous8448 2 роки тому +2

    Great product overall, but sucks that you can’t use Airflow on it.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      I could have sworn I saw a partnership between them and astronomer

    • @fenderbender28
      @fenderbender28 2 роки тому +1

      Their Workflows orchestrator got super powerful and is easier than airflow imo

  • @mikenashtech
    @mikenashtech 2 роки тому

    Great vid Ben. Like the way, you give tips with commercial thinking behind it. Thanks Mike

  • @AliTwaij
    @AliTwaij 2 роки тому +2

    Thank you

  • @manticomar1146
    @manticomar1146 Рік тому

    Good work

  • @zacharythatcher7328
    @zacharythatcher7328 2 роки тому +1

    To me this just looks like a great way to encourage developers to deploy untested code. Are there any testing pipelines built in that prevent job deployment prior to passing tests?

    • @gautam3305
      @gautam3305 2 роки тому +2

      Testing in data field is completely not like typical software engineering cicd unit testing

  • @chrisbreen3188
    @chrisbreen3188 2 місяці тому +1

    Upon watching , I am still no clearer with how this software is helpful to me, a data analyst. What is the benefit of using this instead of running python scripts on my local computer to ETL my data into powerbi? Am I missing something?

  • @mikipatel2434
    @mikipatel2434 Рік тому

    Great video ruined by the annoying background music. The background music was really distracting and annoying. But very good information. Thank you

  • @VicusBass
    @VicusBass Рік тому

    Whenever it's Databricks there a Romanian around :)

  • @cerberus1321
    @cerberus1321 Рік тому +1

    Its the same as Domino

    • @SeattleDataGuy
      @SeattleDataGuy  9 місяців тому

      there are some similar features but it is kind of an apples and oranges comparison

  • @piotr780
    @piotr780 Місяць тому

    kubeflow much much larger and powerful tool then mlflow

  • @JimRohn-u8c
    @JimRohn-u8c 2 роки тому +3

    I hate Databricks, would rather use Snowflake.

    • @michaeld9682
      @michaeld9682 2 роки тому +3

      Why?

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      I would also love to know why!

    • @sevegarza
      @sevegarza 2 роки тому +1

      Agreed. Snowflake + Prefect = 😇

    • @JimRohn-u8c
      @JimRohn-u8c 2 роки тому +5

      @@SeattleDataGuy a couple reasons:
      1. A lot of times when I’m building a data pipeline there’s a lot of SQL queries I need to write to just analyze the data before I start creating certain metrics and to just start understanding the data. Because of Databricks 1000 row limit this is harder to do.
      If those tables were in a RDBMS or Snowflake I wouldn’t feel as hindered with regards to this very common task. I know it may seem weird to some people but sometimes just being able to see more of your data and scroll through just helps; maybe this is a Junior Data Engineer thing idk.
      2. Idk if this is easier in Snowflake but passing a Parameter from a widget into SparkSQL was a pain in the ass in Databricks, the only reason I figured it out was because a notebook written by someone else did the same thing. We use Azure Synapse notebooks and I like that much more than Databricks as well; it was easier to do some of the same things.

    • @gautam3305
      @gautam3305 2 роки тому

      @@JimRohn-u8c I understand data discovery is key aspect before modeling, but that can be achieved by using groups by, limit, distinct, windowing etc, don't need to print million rows and export to excel for that.