Creating Managed Tables in Databricks Unity Catalog

Поділитися
Вставка
  • Опубліковано 26 сер 2024
  • 🔗 Links and Documentation:
    - learn.microsof...
    💻 Check out my Databricks courses on Udemy:
    Azure Databricks and Spark SQL (Python): www.udemy.com/...
    Databricks SQL for Data Analysts: www.udemy.com/...

КОМЕНТАРІ • 3

  • @SilvioDiMartino
    @SilvioDiMartino 6 місяців тому

    A VERY useful playlist, a big thanks! Just a few questions, I'm not sure to understand deeply the difference between managed and external tables:
    - the EXTERNAL tables are managed by the cloud provider and stored on the cloud (S3 bucket or ADLS container). You can register the table in a Unity Catalog metastore, accessing them through the storage credentials, so that you can read, managed and audit Databricks access;
    - the MANAGED tables are fully managed by Unity Catalog. You can access them through the Catalog Explorer, but they are still stored on the cloud (S3 bucket or ADLS container, by default in the storage location of the metastore), aren't they?
    Thanks again!

    • @pathfinder-analytics
      @pathfinder-analytics  6 місяців тому

      I'm glad you found the playlist useful! Your understanding is on the right track.
      External Tables
      External tables refer to data that is stored outside of Unity Catalog, typically in a cloud storage service like Amazon S3 or Azure Data Lake Storage (ADLS). The the storage of the data is managed by the cloud provider, but the metadata (information about the table like schema, data location, etc.) can be registered and managed in Unity Catalog.
      By registering an external table in Unity Catalog, you enable data access and management within Databricks, including reading, writing, and auditing access. The actual data remains in its original location, and Databricks interacts with it through storage credentials.
      Managed Tables
      Managed tables are more tightly integrated within Unity Catalog. Both the data and the metadata are managed by Unity Catalog, making operations like managing, monitoring, and securing data more streamlined.
      The data for managed tables is still stored on the cloud (e.g., S3 or ADLS), but the key difference is that the location and lifecycle of the data are controlled by Unity Catalog. For managed tables, Unity Catalog decides the storage location (typically within a specified path in the metastore's associated storage) and handles tasks like data format optimization, partitioning, and cleanup (e.g., deleting data when a table is dropped).

    • @SilvioDiMartino
      @SilvioDiMartino 6 місяців тому

      @@pathfinder-analytics cool! super useful, thanks again!