Core Databricks: Understand the Hive Metastore

Bryan Cafferky

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 січ 2025

КОМЕНТАРІ • 56

@haseebjehangir3249 Рік тому ⁺¹⁵
Finally a video on databricks hive metastore which is well explained, thanks Bryan
@bungaloebill4433 2 місяці тому ⁺²
Great video! I'm subscribing for the Red Green reference alone!
@JLRocco43 Рік тому ⁺⁴
I was just pondering on doing a deep dive in this today and reading a lot of docs and then you put out the video 😂 awesome work Bryan!
@andrewpotts9948 8 місяців тому ⁺⁵
That's the right level of detail that I needed. Well explained. Thank you.
@BryanCafferky 8 місяців тому
You're Welcome!
@ambarishdashora5440 14 днів тому
This is what I was really looking for. Thank you very much for providing such an amazing explaination.
@BryanCafferky 12 днів тому
You're welcome. Glad to help.
@soumyavema6515 Рік тому ⁺²
Pretty clear ...very much needed before exploring Unity catalog ....Waiting for the next
@jace_viz 20 днів тому
Very clear explanation. Thanks @Bryan!
@TheAliakbarazad 15 днів тому
Thank you so much for despite your great knowledge about the subject, you take the time to explain it so even I can understand!!😍
@BryanCafferky 12 днів тому ⁺¹
You're welcome. Glad it helps.
@daminimohite3400 6 місяців тому
super clear explanation, loved the analogy used in the beginning
@BryanCafferky 6 місяців тому
Thank You!
@tiwlan 2 місяці тому
Thank you very much for the video and the channel, I'm from Brazil and your work help me a lot!
@BryanCafferky 2 місяці тому ⁺¹
So glad my videos are helping you!
@kvin007 Рік тому ⁺¹
Love the direct and clear content! Keep it going!
@martalopezjurado Рік тому ⁺¹
I love this video!! thanks a lot.
Waiting for the unity catalog video!
@BryanCafferky Рік тому
YW.
@sumak151 2 місяці тому
That's so good i enjoyed the video thoroughly..i am.just starting to understand more about azure data bricks
@mehulkhare8278 10 місяців тому
Thanks for making it simple to understand.
@BryanCafferky 10 місяців тому
You're Welcome! Glad it helped.
@danhai7276 Рік тому
Great video, waiting for the next one unity catalog.🙌
@BryanCafferky Рік тому
Yeah. There's a lot to Unity Catalog. Also doing Databricks AI Assistant which is very cool.
@YiminWei-z6w 6 місяців тому ⁺¹
great explanation. Thanks!
@spursyou230 20 днів тому ⁺¹
thanks for video. but bit confused, when you do saveAsTable() and drop the table, will the physical data be deleted from original source? for example if I read data from AWS S3 and saveAsTable, but then drop the table, will the data in S3 also be deleted ?
@BryanCafferky 17 днів тому
When you create a schema on top of an existing file, schema on read, it's really a read only pseudo table. You can also create tables that are unmanaged which means Spark will not delete them when you drop the table. If the table is defined as a managed table, dropping the table will also drop the underlying data. you need to make sure you know whether you have a managed or unmanaged table to avoid bad surprises.
@devigugan 4 місяці тому
Excellent narrative ❤❤❤
@sujitunim Рік тому
Thanks Bryan for this amazing session
@BryanCafferky Рік тому
YW
@rabeMa Рік тому
Deadly clear, awesome 👌👌👌💯💯💯
@joshuawagner5350 7 місяців тому
Exceptional explanation. Thank you.
@BryanCafferky 7 місяців тому
Glad it was helpful.
@pal3201 Рік тому ⁺¹
Can you tell us when are you releasing your take on Unity Catalog ? Looking forward to it.
@BryanCafferky Рік тому
So many things to cover these days. Hopefully, soon. Thanks!
@awadelrahman 5 місяців тому
Thanks A LOT!
One question: at 17:05; did you mean "Delta Files" instead of "Delta tables" ? when you said "Detla tables are rather interesting ...."
@BryanCafferky 5 місяців тому ⁺¹
Just that a Delta file is really a Delta Table that has not been cataloged in the Hive Metastore or the Unity Catalog. But that just by pointing to the Delta file path, you can use as a table.
@jbab9618 10 місяців тому ⁺¹
Hi @BryanCafferky if CSV file meta data is change then hive metastore automatically update metadata in hive store, is it right else we can do any steps for refresh metadata ?
@BryanCafferky 10 місяців тому ⁺¹
A Hive table definition over a CSV file is read only and to get the meta data reloaded, I believe you would need to drop and re-create the table.
@nargesrokni6348 Рік тому
very good explanation, thank you very much man
@BryanCafferky Рік тому
YW
@Kete-Dude 5 місяців тому
have some confused about unmanaged and managed, in the step `create delta table that stored in hive` the type of dimgeography is Managed but it still can drop by not get rid of the physical files like Unmanaged(External), so what's the difference point of it?
@BryanCafferky 5 місяців тому
Yes. It is confusing. Think of a managed table as being like a SQL Server table if that helps. SQL Server tables are created and dropped with all the data via a DROP TABLE statement. Spark supports similar functionality for Managed tables in which the table schema and underlying data are created at the same time. This is to mimic SQL database type of functionality. Unmanaged tables are when you already have an external file and you create a schema defining the columns names and types describing the table so Spark can allow you to use SQL queries against it. Since the file pre-exists and is maintained separately from the Hive Metastore or Unity Catalog, you don't want the physical file deleted when you issue a SQL DROP TABLE statement. Bottom line: if you want the table to be treated just like an RDBMS would treat it, i.e. catalog entry and physical data handled via SQL, you want Managed. If you want to use SQL queries against a pre-existing data file, you want to define it as Unmanaged. Make sense?
@etianemarcelino5706 Рік тому
Great content... Like always
@renegade_of_funk Рік тому
You’re doing the Lord’s work. 👌
@GhernieM 6 місяців тому
Hey Bryan, do you plan to create something about Unity Catalog?
@malaka123456 2 місяці тому
Great video!
@BryanCafferky Місяць тому
Thanks!
@ManishSharma-fi2vr Місяць тому
Thanks Bryan!!
@BryanCafferky Місяць тому
You're welcome!
@CaponordRevHappy Рік тому
Superb! thank you.
@BryanCafferky Рік тому
You're Welcome!
@naveenagrawal_nice 6 місяців тому
Loved it
@ngneerin Рік тому
This gave real good idea
@benjaminwootton Рік тому ⁺¹
Good video. Though I understand Hive Metastore, it confuses me why everything in data has a dependency on it. For instance, Iceberg seems to need it for everything even though it’s supposed to be a self describing table format.
@BryanCafferky Рік тому
Technically, you don't need the Hive metastore to read Delta tables. But it provides a look up to where the table is physically stored. Otherwise, you need to provide the full path to the storage location. It also stores schemas for files that don't have built-in schemas like CSV and Text files.
@ravinarang6865 9 місяців тому
Very Good.

Наступне

Автоматичне відтворення

Scale Up Your Databricks Coding with Databricks AI Assistant