Thanks. For some reason all the other explanations I found on Google are so complicated. Was just trying to get a basic understanding, and this was great!
This video updates a mistake that had been nagging me for weeks. ghettoghurle pointed out the mistake, and it pushed me to replace the old video with this one. Thanks ghettoghurle!
Certainly this can be done through indexing. Any really good DBA can synthesize this, but columnar DB vendors focus on providing much of the columnar optimization out of the box. The purpose of the video is to try and simplify the concept to business people so they aren't confused about why their traditional row based queries aren't performing. A lot of the Columnar DB vendors make it appear that traditional databases are unfit for service when the reality is that it's a structural change.
Also if another American league team is added it will have row id outside the sequence of 1 - 258. As 259 onwards start national teams. So in this case the query won't be as simple of 1 - 258. It will need to include team such as 569 row ID.
Any decent RDBMS can accomplish this if you normalize it to 6NF and then put it back together as a view. A query against the view would access only the underlying tables needed.
So how is this different than a relational database? select League, Team from MyTable where League = 'American' Especially if I have and index on League and a numeric value for the American league
So, all data in the columns are sorted? Why couldn't there be 20 American league teams listed first, then 10 National league teams, then 5 more American league teams?
You stated the performance problem is reading every row and every column; the columns that aren't necessary too. I assume it is these unnecessary columns that are causing the performance. So why not the other solution of only selecting the column you require and not SELECT * Or does the query engine not actually work that way?
A row store stores the entire row as a single piece. So if there 10 records, it stores each record as a separate piece.Hence even if you give select a,b from emp, it has to scan all the columns in that record.Whereas columnar database stores data into columns.So the values of each column is stored separately. Hence if you give select a,c from emp, then it will select/read only column a and c even though there are 100's of other columns.That 's the reason why columnar databases are good for read intensive and aggregate queries/use cases. Hope this helps
What if I want to retrieve team and player name. Each player has to be matched with its team. So u need to have to row id in each column for the linking to work.
When columnar databases run their queries, much of the query optimization is about what data is NOT being queried. Now recognize that in most cases queries are not ran for single records (like an OLTP use case). In most cases the pull of data is in mass quantities. The query optimizer is going to attempt at first to eliminate records where the desired results are not present. So if you parameterize your queries by players of certain teams, the query optimizer will first eliminate the teams you don't want to eliminate additional scanning. Then it will dive into scanning down the record set, and there are optimizations for that scanning as well based on how the data is partitioned. In the case where you were wanting a single record the performance will NOT be good. This is because the overhead isn't tuned for the use case. Here's a good whitepaper on the topic which highlights Snowflake, but dives into how it works a little more.
You need to put "FOR NON ENGINEERS" in the title. You have watered it down to the point that it is just good for chitchat. And honestly I knew what was coming when you used a baseball example in a DB tutorial.
Yes let's compare an unoptimized table scan to an ideal query scenario on a purpose optimized database. Adding a covering index for the query (and avoiding a table scan) would yield similar performance characteristics.
The difference is that every column in a columnar store is optimized not just the ID and League. This way the users don't have to be DBA's to get performance in their queries. Tactically, you can avoid purchasing a columnar store by trying to preempt the requirements and build all the indices ahead of time, but then you'd need a crystal ball in your back pocket.
After nearly 14 years, and still the best explanation, so absurd
Thanks. For some reason all the other explanations I found on Google are so complicated. Was just trying to get a basic understanding, and this was great!
Best explanation on the internet. Thank you
This video updates a mistake that had been nagging me for weeks. ghettoghurle pointed out the mistake, and it pushed me to replace the old video with this one. Thanks ghettoghurle!
Thanks a mil! This is the best explanation on columnar databases ever. On point and very easy to grasp thanks to the use of an example illustration.
Man you really did simplify it. Thanks a bunch. Great Job on the video!
A 13 year old video did a brilliant job in explaining this concept! Kudos.
Certainly this can be done through indexing. Any really good DBA can synthesize this, but columnar DB vendors focus on providing much of the columnar optimization out of the box. The purpose of the video is to try and simplify the concept to business people so they aren't confused about why their traditional row based queries aren't performing. A lot of the Columnar DB vendors make it appear that traditional databases are unfit for service when the reality is that it's a structural change.
An index over the row-based DB still wont be as fast because, those indexes refer to the primary key and not the value of the field.
@@amanaggarwalgogo Not exactly true. There are index-only scans and covering indexes supported by all primary vendors.
This is the simplest explanation. Thank you.
This made me understand the concept so much better than reading
Excellente! Simple and straigh to the point, thanks!
Thank you so much for the nice and clear explaination!
Great video thank you! Nice and short
Was a mistake from a former video on the same topic. This video fixes that mistake, it was regarding the database type.
"Hi, i'm Sherlock Holme"...
It is a concise and easy to understanding presentation. I keep watching other videos uploaded. I love them.
Perfect explaination.
So easy to understand. Thank you
amazing content
Best explanation I've seen.
Also if another American league team is added it will have row id outside the sequence of 1 - 258. As 259 onwards start national teams. So in this case the query won't be as simple of 1 - 258. It will need to include team such as 569 row ID.
Here’s a white paper on it that can explain in a little more detail www.intricity.com/learningcenter/snowflake-is-a-semi-not-a-motorcycle
Any decent RDBMS can accomplish this if you normalize it to 6NF and then put it back together as a view. A query against the view would access only the underlying tables needed.
So how is this different than a relational database?
select League, Team
from MyTable
where League = 'American'
Especially if I have and index on League and a numeric value for the American league
Comment Tator it's still an rdbms. The underlying storage system is what differs.
@@someguyO2W What differs are the internals... luckily most of us mere mortals will never need to know these details..which is kind of cool
@@apamwamba very cool. If we had to know everything, we'd explode!
best ad ever
The point of the video series is to simplify the concept not to do DBA training.
its a good video
good introduction video.. could you also explain how the data is stored in the database ? how is the ID column associated with the "Team" column ?
Thank you. This was really informative. :)
Awesome explanation.. thank you
So, all data in the columns are sorted? Why couldn't there be 20 American league teams listed first, then 10 National league teams, then 5 more American league teams?
excellent visual comparison of these 2 technologies, what produkt did you use for your presentation purpose?
Simple and well explained! Thank you.
You stated the performance problem is reading every row and every column; the columns that aren't necessary too. I assume it is these unnecessary columns that are causing the performance. So why not the other solution of only selecting the column you require and not SELECT *
Or does the query engine not actually work that way?
A row store stores the entire row as a single piece. So if there 10 records, it stores each record as a separate piece.Hence even if you give select a,b from emp, it has to scan all the columns in that record.Whereas columnar database stores data into columns.So the values of each column is stored separately. Hence if you give select a,c from emp, then it will select/read only column a and c even though there are 100's of other columns.That 's the reason why columnar databases are good for read intensive and aggregate queries/use cases.
Hope this helps
Thanks a ton.. this question is what I was searching for. great explanation..
What if I want to retrieve team and player name. Each player has to be matched with its team. So u need to have to row id in each column for the linking to work.
When columnar databases run their queries, much of the query optimization is about what data is NOT being queried. Now recognize that in most cases queries are not ran for single records (like an OLTP use case). In most cases the pull of data is in mass quantities. The query optimizer is going to attempt at first to eliminate records where the desired results are not present. So if you parameterize your queries by players of certain teams, the query optimizer will first eliminate the teams you don't want to eliminate additional scanning. Then it will dive into scanning down the record set, and there are optimizations for that scanning as well based on how the data is partitioned.
In the case where you were wanting a single record the performance will NOT be good. This is because the overhead isn't tuned for the use case.
Here's a good whitepaper on the topic which highlights Snowflake, but dives into how it works a little more.
What about setting an complex index on ID and League? performance will be the same
Nicely explained. Thanks.
You need to put "FOR NON ENGINEERS" in the title. You have watered it down to the point that it is just good for chitchat. And honestly I knew what was coming when you used a baseball example in a DB tutorial.
The channel name is Intricity101… 101 is an intro course in college.
@@Intricity101 My bad, just had a rough day 🌹
nice presesntation, and easy for people to get what is columnar database.
Overused statement but
3 minutes of this beats hours of uni lecturers
Thank you. This was really helpfull
Very good explanation.
thank you
Thanks! Give us a share on socials. We're hoping to raise awareness about our channel.
fucking amazingly explained simple to te point fast and guddddddddd
i just started learning the term column family and i guess this is the same beast.
Yes let's compare an unoptimized table scan to an ideal query scenario on a purpose optimized database. Adding a covering index for the query (and avoiding a table scan) would yield similar performance characteristics.
The difference is that every column in a columnar store is optimized not just the ID and League. This way the users don't have to be DBA's to get performance in their queries.
Tactically, you can avoid purchasing a columnar store by trying to preempt the requirements and build all the indices ahead of time, but then you'd need a crystal ball in your back pocket.
nice explanation! thank you!
very well articulated
Sounds like index table to me.
Thanks, this really helps
Sybase is the best Columnar database .
you need to adjust the content
well presented
This came in my exam but i didn't do it
Excellent
Thanks a ton.Awesome.
thanks
Thanks