Window functions are a great way to compute running totals, averages, sums, etc. Or do something specific with the first or last row. Or peek at the next row or the previous row within a group. Or group by different things within a single query. They're very powerful!
Wow Steve's graphic design skill is getting better, compared to the previous video Aaron in this video looks more realistic and is barely distinguishable from the real one 👍😁 anyway, even when the query is fast enough, we also need to consider the number of times we may execute it (multiple users requesting the data), so using a summary table is definitely helpful here. Thanks Aaron,
If I understand this correctly, one important feature of a window function is that everything inside is on a rolling basis. That is for sure not something you could ever do with a simple GROUP BY. I do have an application where this may actually come in very handy. Thanks Aaron, great content as always!
Thank you for this video!! Wonderful to dive into SQL like this and absolutely fantastic to see your enthusiasm Aaron. It's very inspiring and I am really excited to learn about the summery table when you make that video. Such a nice idea to make a query faster by just sort of "computing it before its needed" !
Great video @aarondfrancis, I would love to see the summary/rollup table video, it's a pity you didn't have time to make it, but one can hope you will cover it someday in your personal channel :) Good luck man
Thanks for a great video. SQL is clearly not amazingly fast when it gets to bigger datasets and more complex analysis. Wouldn't it be beneficial to use another tool for a task like this, something like Spark perhaps?
MySQL lacks mviews, anyway to get similar functionalities using just DB options. I've experimented with triggers on insert,update,delete that reruns the query to fill the table but incase of multiple inserts or updates it's not quite what was expected.
Idea for your summary table video: in order to maximize performance and reduce unnecessary storage, I like to do summary view + index instead of an actual table.
For the aggregations a simple group by should work as well. It would complicate the query in case you actually need the fields for first- and last timestamp (although one could be calculated back from the bucket). In the end using window functions was most likely the more interesting and educational topic for a video, even though the simple example could be solved in other ways
@@goaserer I think it’s more like a leading video for the summary tables topic. I recently worked on a topic like this so I keep seeing group by solutions everywhere I look.
@@developertpl I agree that showing window functions approach here is a nice educational approach. However, since it was mentioned that it is a no go for PlanetScale, it would be nice to also show the group by approach as an alternative here. Especially after we filtered only last record of each bucket which is literally what would group by achieve. From my point of view window functions are best used when you're reviewing per-record data in context of its group/bucket, which wasn't the case here. Tl;dr I enjoyed the video and always appreciate videos on this channel as they are of high quality and very informative. Just pointing out showing group by approach as well would make it "whole" :)
Hi. I wanna thank you for beeing a great teacher. I have learnt a lot of things from you. I have a question, traditionally, when i have the case where i need to do some calculations, and the reuse them, I usually create a temporary table, so that i can consume the data that has been calculate and not repeat it. What would it be your recommendation , use subqueries, use temporary table or the one that I just leaned, the with utility.
Is there an advantage counting these on the database level rather than the application level? Strictly speaking, is there any performance advantage when doing this calculation on the database?
It is faster First, you don't pull tens of thousands of rows over the network Second, even if MySQL is not optimized for timeseries data, it is still dozen of times faster than say, Javascript or Java. After you got the grouped result which has reduced size, then you pull that data to application level and display them as charts
@@_MB_93 But, will it create another bottleneck elsewhere? I mean, database connection is a resource. Will performing calculation hold those connections? Because if it is, then wouldn't it be better to just dump data and then perform calculation on the application side since the application server is typically easier to scale than db server? But of course, this also needs to be performed after profiling.
@@YandiBanyu It is a bottleneck no matter where you put it, its just that database is a specialized software for aggregating and transforming data, I personally think it is "re-inventing the wheels" if things such as "group by" is performed on application side. As for scaling, well, as Aaron said in the video, MySQL can take you a long long way before you actually need it.
@@_MB_93 yeah, there will always be bottleneck somewhere. But some of it is easier to solve than others. I agree that not many applications will scale beyond what RDBMS such as MySQL offer, but it is still good to keep in mind too when doing costly operations on the database level. After all, it is (IMHO) harder to reduce the bottleneck of a single database server (before going to a more complex clustering strategy) than say, spin up another server instance behind a load balancer. This can matter when making an application that needs on premise data for some security/legal reasons.
Hi PlanetScale, I have a Table, with one column, which are the timestamps on which server was observed alive. I want to write a select query, which returns me the data for every minute, if there exists a timestamp in that minute, 1, else 0.
Video Suggestion How to handle case where you have to insert n numbers of rows and then you need the primary key(id) for those to be inserted in a child row WITHOUT making n insertion queries
😁only understand about 1% but i'm sooo impressed what you are able to and also what mysql is capable of. i only use it as a middleman in my java and c# programs
With all the talk on indexing, what does it really do? Is there such a thing as too many indexes? I'd imagine there's eventually a limit, either through practicality or performance, stopping you from adding an index for each and every column.
I've been working in MySQL for over 20 years now, but learned a TON from this video - Thank you for putting out such great learning content Aaron!
Maybe I did not catch the specific point why window function was used here instead of simple GROUP BY "bucket"?
my guess is so that you are able to have multiple, separate, "groups" and operate on them all within the same query
@planetscale???
Window functions are a great way to compute running totals, averages, sums, etc. Or do something specific with the first or last row. Or peek at the next row or the previous row within a group. Or group by different things within a single query. They're very powerful!
@@PlanetScale yes but this specific task would be better with group by. still a great thing to know
in this case it could be done with groupby right? Or im missing something?
Wow Steve's graphic design skill is getting better, compared to the previous video Aaron in this video looks more realistic and is barely distinguishable from the real one 👍😁
anyway, even when the query is fast enough, we also need to consider the number of times we may execute it (multiple users requesting the data), so using a summary table is definitely helpful here.
Thanks Aaron,
If I understand this correctly, one important feature of a window function is that everything inside is on a rolling basis. That is for sure not something you could ever do with a simple GROUP BY. I do have an application where this may actually come in very handy. Thanks Aaron, great content as always!
Thank you for this video!! Wonderful to dive into SQL like this and absolutely fantastic to see your enthusiasm Aaron. It's very inspiring and I am really excited to learn about the summery table when you make that video. Such a nice idea to make a query faster by just sort of "computing it before its needed" !
Classic Aaron, using MySQL as a calculator
Haha didn't even think about it
One of the best UA-cam channels when it comes to sql!
Great job! So far, the best tutorial clips on common windows functions.
This channel just keeps on getting better 😍
The fact this content is free is outstanding. Beyond blessed to be a developer in 2024!
Amazing! Been trying to do this for ages! The explanation is great.
Thank you!!! Good to see this type of content. 🍻
Video request: Geospacial indexing
Great video @aarondfrancis, I would love to see the summary/rollup table video, it's a pity you didn't have time to make it, but one can hope you will cover it someday in your personal channel :) Good luck man
Perfect content right there!
with rollup table potential problem is with presenting data in different timezones, especially when doing stats per day
This is like learning chess openings: at first everything is clear and easy, but in the end, your mind will explode.
Thanks for a great video. SQL is clearly not amazingly fast when it gets to bigger datasets and more complex analysis. Wouldn't it be beneficial to use another tool for a task like this, something like Spark perhaps?
MySQL lacks mviews, anyway to get similar functionalities using just DB options. I've experimented with triggers on insert,update,delete that reruns the query to fill the table but incase of multiple inserts or updates it's not quite what was expected.
GROUP BY looking at you menacingly from the corner 😂
Perfect timing!
Cool, detailed, niche case handling explanation video, keep it up!
I get same output with groupBy but times faster, only one thing missed is first_value(dt), but it can be replaced by MIN(dt)
Thanks Steve!
Idea for your summary table video: in order to maximize performance and reduce unnecessary storage, I like to do summary view + index instead of an actual table.
If we have a bucket as calculated column, would group by do the job?
17 minutes in and still asking myself why are we not grouping by key and bucket instead of trying to get the last row of each group.
My thoughts exactly as well :P Probably could even do `GROUP BY bucket ORDER BY bucket`
For the aggregations a simple group by should work as well. It would complicate the query in case you actually need the fields for first- and last timestamp (although one could be calculated back from the bucket).
In the end using window functions was most likely the more interesting and educational topic for a video, even though the simple example could be solved in other ways
@@goaserer I think it’s more like a leading video for the summary tables topic. I recently worked on a topic like this so I keep seeing group by solutions everywhere I look.
@@developertpl I agree that showing window functions approach here is a nice educational approach. However, since it was mentioned that it is a no go for PlanetScale, it would be nice to also show the group by approach as an alternative here. Especially after we filtered only last record of each bucket which is literally what would group by achieve.
From my point of view window functions are best used when you're reviewing per-record data in context of its group/bucket, which wasn't the case here.
Tl;dr I enjoyed the video and always appreciate videos on this channel as they are of high quality and very informative. Just pointing out showing group by approach as well would make it "whole" :)
@@Szarlus Great content for sure! Always a pleasure.
Can't you just SELECT MIN(dt), AVG(v), MIN(v), MAX(v) FROM timeseries GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / bucket_length_in_secs)?
Bump! I was thinking the same.
another video I'd be very interested to see is a deep dive on MySQL replication
Multi master is so much fun 🤔😬
Hi.
I wanna thank you for beeing a great teacher. I have learnt a lot of things from you.
I have a question, traditionally, when i have the case where i need to do some calculations, and the reuse them, I usually create a temporary table, so that i can consume the data that has been calculate and not repeat it.
What would it be your recommendation , use subqueries, use temporary table or the one that I just leaned, the with utility.
slick animations!
Isn't summary table similar temp table?
Why don't just group by bucket and use mysql's AVG function to get the average value in the bucket?
Is there an advantage counting these on the database level rather than the application level? Strictly speaking, is there any performance advantage when doing this calculation on the database?
It is faster
First, you don't pull tens of thousands of rows over the network
Second, even if MySQL is not optimized for timeseries data, it is still dozen of times faster than say, Javascript or Java. After you got the grouped result which has reduced size, then you pull that data to application level and display them as charts
@@_MB_93 But, will it create another bottleneck elsewhere? I mean, database connection is a resource. Will performing calculation hold those connections? Because if it is, then wouldn't it be better to just dump data and then perform calculation on the application side since the application server is typically easier to scale than db server? But of course, this also needs to be performed after profiling.
@@YandiBanyu It is a bottleneck no matter where you put it, its just that database is a specialized software for aggregating and transforming data, I personally think it is "re-inventing the wheels" if things such as "group by" is performed on application side.
As for scaling, well, as Aaron said in the video, MySQL can take you a long long way before you actually need it.
@@_MB_93 yeah, there will always be bottleneck somewhere. But some of it is easier to solve than others. I agree that not many applications will scale beyond what RDBMS such as MySQL offer, but it is still good to keep in mind too when doing costly operations on the database level.
After all, it is (IMHO) harder to reduce the bottleneck of a single database server (before going to a more complex clustering strategy) than say, spin up another server instance behind a load balancer. This can matter when making an application that needs on premise data for some security/legal reasons.
Hi PlanetScale,
I have a Table, with one column, which are the timestamps on which server was observed alive.
I want to write a select query, which returns me the data for every minute,
if there exists a timestamp in that minute, 1, else 0.
What about data gaps?
Have you ever heard of an espresso martini?
can you provide the data file for the data that you have used?
Video Suggestion
How to handle case where you have to insert n numbers of rows and then you need the primary key(id) for those to be inserted in a child row WITHOUT making n insertion queries
😁only understand about 1% but i'm sooo impressed what you are able to and also what mysql is capable of. i only use it as a middleman in my java and c# programs
this is great!
I dont use mysql i dont even use planetscale (no offense) but i learn alot of things from you, and I like how you explain stuff
If you keep making videos like this, my `favourites` saved list is just going to look like your channels `videos` page.
SAD there is no more hobby plan in PlanetScale, :/
Always forget the end on the case. Every time.
Can't you just group by the bucket calculation?
🤯👏
Totally new for me😅
brb gonna go build an APM tool
MySql Gandalf
With all the talk on indexing, what does it really do? Is there such a thing as too many indexes? I'd imagine there's eventually a limit, either through practicality or performance, stopping you from adding an index for each and every column.
I was lost around 10 minutes in.....
GROUP BY on steroids 😅
😅😂😂😂😂