Prepared statements should not be used because they're faster. They should be used because they're much safer. The speed increase is just a free bonus.
@@tipeon I have been reliably informed that Oracle have imprisoned a Night Elf in their compiler and whenever it sees a java.sql.Statement, he screams "YOU ARE NOT PREPARED", teleports to it and starts wailing. It may be true, but I haven't used a PreparedStatement since Blizzard released Heroes of the Storm, so...
Our source has a lots of unprepared statements ❤ I gave up any security concerns in my company 😂 at least we just have a small number of accounts. It’s not perfect if someone is targeting us to exploit data but I haven’t written that bullshit
Great info which was valuable to me, and your channel is top notch anyway. And to add some humor into the mix: Beware, if you have more microservices than users, your system probably does not need indexing or the rest 🙂 stay health, stay fresh, and good luck out there
Your index explaination is not entirely correct. Postgres does offer hash-based indexes which are a lot closer to your explaination but the default index type (which you used in your creation example) is a B-Tree index, the data structure is very different. Paritions don't do anything meaningful to speed up writes, they would only speed up reads. Instead of scanning a whole table for a record, you only need to scan the single partition (assuming you're querying a single key) where you know your record lives in. It's the same concept as database sharding, but on one machine instead of multiple.
Thanks for your comment! I'm not sure what you're referencing but yes, btree is the default index which uses a binary tree for its lookup table. Partitions can speed up writes usually when asscosited with b tree indexes. One such factor is because of b tree balancing, which on partitioned table is usually less intensive than on the entire data set. Another increase in performance is when performing deletes associated with the partition column, as deleting the partition rather than deleting the rows of a table prevents rebalancing from taking place. This is common in time series data and dramatically improves write performance.
slight addition to COPY. you can use \copy from client if you don't have access to store input files on server. i.e. You can locally stream csv to server.
I want to be from ur first subscribers so when u reach a million in the next year i will comment i was here when he was getting started (i was here at 5k)
Thank you! I appreciate the feedback. For this video I used the Electrovoice RE20. I also recorded in a soundtreated room this time as well which made a difference!
Stored procedures are wonderful, but prepared statements have the advantage of being more dynamic in nature. Imagine e.g. a web page displaying a list with multiple columns each with different filters and sorting options. It would be a nightmare to implement with a stored procedure, but using a prepared statement you can dynamically build the necessary query.
@@Lightbeerer Kinda disagree here. Especially with web pages, the connections/sessions are very short and only exists for the short time the page is rendered. And if you only execute the query once per request or paging request, preparing the statements make it slower. And you can absolutely implement dynamic filtering/sorting etc with an SP... and with a lot less SQL injection dangers...
@@tipeon That's not what Connection pooling is for and I would consider this bad design. Connection pooling is for mitigating the connection overhead, but you're not supposed to assume that the connection is pooled or in any state.. you should assume it's a new or in some sense resetted connection. So you would have to first ask the server if there's already a prepared statement in the session and if not, recreate it. That would make things slow again. But it's probably not even possible, because you couldn't reconnect to your prepared statement after you reconnect. Which API allows you to reconnect to an already prepared statement on the server once you let go of the statement object you held? I'm not aware of any. So for this to work you'd need to implement your own connection pooling and keep track of the statement.. and that's an even worse idea.
people are not using indexes in an SQL service? also what people should learn about indexes if you create an index on a column the db will search faster after it if you have 3 where conditions, for example, then you need to create an index for those 3 colum combination for speed
Definitely at some of the places I've worked at. Indexes are kind of interesting, they're not very useful for small data sizes, and there's the risk of over optimizing for them.
>if you have 3 where conditions, for example, then you need to create an index for those 3 colum combination for speed Query planners are smart - if you have very large data sets you can do multi-column indexes and make sure the set reduction is in the correct order, but in my experience even up to a few billion records just having b-trees on each column individually is enough.
@@dreamsofcode Mysql is faster, but lack feature. So they both not good. Mongodb is pretty god, unfortunately it's not SQL which is what most people need. That's why companies should make better databases. Something that plug and play and fast.
It would be brilliant to have some database that is fast by default, but I’m afraid that is not possible in every use-case. Every choice in a database is a tradeoff. (Indexes for instance makes some reads a lot faster, but every write a lot slower…) I think the main selling point for PostgreSQL is that it is relatively easy to change these tradeoffs after you build your application.
I discovered your channel a few days ago; it's been amazing. Keep up the excellent work.
Thank you! I appreciate that a lot
Prepared statements should not be used because they're faster. They should be used because they're much safer.
The speed increase is just a free bonus.
This is a great point.
As a Java developer, I don't even remember the last time I used an unprepared statement.
@@tipeon I have been reliably informed that Oracle have imprisoned a Night Elf in their compiler and whenever it sees a java.sql.Statement, he screams "YOU ARE NOT PREPARED", teleports to it and starts wailing.
It may be true, but I haven't used a PreparedStatement since Blizzard released Heroes of the Storm, so...
While that is correct, I am pretty sure prepared statements were initially developed for performance.
Our source has a lots of unprepared statements ❤ I gave up any security concerns in my company 😂 at least we just have a small number of accounts. It’s not perfect if someone is targeting us to exploit data but I haven’t written that bullshit
Great info which was valuable to me, and your channel is top notch anyway.
And to add some humor into the mix: Beware, if you have more microservices than users, your
system probably does not need indexing or the rest 🙂 stay health, stay fresh, and good luck out there
Suggestion for next sql video is "how to vectorize sql database for fast searching"
This is a great suggestion!
@@dreamsofcodeHave you done it
Great Video. Thank you!
Your index explaination is not entirely correct. Postgres does offer hash-based indexes which are a lot closer to your explaination but the default index type (which you used in your creation example) is a B-Tree index, the data structure is very different.
Paritions don't do anything meaningful to speed up writes, they would only speed up reads. Instead of scanning a whole table for a record, you only need to scan the single partition (assuming you're querying a single key) where you know your record lives in. It's the same concept as database sharding, but on one machine instead of multiple.
Thanks for your comment! I'm not sure what you're referencing but yes, btree is the default index which uses a binary tree for its lookup table.
Partitions can speed up writes usually when asscosited with b tree indexes.
One such factor is because of b tree balancing, which on partitioned table is usually less intensive than on the entire data set. Another increase in performance is when performing deletes associated with the partition column, as deleting the partition rather than deleting the rows of a table prevents rebalancing from taking place.
This is common in time series data and dramatically improves write performance.
@@dreamsofcode great point, when dealing with 100 000 000+ rows, I need to try this!
Thank you for the detailed explanation
"You can lower the roof and feel the wind in your hair", I love Dreams of Code, I love PostgreSQL
Really well made video. Staying here for more!
slight addition to COPY. you can use \copy from client if you don't have access to store input files on server. i.e. You can locally stream csv to server.
Can you explain this in a better way?
Seems kinda wild that the first point is prepared statements. It's not even a drop in the bucket for performance optimizations compared to indexing
I want to be from ur first subscribers so when u reach a million in the next year i will comment i was here when he was getting started (i was here at 5k)
Haha that would be awesome
I have found the channel ! awesome thank you
Bravo, subbed!
Who considered Mongo 'fancy'!? I thought everyone had got over the NoSql silliness.
Awesome video
Thank you 🙏
I'm from the Oracle world, a lot of familiar concepts
I have restricted my studies to data manipulation tasks. There is a lot to take a look on data definition and control yet!
Any book recommendations on how to optimize PostgreSQL?
1. the art of postgresql
2. sql antipatterns
Thanks!
Thank you so much for the support. It's really appreciated!!!
Nice and informative vid.
Care to share what mic you are using? Sounds very nice
Thank you! I appreciate the feedback.
For this video I used the Electrovoice RE20. I also recorded in a soundtreated room this time as well which made a difference!
@@dreamsofcode Ah, that explains! I am looking to upgrade my gear but RE20 is really out of my budget 😂 Thanks for the reply
@@xucongzhan9151 It's pricey! I think the Shure SM57 is pretty decent as well and much cheaper, I use that one whenever I travel!
Thanks for the video, very good content and well edited, I'd just recommend putting more dynamism in your voice to match the pacing
Thanks for the tips!
there is no link to your code in the video description. very interesting video!
Oh shoot! Thank you for letting me know. I'll fix that now.
@ThePaulCraft Fixed! Thank you again.
I would like to see content possible and good way to implement multi tenant on postgres
you should add a timestamp for the copy statement part of the video
Nice! Now I don’t have to use web3 and store my data on crypto and pay per request and have huge latencies and non acid transactions. 😂
Mahn, please do the dadbod plugins for NvChad
Joined as a sub , excellent content especially on read replicas
MOOOORE
🔥any good resources to learn more?
There's very little out there really on optimizing PostgreSQL. If it's something of interest I can dedicate some more videos into optimization!
@@dreamsofcode yes pls!!!
@@dreamsofcode YES that would be really helpful
Why would you use preared statements instead of stored procedures? They are automatically "prepared" and don't need to be recreated in every session
Stored procedures are wonderful, but prepared statements have the advantage of being more dynamic in nature. Imagine e.g. a web page displaying a list with multiple columns each with different filters and sorting options. It would be a nightmare to implement with a stored procedure, but using a prepared statement you can dynamically build the necessary query.
@@Lightbeerer Kinda disagree here. Especially with web pages, the connections/sessions are very short and only exists for the short time the page is rendered. And if you only execute the query once per request or paging request, preparing the statements make it slower. And you can absolutely implement dynamic filtering/sorting etc with an SP... and with a lot less SQL injection dangers...
@@Lightbeerer of course it's all trade-offs but especially for web pages, preparing doesn't make sense if you don't call the query multiple times.
With connection pooling, prepared statements make sense because the connections are actually long lived.
@@tipeon That's not what Connection pooling is for and I would consider this bad design. Connection pooling is for mitigating the connection overhead, but you're not supposed to assume that the connection is pooled or in any state.. you should assume it's a new or in some sense resetted connection. So you would have to first ask the server if there's already a prepared statement in the session and if not, recreate it. That would make things slow again. But it's probably not even possible, because you couldn't reconnect to your prepared statement after you reconnect. Which API allows you to reconnect to an already prepared statement on the server once you let go of the statement object you held? I'm not aware of any. So for this to work you'd need to implement your own connection pooling and keep track of the statement.. and that's an even worse idea.
Which language would I write a postgreSQL extension in? PL/SQL? ECMA? Python?
SQL and C code are typically used for creating an extension. Mainlg SQL code and C if you need something more powerful!
I am wondering how did you insert 20 million row into a table, where did you get that data from
I just randomly generated it using a mock data library in Go.
chef
What tool are you using the terminal looks so good
alacrity
mongodb is web scale
Prepared statements STILL don't work with pgbouncer and most other db proxies. No thanks.
people are not using indexes in an SQL service?
also what people should learn about indexes
if you create an index on a column the db will search faster after it
if you have 3 where conditions, for example, then you need to create an index for those 3 colum combination for speed
Definitely at some of the places I've worked at.
Indexes are kind of interesting, they're not very useful for small data sizes, and there's the risk of over optimizing for them.
>if you have 3 where conditions, for example, then you need to create an index for those 3 colum combination for speed
Query planners are smart - if you have very large data sets you can do multi-column indexes and make sure the set reduction is in the correct order, but in my experience even up to a few billion records just having b-trees on each column individually is enough.
A good database should be fast by default. if something requires deep knowledge to make it fast, it's a nerdy database.
Which databases would you consider "fast by default"?
So everything in programming is nerdy then cuz you need to learn to make things work.
wait, that indeed makes some sense...
@@dreamsofcode Mysql is faster, but lack feature. So they both not good. Mongodb is pretty god, unfortunately it's not SQL which is what most people need. That's why companies should make better databases. Something that plug and play and fast.
It would be brilliant to have some database that is fast by default, but I’m afraid that is not possible in every use-case. Every choice in a database is a tradeoff. (Indexes for instance makes some reads a lot faster, but every write a lot slower…)
I think the main selling point for PostgreSQL is that it is relatively easy to change these tradeoffs after you build your application.
MongoDB for speed? PostgreSQL is a faster document db than mongo and it’s not even the main focus.
Thanks!
Thank you so much!