hey can you tell us how sql indexes works on long text , lets we have text indexes in no-sql(mongo) which works good in long text and statements but how long text and statements can be efficiently searched with sql indexes.
9:08 "5080" A good example of how difficult it is to get good benchmarks. ID=5000 was fetched, but why was fetching 5080 so fast while fetching 7080 was slow again? Because PostgreSQL stores rows in pages, which are 8KB blocks of diskspace. Hussain made rows of 2 integer and three(?) characters so one 8KB block can hold about 1000 rows. When the database fetched the page that 5000 was in, that page was cached by the operating system (not the database) had inadvertently instantly cached 1000 rows around id=5000. 14:04 "The primary key is usually stored with every single index" I have never heard of that behavior. Primary keys are always indexed but as far as I am aware they are never automatically added to every index you create. ht index contains tuple and row information to enable a lookup but not the PK. I get the feeling you're seeing the ID rather quicky because the pages were already in cache from the previous queries. And about the LIKe not using an index; that's a good topic for a separate video: Trigram indexes.
Thanks Vinny this is very valuable! and yes me pulling id 5000 and then 5080 came also quick because the OS cached the page.. Neat how databases work Yeah InnoDB I believe works this way stores the primary key along side every index you create on other columns
@@hnasr It seems you are correct about InnoDB adding the PK to the index... wow...weird design choice. But then my opinion of MySQL has never been very high :-)
What do you mean by page here? also, if the primary key (or more general, a reference to the record) is not stored with the indexed row, for exmaple name column here (which I assume that is stored in another database in a b-tree structure), how do database find the actuall record when I say "Select * from table where name='name';" ? tnx
I was just wasting time in youtube and suddenly your video pop up to my screen. Good information summarized. Thanks man! Take my like and keep uploading videos!!!
Checkout my Introduction to Database Engineering Udemy course database.husseinnasser.com Chapters intro 0:00 What is an index 0:30 Describe Table 2:15 SELECT [ID] WHERE ID 4:00 SELECT [NAME] WHERE ID 6:30 SELECT [ID] WHERE [NAME] (No index) 9:20 SELECT [ID] WHERE NAME LIKE 11:12 CREATE INDEX ON [NAME] 12:00 SELECT [ID] WHERE NAME (Indexed) 12:50 SELECT [1D] WHERE NAME LIKE (Indexed) 14:30 Summary 16:00
**Highlights**: + [00:00:00] **Introduction to database indexing** * What is an index and why it is useful * How indexes are built and stored * Examples of index types: B-tree and LSM tree + [00:03:00] **Querying with and without indexes** * How to use explain analyze to measure query performance * How to compare the execution time and cost of different queries * How to avoid full table scans and use index scans instead + [00:12:08] **Creating an index on a column** * How to create a B-tree index on a name column * How to use the index to speed up queries on the name column * How to avoid going to the heap and use inline queries + [00:16:30] **Querying with expressions and wildcards** * How expressions and wildcards prevent the use of indexes * How to avoid using like with percentage signs * How to use hints to force the use of indexes
Good stuff. I'm a DBA of about 20 years. I remember using PostgreSQL before the SQL interface was added. :D Anywho, you mentioned the primary key being stored with the data. It's actually the opposite. The data is stored or "clustered" with the primary key. It's the only index that exists with the data. All others are index lookups that reference the location of the data. Great explanation.
Actually normally I would never say something about the accent that anyone has, but wow bro your accent is perfect! Thank you for creating so much valuable content on youtube like this and please keep doing it!
Great video! You talked about using a multicolumn index as a way to save time to not have to go to disk. It would be interesting to have a video showing the tradeoffs of this approach. It seems Postgres do not recommend using a multicolumn index, only when really necessary.
When you have multiple colums in where clause and sometime index will not hit. Looking forwad for a 2nd part of this video explaining best practices when there are multiple columns in where clause then what kind of and order of index should be made. Separate index on each column or combined index and how this will impact on write time? Also if we write like query as 'Zs%' will index hit?
Great video, thank you! 12:00 wouldn't it be much slower if you searched for something else like '%wr%' because 'zs' results have been cached as a result from the query you ran before the LIKE-query? I mean, the "= 'zs'" query took about 3.2 seconds, the LIKE + wildcard query only 1.x seconds?
Would be nice to know if the index helps in case the expression is "like 'Za%'". Intuitively it should be able to the rows starting with Za and take advantage of the index, what do you think?
At 12:58 you quickly run through the result of this 'explain' query but since we just created index on name column then what is this 'Bitmap heap scan' and 'Bitmap index scan'? Why isn't it 'Index only scan'? Could you please elaborate on this?
At 7:08 I think it has a use case let's say I want to have some kind of validation before inserting data for an employee I can just select an id and if I get a response that means this id is a valid id and then I can proceed further. Or this approach has some issues and I should select some other column to be sure that this exists?
I'm 8 months too late but for anyone else wondering this in the future this a perfectly valid use case. I've seen it in use quite a few times in production at major IT companies though if I remember corectly you can also just "SELECT 1 FROM table WHERE ..." and it will just return 1 for every row found.
What is the best way to use instead of Like query if some similar query needs to be implemented. Use fullTextSearch or elasticSearch or any other things or is there a way?
Hello Hussein! Great video as always. This left me with some questions. How are we supposed to implement a search functionality if "like" is not a good idea? Should we create as much indexes as possible? or should we create indexes on most used fields? Thanks again.
Great question! This is something I didn’t touch upon on the video you can actually create an index based on the LIKE predict. Some databases also support full text search capabilities in an efficient manner. And finally there are databases specialized in text based search
". How are we supposed to implement a search functionality if "like" is not a good idea?" There is nothing inherently wrong with LIKE. Hussain's exaplne uses a BTREE index and that index type cannot search for wildcards at the beginning. Other index types such as Trigram indexes can do that. Fulltext mostly won't help if you are really looking for substrins because they generally don't implement that.Searching is a whole different subject,but genereally speaking PostgreSQL's fulltext with Trigram indexes and a little bit for manual labour is more than sufficient. No need to jump to Lucene and the like unless you are doing very spcific work or at a large scale.
Thank you for the video it was really helpful to watch. I am working with databases and since you already spoke about acid, indexing and pooling another topic I'd be very interested in is views. How and when they are computed their benefits over regular queries and also materialized views which I think is a great Postgre feature.
Which one is better, having different indexes for different columns or having one index containing multiple columns and can you give examples of in which case which option to go for?
When creating the index, does this create a transaction and lock down inserts? I would think because the b-tree is being stored in another place that it doesn't have to do this, but if it doesn't lock the table down, as new records are inserted while the b-tree is being built they won't get put into it until later.
tmanley1985 great question! Lots of databases tackle this differently, I believe postgres blocks inserts/updates and deletes during the create index operation but allows reads. In postgres 12 I believe they introduced a new feature to allow writes while create index however that may take longer.. Its fascinating reading about this stuff and see how each database perform in certain situations.. that makes engineers pick and choose which database to select as a result Awesome question
Thank you so much, Hussein! This explanation is so incredible! After all, i'm asking you if you can explain where index is bad to database (sparse tables, how much it can cost to database size, etc), this will be good at all, and i will be gratefull!
so how can we make matching expressions fast ?, and will a document-oriented database face the same problem and search through all documents in a collection to match maybe a name or will it be faster ? (i mean do we have an option to optimize queries like this ?)
Hi, As far as I know, PostgreSQL includes the primary key along with the secondary index. Now, I have a table - tbl_questions that has: 1. id - primary key 2. question_set_id - secondary index I am using the query: EXPLAIN ANALYZE SELECT * FROM tbl_questions WHERE question_set_id = 3 AND id > 50 LIMIT 10; This query is doing an Index scan on question_set_id and then filtering out records where id > 50 Here's the output: Limit (cost=0.14..7.89 rows=1 width=582) (actual time=0.009..0.009 rows=0 loops=1) -> Index Scan using tbl_questions_question_set_id_idx on tbl_questions (cost=0.14..7.89 rows=1 width=582) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (question_set_id = 3) Filter: (id > 50) Planning Time: 0.073 ms Execution Time: 0.021 ms (6 rows) My question is, if the id is stored along with question_set_id, then why is the condition not like Index Cond: (question_set_id = 3) AND (id > 50) I have tried switching the position for id and question_set_id in the query but still the same result. However, when I created a composite index like below, it was working as expected: 1. id - primary key 2. question_set_id, id- secondary index Here's the query and output: EXPLAIN ANALYZE SELECT * FROM tbl_questions WHERE question_set_id = 5 AND id > 10 LIMIT 10; Index Scan using tbl_questions_question_set_id_idx on tbl_questions (cost=0.14..8.01 rows=1 width=582) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: ((question_set_id = 5) AND (id > 10)) Planning Time: 0.074 ms Execution Time: 0.021 ms (5 rows) It will be very helpful if you can clear this out or let me know if I am doing anything wrong. Thanks
Why the "like" command took less time than the actual "select" command with no "like"? Was that also cached? @HusseinNasser Just needed some explanation over that !?
Thank you for clarifying concept database indexing. I appreciate your insights, but I'm still a bit puzzled about a few things. While I understand that traditional relational databases have robust implementations of indexing, I'm curious to know why one might opt for search engines, like Elasticsearch, over them. Specifically, how does Elasticsearch indexing differ from that of relational databases? Moreover, are there specific challenges or limitations associated with relational database indexing that Elasticsearch indexing can address more effectively? I'd greatly appreciate any further insights you can provide on this topic. Thank you in advance for your time and assistance.
I'd watched this video 6 months ago and understand nothing, now I watch this video and I am working on some kind of project that needs indexing Finally I Understand Thank Hussien But, wait a little bit, how do you design your youtube thumbnail
It depends too much indexes on a table can actually slow down writes because of the need to update all those indexes and structures it becomes an overhead. The trick is to index exactly what we need and put the columns we need to get inline index only scans which are the best. Good question
good info... thanks... very interested in how: - a dattabase decide in what page save a record - how variable size rows are stored (a page can have 1 row... or 10 rows) - what if a record is bigger than the page size? keep up----
@Hussein Nasser how does multi column index(c1, c2, c3) works. If I have a query like where c1='xyz' and c3='123' and c2='456' will this query use the index which is created. Also I am getting a result known as row identifier scan in the explain query. Am not sure what it is. Can you please throw some light on these?
That was a great video ! Can you explain how exactly the b-tree looks like when a index is created? I mean, what a b-tree node contains? (Key, value, pointer to row id etc ? )
@@hnasr I would really appreciate if you can do one. I have been trying to understand it from here: use-the-index-luke.com/sql/anatomy. Will wait for the video!
Great video. What would be the best way to deal with more expressive queries like the example you ran into at the end with 'LIKE'. Perhaps you have a index key that encodes multiple data points, and you want to run regex across each index key to quickly get back a set of id's that satisfy your search. Is there a better method for approaching this type of requirement?
Thanks! You can create an index in postgres with gin extension that allows for expressive queries such as LIKE niallburkley.com/blog/index-columns-for-like-in-postgres/
can I get the 11 million rows database link? From where I can find sample database for learning because I don't want to waste my time for adding huge amount of data manually . I know I can use script but still I want some sample databases for testing, exploring and learning.
Hey, I know this is not Stackoverflow but I have a question and I thought you could help :) my question is, if I have a table with columns a, b, c, d, e, f and I create a primary key on a, b, c, d and e, is there a way I can create a hash index on those 5 columns? So only 1 hash index that only stores 1 hash for each row with all the 5 columns combined and hashed because I'm always gonna do queries on those 5 columns so I don't need to waste memory for storing 5 times more hashes and it would also be faster right?
Hard to discuss such problems over YT comments but ill give it a shot Question very broad and depends on your use case. I would first ask is the database uses clustered or non-clustered index (you said primary so Im assuming clustered) if so than generally this doesn’t look like a good idea inserts will be extremely slow assuming those columns are not ordered. I would suggest generating a hash of those column and use that as the primary key again you have the same problem with randomness. I suggest using a database where you can control clustering (postgres use all secondary keys) If you absolutely need to keep using clustered primary key then I suggest creating a new primary field that is serial and create a unique composite index on the 5 columns instead... Also make sure that your query uses “AND” and not “ OR” otherwise the index is not as effective.. Hope that helps
Flankymanga correct Composite indexes contain multiple fields and are both “key” columns meaning they are built to be searchable in the index What i was referring to multi-column is ability to add nonkey column to the index doesn’t necessarily mean to be searchable just that its used for selecting after searching using the key column
This video is really helpful. I have few questions 1. Can Insert query become Faster in case of indexing?? 2.After creating index if i insert some new rows then indexing will work for this new rows?? Do I need refresh the table's data for that new rows or need to recreate the index??
Inserts on tables with indexes are “slower” than when no indexes exist. Reason being the additional work required to update indexes. That being said it depends on rows inserted, nulls for instance are mostly not indexed so no cost there.
What can be the best way to execute my query faster if I have millions of records in a table and I need to perform select , delete and insert operation one by one in a single API call? Currently I am using postgre (In Python) without indexing and it's taking more than 3 mins to finish API call.
I ran the same command "explain analyze select id from country where id = 8" on a table named country which I have but it was showing "Heap Fetches: 1" I am on postres version 9.3 Any idea why is that?
This means that this data is recently inserted or updated. Postgres had to go to heap to check if the row is visible or not to the current transaction. If you run vacuum you should not see the heap fetch. Unless something else preventing the vacuum from updating the visibility map.
The worst-case scenario is O(n), and it has to scan everything, but I think if the requested record was at the beginning of the table, it should be faster. Is this how it works?
Correct assuming full table scan and if we are lucky the row was at the beginning. I am not sure however that all databases scan top to bottom though. That is why database sometimes prefer seq scan over index scan id it knows it could find the row faster with a seq sequence or if the index scan is gonna be slower because of scattered nature
Hey Hussein, I hope this comment finds you. I had a question related to Fulltext Search (for MSSQL but I understand you have a preference for PostGres). I have an issue where if there is a search involving multiple CONTAINS calls, any subsequent search on one of the columns takes a really long time to execute. Oddly enough, after around 10 mins or so, it then becomes rapid. This is on a table with around 71k rows so not massively huge. I wondered if it was something to do with indexes but it would be great to see a video on Fulltext. I can send you the query exactly how I have it
there are two types of inverted indexes you can use for fulltext search: gin, gist each one has pros and cons. for gin: it's good for lookups but not so with inserts, deletes or updates. for gist: it's the opposite (good for updates not lookups) if you use gin on a table that changes often it would be slow as it takes time to build/rebuild the index. whenever you do query while index is building, it won't use the index and will do a full table scan. but once it finishes building the index it will use it. so that could be a reason for your problem. using gin for a table that changes so often which causes the index to rebuild each time do you can't use it until in finishes building.
@@mohamedmohamedy3085 interesting, was not aware of gin/gist. Will look into that a bit more. I gave up with that search implementation and decided to stick with the current Azure cog search we had on the project 😂
That is a good question the answer is not always, because more indexes mean more structure to update means slow writes. The rule of thumb is to understand what are the most important columns that will limit the search space to a small number so that a sequential scan becomes easier. for example your where clause will have both firstname lastname , if you put an index on firstname that will still leave lots of rows to search (e.g. John is a very common name so lots of last names to search) if you put an index on lastname the search space is lowered, fewer people has last name smith compared to a first name john so you lower the search space further .. so it might be good to test only an index on last name if you can afford both that is even better but think of the cost .. so my advice is test, test test..
when I select only the ID (pkey) it says Index Only Scan but shows " Heap Fetches: 1" and when I do a select *, there is no mention of Heap Fetches. Can you please explain why that might be happening?
Great question, When you do an insert, update or a delete postgres put some visibility meta data associated with the new rows for mvcc reasons. Something like this row is visibile between this transaction id and this transaction id .. these visibility information is stored in the table heap fetches happened when postgres had to fetch the heap table despite only requiring an index scan. the reason is postgres needs to check if the index entry is visible or not and the information which says this row is visible or not is not is stored in the heap. This happens when you actively edit the table you can fix it by running vacuum table which effectively removes the dead tuples and updates the visibility map. Select * always go to the heap because your asking for everything so it doesn’t bother mentioning heap fetches, those just reads..
how to performance tune? explain analyze select ...(performance report) table vs index(two different data structures on the disk) slow vs fast index scan no index scan like
Get my Fundamentals of Database Engineering udemy course to learn more , link redirects to udemy with coupon applied database.husseinnasser.com
hey can you tell us how sql indexes works on long text , lets we have text indexes in no-sql(mongo) which works good in long text and statements but how long text and statements can be efficiently searched with sql indexes.
Already bought it and enjoying it 😀
9:08 "5080"
A good example of how difficult it is to get good benchmarks. ID=5000 was fetched, but why was fetching 5080 so fast while fetching 7080 was slow again? Because PostgreSQL stores rows in pages, which are 8KB blocks of diskspace. Hussain made rows of 2 integer and three(?) characters so one 8KB block can hold about 1000 rows. When the database fetched the page that 5000 was in, that page was cached by the operating system (not the database) had inadvertently instantly cached 1000 rows around id=5000.
14:04 "The primary key is usually stored with every single index"
I have never heard of that behavior. Primary keys are always indexed but as far as I am aware they are never automatically added to every index you create. ht index contains tuple and row information to enable a lookup but not the PK.
I get the feeling you're seeing the ID rather quicky because the pages were already in cache from the previous queries.
And about the LIKe not using an index; that's a good topic for a separate video: Trigram indexes.
Thanks Vinny this is very valuable! and yes me pulling id 5000 and then 5080 came also quick because the OS cached the page.. Neat how databases work
Yeah InnoDB I believe works this way stores the primary key along side every index you create on other columns
@@hnasr It seems you are correct about InnoDB adding the PK to the index... wow...weird design choice. But then my opinion of MySQL has never been very high :-)
I was about to comment on the same thing explaining why the query with id = 5080 was faster than the one with id = 7080.
also about the page size, I think it's not fixed in every system. for example I just checked on my mac($getconf PAGESIZE), it's 4KB.
What do you mean by page here? also, if the primary key (or more general, a reference to the record) is not stored with the indexed row, for exmaple name column here (which I assume that is stored in another database in a b-tree structure), how do database find the actuall record when I say "Select * from table where name='name';" ?
tnx
I was just wasting time in youtube and suddenly your video pop up to my screen. Good information summarized. Thanks man! Take my like and keep uploading videos!!!
Thank you 😊 glad you enjoyed the content !
Checkout my Introduction to Database Engineering Udemy course
database.husseinnasser.com
Chapters
intro 0:00
What is an index 0:30
Describe Table 2:15
SELECT [ID] WHERE ID 4:00
SELECT [NAME] WHERE ID 6:30
SELECT [ID] WHERE [NAME] (No index) 9:20
SELECT [ID] WHERE NAME LIKE 11:12
CREATE INDEX ON [NAME] 12:00
SELECT [ID] WHERE NAME (Indexed) 12:50
SELECT [1D] WHERE NAME LIKE (Indexed) 14:30
Summary 16:00
Wow... how did you put that character to comment section?
Athaariq Ardiansyah thats custom emoji that I built, members of the channel can use it 😍
@@hnasr We can program our own emoji? That blows my mind :o
**Highlights**:
+ [00:00:00] **Introduction to database indexing**
* What is an index and why it is useful
* How indexes are built and stored
* Examples of index types: B-tree and LSM tree
+ [00:03:00] **Querying with and without indexes**
* How to use explain analyze to measure query performance
* How to compare the execution time and cost of different queries
* How to avoid full table scans and use index scans instead
+ [00:12:08] **Creating an index on a column**
* How to create a B-tree index on a name column
* How to use the index to speed up queries on the name column
* How to avoid going to the heap and use inline queries
+ [00:16:30] **Querying with expressions and wildcards**
* How expressions and wildcards prevent the use of indexes
* How to avoid using like with percentage signs
* How to use hints to force the use of indexes
Started today, My 15th video in a row. Thanks a lot man, getting all this knowledge for free is a blessing for us.
nice! thanks for commenting and take some rest and pick up some other time :)
all the best
Good stuff. I'm a DBA of about 20 years. I remember using PostgreSQL before the SQL interface was added. :D Anywho, you mentioned the primary key being stored with the data. It's actually the opposite. The data is stored or "clustered" with the primary key. It's the only index that exists with the data. All others are index lookups that reference the location of the data. Great explanation.
didn't know postgres primary index is clustered! thanks
Thank you so much! This was clear, concise and very helpful! More postgres tutorials plz!
Man, I absolutely love your attitude and style of teaching. Deeply grateful for your content... thank you sir!
Actually normally I would never say something about the accent that anyone has, but wow bro your accent is perfect!
Thank you for creating so much valuable content on youtube like this and please keep doing it!
I have learned in this video more than I learned in a complete university semester
الله يحفظك ❣
You are a gift from God for us backend developers.
Me: Man I'm really interested in (insert subject here). I wonder if there's a video on this.
*Hussein has entered the chat*
just finished my first proper postgresql view that takes about 10 seconds and i see hussain nasser upload this video, coincidence? i think not...
You're orders of magnitude better than the TA for my DB course! Thank you very much for the explanation 😊
Love you my guy, most valuable tech content creator in youtube.
Great video!
You talked about using a multicolumn index as a way to save time to not have to go to disk. It would be interesting to have a video showing the tradeoffs of this approach. It seems Postgres do not recommend using a multicolumn index, only when really necessary.
This was extremely helpful. One of those topics that is never really explained in detail!
in this quick video I jumped into Database Flow world!!! really appreciate your work, bro
When you have multiple colums in where clause and sometime index will not hit. Looking forwad for a 2nd part of this video explaining best practices when there are multiple columns in where clause then what kind of and order of index should be made. Separate index on each column or combined index and how this will impact on write time?
Also if we write like query as 'Zs%' will index hit?
The best channel on youtube!
thank you for this! I love these videos that actually show how theory works in a concrete example!
Love this video. Really fascinating learning about the intuition behind indexes!
wow, exactly what i am looking for! awesome, Hussein.
Nasser, Your videos are really informative and it helps me picture the topic . More power to you , god bless
appreciate you dear, thanks for your comment!
Thank you so much! very helpful :)
this channel deserve Million of subscriber
You explaining skills are just excellent.
Great video, thank you!
12:00 wouldn't it be much slower if you searched for something else like '%wr%' because 'zs' results have been cached as a result from the query you ran before the LIKE-query? I mean, the "= 'zs'" query took about 3.2 seconds, the LIKE + wildcard query only 1.x seconds?
Probably because of caching. EXPLAIN ANALYZE will run the SELECT query and tell what is going on.
man your videos are great. simple to exactly to the point.
Would be nice to know if the index helps in case the expression is "like 'Za%'". Intuitively it should be able to the rows starting with Za and take advantage of the index, what do you think?
At 12:58 you quickly run through the result of this 'explain' query but since we just created index on name column then what is this 'Bitmap heap scan' and 'Bitmap index scan'? Why isn't it 'Index only scan'? Could you please elaborate on this?
At 7:08 I think it has a use case let's say I want to have some kind of validation before inserting data for an employee I can just select an id and if I get a response that means this id is a valid id and then I can proceed further.
Or this approach has some issues and I should select some other column to be sure that this exists?
I'm 8 months too late but for anyone else wondering this in the future this a perfectly valid use case. I've seen it in use quite a few times in production at major IT companies though if I remember corectly you can also just "SELECT 1 FROM table WHERE ..." and it will just return 1 for every row found.
What is the best way to use instead of Like query if some similar query needs to be implemented.
Use fullTextSearch or elasticSearch or any other things or is there a way?
This was SO GOOD, congratulations, highest quality class
Hello Hussein! Great video as always. This left me with some questions. How are we supposed to implement a search functionality if "like" is not a good idea? Should we create as much indexes as possible? or should we create indexes on most used fields? Thanks again.
Great question! This is something I didn’t touch upon on the video you can actually create an index based on the LIKE predict. Some databases also support full text search capabilities in an efficient manner. And finally there are databases specialized in text based search
Would you be able to make a video regarding that?
". How are we supposed to implement a search functionality if "like" is not a good idea?"
There is nothing inherently wrong with LIKE. Hussain's exaplne uses a BTREE index and that index type cannot search for wildcards at the beginning. Other index types such as Trigram indexes can do that.
Fulltext mostly won't help if you are really looking for substrins because they generally don't implement that.Searching is a whole different subject,but genereally speaking PostgreSQL's fulltext with Trigram indexes and a little bit for manual labour is more than sufficient. No need to jump to Lucene and the like unless you are doing very spcific work or at a large scale.
Thank you for the video it was really helpful to watch. I am working with databases and since you already spoke about acid, indexing and pooling another topic I'd be very interested in is views. How and when they are computed their benefits over regular queries and also materialized views which I think is a great Postgre feature.
Which one is better, having different indexes for different columns or having one index containing multiple columns and can you give examples of in which case which option to go for?
Informative with crystal clear explanation. Thank you.
Amazing video, glad to watch this and understand at first glance. I am surely gonna watch more videos to enhance my technical skill.
When creating the index, does this create a transaction and lock down inserts? I would think because the b-tree is being stored in another place that it doesn't have to do this, but if it doesn't lock the table down, as new records are inserted while the b-tree is being built they won't get put into it until later.
tmanley1985 great question! Lots of databases tackle this differently, I believe postgres blocks inserts/updates and deletes during the create index operation but allows reads. In postgres 12 I believe they introduced a new feature to allow writes while create index however that may take longer..
Its fascinating reading about this stuff and see how each database perform in certain situations.. that makes engineers pick and choose which database to select as a result
Awesome question
By default PostgreSQL will use a read-only lock during CREATE INDEX. But there is a CONCURRENTLY option (add
Thank you so much, Hussein! This explanation is so incredible! After all, i'm asking you if you can explain where index is bad to database (sparse tables, how much it can cost to database size, etc), this will be good at all, and i will be gratefull!
so how can we make matching expressions fast ?, and will a document-oriented database face the same problem and search through all documents in a collection to match maybe a name or will it be faster ? (i mean do we have an option to optimize queries like this ?)
Thank you Hussein for this video, was waiting for this one.
My pleasure
Hi,
As far as I know, PostgreSQL includes the primary key along with the secondary index.
Now, I have a table - tbl_questions that has:
1. id - primary key
2. question_set_id - secondary index
I am using the query:
EXPLAIN ANALYZE SELECT * FROM tbl_questions WHERE question_set_id = 3 AND id > 50 LIMIT 10;
This query is doing an Index scan on question_set_id and then filtering out records where id > 50
Here's the output:
Limit (cost=0.14..7.89 rows=1 width=582) (actual time=0.009..0.009 rows=0 loops=1)
-> Index Scan using tbl_questions_question_set_id_idx on tbl_questions (cost=0.14..7.89 rows=1 width=582) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (question_set_id = 3)
Filter: (id > 50)
Planning Time: 0.073 ms
Execution Time: 0.021 ms
(6 rows)
My question is, if the id is stored along with question_set_id, then why is the condition not like Index Cond: (question_set_id = 3) AND (id > 50)
I have tried switching the position for id and question_set_id in the query but still the same result.
However, when I created a composite index like below, it was working as expected:
1. id - primary key
2. question_set_id, id- secondary index
Here's the query and output:
EXPLAIN ANALYZE SELECT * FROM tbl_questions WHERE question_set_id = 5 AND id > 10 LIMIT 10;
Index Scan using tbl_questions_question_set_id_idx on tbl_questions (cost=0.14..8.01 rows=1 width=582) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: ((question_set_id = 5) AND (id > 10))
Planning Time: 0.074 ms
Execution Time: 0.021 ms
(5 rows)
It will be very helpful if you can clear this out or let me know if I am doing anything wrong.
Thanks
That's because only InnoDB engine adds the primary key to every index. I think you are not using InnoDB
Please create a video on sql joins time complexity analysis, as you have done for index scan, index only and table scan
perfect explanation! thanks Hussein...
Thanks for this video. Please also make a video on everything about SQL Query optimisation. I really need it soon, thanks!
Sure thing!
Why the "like" command took less time than the actual "select" command with no "like"? Was that also cached?
@HusseinNasser Just needed some explanation over that !?
Superb simple explanation ❤❤
Thanks for indexing make simple for a layman.... It's complicated but this video made it baby job
Thank you for clarifying concept database indexing.
I appreciate your insights, but I'm still a bit puzzled about a few things.
While I understand that traditional relational databases have robust implementations of indexing, I'm curious to know why one might opt for search engines, like Elasticsearch, over them. Specifically, how does Elasticsearch indexing differ from that of relational databases?
Moreover, are there specific challenges or limitations associated with relational database indexing that Elasticsearch indexing can address more effectively?
I'd greatly appreciate any further insights you can provide on this topic.
Thank you in advance for your time and assistance.
I'd watched this video 6 months ago and understand nothing, now I watch this video and I am working on some kind of project that needs indexing
Finally I Understand
Thank Hussien
But, wait a little bit, how do you design your youtube thumbnail
This is awesome I'm looking to getting better with PostgreSQL. I am wondering if you can do RLS policy with postgres.
Yes, you can read about that in the manual: www.postgresql.org/docs/13/ddl-rowsecurity.html
Really good video, well explained and just the right degree of details that I was looking for!
Thanks. It is a really amazing lesson I have learnt from this discussion.
Thanks for cool video.
i have a question. what is the effect if add too much indexing to a table?
It depends too much indexes on a table can actually slow down writes because of the need to update all those indexes and structures it becomes an overhead.
The trick is to index exactly what we need and put the columns we need to get inline index only scans which are the best.
Good question
Thanks for your time! Great intro to indexes
this was a nice demonstration of indexing, thanks!
Waiting for the next indexing video.
Does the execution time include the CPU time and the release memory time?
Amazing video, thanx Hussein
Ok so is there a smarter way to do the like query?
good info... thanks...
very interested in how:
- a dattabase decide in what page save a record
- how variable size rows are stored (a page can have 1 row... or 10 rows)
- what if a record is bigger than the page size?
keep up----
@Hussein Nasser
how does multi column index(c1, c2, c3) works. If I have a query like where c1='xyz' and c3='123' and c2='456' will this query use the index which is created. Also I am getting a result known as row identifier scan in the explain query. Am not sure what it is. Can you please throw some light on these?
Yes an AND query would always use the composite index I talked about it in this video ua-cam.com/video/mBonHTpQO7E/v-deo.html
Great video and teaching skills. Inspired me to buy your udemy course on databases. Can't wait to learn more.
Is Indexing random sha256 key a good idea for millions of records ?
Is there any way to optimize like query?
thats a bloody good video mate. Whats the ram speed youre running there?
That was a great video ! Can you explain how exactly the b-tree looks like when a index is created? I mean, what a b-tree node contains? (Key, value, pointer to row id etc ? )
Good idea for a video ☝️
@@hnasr I would really appreciate if you can do one. I have been trying to understand it from here: use-the-index-luke.com/sql/anatomy. Will wait for the video!
How to make queries containing like conditions faster?
Thank you for this informative video❤. You're worth subscribing❤
You are an amazing teacher. Thank you!
This was simply wow 👏🏻
Really great job bro, thanks for all this information.
Great video.
What would be the best way to deal with more expressive queries like the example you ran into at the end with 'LIKE'. Perhaps you have a index key that encodes multiple data points, and you want to run regex across each index key to quickly get back a set of id's that satisfy your search. Is there a better method for approaching this type of requirement?
Thanks! You can create an index in postgres with gin extension that allows for expressive queries such as LIKE
niallburkley.com/blog/index-columns-for-like-in-postgres/
@@hnasr Thanks for the info. Great stuff!
Will putting an index on a View help?
Thanks Hussein! Amazing content!
@Hussein Nasser Can you also discuss multi-column index? Thanks!
can I get the 11 million rows database link?
From where I can find sample database for learning because I don't want to waste my time for adding huge amount of data manually . I know I can use script but still I want some sample databases for testing, exploring and learning.
Very helpful. Thank You.
Subscribed to the channel.
Hey, I know this is not Stackoverflow but I have a question and I thought you could help :) my question is, if I have a table with columns a, b, c, d, e, f and I create a primary key on a, b, c, d and e, is there a way I can create a hash index on those 5 columns? So only 1 hash index that only stores 1 hash for each row with all the 5 columns combined and hashed because I'm always gonna do queries on those 5 columns so I don't need to waste memory for storing 5 times more hashes and it would also be faster right?
Hard to discuss such problems over YT comments but ill give it a shot
Question very broad and depends on your use case. I would first ask is the database uses clustered or non-clustered index (you said primary so Im assuming clustered) if so than generally this doesn’t look like a good idea inserts will be extremely slow assuming those columns are not ordered.
I would suggest generating a hash of those column and use that as the primary key again you have the same problem with randomness. I suggest using a database where you can control clustering (postgres use all secondary keys)
If you absolutely need to keep using clustered primary key then I suggest creating a new primary field that is serial and create a unique composite index on the 5 columns instead...
Also make sure that your query uses “AND” and not “ OR” otherwise the index is not as effective..
Hope that helps
@@hnasr Yeah it really did help! My question was not the best 😅 Thank you so much! Also keep up the UA-cam stuff, I've been watching all your videos
How to create indexing example on 12:20 min.
17:32 aren't thoose called composite indexes?
Flankymanga correct
Composite indexes contain multiple fields and are both “key” columns meaning they are built to be searchable in the index
What i was referring to multi-column is ability to add nonkey column to the index doesn’t necessarily mean to be searchable just that its used for selecting after searching using the key column
@@hnasr hmm thats something new to me... looking forward to see your vid
Why did '%Zs%' execute so much faster than 'Zs'? Because of cache again?
This video is really helpful. I have few questions
1. Can Insert query become Faster in case of indexing??
2.After creating index if i insert some new rows then indexing will work for this new rows?? Do I need refresh the table's data for that new rows or need to recreate the index??
Inserts on tables with indexes are “slower” than when no indexes exist. Reason being the additional work required to update indexes.
That being said it depends on rows inserted, nulls for instance are mostly not indexed so no cost there.
What can be the best way to execute my query faster if I have millions of records in a table and I need to perform select , delete and insert operation one by one in a single API call?
Currently I am using postgre (In Python) without indexing and it's taking more than 3 mins to finish API call.
How Btree search store data on String column, btree for primary key number work like less than greater than, but how it works for string?
a string is just a number so it works similarly but the comparison is a little bit different since there are multiple bytes to compare.
I ran the same command "explain analyze select id from country where id = 8" on a table named country which I have but it was showing "Heap Fetches: 1"
I am on postres version 9.3
Any idea why is that?
This means that this data is recently inserted or updated. Postgres had to go to heap to check if the row is visible or not to the current transaction.
If you run vacuum you should not see the heap fetch. Unless something else preventing the vacuum from updating the visibility map.
@@hnasr thanks for explaining it 😃
Thanks you so much for such a details explanation
The worst-case scenario is O(n), and it has to scan everything, but I think if the requested record was at the beginning of the table, it should be faster. Is this how it works?
Correct assuming full table scan and if we are lucky the row was at the beginning. I am not sure however that all databases scan top to bottom though.
That is why database sometimes prefer seq scan over index scan id it knows it could find the row faster with a seq sequence or if the index scan is gonna be slower because of scattered nature
Well done! Congrats 👏
@Hussien Naseer can you do same for mongodb and lookups
Hey Hussein, I hope this comment finds you. I had a question related to Fulltext Search (for MSSQL but I understand you have a preference for PostGres). I have an issue where if there is a search involving multiple CONTAINS calls, any subsequent search on one of the columns takes a really long time to execute. Oddly enough, after around 10 mins or so, it then becomes rapid. This is on a table with around 71k rows so not massively huge.
I wondered if it was something to do with indexes but it would be great to see a video on Fulltext. I can send you the query exactly how I have it
there are two types of inverted indexes you can use for fulltext search: gin, gist
each one has pros and cons.
for gin: it's good for lookups but not so with inserts, deletes or updates.
for gist: it's the opposite (good for updates not lookups)
if you use gin on a table that changes often it would be slow as it takes time to build/rebuild the index.
whenever you do query while index is building, it won't use the index and will do a full table scan.
but once it finishes building the index it will use it.
so that could be a reason for your problem.
using gin for a table that changes so often which causes the index to rebuild each time do you can't use it until in finishes building.
@@mohamedmohamedy3085 interesting, was not aware of gin/gist. Will look into that a bit more. I gave up with that search implementation and decided to stick with the current Azure cog search we had on the project 😂
I wonder how BTree works for string type
Is it recommended to index all the columns that are commonly used in the "WHERE" condition
That is a good question the answer is not always, because more indexes mean more structure to update means slow writes. The rule of thumb is to understand what are the most important columns that will limit the search space to a small number so that a sequential scan becomes easier.
for example your where clause will have both firstname lastname , if you put an index on firstname that will still leave lots of rows to search (e.g. John is a very common name so lots of last names to search)
if you put an index on lastname the search space is lowered, fewer people has last name smith compared to a first name john so you lower the search space further ..
so it might be good to test only an index on last name
if you can afford both that is even better but think of the cost ..
so my advice is test, test test..
Like %za% is slow,
however I feel,
Like za% would benefit from index
How does indexing work in Non-Relational databases !?
when I select only the ID (pkey) it says Index Only Scan but shows " Heap Fetches: 1"
and when I do a select *, there is no mention of Heap Fetches.
Can you please explain why that might be happening?
Great question,
When you do an insert, update or a delete postgres put some visibility meta data associated with the new rows for mvcc reasons. Something like this row is visibile between this transaction id and this transaction id .. these visibility information is stored in the table
heap fetches happened when postgres had to fetch the heap table despite only requiring an index scan. the reason is postgres needs to check if the index entry is visible or not and the information which says this row is visible or not is not is stored in the heap.
This happens when you actively edit the table you can fix it by running vacuum table which effectively removes the dead tuples and updates the visibility map.
Select * always go to the heap because your asking for everything so it doesn’t bother mentioning heap fetches, those just reads..
Clearly explained, your voice resembles Harsha Bhogle in 1.5x speed.
Awesome explanation
how to performance tune? explain analyze select ...(performance report)
table vs index(two different data structures on the disk)
slow vs fast
index scan
no index scan
like