Learn the fundamentals of database engineering with me, get my udemy course over 19 hours worth of database engineering first principles. Head to database.husseinnasser.com for a discount coupon, link redirects to udemy with discount coupon applied.
I feel this is a good explanation of how an offset works. In some cases it doesn't matter as the user won't turn to the 100+ page but will use search if it exists. Plus in cases where the user jumps to some random page, this won't work. But this video got me thinking. Great work!!
Sometimes you can do "spatial pagination" when working with geospatial data. Basically assign each row a geohash, iterate over (shorter) geohashes for a region of interest, use the shorter geohash to lookup a page of rows. Just have to make sure the geohashes assigned to your rows are shorter than the geohashes covering your region of interest.
Basically, this is the exact way that the big platforms like Google or Facebook are using for pagination. They give you a pointer, and you can fetch the next page results by that pointer.
FB is using Cassandra DB and that is exactly how Cassandra db pagination works. Limit per request on CassandraDB is 5000 so not possible to retrieve all data at once. For next page you pass pointer and than get another 5000 results (or any other number less than 5000 that you have set) like you mentioned.
That would be difficult since UUIDs are randomly generated. I would rely on another field (int or date) to perform this. UUIDs are great but this is one limitation that I personally ran into too
This is a good solution when you can use it, but in many cases where-condition or sorting do not allow you to do that. Just apply filtering or ordering by any field other than id and this approach stops working.
Omg! Thank you for this. I’ve been using this since like forever now, I never knew it’s common and it has bigger effects like this but I was feeling comfort in using this with my infinite scrolling lists so later on used to do the same with traditional pagination. But glad it did better.
Correct no longer can you jump to a specific page without a hint of an additional filter (id or date) but personally I would rather disable that feature and gain performance by letting the user scroll pages incrementally. I don’t recall me clicking on anything in the second page or third page. If that feature is a must have I might add another table that maps page to id maybe? I dont know I would like to see what others think this is interesting 🧐
@@lighty8561 i see what you are saying but ur code assume we have results in each and every row of the the previous ids which might be the case. take the following case i have results in ids 1 to 10: Then nothing till id-100 now the equation you provided doesn't hold true. Thank you for the answer buddy.
@@BernardoBits if you go with the offset then you are doing the bulk of the work assume we want to go for rows 1 million to one million and ten. Thank you buddy but I think there are too many edge cases for this one.
But what if Google throws the user directly to your site on page 10 site_com/news/10? Or what if “user” is Googlebot who crawls page 7 directly? Lots of real-world scenarios out there where you have to offer pages to search engines to crawle your content etc.
But say someone has to jump to that page 100 directly without having seen any posts before. What id number would you choose then, because sometimee the ids in the middle would be missing if someone deleted their posts? If a lot of people started deleting their posts wouldn't this cause an issue?
The explained approach is not suitable for directly jumping to a particular page. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
For autoincremented id you can do something like this: Multiply the page number by the limit value + the id of the first id in the database minus 1. ?page=1777&per_page=10 Let s assume the first id in the db is 10000 Your new id for the where clause is: 10000 + (1777 x 10) - 1 First value in the db can be replaced by starting point of auto_increment
@@MyBerthony This does not work if you delete more than 10 items from the db for instance. I imagine that you would have to have a column that you would edit everytime something was deleted, or there could be another function that returns to the client app the ids, so the client can now see what ids are in witch page.
GREAT. thanks for the warning. Performance AND integrity. workaround... depends on the context. size, latency, concurrency, the index you can work with... Old times I used to solve with temporary non normalized tables on the fly, performance and integrity was far better, but I didn't face the crazy sizes of modern databases.
Please correct me if I am wrong The order by column in this case will need an index to be performant. So this will not be a good idea in case the user has option to sort by many fields like in a back office application.
Yes, however this only works if the column you offset from has values that are both unique and serial sorted by the sorting parameters. If that condition is not true then you still have to use offset or create a column that has that property, which makes inserting a new row a job that requires first fetching the two values where the new row has to be in between and calculate the average value between those sorting values.
Yes, this is also problematic when client requires rows from 100 to 200, but if the rows 150 - 170 has been deleted, we have only 80 rows but the client expecting 100 rows, so we make additional SQL queries, to get required rows, when we need do that, we also want to inform the client, In next request you need to ask from this id, example 230. kind of expensive but much better than offset may be.
Great content as always. I guess you could also provide the use cases, like infinity scroll and where user is going page by page only. As if user is allowed to jump pages then how come system know which id forward it should move...
the client needs to know the last post it fetched, and you then paginate using that page... when scroll hits 1400 var id = items[100].id callPaginate(id)
Why does "offset" work this way though? Seems like a pretty simple optimization that if you can determine the rows that "offset" and "limit" specify only from an index, then you don't fetch the rows you wouldn't return.
3 years later but the short answer is that the data on disk for the entire table is not necessarily stored contiguously, therefore you cannot simply jump to an offset in the database file. For very specialized database systems this may be possible but for the majority it is not
I'm asking a question that might show up my inexperience with databases. How we handle skipping the rows using index loop when we are also filtering records on different fields. Consider an api endpoint for search that returns paginated results
@@ahmadsaeed2420 What happens some records/rows are deleted. I think this technique may not be suitable if we want to directly jump to some page number. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
What I don't understand is that why db makers don't optimize offset so that when used without a where clause, it scans past the offset part and fetch only the limit part from disk
You should not use first-name/any other useful column as a primary key, it violates the Single Responsibility Principle. You can create an index (unique or not) on that column but it is recommended that you keep primary key as auto-increment.
@@rahulsangvikar7973 I didn't mention anything regarding using first name as a primary key. My comment points out potential issues caused by ordering using a non increment field.
I do wonder how you would implement a pager on a website where a user can immediately jump to say page 12. Using this technique you'd need to know where page 11 ended, so in the end you still need to read all records to determine where each page ends?
The explained approach is not suitable for directly jumping to a particular page. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
You could also do a select for only the Id column and limit pagesize * pagenumbers, and then ignore all the rows until you hit your pagesize, add that to the page index list, and then reset counter + repeat / continue. That would work for generating paging indexes for further pages at least, using the same method as explained in the video, tho you’d end up throwing away some of the id values, it should still be decently fast unless you add too big amount of pages or pagesize.
This only works when the Id is the primary key and you are not sorting by anything else, which is not likely. It is very rare that you can use this in real life.
When not having too many rows (billions) , i dont see offset as an issue. Even if the query takes half a second at a certain point. As others pointed out, if the user can navigate directly to page X, this wont work Anyway, great explanation
Great video....can we use this for reverse (or back) pagination as well? I'm guessing We will need extra coding using and ORDER BY ...DESC ? Or is there a better way?
How does this work when you paginate through items that are already filtered by the user. Let's say you have a list of movies that are sorted by score and filtered by certain genres and release dates. Each combination of where clauses and order statements is going to give you a different set of items across pages. In this example you use "< id" but there is no pattern or reference when the items are already subset in a given order. Is there a way in this situation?
Gey Naseer good one. But paging is usually used with sorting and at that time our ids amy not be depicting the sort order that we really want. Like a sort with number of likes and pagination.
Hello Hussein, This is a nice video as always. What happens if my primary id is not an autoincrement but uuid??? Is it the same thing as explained in the video? thanks so much.
As long as you do an order by on the UUID field the paging should work. However the problem unlike integer primary key is UUID is random so it won’t give you the “latest” entries (news in my example) when you do an order by UUID.
@@hnasr Wow! I wanted to know the answer to this query and was thinking that this is an old video and no one would actually respond. But Hussein, you are awesome
In case of APIs - we can send an opaque string which encodes the last page last row column values(columns used in ordering). In the API result, we can name this opaque string as nextPage. When clients make next API call they need to pass this as a parameter. Now we can decode & get the column values and form the where clause needed.
that is sequential pagination it wont work for all cases still using offset is the common approach and the only reliable approach could u utilize it to fetch 5th page content based on the first page? we can mix both case front sends last seen we use key set else we use offset thats what i think
i think i can't do the index based pagination because before even paginating it needed the entire set to begin, while the actual pagination happens only when all the filtration and aggregation occurred
Good stuff! But I think that this approach doesn't protect from duplicate rows reading, mentioned at the beginning of the video, as well. Correct me, if I'm wrong.
This solution won't work for every database table, or even many. What if the ID in my table isn't a fixed variable? What if it starts at different things, so something like 91100 is actually only the first variable?
What should I do if I have only 20 records, the limit is 10, ordered by ascending, and in the second API call, I receive ID 20? How do I determine if I need to send another request, especially if the current query is still returning no data records found?
What if I want paginate not all table, but just subset of mine "news" ? In such case you can go to next page with your approach, not on random page in your list of "news"
Hello Hussein, We are building a service that gets data from various services, where each of them has it's own structure, so they are related only topic wise. Our structure reflects our needs and gets common ground with the others to some degree. Each provider is handled separately via API, using queue system. Some relations are in place for the product types. Generally speaking, how would you approach this task? I am mostly concerned about database related performance. Would you use separate database for originally imported data? Are relations good idea, how would you solve them? I imagine using UUIDs would speed up things here as well. This is open question, looking for advices. Thanks!
Sounds like something the database driver should optimise, not the end user. Also, seems to be only a problem once you go into the 100 k territory. I doubt even PromHub doesn't have so many videos (at least now)
I understand the main idea, but I can't see how it can work on complicated tables. For example, at my previous work, we had to move "deleted" records to another table to make the main table smaller. Or how do we work with records of different users or "hidden" records? And I do not remember the details exactly, but as I know, when we roll back the transaction, we have a chance to skip auto increment. here a example: id | user | status | data 1 | 1 | null | "data" 2 | 2 | null | "data" 3 | 1 | "hidden" | "data" 17 | 1 | null | "data" ... 10000 | 1 | null | "data" and when user look at second page, he sees exactly 9 records from first page. select * from history where id > 10 and user = 1 and status is null limit 10; How I can see, is the only way to make it work is to make a virtual table with another row of ID with preselected rows, but it is probably costly than offset. Is there is another way?
Users should not randomly go to any page. BE should expose appropriate filters so that user can find what he/she is looking for by max 2-3 paginated queries
My guess is some companies encrypt the numeric ID so as not to expose the information directly. Just have paging links with something like '76c43r67' in the URL.
Thank you for creating awesome and simple to understand content always. If you can, would you like to create a tutorial about algorithms and data structures.
Hi Hus! How to do pagination with Cloud Firestore? (for native android app) There is an offset value that the cloud firestore query provides, and also an after() query modifier. So should offset never be used to fetch the next page? And even with using after() i think, the newly added document on top will not be considered to be added on it's own to the first page.
Interesting, but would creating a auto incrementing paging column with Btree indexing be a bit more flexible tho? Numeric identifiers are convenient but usually not the preference over uuid4 (guid),. Also why is it so horrifically slow? Rows are generally fixed size, so its weird it needs a full linear scan...why doesn't it (offset * row_size) into the record set? Does the way PG persist records interleave? Id expect some fragmented store to be slow, but partial fragmentation should still be on par using with Btree index lookup if using a simple (offset * row_size). What am i missing here?
What if we are using a database where the IDs are randomly generated, like elasticsearch. Elasticsearch does not support auto-incrementing IDs as well.
If we use any columns for ordering the result, then we can use those columns in where clause. If we don't have any kind of order when presenting the result, that is not so good UX. So if we have any order, then the above technique can be used.
@@hnasr I agree, but the web interface should provide filters to the user find what him wants and the pagination be just forward or rewind. I get your point in the video, but in major case the problem would be a bad user interface. So maybe firstly should be analized why the user is paging up to 100k to find something. I really get your point in the video, my comment it's just a use case involving front and back-end.
Correct, That would not have enough information to point us to page 100, the server should capture that and fail the call. Additional filters are required such as date or id.
Well you should be able to though, if something is so secret about a incrementing number then maybe something is wrong with the API or the internals themselves? Do you have a requirement to hide which entries were before or after each other to the public perhaps?
@@motalasuger Not really, I was just mapping this with the original api that was already exposed like the page number and count per page. The sorting problem is a persistence layer issue, I was wondering why we should expose it to the API consumer?
I guess this is use case dependent solution but a nice hack, like this fails if a user wants directly to goto a page. How about make a video on that too, when user directly points to a page.
Answer by Hussein from another similar comment: As long as you do an order by on the UUID field the paging should work. However the problem unlike integer primary key is UUID is random so it won’t give you the “latest” entries (news in my example) when you do an order by UUID.
I think its not a good idea, if we sort based on title column, ids will shuffle so it is difficult to get next page data by passing page last ID(you may get unexpected results)
Learn the fundamentals of database engineering with me, get my udemy course over 19 hours worth of database engineering first principles. Head to database.husseinnasser.com for a discount coupon, link redirects to udemy with discount coupon applied.
I feel this is a good explanation of how an offset works. In some cases it doesn't matter as the user won't turn to the 100+ page but will use search if it exists. Plus in cases where the user jumps to some random page, this won't work. But this video got me thinking. Great work!!
Sometimes you can do "spatial pagination" when working with geospatial data. Basically assign each row a geohash, iterate over (shorter) geohashes for a region of interest, use the shorter geohash to lookup a page of rows. Just have to make sure the geohashes assigned to your rows are shorter than the geohashes covering your region of interest.
Basically, this is the exact way that the big platforms like Google or Facebook are using for pagination. They give you a pointer, and you can fetch the next page results by that pointer.
FB is using Cassandra DB and that is exactly how Cassandra db pagination works. Limit per request on CassandraDB is 5000 so not possible to retrieve all data at once. For next page you pass pointer and than get another 5000 results (or any other number less than 5000 that you have set) like you mentioned.
Nice
Also Firebase's Firestore implements pagination in the exact same way !
Awesome content, how do we optimize the paging when the IDs are UUIDs?
Good question to raise
I was thinking of this, maybe using creation time column for filtering
But then we have to index creation time column, not to mention whether it exists or not in the first place
That would be difficult since UUIDs are randomly generated. I would rely on another field (int or date) to perform this.
UUIDs are great but this is one limitation that I personally ran into too
Auto incrementing IDs are the way to go then haha
This is a good solution when you can use it, but in many cases where-condition or sorting do not allow you to do that. Just apply filtering or ordering by any field other than id and this approach stops working.
Why is that?
@@samadams7278 - Because your ids can be all over the place and in no particular order when filtering by anything else
If ID is a primary key then a B-Tree Index is created on it which sorts it by definition. Therefore paginating using a where clause is optimal.
I just feel lucky to stumble upon your channel. : )
Omg! Thank you for this. I’ve been using this since like forever now, I never knew it’s common and it has bigger effects like this but I was feeling comfort in using this with my infinite scrolling lists so later on used to do the same with traditional pagination. But glad it did better.
Whoa.. I totally didn't know that.
Btw, Happy New Year brother.
Happy New year!!
What if the user clicks directly on page 7? we don't have the id of page 6 nor 5, any solution for that?
Correct no longer can you jump to a specific page without a hint of an additional filter (id or date) but personally I would rather disable that feature and gain performance by letting the user scroll pages incrementally. I don’t recall me clicking on anything in the second page or third page.
If that feature is a must have I might add another table that maps page to id maybe?
I dont know I would like to see what others think this is interesting 🧐
# page 7 (10 entrys per page)
select * from customers where id > (7 - 1) * 10 and id
@@lighty8561 i see what you are saying but ur code assume we have results in each and every row of the the previous ids which might be the case.
take the following case
i have results in ids
1 to 10:
Then nothing till id-100
now the equation you provided doesn't hold true. Thank you for the answer buddy.
@@BernardoBits if you go with the offset then you are doing the bulk of the work assume we want to go for rows 1 million to one million and ten.
Thank you buddy but I think there are too many edge cases for this one.
But what if Google throws the user directly to your site on page 10 site_com/news/10? Or what if “user” is Googlebot who crawls page 7 directly? Lots of real-world scenarios out there where you have to offer pages to search engines to crawle your content etc.
But say someone has to jump to that page 100 directly without having seen any posts before. What id number would you choose then, because sometimee the ids in the middle would be missing if someone deleted their posts? If a lot of people started deleting their posts wouldn't this cause an issue?
Interesting,
I think the explained solution is best for comments/infinite scrolling UX where the user can't decide which page to look into
The explained approach is not suitable for directly jumping to a particular page. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
For autoincremented id you can do something like this: Multiply the page number by the limit value + the id of the first id in the database minus 1.
?page=1777&per_page=10
Let s assume the first id in the db is 10000
Your new id for the where clause is:
10000 + (1777 x 10) - 1
First value in the db can be replaced by starting point of auto_increment
@@MyBerthony This does not work if you delete more than 10 items from the db for instance. I imagine that you would have to have a column that you would edit everytime something was deleted, or there could be another function that returns to the client app the ids, so the client can now see what ids are in witch page.
GREAT. thanks for the warning. Performance AND integrity.
workaround... depends on the context. size, latency, concurrency, the index you can work with...
Old times I used to solve with temporary non normalized tables on the fly,
performance and integrity was far better, but I didn't face the crazy sizes of modern databases.
I guess by that definition, I should make a field in the table that's indexed for pagination use (like created_at, etc)
Maybe, I think id field is a better choice, if you use it with Hashids
there's always hackier way to do things :)
Please correct me if I am wrong The order by column in this case will need an index to be performant. So this will not be a good idea in case the user has option to sort by many fields like in a back office application.
Not very useful for grids than can be sorted by any column desc/asc
same here, do we have any solution for this?
Incorporate the search and sort into the paging query. Your users will rarely have to go beyond a few pages.
Learning so much from you, thank you Hussein!
for anyone looking up on Google for this technique, its called keyset pagination
Yes, however this only works if the column you offset from has values that are both unique and serial sorted by the sorting parameters. If that condition is not true then you still have to use offset or create a column that has that property, which makes inserting a new row a job that requires first fetching the two values where the new row has to be in between and calculate the average value between those sorting values.
Yes, this is also problematic when client requires rows from 100 to 200, but if the rows 150 - 170 has been deleted, we have only 80 rows but the client expecting 100 rows, so we make additional SQL queries, to get required rows, when we need do that, we also want to inform the client, In next request you need to ask from this id, example 230. kind of expensive but much better than offset may be.
My eyes are opened, thanks man this means alot.
Glad I could help (L) thanks
What would be the best approach if we have a paginated grid with different columns and you can sort by any of them? Love your videos btw.
you cannot
Great content as always.
I guess you could also provide the use cases, like infinity scroll and where user is going page by page only.
As if user is allowed to jump pages then how come system know which id forward it should move...
the client needs to know the last post it fetched, and you then paginate using that page...
when scroll hits 1400
var id = items[100].id
callPaginate(id)
@@optimumcool Thank you for clarification.
wow I didn't know offset was this inefficient :O
But the alternative doesn't work for queries with special filtering and ordering ;(
Yes it does. Any filters should be fine. To change the ordering you have to change the "next page hint" to the new ordering fields rather than id.
Awesome content, loved the way you explained the things 😊
❤️❤️❤️
Why does "offset" work this way though? Seems like a pretty simple optimization that if you can determine the rows that "offset" and "limit" specify only from an index, then you don't fetch the rows you wouldn't return.
3 years later but the short answer is that the data on disk for the entire table is not necessarily stored contiguously, therefore you cannot simply jump to an offset in the database file. For very specialized database systems this may be possible but for the majority it is not
I'm asking a question that might show up my inexperience with databases.
How we handle skipping the rows using index loop when we are also filtering records on different fields. Consider an api endpoint for search that returns paginated results
I'm considering that we use auto-increment ids as PK/index
@@ahmadsaeed2420 What happens some records/rows are deleted. I think this technique may not be suitable if we want to directly jump to some page number. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
Nice video man, clear and simple.
What I don't understand is that why db makers don't optimize offset so that when used without a where clause, it scans past the offset part and fetch only the limit part from disk
At application level we need to right an inner query to fetch the last ID and then use it in the outer query.
This won't work if you order your records with a field other than the auto-increment key. Users ordered by first name for example
We need to use the same columns which are used to order our results. If we have ANY kind of ordering this technique can be used.
You should not use first-name/any other useful column as a primary key, it violates the Single Responsibility Principle. You can create an index (unique or not) on that column but it is recommended that you keep primary key as auto-increment.
@@rahulsangvikar7973 I didn't mention anything regarding using first name as a primary key. My comment points out potential issues caused by ordering using a non increment field.
@@Peter-bg1ku My bad, I misunderstood, but in that case I agree with @Dhanaraj D
Damn! Totally did not know this. Thank you brother.
Great video! Definitely going to be using this.
Excellent video. Also please make a video about Cassandra. Thanks 👍
I do wonder how you would implement a pager on a website where a user can immediately jump to say page 12. Using this technique you'd need to know where page 11 ended, so in the end you still need to read all records to determine where each page ends?
The explained approach is not suitable for directly jumping to a particular page. It works well for infinite scroll OR just providing 'next' button. Also in practice we just don't use id alone for ordering, we use multiple columns in such cases where clause becomes complex as the number of order columns increase.
If the ids are consecutive then you can use a bit of mathematics: *last_id_of_current_page + (new_page_number * limit)*
@@mushifalinawaz it would only work if there are no deletions and ordering is only based on id.
@@RajDhana586 Yes. That's why I mentioned: *"If the ids are consecutive"*
You could also do a select for only the Id column and limit pagesize * pagenumbers, and then ignore all the rows until you hit your pagesize, add that to the page index list, and then reset counter + repeat / continue.
That would work for generating paging indexes for further pages at least, using the same method as explained in the video, tho you’d end up throwing away some of the id values, it should still be decently fast unless you add too big amount of pages or pagesize.
Thanks for sharing, I was wondering if the offset is not good for pagination then, what is it created for? is there any use case for using offset?
With this method you can't just open random page. This works if you go from start to finish or from finish back to the start.
Great video! But how do we handle the situation where the user jumps to a random page
Wow this was very helpful. Thanks!
Great video. Which tool do you use for slide creation?
Amazing!!! thank you again Hussein
Interesting I can use that where id trick for app data sync as well.
This solution looks great while this can have problems when a we can jump into any pages for example if i want to visit page 8 after 1st page
That was very helpful!
your content is awesome, thanks a lot
This only works when the Id is the primary key and you are not sorting by anything else, which is not likely. It is very rare that you can use this in real life.
Thank you for the important tip!
When not having too many rows (billions) , i dont see offset as an issue. Even if the query takes half a second at a certain point. As others pointed out, if the user can navigate directly to page X, this wont work
Anyway, great explanation
Great video....can we use this for reverse (or back) pagination as well? I'm guessing We will need extra coding using and ORDER BY ...DESC ? Or is there a better way?
Correct the change here is you determine where to start with an indexed column instead of using an offset
Hey good content thanks for such helping videos!
Thank you, that's a great tip!
Awesome stuff
How does this work when you paginate through items that are already filtered by the user. Let's say you have a list of movies that are sorted by score and filtered by certain genres and release dates. Each combination of where clauses and order statements is going to give you a different set of items across pages. In this example you use "< id" but there is no pattern or reference when the items are already subset in a given order. Is there a way in this situation?
Gey Naseer good one. But paging is usually used with sorting and at that time our ids amy not be depicting the sort order that we really want. Like a sort with number of likes and pagination.
Hello Hussein, This is a nice video as always. What happens if my primary id is not an autoincrement but uuid??? Is it the same thing as explained in the video? thanks so much.
As long as you do an order by on the UUID field the paging should work. However the problem unlike integer primary key is UUID is random so it won’t give you the “latest” entries (news in my example) when you do an order by UUID.
@@hnasr Wow! I wanted to know the answer to this query and was thinking that this is an old video and no one would actually respond. But Hussein, you are awesome
Awesome 😎 But how can we use this technique with a where clause for pagination api's?
In case of APIs - we can send an opaque string which encodes the last page last row column values(columns used in ordering). In the API result, we can name this opaque string as nextPage. When clients make next API call they need to pass this as a parameter. Now we can decode & get the column values and form the where clause needed.
Nice one Hussain!
But what if my query doing a join with another table, how i can use the id to mimic the pagination?
What if the id is not an integer or a number?
Why would that matter? As long as you can order by it it's fine
However, people don't page thru 1000 pages. Usually they do a search to limit the results, then use paging. My users rarely go beyond page 2 or 3.
that is sequential pagination it wont work for all cases still using offset is the common approach and the only reliable approach could u utilize it to fetch 5th page content based on the first page? we can mix both case front sends last seen we use key set else we use offset thats what i think
i think i can't do the index based pagination because before even paginating it needed the entire set to begin, while the actual pagination happens only when all the filtration and aggregation occurred
Good stuff! But I think that this approach doesn't protect from duplicate rows reading, mentioned at the beginning of the video, as well. Correct me, if I'm wrong.
It will protect, as you are fetching rows after given id, so you are guaranteed to get next page without duplicate rows
This solution won't work for every database table, or even many. What if the ID in my table isn't a fixed variable? What if it starts at different things, so something like 91100 is actually only the first variable?
Can you explain more why SQL server would have crashed with the 100000 offset?
What should I do if I have only 20 records, the limit is 10, ordered by ascending, and in the second API call, I receive ID 20? How do I determine if I need to send another request, especially if the current query is still returning no data records found?
What if I want paginate not all table, but just subset of mine "news" ?
In such case you can go to next page with your approach, not on random page in your list of "news"
How if i use uuid for primary key? This is not sequential record and can not do math for this field.
Hello Hussein,
We are building a service that gets data from various services, where each of them has it's own structure, so they are related only topic wise. Our structure reflects our needs and gets common ground with the others to some degree. Each provider is handled separately via API, using queue system. Some relations are in place for the product types.
Generally speaking, how would you approach this task? I am mostly concerned about database related performance. Would you use separate database for originally imported data? Are relations good idea, how would you solve them? I imagine using UUIDs would speed up things here as well. This is open question, looking for advices. Thanks!
Sounds like something the database driver should optimise, not the end user. Also, seems to be only a problem once you go into the 100 k territory. I doubt even PromHub doesn't have so many videos (at least now)
I understand the main idea, but I can't see how it can work on complicated tables. For example, at my previous work, we had to move "deleted" records to another table to make the main table smaller. Or how do we work with records of different users or "hidden" records? And I do not remember the details exactly, but as I know, when we roll back the transaction, we have a chance to skip auto increment.
here a example:
id | user | status | data
1 | 1 | null | "data"
2 | 2 | null | "data"
3 | 1 | "hidden" | "data"
17 | 1 | null | "data"
...
10000 | 1 | null | "data"
and when user look at second page, he sees exactly 9 records from first page.
select * from history where id > 10 and user = 1 and status is null limit 10;
How I can see, is the only way to make it work is to make a virtual table with another row of ID with preselected rows, but it is probably costly than offset. Is there is another way?
But what if user "skips" pages and doesn't know id?
Users should not randomly go to any page. BE should expose appropriate filters so that user can find what he/she is looking for by max 2-3 paginated queries
Very common behavior on forums. I go to a topic and then go to the last page
Only useful in case of auto increment keys. Doesn't seems to be useful in case of composite primary key.
we can use datetime as an id datatype for this to work
doesn't it expose your primary id to the world? what if I use any uuid or something? what'd be the case?
My guess is some companies encrypt the numeric ID so as not to expose the information directly. Just have paging links with something like '76c43r67' in the URL.
Thank you for creating awesome and simple to understand content always. If you can, would you like to create a tutorial about algorithms and data structures.
Id may not be the best because... there can be custom sort applied... Can be on any column other than id....
Saved for tonight
Hi Hus! How to do pagination with Cloud Firestore? (for native android app) There is an offset value that the cloud firestore query provides, and also an after() query modifier. So should offset never be used to fetch the next page? And even with using after() i think, the newly added document on top will not be considered to be added on it's own to the first page.
more content and insights on databases
What if your id is not auto increment? it's a randomly generated numbers or numbers and alphabet . How do u go about it?
Interesting, but would creating a auto incrementing paging column with Btree indexing be a bit more flexible tho? Numeric identifiers are convenient but usually not the preference over uuid4 (guid),. Also why is it so horrifically slow? Rows are generally fixed size, so its weird it needs a full linear scan...why doesn't it (offset * row_size) into the record set? Does the way PG persist records interleave? Id expect some fragmented store to be slow, but partial fragmentation should still be on par using with Btree index lookup if using a simple (offset * row_size). What am i missing here?
ok but what if you want to order by title or any other field
Problematic if you want to jump to page N.
What if we are using a database where the IDs are randomly generated, like elasticsearch. Elasticsearch does not support auto-incrementing IDs as well.
If we use any columns for ordering the result, then we can use those columns in where clause. If we don't have any kind of order when presenting the result, that is not so good UX. So if we have any order, then the above technique can be used.
Every page should have an exact id pointer by which i can paginate by id
But then you cant jump to a page :/
What if I need to filter some rows out? Let’s say I have some blog posts that I marked as deleted and I want to show the last 10 non-deleted ones?
it works the same. you add where id > x and deleted_at is null limit 10
I don't know why the user would page up to the 100k or 1 million offset
Skipping to page x would be a use case, using offset just kills the performance in this case
@@hnasr I agree, but the web interface should provide filters to the user find what him wants and the pagination be just forward or rewind. I get your point in the video, but in major case the problem would be a bad user interface. So maybe firstly should be analized why the user is paging up to 100k to find something. I really get your point in the video, my comment it's just a use case involving front and back-end.
What if someone directly entered a URL that points to page 100. domain.com/products/?page=100
in this case, we won't have the pointer.
Correct, That would not have enough information to point us to page 100, the server should capture that and fail the call. Additional filters are required such as date or id.
sadly it doesn't work with unsorted primary key.
you just offering us to not using offset queries anymore, but it won't help us because it leaves some new cases.
If we provide an API to the public, we can't expose the internal reference id
Well you should be able to though, if something is so secret about a incrementing number then maybe something is wrong with the API or the internals themselves?
Do you have a requirement to hide which entries were before or after each other to the public perhaps?
@@motalasuger Not really, I was just mapping this with the original api that was already exposed like the page number and count per page.
The sorting problem is a persistence layer issue, I was wondering why we should expose it to the API consumer?
POINTER!!! Thanks a lot, Dr Comments.
id is random
how to operate with varchar ID ?
Hi, I am new to databases but shouldn't the WHERE clause be id > 1009993 instead of (
Depends on ur order by . Hes using desc
@@ggwp-gz1so I was quite stupid when I wrote the comment, I just realized it now lmao thanks anyways
@@parkuuu no worries happend to me xd
I dont think that will work if you are using uuid instead of normal integers
Only support next page, could be another solution.
Great Great
This will be application if you use UUID primary key
I guess this is use case dependent solution but a nice hack, like this fails if a user wants directly to goto a page. How about make a video on that too, when user directly points to a page.
isnt this same as cursor pagination?
What you gonna do if ID will have UUID type? :)
Answer by Hussein from another similar comment:
As long as you do an order by on the UUID field the paging should work. However the problem unlike integer primary key is UUID is random so it won’t give you the “latest” entries (news in my example) when you do an order by UUID.
I think its not a good idea, if we sort based on title column, ids will shuffle so it is difficult to get next page data by passing page last ID(you may get unexpected results)
i use keyset pagination