Extremely FAST Paging With Cursor Pagination And Database Index Seek

Поділитися
Вставка
  • Опубліковано 25 лип 2024
  • ☄️ Master the Modular Monolith Architecture: bit.ly/3SXlzSt
    📌 Accelerate your Clean Architecture skills: bit.ly/3PupkOJ
    🚀 Support me on Patreon to access the source code: / milanjovanovic
    When you think about pagination, you're probably familiar with offset-based pagination. This type of paging may or may not rely on a database index. You specify a page and page size, and in the database, you execute a skip-take type of query to fetch the data. However, a more performant way to do pagination relies on database index seeks. It's called cursor pagination, which boils down to providing a cursor value. The cursor acts as a pointer to the next page to be returned. If the cursor is also indexed, you can perform an index seek to directly query the required page of data.
    Join my weekly .NET newsletter:
    www.milanjovanovic.tech
    Read my Blog here:
    www.milanjovanovic.tech/blog
    Subscribe for more:
    / @milanjovanovictech
    Chapters
    0:00 Implementing Offset pagination
    3:39 Offset pagination performance
    4:56 How Offset pagination works under the hood
    5:48 Implementing Cursor pagination
    8:35 Cursor pagination performance
    11:22 Completing Cursor pagination implementation
    14:25 Where is Cursor pagination useful
  • Наука та технологія

КОМЕНТАРІ • 141

  • @MilanJovanovicTech
    @MilanJovanovicTech  Рік тому +3

    Want to master Clean Architecture? Go here: bit.ly/3PupkOJ
    Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt

  • @000bobrock
    @000bobrock Рік тому +12

    Real world example would be where you have a grid with 50 columns where 30 of them can be filtered and all need to be sortable... then the SQL is the King. But nice example, I apreciate your work, keep going.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      In that case you need vanilla offset pagination 😁

    • @Wouldntyouliketoknow2
      @Wouldntyouliketoknow2 Рік тому +1

      Exactly this. Most real world grids allow user to dynamically sort on any of the columns, and page. Probably should mention in the video that if you need dynamic sorting you can rule out cursor pagination..

  • @shuvo9131
    @shuvo9131 Рік тому +1

    Thanks for sharing the concept Milan.

  • @rstobing
    @rstobing 10 місяців тому +1

    This is super awesome! Thanks for sharing Milan!

  • @yeevirgen
    @yeevirgen Рік тому +1

    Awesome video, not very long and quite educating. I subbed and liked so please make more

  • @kodindoyannick5328
    @kodindoyannick5328 5 місяців тому +1

    Thank for this wonderful concept.

    • @MilanJovanovicTech
      @MilanJovanovicTech  5 місяців тому

      I've an interesting video on this coming next week, with benchmarks and execution plans

  • @DavidSoles
    @DavidSoles 10 місяців тому +1

    Very well explained. Thanks.

  • @broadshare
    @broadshare 2 місяці тому +1

    Good work, I like the blog also. Good content.

  • @MortvmMM
    @MortvmMM Рік тому +19

    Great video but 2 important concerns:
    1. If you can't order by the given cursor column, for example if your Id column is Guid this might not work as intended
    2. The OrderBy call should come before the Where call (in theory you first need to sort by that cursor, then filter)

    • @pedrofiuza8604
      @pedrofiuza8604 Рік тому +3

      For the number 1, if you have a CreatedAt column I think it's a good way to proceed with the OrderBy.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +8

      1. Agreed, you'd need something sortable like a CreatedAt column, like Pedro suggested
      2. It doesn't make a difference since it's converted into SQL, and SQL is deterministic when it comes to execution order.

    • @MortvmMM
      @MortvmMM Рік тому +5

      @@MilanJovanovicTech I like the fact that it is deterministic but that is SQL. If we make abstraction of that, it's good to know in general the order of operation matters when paging by cursor

    • @islandparadise
      @islandparadise Рік тому +2

      @@MortvmMM ​ @Milan Jovanović I think PKs should always be created with sequential logic. Using `Guid.NewGuid()` like you said has no sequential logic, and hence would make index fragmentation on your PK column go up like crazy, impacting performance and hence DB maintainability. 2 possibilities are either:
      1) ONLY create upon insert (using SQL's NEWSEQUENTIALID() method), or
      2) Use EFCore library's SequentialGuid ValueGeneration method.
      Don't know if mixing both generation methods in the same column would achieve the intended purpose though, so I'd suggest sticking to one for the same table.
      Hope this helps.

    • @BorisSiljanovski
      @BorisSiljanovski Рік тому

      @@islandparadise if using SQLServer you should use ValueGeneratedOnAdd (if using EF) and let the Guid be generated by the SQL engine, that way it will be sequential. This doesn't work on Postgres, because Postgres has a different logic for clustered indexes, or something along those lines... (i still don't fully understand it TBH) 😅

  • @vicky2118
    @vicky2118 Рік тому +1

    Awesome video
    Thanks

  • @rakkarajput
    @rakkarajput Рік тому +1

    Excellent 👌😊👌

  • @thallesteodoro5278
    @thallesteodoro5278 Рік тому +1

    Awesome video. There is a problem with cursor pagination when you need to apply a filter, so keep that in mind.

  • @andreyt1818
    @andreyt1818 Рік тому +4

    Hi, thanks for video.
    It seems 'cursor' is a little bit confusing term in this video.
    I mean the 'Cursor' in SQL DB (forward-only pointer to row in query result) is not that is in this video - just Id.
    In this video the main point is using advantages of clustering index for fetching range of rows ordered by such index, for example pages.
    This approach has best performance because, you know, rows are physically ordered by clustering index within db table.
    EF is using clustering index for Id field by default.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +2

      Unfortunate, but I didn't come up with the name

    • @pilotboba
      @pilotboba Рік тому +1

      it's also know as keyset. So you can use that or you can use "last" or "lastseen" as your parameter.

    • @gurth8371
      @gurth8371 Рік тому

      Definitely a bad term in video name and code. I tought i will see something similar to what I saw in 30 years old legacy code, but using EF - paggination/scrolling build on top of sql cursors with fetch count based on amount of list items being draw/visible.

  • @MrAymenmatador
    @MrAymenmatador Рік тому +1

    Great video thanks.
    Does the processing of large data by batch works better with a cursor pagination than with a skip take ?

  • @pandagamedev1177
    @pandagamedev1177 5 місяців тому

    Thank you

  • @Wouldntyouliketoknow2
    @Wouldntyouliketoknow2 Рік тому +1

    Well done for mentioning a sensible use case for cursor pagination. Video could have been improved by discussing the implications of dynamic sort order for cursor pagination viability!

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      I think I'll touch on one more video to showcase that, plus also include some benchmarks so people can understand the performance difference.

  • @justgame5508
    @justgame5508 11 місяців тому +1

    You can just change Cursor >= request.Cursor to Cursor > request.Cursor rather than returning take + 1. This is fine for databases that use some form of auto increment as the first value is 1, so you don’t have an issue with off by 1 errors

  • @reggyA898
    @reggyA898 Рік тому +1

    Hi, I have question, it's a bit not about the topic in this video. I'm using cqrs, when I handle my command with creating Observation, I need to send notification to fron-end that is based on some logic through SignalR. I rise domain event with MediatR Notification and Handling all logic there. But now I also need to update my Patient in the same transaction(when i Create observation), as I know CQRS can update only one aggregate per command? So what should I do in this case.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Update both of them in a single transaction, be practical.
      Or update Patient asynchronouslt.

    • @satelibra
      @satelibra Рік тому +1

      @@MilanJovanovicTech okay, thanks, if u start googling there is a big holy war between ppl can we update 2 aggregates in one transaction or not :)

  • @pilotboba
    @pilotboba Рік тому +2

    Good video.
    I've always known this as keyset pagination.
    BTW: There is a keyset pagination library for EF core. It does a lot of work for you.
    I suggest don't just use this for infinite scrolling, use it even for paged grids. It does mean you lose the ability to navigate to a specific page, but with search/filter features and or an index that's not really that important. Not only is perf better for a single query, but overall scalability is better because your database server is doing less work.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Yes, it's also known as Keyset pagination! I do believe google uses this for Gmail? Since I can only go forward/backward through my email list.

  • @Real-Hindu-Us88
    @Real-Hindu-Us88 Рік тому +1

    Great

  • @vitaliimaheria6317
    @vitaliimaheria6317 Рік тому +1

    Hi, thanks for example. Do you know some efficient way for offset (not cursor) pagination with filtering, that could not be transmitted into where statement (for example, some sort of auth rule that could be applied only for materialized result after query execution). Would be great to watch your new video with such implementation

  • @patrykklimas4398
    @patrykklimas4398 Рік тому +1

    Very nice video. Could you also share with us how do you add to this different kind of filtering data?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      That's where it becomes a little tricky. You would need to include indexes for the filter columns for it to be fast, and then everything else should end up being the same. I would Google 'Keyset pagination' or 'Cursor pagination' and explore the topic some more.

    • @wisnu7734
      @wisnu7734 Рік тому

      ​@@MilanJovanovicTechhi milan...how about if i changed order by into Name column....????....

  • @lukaskuchta1010
    @lukaskuchta1010 11 місяців тому

    What elastic search topics are they under the hood ?

  • @elgunlee
    @elgunlee 8 місяців тому +1

    You can still order by different columns while doing keyset (cursor) pagination. Even there are syntactic sugars for that in some databases, it is written like (a, b) > (?, ?). For example in PostgreSQL:
    select * from posts where (name, id) > (“john”, 10) order by name limit 25
    This syntax expands to name > “john” or (name = “john” and id > 10)
    If your db hasn’t this syntax, you can always use second syntax
    Also in EFCore Npgsql has translation for this syntax to postgresql, it is EF.Functions.GreaterThan(ValueTuple.Create(a, b), ValueTuple.Create(x, y))

  • @softcodeacademy
    @softcodeacademy Рік тому +1

    What if the user is in first page and clicks on 10th page instead of second page? How would that work in Cursor approach?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      There is no 10th page with cursors, really. Only the next page, and in some cases previous. It's a slightly different concept.

  • @rr.vasconcelos
    @rr.vasconcelos Рік тому +1

    that sounds great to me, but what about when my primary key is of type guid?

  • @Nisa-Julie
    @Nisa-Julie Рік тому +3

    Hi Milan. What about if I have my primary key is GUID. What should I do with millions of data to be fast. I face this issue with skip input of millions query is time out

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      Guid isn't sortable, so you can't use it with cursor pagination. You need a unique column that is also sortable.

    • @Nisa-Julie
      @Nisa-Julie Рік тому

      @@MilanJovanovicTech let say ur example above ID is GUID not long what is the best way to speed up the query. Thanks u for reply Milion.
      I'm ur fan for next videos

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      @@Nisa-Julie There isn't one, really. Unless you use a sortable GUID. 🤷‍♂
      The biggest cost with GUID will be sorting, not paging.

    • @OlegKosmakov
      @OlegKosmakov Рік тому +3

      Cursor should not be whatever your primary key is, cursor must be whatever field (or fields) you sort by. For example, you do not order by guid, therefore there is no need to use it in cursor, instead you probably sort by datetime, so that's your first cursor candidate. If you use more than one field in sorting, or even have the customizable endpoint where you can control the sorting order per request - then you need to figure out how to transform fields for this particular sort order back to cursor, and also translate the cursor into individual field values to search by.

    • @elerius2
      @elerius2 Рік тому +1

      This should work with Guid too. Guids are indeed sortable. However, you typically don't want to use them as primary keys. With random guids, every insert will be on a random page of your table, and can lead to terrible performance. You might look into using newsequentialid() in TSql, or there are various implementations of sequential guid algorithms if you need to allocate outside the DB.

  • @Dalet_
    @Dalet_ Рік тому +1

    I often use this pattern except that I do not include the cursor in the result. The next cursor is simply the id of the last item, you don't have to query one more item this way.

  • @ilovepandaypoe6056
    @ilovepandaypoe6056 Рік тому +2

    I think the approach is very limited since most of the search will require parameters like patter to search based on example tag, product name and not rely on Id? How would cursor solve this?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +2

      Then simply use offset pagination if you need flexible paging like that

    • @ruirodrigues1971
      @ruirodrigues1971 5 місяців тому

      You can use use non-cluster index wtih a particular column (or set of columns) that are used for your filtering pattern. In the request to database you use both index and have the same speed effect. The only difference in the code that was showed is that you a more complex Where( in the video && extra columns), but the idea remains.

  • @stefanotorelli3688
    @stefanotorelli3688 Рік тому +1

    Where is the DB index see, can I see the query plan?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      I didn't post it, but you can read some more here: use-the-index-luke.com/no-offset

  • @DuQuels93
    @DuQuels93 Рік тому +1

    At the end with the complete cursor implementation, the performance is a BIT better than the approach with Take and Skip.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Disregard my tests in Debug mode, it's MUCH faster with cursor pagination

  • @vietphamquang6202
    @vietphamquang6202 6 місяців тому +1

    This is amazing. Excuse my english, I surely remember that in a video of you, i heard about how to optimize the send email operation. The thing is, when a new user registered successfully, I will send email before return the result of request. But you introduce a way that sending email in background, and now i lost thay video. Please i need your help to reference that video or you can give me some keyword search. Appreciate a lot, thank you so much

  • @handler572
    @handler572 2 місяці тому

    /api/user?sort=age
    In these case what to do coz tabel get unordered there will no cursor

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 місяці тому +1

      I don't recommend cursor pagination if you need random sorting capability

  • @marcosphs
    @marcosphs Рік тому +1

    Is the example code shared on github?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      I share the code on my Patreon: www.patreon.com/milanjovanovic

  • @JosephRuhundwa
    @JosephRuhundwa Рік тому +1

    At 13:26 the milliseconds spiked to a thousand. Is it because of the extra logic added to return the cursor position?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      No, my PC just slowed down 😅
      Cursor pagination should have constant time in production

    • @JosephRuhundwa
      @JosephRuhundwa Рік тому

      @@MilanJovanovicTech Ooh i see. And probably the API was warming up after restarting. Thank you for your great video. God bless you.

  • @buildingphase9712
    @buildingphase9712 Рік тому +1

    However this only works if you are ordering by ids, if you are ordering by something like datetime you would need to have a different implementation. And may even be hustle

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Yes, that's very important. Sortability is a requirement for this to work.

  • @souravsingha-lq9mm
    @souravsingha-lq9mm Рік тому +1

    Please make a video use case of cancelation token ,how ,why use this cancelation token

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Alright! I've been planning to do it for a while, but I kept postponing it 😅

  • @Silky987
    @Silky987 Рік тому +2

    Looks like performance went back to Skip/Take performance after all that work. Might not be as beneficial since Skip/Take is fairly common and understood.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      Not really, my PC was throwing fits. Skip/Take slows down linearly the further away from the start. Cursor/keyset pagination has constant performance wherever you are. There are many articles about it, with benchmarks.

  • @chswin
    @chswin 9 місяців тому +1

    I like the part where you copy it into a list twice 😂

  • @zikkrype
    @zikkrype Рік тому +1

    If you build some feed like Twitter/Instagram/TikTok have then consider using cursor pagination, but if you need ability to navigate to specific page (for example you're developing a porn site) then offset pagination may be more suitable

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      Agreed, it's a good solution for a specific problem. Not a general solution for all paging.

    • @zikkrype
      @zikkrype Рік тому

      @@MilanJovanovicTech also if you did not mention (or I missed): cursor pagination helps prevent page data shift in case if new records have been added to the database

  • @MarcusKaseder
    @MarcusKaseder Рік тому +1

    I don't know if I can agree with that sample. While it's a good showcase for the cursor it's way too simplified. In almost all cases you don't do a paging by ascending order. It's more by descending order.
    Like you've mentioned in your last example. You want to show the latest post in your feed and add older posts while scrolling down.
    But it's more difficult than shown in your example. If you order descending, you also have to save your start position for the skip/take approach. Cursor is fine though since you return the next starting cursor position.
    Maybe an additional (huge) advantage to the cursor approach.
    If you don't save your state (last position or start point) for the descending order, you can mess up your paging if new items get added.
    Seen that error a lot. Especially for junior devs and wanted to mention that.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Actually... You just pass the MaxValue as the cursor. And if an indexed is sorted in ASC, it can also be traversed in DESC order from behind. So you can actually use the same index for both sort orders, but just need to change the cursor logic. 😁

    • @MarcusKaseder
      @MarcusKaseder Рік тому +1

      @@MilanJovanovicTech For the starting point, yes. You can start with int.MaxValue, DateTime.MaxValue and do checks like index

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      @@MarcusKaseder True, but it's very useful to know about Cursor pagination when you run into a situation where you need it

    • @MarcusKaseder
      @MarcusKaseder Рік тому

      @@MilanJovanovicTech True, and also about its limitations 😉

  • @KunalMukherjee3701
    @KunalMukherjee3701 Рік тому +1

    This is also called keyset pagination

  • @MaxSupercars
    @MaxSupercars Рік тому +1

    You could use Tuple as a return value or? Not necessary to create response class object.

    • @bogella2225
      @bogella2225 Рік тому

      I find tuples to be less readable than response class objects.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +2

      I prefer classes when working with queries, and I use tuples typically for internal results

  • @gerezd2335
    @gerezd2335 Рік тому +1

    Why didn't you just changed the >= to > to solve the duplicate problem? It results in much less code, you can use the ID of the last element in the returned page.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Kind of missed it on the first try, and improvised later 😅

    • @gerezd2335
      @gerezd2335 Рік тому

      @@MilanJovanovicTech Haha it happens. Thanks for the honest answer. 🙂

  • @gbbigardi
    @gbbigardi Рік тому +2

    Nice video bro! Please, raise a little bit the gain of your mic, the volume of your voice is low....

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      Ah damn it, thanks for the feedback! Will do.

    • @gbbigardi
      @gbbigardi Рік тому

      @@MilanJovanovicTech Don't worry, it's not a big problem, just an improvement to make it better :)

  • @ThanhPhamTien-vr4gf
    @ThanhPhamTien-vr4gf 5 місяців тому

    first query cursor = 0 , take 51 so it should return 51 records not 50 ? why this happening ?

  • @alwaseem5309
    @alwaseem5309 9 місяців тому

    The time it took still around 160ms?

    • @MilanJovanovicTech
      @MilanJovanovicTech  9 місяців тому +1

      I really need to do a benchmark video about this to prove I'm not bulshitting 😅

    • @alwaseem5309
      @alwaseem5309 9 місяців тому

      😆 I just noticed the value in postman....

  • @brechtlaitem
    @brechtlaitem Рік тому +1

    13:47 Check that response time after the cursor response refactor. Back at 180ms instead of 30ms. So almost no profit gain at all…

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      Cold start.

    • @brechtlaitem
      @brechtlaitem Рік тому

      Nope. The cold start was at 1600ms and the subsequent calls are at 180ms.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +1

      @@brechtlaitem It's faster. Trust me 😅
      I'll put my money on it.

    • @brechtlaitem
      @brechtlaitem Рік тому +2

      Hmmm interesting 😂 But you see the timings on the video right? The cold start was at 1600ms (cursor=0), then you do a new call (the one with cursor 51) that took 160ms, then a new call with cursor 101 that took 175ms.
      No where near the 30ms before the cursor respone refactoring.
      I think the additional Take() call is to blame.

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому +3

      @@brechtlaitem That Take is in-memory, so I doubt it made a difference. I'll do a proper benchmark to prove my point.

  • @semen083
    @semen083 Рік тому

    Maybe first .ToListAsync(...) is redundant?

    • @MilanJovanovicTech
      @MilanJovanovicTech  Рік тому

      It's to materialize the list from the DB

    • @semen083
      @semen083 Рік тому

      The mehod "Handle" will work as well, if you remove first materialization.

  • @davidmumbrudeleon178
    @davidmumbrudeleon178 7 місяців тому

    I wanna know why the time got longer

    • @MilanJovanovicTech
      @MilanJovanovicTech  7 місяців тому

      Cold start, need to do a proper benchmark on this 😅

  • @xaxfixho
    @xaxfixho Рік тому +1

    How about sorting?
    Used something similar to delete old data from a database with hundreds of millions of rows, reduced batch jobs time by 60%.
    Previous solution was hibernate though 🤭😉

  • @KarwanEssmat
    @KarwanEssmat Рік тому

    It is perfect, Unfortunately It can not be used for all situations such as for GUID and PageNumber clickable.