Bookmarks 3:45 Ticket claim approaches - no claim FCFS, with TTE, with TTE along with wait list [perhaps for all site like queueit] 7:45 Data partition 12:30 Claim expiration, fair claim etc
Jordan you’re the sh*t bro thanks for taking the time out to create all these videos! This channel is by far the best resource I’ve found for system design.
Ticket booking sites have waiting lists, but I've never seen one that has waiting lists for specific seats or rows. They have a waiting list (like a "deli ticket" system) *before* entering seat selection/checkout. Sometimes that's strictly FIFO, but in some cases you join a "lobby" and your position in queue is randomly or semi-randomly assigned when tickets go on sale. Buyers in the queue are then admitted to seat selection in batches. The point of these systems is to reduce load on the seat selection and checkout systems. I.e to avoid the thundering herd you mentioned a few times. The question to ask is how do you design a system to handle extreme QPS peaks by spreading out the load over time. The solution presented in this video where you shard by section or row doesn't really solve the hot shard or thundering herd problems. Front-row center is always going to be the most popular row. At least if I understand your design correctly, all buyers will simultaneously enter seat selection at the same time. And most of those people are going to compete for the same most desirable seats. Those will be very hot shards, which hundreds of thousands of people will be hitting simultaneously for popular events. Then once seats are selected, usually you have a few minutes to complete payment. So then you will have tens of thousands of people who managed to select seats hitting your payment systems at the same time.
Yeah I covered that in like the first few minutes of the video. Agreed that it's not that crazy of a problem, hence why I think it's stupid that everyone bothers asking it. Agreed that the hold off approach is more practical, though I do suppose it can lead to a thundering herd when one person's timeout ends.
@@jordanhasnolife5163 agreed, hot sharding has been mentioned here, and should be remarked as something to fix with the help of dynamic re sharding/autoscaling+loadbalancing
correct me if I'm wrong but we don't need serializable transactions for booking one seat itself(i am not talking about claiming e.t.c). if we have a lock. Let's suppose that you acquire a distributed lock when you try to create/update the booking. Even if you decide to have a separate payment entity, most probably you don't need to update the balance, only create it with a reference to the booking id enough. You are gonna use a 3rd party vendor to send the data, so if the data is successfully transferred you just commit the transaction. Atomicity is provided by default. There are no 2 users that can have a lock at the same time (provided by consensus algorithm), and you just release the lock when a transaction is committed.
I was wrong in my previous message. Technically, 2 booking requests can occur at the same time when TTL expires(and you are already booking) and another one decided to acquire a lock(reserved) and then immediately started booking. This can be fixed using either Serializable transactions OR a better way to construct a booking id to be ticket_id + concert_time + seat_number. Thus, your entire insert transaction would fail if one of those constraint violated. For updates, you could just use optimistic locking. Am I right ?
I think as long as on a per seat level you will pretty much only claim a seat as long as it is empty when you make the claim, if you have seat level serializable transactions, you should be good to go
The real answer is that I don't care at all about the high traffic scenario. That accounts for 1% of my total events, and those are so popular that 100% of tickets will be sold no matter what. It doesn't matter if the system is fair, of if the UX is terrible, or even if it goes down. That's just the difference between selling all the tickets in 1 minute and selling them all in 10 minutes. Same revenue. The important part is the UX for all the customers buying tickets to less popular events. If you put them off, you might actually fail to sell a ticket that you could have.
Assigning N seats in a section to a group of k people each requesting {s_1, ..., s_k} seats is a Bin-packing problem. Simply allocating seats in FIFO order may lead to unfairness and sub-optimal allocation (people at the end of the list and requesting higher number of seats will be most impacted; and we want to sell as many tickets as possible). Although, it's not practically possible to solve Bin-packing in real-time, a simple optimization would be to maintain the sum of total seats requested by people for every section. In real-time, while the bookings are on-going, that sum must not exceed N. The actual seat allocation may be done offline e.g., seats can be finalized after the booking closes, and the users can be sent their tickets via mail/phone [AWS SNS] with actual seat numbers.
Hey Jordan, quick question about 34:35. If we were to write to the replica or kafka synchronously, is that considered a "two phase commit"? Or does that term only apply when we try to do a distributed transaction across two services, with a third service coordinating the process?
I'd say a two phase commit is a two phase commit if rather than pipelining writes like in synchronous replication, you have one coordinator making writes in a prepare and commit phase, hence the name "two phase" commit
Fandango locks tickets when you go to the payment page using a separate database table with a ttl. There's a session timeout hook and back event to clear out the table as well. The actual available seats are still returned for the show for next user, but are greyed out by cross referencing against that other table.
That doesn't surprise me, that's basically the first solution I cover. The second one is complete overkill and is really only in there because I have no idea what an interviewer will ask for :)
Thank you so much for these videos. They are by far the best. A couple of questions, Can we CDC from a server, I see that in the final diagram? I thought we could have it only from database tables. Also, when should we consider using an In-memory message broker instead of Kafka? In most of your videos, you have used Kafka.
Thank you for making these videos. They are very helpful. Would it be possible to make a video on monitoring as well? Or maybe introducing a monitoring setup in one of the applications? Thanks again!
24:34 I thought about putting orders into a message que, but then, the clients can't receive an asap information whether they have claimed something or they are waiting in queue. If we do that as "Client -> ClaimService -> MQ -> DB" then ClaimService cannot tell the Client in real time what is happening with his claim after it publishes to the queue... or can it? Great video Jordan as always!
So in this case, the DB is a derived data store, the claim service is actually the source of truth for both bookings and claims, hence why we can tell the client in real time whether they have booked. Thanks and I hope this makes sense!
Hi Jordan. Could you please tell me how the reads would flow for a particular event. That is, when user logs in and searches for an event, how do we read for available seats? Does the request go to DB ( if so isn't the data stale as cache is current?), does it go to cache? ( if so, since each row is in different partition, do we need to query every row for an event and then aggregate???). Thanks!
I think reading from the cache would be sufficient, and yeah as you mentioned depending on the front end settings here we may have to perform a query on a few caches.
Hi Jordan, how do you plan to invalidate a fencing token if no subsequent request comes in from the user? For example, if a user with userID 123 receives a fencing token like and then drops off, how would those reserved seats become available for someone else? Do we need to have some sort of cron jobs in background to set those particular rows back to available ?
You don't need to "invalidate" a fencing token. A server simply checks if it's seen a fencing token with a higher number before and that tells you whether to disregard this one. As for when to make seats "available" again, we use a TTL for that. The fencing token protects us against delayed packets from old lock owners.
Hi Jordan Can you help me clarify this: You said we are going to do a hash range partition on (EventId + SectionId + RowId) for OrderClaimServers. 1. You are referring to consistent hashing here isn't it ? 2. If we are doing a hash range partitioning, how can we ensure that seats with same rows end up on same partition ? Anyways, appreciate all your hardwork.
Hey Jordan, sorry I am getting a bit confused here. Is what you presented in the end basically a write through cache? And to keep data consistent between cache & DB -> we use change data capture instead of 2PC to speed things up? If this is the case, why you also mentioned write back cache at the beginning? That won't guarantee this cache & DB consistency at all right?
Hi Aria - yes albeit it's not a true write back cache because it holds additional data to the bookings (namely the queues). As for your second question, yes. CDC should help us guarantee *eventual* consistency, which is different than two phase commit (that's what we'd use with a write through cache, not a write back cache which is eventually consistent)
For such ticket purchasing system, is eventual consistency okay? If we serve all the read/write requests via cache and it's replicated, curious how can we guarantee atomic operations with cache? @@jordanhasnolife5163
Thanks for the response. For this kind of ticket purchasing system, can we just guarantee eventual consistency? I assume we should have stronger guarantee. It seems what you mentioned here is that we use cache as source of truth and use write back cache to sync to DB. In this case, how can we guarantee in distributed settings (reads /writes) from caches are synced with each other? eg: one user read from cache1 see a seat is bookable and book it, and that write is not synced to cache2 and user from cache2 also booked it? Getting a bit lost here and appreciate your help! @@jordanhasnolife5163
Hi Jordan. I am confused the concept of waiting for tickets. I took a look at the ticketmaster, a seat is either available or blocked, whether a use completes the payment or not. If the payment doesn't go through, the blocked seats are released for booking. My question is why would a user want to wait on a seat for 10mins when the chances of getting it are slim? wouldnt he just look for different seats in different section. I dont get why this waiting for tickets feature exists at first place. What are we exactly trying to solve?
Yeah you basically described why I don't like this problem. There are a million different ways to build a ticket booking site. What you just described is the "unfair solution", which I covered in this video as well. But that one's easier so felt I'd take things up a notch haha
Hey Jordan, really love your videos. Could you please attach the sources (if there are any) for videos that have " question specific" concepts like Fair claim implementations in this video, Route recreations in your Uber video etc...
Sometimes there are sources, in this one there aren't really. This time around I've mostly been trying to avoid external sources, as I think that they cause me to laser in on one potential approach and not think about things rationally. That's not to say I haven't considered other approaches, I just am also using my old videos for inspiration, where I actually relied more heavily on sources.
In this case, no for the seat balancing given the cache is our source of truth. Maybe for payments, but you could also just let them claim the seat, ask for payment, and if it doesn't come by a certain time, revoke the seat.
Could you also discuss a raffle system, in any system design video like this later? Its something nike does to handle bots. I mean a third party service does it for them, but anyway seems alright.
I probably won't make a video on this in a while but I'd just batch everything in kafka, and then have some consumer that after a certain point in time picks the winners and then notifies the winners via some sort of fan out (email, notifications, etc).
Hey! I think that's ok but then you run into the two phase commit issue of having to store the booking into a database (I'm assuming you're talking about the "unfair" implementation)
@@jordanhasnolife5163 Ah let me clarify sorry. I meant for locking on ticket_id with TTL using something like redlock in order to avoid the double purchase problem. Great video btw thanks for illuminating us peasants.
@@jordanhasnolife5163 Thanks for replying. Some companies have a product architecture round instead of system design. But yeah, I’ll have a look at your videos.
Maybe 45 minute system design should not go through that much details and complexity. Semi fair is good enough, we are not building supreme court. Interviewer might think we are overengineers.
Yeah to be clear, these videos aren't supposed to be like mock interviews. It's trying to just do an in depth discussion of many possible routes that this problem could take.
These would basically be two separately maintained indexes. Perhaps gets a bit complicated when you accept an order and need to delete it from the treeset, could be short sighted on my end.
This feels like you're breaking up with me Jk, I know this problem was particularly convoluted, because normal ticket master is just... use a database.
Bookmarks
3:45 Ticket claim approaches - no claim FCFS, with TTE, with TTE along with wait list [perhaps for all site like queueit]
7:45 Data partition
12:30 Claim expiration, fair claim etc
Jordan you’re the sh*t bro thanks for taking the time out to create all these videos! This channel is by far the best resource I’ve found for system design.
thanks man!!
Ticket booking sites have waiting lists, but I've never seen one that has waiting lists for specific seats or rows. They have a waiting list (like a "deli ticket" system) *before* entering seat selection/checkout. Sometimes that's strictly FIFO, but in some cases you join a "lobby" and your position in queue is randomly or semi-randomly assigned when tickets go on sale. Buyers in the queue are then admitted to seat selection in batches.
The point of these systems is to reduce load on the seat selection and checkout systems. I.e to avoid the thundering herd you mentioned a few times. The question to ask is how do you design a system to handle extreme QPS peaks by spreading out the load over time.
The solution presented in this video where you shard by section or row doesn't really solve the hot shard or thundering herd problems.
Front-row center is always going to be the most popular row. At least if I understand your design correctly, all buyers will simultaneously enter seat selection at the same time. And most of those people are going to compete for the same most desirable seats. Those will be very hot shards, which hundreds of thousands of people will be hitting simultaneously for popular events.
Then once seats are selected, usually you have a few minutes to complete payment. So then you will have tens of thousands of people who managed to select seats hitting your payment systems at the same time.
Yeah I covered that in like the first few minutes of the video. Agreed that it's not that crazy of a problem, hence why I think it's stupid that everyone bothers asking it. Agreed that the hold off approach is more practical, though I do suppose it can lead to a thundering herd when one person's timeout ends.
@@jordanhasnolife5163 agreed, hot sharding has been mentioned here, and should be remarked as something to fix with the help of dynamic re sharding/autoscaling+loadbalancing
Jordan, thank you so much for the uploads!! Appreciate all your videos !
correct me if I'm wrong but we don't need serializable transactions for booking one seat itself(i am not talking about claiming e.t.c). if we have a lock. Let's suppose that you acquire a distributed lock when you try to create/update the booking. Even if you decide to have a separate payment entity, most probably you don't need to update the balance, only create it with a reference to the booking id enough. You are gonna use a 3rd party vendor to send the data, so if the data is successfully transferred you just commit the transaction. Atomicity is provided by default. There are no 2 users that can have a lock at the same time (provided by consensus algorithm), and you just release the lock when a transaction is committed.
I was wrong in my previous message. Technically, 2 booking requests can occur at the same time when TTL expires(and you are already booking) and another one decided to acquire a lock(reserved) and then immediately started booking. This can be fixed using either Serializable transactions OR a better way to construct a booking id to be ticket_id + concert_time + seat_number. Thus, your entire insert transaction would fail if one of those constraint violated. For updates, you could just use optimistic locking. Am I right ?
I think as long as on a per seat level you will pretty much only claim a seat as long as it is empty when you make the claim, if you have seat level serializable transactions, you should be good to go
The real answer is that I don't care at all about the high traffic scenario. That accounts for 1% of my total events, and those are so popular that 100% of tickets will be sold no matter what. It doesn't matter if the system is fair, of if the UX is terrible, or even if it goes down. That's just the difference between selling all the tickets in 1 minute and selling them all in 10 minutes. Same revenue. The important part is the UX for all the customers buying tickets to less popular events. If you put them off, you might actually fail to sell a ticket that you could have.
This may be the case for you, and it may even be the case for ticket master, but I'm curious if it'll be the case for your interviewer
@@jordanhasnolife5163 Probably not. Which is why I'm binge watching your channel.
@@tfk933 Welcome good sir
This comment has a separate like base. The horizon where software development meets sales and business !!
Assigning N seats in a section to a group of k people each requesting {s_1, ..., s_k} seats is a Bin-packing problem. Simply allocating seats in FIFO order may lead to unfairness and sub-optimal allocation (people at the end of the list and requesting higher number of seats will be most impacted; and we want to sell as many tickets as possible). Although, it's not practically possible to solve Bin-packing in real-time, a simple optimization would be to maintain the sum of total seats requested by people for every section. In real-time, while the bookings are on-going, that sum must not exceed N. The actual seat allocation may be done offline e.g., seats can be finalized after the booking closes, and the users can be sent their tickets via mail/phone [AWS SNS] with actual seat numbers.
That's totally fine by me if it's allowed by your interviewer to compute exact seats after the fact!
Thanks dude for the videos. I just started with the system design 2.0 playlist and already learned a great deal.
Woo!
Awesome content again. Thank you for all that you do, Jordan!
Hey Jordan, quick question about 34:35. If we were to write to the replica or kafka synchronously, is that considered a "two phase commit"? Or does that term only apply when we try to do a distributed transaction across two services, with a third service coordinating the process?
I'd say a two phase commit is a two phase commit if rather than pipelining writes like in synchronous replication, you have one coordinator making writes in a prepare and commit phase, hence the name "two phase" commit
Fandango locks tickets when you go to the payment page using a separate database table with a ttl. There's a session timeout hook and back event to clear out the table as well. The actual available seats are still returned for the show for next user, but are greyed out by cross referencing against that other table.
That doesn't surprise me, that's basically the first solution I cover. The second one is complete overkill and is really only in there because I have no idea what an interviewer will ask for :)
Thank you so much for these videos. They are by far the best. A couple of questions, Can we CDC from a server, I see that in the final diagram? I thought we could have it only from database tables.
Also, when should we consider using an In-memory message broker instead of Kafka? In most of your videos, you have used Kafka.
We're doing CDC from redis here, but even still CDC from a server would just be sinking to Kafka!
Thank you for making these videos. They are very helpful. Would it be possible to make a video on monitoring as well? Or maybe introducing a monitoring setup in one of the applications? Thanks again!
I believe there already is one in this series.
Love these videoes man, helping me crank up my growth trajectory
Lots of cranking to be done on this channel
24:34 I thought about putting orders into a message que, but then, the clients can't receive an asap information whether they have claimed something or they are waiting in queue. If we do that as "Client -> ClaimService -> MQ -> DB" then ClaimService cannot tell the Client in real time what is happening with his claim after it publishes to the queue... or can it?
Great video Jordan as always!
So in this case, the DB is a derived data store, the claim service is actually the source of truth for both bookings and claims, hence why we can tell the client in real time whether they have booked.
Thanks and I hope this makes sense!
@@jordanhasnolife5163 ah yeah, makes sense. Brain farted on that one, thanks
Hi Jordan. Could you please tell me how the reads would flow for a particular event. That is, when user logs in and searches for an event, how do we read for available seats? Does the request go to DB ( if so isn't the data stale as cache is current?), does it go to cache? ( if so, since each row is in different partition, do we need to query every row for an event and then aggregate???).
Thanks!
I think reading from the cache would be sufficient, and yeah as you mentioned depending on the front end settings here we may have to perform a query on a few caches.
Hi Jordan, how do you plan to invalidate a fencing token if no subsequent request comes in from the user? For example, if a user with userID 123 receives a fencing token like and then drops off, how would those reserved seats become available for someone else?
Do we need to have some sort of cron jobs in background to set those particular rows back to available ?
You don't need to "invalidate" a fencing token. A server simply checks if it's seen a fencing token with a higher number before and that tells you whether to disregard this one.
As for when to make seats "available" again, we use a TTL for that. The fencing token protects us against delayed packets from old lock owners.
@@jordanhasnolife5163 Got it. Thank you.
Hi Jordan
Can you help me clarify this:
You said we are going to do a hash range partition on (EventId + SectionId + RowId) for OrderClaimServers.
1. You are referring to consistent hashing here isn't it ?
2. If we are doing a hash range partitioning, how can we ensure that seats with same rows end up on same partition ?
Anyways, appreciate all your hardwork.
1) Yes
2) Because seats in the same row have the same event Id, section Id, and row Id :)
Hey Jordan, sorry I am getting a bit confused here. Is what you presented in the end basically a write through cache? And to keep data consistent between cache & DB -> we use change data capture instead of 2PC to speed things up? If this is the case, why you also mentioned write back cache at the beginning? That won't guarantee this cache & DB consistency at all right?
Hi Aria - yes albeit it's not a true write back cache because it holds additional data to the bookings (namely the queues).
As for your second question, yes. CDC should help us guarantee *eventual* consistency, which is different than two phase commit (that's what we'd use with a write through cache, not a write back cache which is eventually consistent)
For such ticket purchasing system, is eventual consistency okay? If we serve all the read/write requests via cache and it's replicated, curious how can we guarantee atomic operations with cache? @@jordanhasnolife5163
Thanks for the response. For this kind of ticket purchasing system, can we just guarantee eventual consistency? I assume we should have stronger guarantee. It seems what you mentioned here is that we use cache as source of truth and use write back cache to sync to DB. In this case, how can we guarantee in distributed settings (reads /writes) from caches are synced with each other? eg: one user read from cache1 see a seat is bookable and book it, and that write is not synced to cache2 and user from cache2 also booked it? Getting a bit lost here and appreciate your help! @@jordanhasnolife5163
Hi Jordan. I am confused the concept of waiting for tickets. I took a look at the ticketmaster, a seat is either available or blocked, whether a use completes the payment or not. If the payment doesn't go through, the blocked seats are released for booking.
My question is why would a user want to wait on a seat for 10mins when the chances of getting it are slim? wouldnt he just look for different seats in different section. I dont get why this waiting for tickets feature exists at first place. What are we exactly trying to solve?
Yeah you basically described why I don't like this problem.
There are a million different ways to build a ticket booking site. What you just described is the "unfair solution", which I covered in this video as well. But that one's easier so felt I'd take things up a notch haha
Hey Jordan, really love your videos. Could you please attach the sources (if there are any) for videos that have " question specific" concepts like Fair claim implementations in this video, Route recreations in your Uber video etc...
Sometimes there are sources, in this one there aren't really. This time around I've mostly been trying to avoid external sources, as I think that they cause me to laser in on one potential approach and not think about things rationally. That's not to say I haven't considered other approaches, I just am also using my old videos for inspiration, where I actually relied more heavily on sources.
Can you some tell me is this product design or system design? I'm very confused with meta
SD and PD
I think they're basically the same, with PD having a slight emphasis on the API
@@jordanhasnolife5163 thanks man for your reply.. Really appreciate it.. Got confused between two
Hi, Dont we need two phase commit to processes payment and update balance seats?
In this case, no for the seat balancing given the cache is our source of truth. Maybe for payments, but you could also just let them claim the seat, ask for payment, and if it doesn't come by a certain time, revoke the seat.
Could you also discuss a raffle system, in any system design video like this later? Its something nike does to handle bots. I mean a third party service does it for them, but anyway seems alright.
I probably won't make a video on this in a while but I'd just batch everything in kafka, and then have some consumer that after a certain point in time picks the winners and then notifies the winners via some sort of fan out (email, notifications, etc).
Jordan, thank you so much. Can you please share the notes for all the videos. Thanks
Hey Senthil, when this series is over I'll upload all of the notes in batch!
@@jordanhasnolife5163 Thanks a bunch.
Jordan, what about saving the ticket claim on redis with ttl and redis can takes that off when time expires
Hey! I think that's ok but then you run into the two phase commit issue of having to store the booking into a database (I'm assuming you're talking about the "unfair" implementation)
@@jordanhasnolife5163 why is that the case? Redis has master-slave replication.
Hey jordan is it possible at all to share your beautifully well written (except for the narcissism) notes? 🥺👉🏻👈🏻
Lol - hey Kaushik! I'll do this eventually, but will probably do a bulk export once I finish my current series in like 2 months
@@jordanhasnolife5163 hmm makes sense. anyways keep shit posting technical stuff dude. I will be a loyal subscriber XD
What about a distributed lock?
What for?
@@jordanhasnolife5163 Ah let me clarify sorry. I meant for locking on ticket_id with TTL using something like redlock in order to avoid the double purchase problem. Great video btw thanks for illuminating us peasants.
@@gatodetaco It wont scale imagine 10k million people waiting on distributed lock for some seat
Hey, Jordan, do you have any content on product architecture?
Not sure what you mean by product architecture but I'd just take a scroll through my existing videos and see what seems useful to you
@@jordanhasnolife5163 Thanks for replying. Some companies have a product architecture round instead of system design. But yeah, I’ll have a look at your videos.
Thank you!
Maybe 45 minute system design should not go through that much details and complexity. Semi fair is good enough, we are not building supreme court. Interviewer might think we are overengineers.
Yeah to be clear, these videos aren't supposed to be like mock interviews. It's trying to just do an in depth discussion of many possible routes that this problem could take.
great video
Fucking ticket master US wouldn't sell me a ticket with a German credit card. I hope you consider this in your design.
No Germans ✅, can do
Now I know why I did not get Taylor Swift tickets!
Just kidding
27:37 How come DLL turned into a TreeSet for deleting scenario?
These would basically be two separately maintained indexes. Perhaps gets a bit complicated when you accept an order and need to delete it from the treeset, could be short sighted on my end.
You're great guy but I think you overcomplicated that a bit - hellointerview has much simpler solution
This feels like you're breaking up with me
Jk, I know this problem was particularly convoluted, because normal ticket master is just... use a database.