I think, you have under-estimated the total size of the table for a month. Why? Because I feel, you have taken the avg stopping station per day per train is 80. This will work only if you book ticket from the source station to the destination station of the train. Where as in reality, ppl hardly do that. So you need to have all possible station combinations. i.e if a train has n stops, there will be n*(n-1)/2 such combinations i.e instead of 80, that number will be 3160, which shoots up the total table size for a month to ~245GB with rest your estimations. I personally feel a Graph Database will be a better choice for this kinda problem.
Why do we also keep past train details in train_details table? to make search super fast, just keep the train data for the next 3 months, and create a seperate table for past trains.
Thanks for the useful tutorial. The another improvement is that we can truncate historical data of train details into separate table/database so that we can move them to other kinds of drive or compress them to save costs. The reason for this idea is the fact customers unlikely query for the historical train details. What do you think?
Queries: 1. It is assumed that there are 80 stations per train but when we store in the train search then shouldn't we be storing n2 entries for each train where n=80 so the estimation needs to be corrected. 2.We are sharding train_details_info table based on date but if user wants to cancel ticket/get_booking_details based on PNR then how will info be available will we be searhcing on each shard based on PNR? 3. Shouldnt the payment and booking db be same as we will not get the transaction functionality across multiple DBs. Could you please help with these queries. Thanks!!!
Train details table at any time will have 3 months data info which gets added every day. However it can archive old data. You mention about partitioning is it table partitioning?
There are very flows in the train details design, you are considering from and to -> so suppose if a train goes to 10 stations so there will be 45 db entries for this. So instead of 80 you have to take 3160. That will increase our size for month to 240 GB and for 3 months 720 GB.
i came here to check how was the case being handled where we wanted to book from B->D if train route is from A->B->C->D . felt my time was wasted here.sigh!
Great Content! I think we should introduce API gateway in place of load balancer and the API gateway will point to different load balancers for respective services (Booking Ticket Service, User Profile Service etc.)
Could you please make a video on detail regarding how to design api in details and also in coding as well?? So that we can have a great idea about building the api. Thanks for you hard work and constant effort to make a positive contribution in the community.
There is no one source from where you can learn everything that is the gap I am trying to fill with the help of this channel. I will suggest you to read engineering blogs of tech companies like uber
Glad it was helpful. Do like and subscribe and share with others. For these core concepts, check outy video of microservices design patterns, here I have explained both of these as part of SAGA pattern ua-cam.com/video/Bt7aC-7mEw0/v-deo.html Hope you will find it helpful 🙂
Lot of things are not clear (at least for me). Ex: If a train goes from A->B->C->D->E. How are you storing the data in DB because i may want to search from B->D then we neeed to see this train part of the search result you never spoke about how we are going to handle concurrency which is very imp for train ticket booking this is a high high level design. Need to incorporate lot more details.
In train details we are storing the starting point and end point for a train. If train stops at A B C D E, then in train details you will have entry for A->B B->C and so on. So your problem is solved 🙂
@@TheTechGranth now searching becomes too complex. Let's say train has 26 stops by alphabet names in order. Now let's i wannna go from c to o, then we have to find trains that has c to d, d to e, e to f,....., N to o.
@@koteshwarraomaripudi1080 you have to pass on the from_station as C and to_station as O for that day. Anyway we have partitioned the table on day basis
@@TheTechGranth Wouldn’t this logic be too complex. For eg: if the train route is a, b, c, d, e, f, g, h (8 stations). As mentioned there will be 8 entries in train info details. Who will be loading these entries per date for each of the train. Will there be a separate service or would it be some job ? If user wants to travel from b to g and books ticket, the seat counts/capacities should be deducted in 6 tables for that journey. Is there a chance of optimization ? Also, in api design of booking, why generate pnr is api. It is important but an intrinsic api. Why does fetching a train route need a date as input ?
@@kvv6452 1) it will be a service, we can maintain train route in some static table, train_details will be populated each day. Because for any particular day, we can only book ticket in like 3-4months in advance. 2) it should not be worst case you can think of it as graph with order it topologically 3) we need api for pnr generation because each day the number of pnr generated will be high, so will be faster if we just pick some pregenerated value
Let say train is travelling between x to z through y. How you will handle booking like x to y and y to z for different classes on any date. How you will store the data.
you skipped the most important part of handling concurrency in this , what is the use of this video without that ? and here concurrency is different as compared to bookmy show as here we will need to take locks on range not a particular row like book my show as if train is going from A -> D and we need the journey from B - > C then how we will handle concurrent users trying to book at that B -> C JOURNEY SPECIFICALLY .
Kindly check out the complete playlist for system design. It is really tough to cover everything in one video, cause long monotonous video. Transactions are covered in payment gateway, locks in bookmyshow. Please let me know if further details are required will have a look and come up with some example
Do watch the video in Full, set the Play speed to 1.25X or 1.5X as per your convenience, do not depend on the PPT :)
I think, you have under-estimated the total size of the table for a month. Why? Because I feel, you have taken the avg stopping station per day per train is 80. This will work only if you book ticket from the source station to the destination station of the train. Where as in reality, ppl hardly do that. So you need to have all possible station combinations. i.e if a train has n stops, there will be n*(n-1)/2 such combinations i.e instead of 80, that number will be 3160, which shoots up the total table size for a month to ~245GB with rest your estimations.
I personally feel a Graph Database will be a better choice for this kinda problem.
For n*(n-1)/2 combinations
Storage required -> (20000*140*40*79 )/8 = 1.1 GB per day around 30 GB per month
Why do we also keep past train details in train_details table? to make search super fast, just keep the train data for the next 3 months, and create a seperate table for past trains.
Thanks for the useful tutorial. The another improvement is that we can truncate historical data of train details into separate table/database so that we can move them to other kinds of drive or compress them to save costs. The reason for this idea is the fact customers unlikely query for the historical train details. What do you think?
Queries:
1. It is assumed that there are 80 stations per train but when we store in the train search then shouldn't we be storing n2 entries for each train where n=80 so the estimation needs to be corrected.
2.We are sharding train_details_info table based on date but if user wants to cancel ticket/get_booking_details based on PNR then how will info be available will we be searhcing on each shard based on PNR?
3. Shouldnt the payment and booking db be same as we will not get the transaction functionality across multiple DBs.
Could you please help with these queries.
Thanks!!!
Train details table at any time will have 3 months data info which gets added every day. However it can archive old data.
You mention about partitioning is it table partitioning?
Yes, it is table partition
There are very flows in the train details design, you are considering from and to -> so suppose if a train goes to 10 stations so there will be 45 db entries for this. So instead of 80 you have to take 3160. That will increase our size for month to 240 GB and for 3 months 720 GB.
Seat allotment logic would be ?
important part is simply skipped
i came here to check how was the case being handled where we wanted to book from B->D
if train route is from A->B->C->D . felt my time was wasted here.sigh!
Hello Anant,
Did you find the answer? I am looking for the similiar answers! Can you please provide ref!
Great Content! I think we should introduce API gateway in place of load balancer and the API gateway will point to different load balancers for respective services (Booking Ticket Service, User Profile Service etc.)
Glad it was helpful. Do like and subscribe and share with others 🙂
Could you please make a video on detail regarding how to design api in details and also in coding as well?? So that we can have a great idea about building the api. Thanks for you hard work and constant effort to make a positive contribution in the community.
Hmm good point, will add this but it may take a while as have some other stuffs on priority list. Next week will be Tik Tok or Bloom filter 😉
@@TheTechGranth Thanks sir.
Thanks a lot for such a great video! Could you pls suggest some resources for LLD and system design. Thanks
There is no one source from where you can learn everything that is the gap I am trying to fill with the help of this channel. I will suggest you to read engineering blogs of tech companies like uber
Nice video 👏
Could you please help in understanding of concepts mentioned such as orchestration or choreography?
Glad it was helpful. Do like and subscribe and share with others.
For these core concepts, check outy video of microservices design patterns, here I have explained both of these as part of SAGA pattern
ua-cam.com/video/Bt7aC-7mEw0/v-deo.html
Hope you will find it helpful 🙂
Good explanantion, thank you :)
Hope it was helpful. Do like and subscribe and share with others 🙂
Please do videos on google translator, google news system design
How to get the irctc apis how to buy and where can I go?
Lot of things are not clear (at least for me).
Ex: If a train goes from A->B->C->D->E. How are you storing the data in DB because i may want to search from B->D then we neeed to see this train part of the search result
you never spoke about how we are going to handle concurrency which is very imp for train ticket booking
this is a high high level design. Need to incorporate lot more details.
In train details we are storing the starting point and end point for a train. If train stops at A B C D E, then in train details you will have entry for A->B B->C and so on. So your problem is solved 🙂
@@TheTechGranth now searching becomes too complex. Let's say train has 26 stops by alphabet names in order. Now let's i wannna go from c to o, then we have to find trains that has c to d, d to e, e to f,....., N to o.
@@koteshwarraomaripudi1080 you have to pass on the from_station as C and to_station as O for that day. Anyway we have partitioned the table on day basis
@@TheTechGranth Wouldn’t this logic be too complex.
For eg: if the train route is a, b, c, d, e, f, g, h (8 stations). As mentioned there will be 8 entries in train info details. Who will be loading these entries per date for each of the train. Will there be a separate service or would it be some job ?
If user wants to travel from b to g and books ticket, the seat counts/capacities should be deducted in 6 tables for that journey. Is there a chance of optimization ?
Also, in api design of booking, why generate pnr is api. It is important but an intrinsic api. Why does fetching a train route need a date as input ?
@@kvv6452 1) it will be a service, we can maintain train route in some static table, train_details will be populated each day. Because for any particular day, we can only book ticket in like 3-4months in advance.
2) it should not be worst case you can think of it as graph with order it topologically
3) we need api for pnr generation because each day the number of pnr generated will be high, so will be faster if we just pick some pregenerated value
Let say train is travelling between x to z through y. How you will handle booking like x to y and y to z for different classes on any date. How you will store the data.
To keep it simple, it should be 2 booking
you skipped the most important part of handling concurrency in this , what is the use of this video without that ? and here concurrency is different as compared to bookmy show as here we will need to take locks on range not a particular row like book my show as if train is going from A -> D and we need the journey from B - > C then how we will handle concurrent users trying to book at that B -> C JOURNEY SPECIFICALLY .
Can you tech me a coding to make a tatkal software
Nice One :)
Thanks 🙂
go into depth, transaction, locks, scalability etc, this was just scratching surface
Kindly check out the complete playlist for system design. It is really tough to cover everything in one video, cause long monotonous video. Transactions are covered in payment gateway, locks in bookmyshow. Please let me know if further details are required will have a look and come up with some example