In a booking system do we really care about the schedule of the trains? I would think having the DB to be more centric to "stations" would make the queries much simpler. So each "station" will have a list of arriving trains and their times. When a user would search for souce->destination we would get all the trains arriving at source and all the trains arriving at destination and returning the common train IDs in both the tables (when arrival time at destination > arrival time at source). but perhaps we are using more space this way.
Instead of arrival Time and Depart time, we can give Number to train stops. Like Origin station is 0 and then next station is 1 and increament it by 1 for next. So benefit would be we can use this TrainStop in query to find out if sourceStop
@@gk296 But doesn't that run into the same problem of recalculating paths from source to destination everytime we have a query? Since trains can go back as well you'd have to calculate this path everytime
I believe if we keep the destination as last destination like D for all Source like(A,B,C) location. Now we have entry from starting point for e.g A as source and D as Destination. Then next row will B station after A as source and D as Destination and so on like next Station as C and destination D. Now, if Destination for the traveler is not D then our serch will be in Source Column only and it will be much easier as Source and destination both will be in same column i.e. Source. We do a search in that column and once get the source and destination. Then subtract Destination start time i.e. source column e.g C location in source minus A or B as Start location. Please share your thoughts
This interview is way better than other interviews on youtube because over here Gaurav is really thinking off the bat. You can see him struggling with issues and eventually coming up with solutions. Also kudos to Keerti for making very valid points wherever needed.
This is the first “system design” interview I’ve watched. I’m not even familiar with a lot of terms mentioned by Gaurav but it was absolutely amazing to watch the thought process! I’m surprised how similar system design is to solving a case as a consultant. The entire structured thought process and planning gives a wholesome experience to both the interviewer and the interviewee. Good work!
I think gaurav was not prepared for the interview or he has't aware of IRCTC application, have been following him past 3 years.. first time I have seen Gaurav struggling for a solution. Mean while Keerthi was well prepared for he r panel..
Just an FYI, in System Design Interviews, you shouldn't be asked to design a 'system'. The best way to approach an interview is to ask the interviewer what we need to design, rather than just throwing everything around. Design interviews don't have a question along the lines of 'need to design IRCTC' - but rather, say, design a train booking system, where we can do X, Y and Z etc. And then further collaboration leads to figuring out more requirements.
umm no - system design interviews are not only about "system design" but also about how the candidate is able to take an open ended question and deal with the ambiguity , break down the problem into parts , negotiate which ones are important and then come up with a solution. May be for SDE 1 and 2 your approach is right but for higher level roles there has to be some ambiguity baked in the problem
@@shrikantkhadilkar4019 yeah - and here are two SDE2's trying to show what a system design interview looks like for an L6 IC. That's my point saying why it's "ridiculous". This isn't the best way to conduct design interviews, and anyone with enough decent interview experience knows that.
@@Amritanjali That would be incorrect in that case. Moreover, anyone could have written that section of their wiki page. I know Passport Seva was developed by TCS for sure (it is there right at the bottom of its home page 😛) PS: I wonder if UA-cam was designed by TCS too. They are flagging my comment for no reason.
In IRCTC, if there is no direct train between two stations then it tells me no direct trains found, unlike the case of flights, that leads me to thinking the following possibilities of the database. - The one which Gaurav pointed out as a brute force, where he first filters out the sources and then for those rows, filters out the destinations. - Having two more tables, one called arrivals, and departures with the columns in them being the Station ( of arrival, or departure, and time ) and the train id, and for stations which are intermediate stations, they exist in both, and then query on each of them and get the common trains. The disadvantage would be that there would be 2 entries in the worst case for each train and station pair ( one in each table ), which given an assumption of 50 stations per train journey and around 13 thousand trips per day would lead to 2 * 50 * 13k which is 1.3 million records, and assuming each record to be ( 4 chars for station making it 4 * 2 = 8 bytes ( assuming the station codes like SBC, SMET, NDLS and so on ) + 8 bytes of date and + 8 bytes for id) 24 bytes it is roughly 1.5 * 25 = 0.375 GB, I do not think it is graph db in IRCTC, because then it would allow to connect destinations which do not have a direct train. Also all of these assumptions ( also in the video ) might be for a single day right? In a real case we must multiply them by 130 for the 3 month future right? Am I missing something?
Better way to store the schedules would be storing all stations(st) for a particular schedule(one way full run of train). That way you just have to join the table to itself and query would be something like (select * where station is source) as t1 join (select * where station is destination) as t2 on t1.scheduleId = t2.scheduleId and t2.arrival > t1.departure
I really loved this video! One of the best system design videos I have seen. Rather than watching videos where the design is already prepared and just explained to the viewers, this format helps us to know how it is to be a part of a real interview and the problem chosen is very complex of course. I kept on thinking how I would approach the problem while watching it. Great content ! I have a newly found respect for IRCTC as well !
Great video, and awesome explanation by Gaurav Also, the interviewer helped him ease the process. Some Possible design improvements/suggestions for someone looking in future interview preparation for System design. 11:25 We can provide an option to chose window or non-window seats. (And Pick consecutive seats, not random seats(general human requirement)) 17:06. Exact scale as of 2021. Indian Railways is among the world's largest rail networks, and its route length network is spread over 67,956 km, with 13,169 passenger trains and 8,479 freight trains, plying 23 million travelers and 3 million tonnes (MT) of freight daily from 7,349 stations. 31:55 Gaurav's design is great but here is one thing that also can be done maybe it is not that good a design as proposed by him. There are 7,325 stations all over India. So better not focus on time focus on 7days * (7325(stations) * 7325) entries. that is assuming that from source x to source y there may be roughly around 100 stops for a train in the worst case[Amritsar Express]. (for most of the trains they are limited to 10 to 20). In my approximate calculation for all entries if we want to go for storing all the entries in a single table then there will be a total of around 10 Million entries and querying on a date for a journey is much simpler rather than binary searching over time indices. This method is easily extensible on graph databases too. The follow-up operation of locking is also not much costly if for all 7349 sources we have separate normalized tables and locking those rows for all the stations involved in a journey is very less close to ~ (20-100) Some things that can be addressed or enhanced in the system. 1. Audit queries 2. Tatkal Mode. 3. Refund system Follow-up design for Tatkal. Limit 10 % seats in each category to not be booked until one day before 11 AM the journey of train starts. And then separately follow the same design/locking structure over the new table. Regarding separate categories,only the cost will vary as these seats will have the same journey schedule routes locking etc. so add an extra column cost in the tables. To make life more simple. Thank You! Pardon if some information is incorrect above.
To search for the trains between destination A to B … how about we store all the trains coming at destination A , retrieve it then get all the trains reaching at destination B and then take their intersection.we will get all the train from source A to destination B.
For Query to get Train Infomation/Trains Available for a particular day - 1. Assume we are going to show availability for next 2 months 2. Keep A train schedule info for next 2 months in a json structure in ES kinda store 3. Model day as an epoch format - becomes easy for search 4. Get All trains running between station A to B on day 10-10-2021 (consider we show sation which is A/B not any near by) a. Get all trains that has a stoppage in Station A - Then filter out trains that doesn't stop in station B b. Consider trainId, runDate from the results of prev step Search is a read intensive work, doesn't need to go with ACID complient DB. Or even a smarter way would be to maintain a cron for each train Id that says which date the train runs in. This way once we get trains that has a stoppage in station A/B - we can easily evalute rundate against cron expression - we really dont need a BTree structure here
I think handling the routes via a graph table like Neo4j could be an approach where you map one route out with multiple nodes and then have a in memory reference to all the times a train can be at a particular route.
Great efforts by both 👍 this was indeed a complex problem. I think using relational DB makes it more intuitive to design. We can use routes table with route Id, source, destination. each train have multiple routes and each route can have multiple trains so simply create a mapping table. Although no of rows will be more but search will be faster with proper indexes. Also this helps with booking. We can simply preallocate certain seats to each route for particular train id.(although not very fair but easier to give working solution). For concurrency we have few options - assign workers statically but this won't work for variable workers. So we can simply set locking mode to row level and Read and update count within transaction. Again mapping route would help hear as we have single row to handle. For assigning actual seats we can handle that later based on preference or randomly programmatically so no need to consider that in DB. Let me know your thoughts on this.
At 26:00 minutes , for Search Train query, I believe there is no need for BFS, you just found some rows where B is source and other rows where D is destination, After this you only need to check matching Train Ids ( natural join or something) and timing. Let me know if I am missing something.
I think you are right, Let's say the table [trains] has following columns only id,src,dest (ignore departure and arrival time) then the query will be, _SELECT __T1.id__,T1.src,T2.dest FROM trains as T1 INNER join trains as T2 on T1.id=__T2.id__ where T1.src='A' and T2.dest='D';_
You guys missed a very crucial point here. Assume you have to travel from delhi to jaipur and some one has to travel from Alwar to Dosa (these station falls between Delhi to jaipur route). Then the table proposed will never have a source to destination entry rather entries will be Stop 1 to Stop 2 and then Stop 2 to Stop 3 and so on. So a source to destination match can never happen unless you do a DFS or some sort of traversal. One way i am thinking but that obviously will increase the storage complexity is to have a Map as the value. Now you should store all possible combination Stop1:[Stop2……..StopN] similarly for the same train Stop2: [Stop3……..StopN]. I know it’s not the optimal way but can be a quickest way to find your answer for a query give me all trains between Mumbai and gujrat for tomorrow
Wonderful guys. Good work. Thanks! One of the approaches to find the available trains for a given source and destination could be (in a single query): (An example to find trains for 'A' to 'C') with a as (select train, src, count(src) from route where src = 'A' group by train, src union select train, dest, count(dest) from route where dest = 'C' group by train, dest) select a.train from a group by a.train having count(a.train) > 1 Other Auxiliary option perhaps could be to store the Trains and their stations in another pre-computed table where stations can be a simple 'Array' type and query that first and get the trains in a table backed by a simple LRU cache. The assumption is the relationship between the trains and stations hardly change and therefore precomputing can bring a good gain.
Almost all interviews are conducted online now using some drawing tool, which means no access to the whiteboard. Putting thoughts down and drawing diagrams is a lot faster/easier on a whiteboard, resulting in effective communication & able to cover a lot more things in the 40-45mins interview. I suggest all future interviews in this channel to use a drawing tool rather than a whiteboard to make it close to real-interviews.
I found your channel through linkedin , and I would definitely say, This is the one of the best channel i found so far , we'll find here CP people who are being interviewed , we'll find System design people and so on.... And the good thing is we are actually getting the environment of how actually interview happens , because we don't know ki interviewer kaisa hoga usse kaise bat krni chahiye, everything in your channel , truly appreciate your work .
It's my first time seeing the System Design Interview. Didn't consider this as important part earlier. But seems like designing Consumer centric applications are way harder than designing enterprise applications.
One of the most challenging problems for IRCTC is managing Tatkal bookings from millions of users simultaneously. Possible approaches imo 1. Sharded Booking Counter: sharded RDBMS setup to ensure strong consistency, may impact overall availability though. 2. Leaderless Replication-Based Data Store: Utilize a CRDT-based system like Redis, though this might result in occasional overbookings or underbookings due to its eventual consistency model.
@29:05 why not get all the trains which start from staion B and separately get all the trains which ends in station D and do an inner join between these two data sets where the key will be train id
Fckin hell. This interview gave me so much idea about system design and also the importance of DSA upto certain point. Thank y'all both. Subscribed to your channel btw.
@44:35 in the route table even though no of seats available for destination A and D =1 he/she won't be able to book the seat bcz intermediate stations B and C has no of seats 0. then what is the significance of having available seats of A and B!!!
I wonder if DB will scale if you do pessimistic locking while booking the tickets. Most probably, you will have to do optimistic locking, the first one wins and subsequent ones will fail and they will have to retry. Optimistic locking will increase the scalability.
I instantly got an idea, that once we have sorted them out based on ETA and EDA, then we get all the ids of the train who have our preferred starting point in any of the source entries, and then we see if the same trains we got from first query, have our preferred destination address in its destination couloumn. Whatever we get here should be the answer.
I think if we maintain a map having the place name as key and list of trains running thru that place as values as at one point of time one place will be searched. When user clicks on search we have the set of trains as those will be common between 2 maps. To fetch the details of time, days train runs, vacant seats can be found from different db. Also if think passenger trains those don't run more than a day.
Do we really need graph DB? Let Admin create Trip where trip contains sequence of stops and associated train. When user search by entering from and to stops, get the list of trips passing through to and from stops. Intersection of that provides list of trips.
I think in Schedule table, it should list stations instead of subtrip between adjacent nodes/stations with estimated arrival and departure time. That way we can build a query with having destination station should have higher ETA than that of ETD of source station.
For the search query, we can find the source as discussed in the video and for the destination, we can search the database for an entry having the same train ID and the requested destination as destination with expected arrival time > EDT of the source and EAT < end time. If multiple entries exist with EAT < end time for the same train ID, take the one with the lowest EAT
For booking, we can do it in memory for each train we can have in memory linked list. Each node in the linked list would be stop where it is sorted from source to destination. we will also have the seats available corresponding to each node in this linked list. So if the query comes as available seats from source to destination, we will query the linked list nodes as give it back to the UI. The seatids which are common in all the nodes inside the linked list which are available within source to destination will be returned back. The complexity would be (number of stops*number of seats requested). Once we go ahead and book the seats, we will have to take the lock on the linked list and remove the taken seats from the DS. One problem with this approach he is coordinating the in memeory data structure with different servers which are getting the same request. For this we can have a qourum specially dedicated for this purpose where the master node will be serving the availability and booking request and the followers would read the WAL to follow the updates on master on this linked list. If we want high consistency we can avoid split brains and only commit the booking once the linked list gets updated to majority of the followers. Also we can have X number of such quorums to shard the train ids on these. While updating we will only take lock on the linked list which we are updating i.e corresponding to the train. If the master fails, the follower which is consistent with master would become the leader (For this we would need RAFT like consensus protocol). Doing it all in memory would decrease the latency For each update in the linked list, we need to update the persistent store i.e SQL in this case which would be bookingid, seatId, trainId, source, destination. If the complete data center goes down we can build the in memory store from the SQL
nice video. learned a lot for parallel seat booking, whenever 1st user click on perticular seat, then it shows filled/booked for other customer. If 1st user can't be able to book that seat on given amount of time then it is available for booking to all customers.
Another approch can be to store all the stop points for every train in route wise,so while you search for train b/w two stations you can just compare first row generated id with another row generated id so if first one is lesser then the second one so we can say there is a route b/w A-B
In place of optimistic locks or pessimistic locks ..what if there is lock on minimum values ..For example if in a Set of source to destination we will find the minimum value of number of seats and put those as locking criteria .This will make system available more. And at same time more users will be able to book tickets...right
I don't know why anyone haven't noticed but to get train from a sorce to destination we can just simply run join query to get the trains Query : [rows have source A] X(join on basis of train_id) [rows have destination B]. you'll get trains between those routes and then you can perform time sorting.
While searching for trains going from a particular source to destination because we are using relational database can't we do a self join on this table where (trainId of tableA = trainId of tableB) and (source of tableA = sourceProvodedByUser) and (destination of tableB=destinationProvidedByUser) and based on this we'll simply get the date and time for every train going from source to destination on that particular day and we can sort them using ORDER BY clause instead of using any graph based algorithm which will make implementation more complex.
I think we can use the not only sql db to store about trains like the id of a train for key and in value we can use the array like data-structure for storing route. Every train has its route with arrival and departure time from every stattion. If we have to search via source and destination so we can check like the train id who has the source and destination both value in its route we return only those trains.
Locking for the transaction and booking a seat is difficult practically. That is the main reason why IRCTC came up with WAITING LIST feature.. which actually made the system more flexible, easy and most importantly profitable :)
I would have opted for RDBMS, preferably Postgres to store train schedules. As rightly mentioned by @Gaurav, it's cheaper than NoSql and the records are meant to be read-heavy but instead keeping multiple records for one train, I think it would be better to keep only one record per train (15000 records) and keep schedule as json in a jsonb column, coz in general, trains do not change but their schedule are likely to change a lot, also the train can have different schedule for each day. Please let me know if someone agrees or have some other opinion. Thanks 😊
I wouldn't hire Gaurav based on this interview :) :) It's so confusing, complete system architecture has been skipped as well. It ended up being data model design and not really an application or system design.
Dbms part is very much confusing .. just define stations.. then routes .. then schedules.. While multiple people is trying to book locking may not be the solution. You have to check compensating transaction design pattern .. if 10 people is trying to book ticket if some one booked successfully then revert all other transactions
Definitely IRCTC is very complex system, very good interview. We must have discussed user input throughly i.e. From & To Station start + Date + Class(Optional) before designing data base etc.
My query would be focused on Source and destination (That's a seprate table) and join that with table containing the schedule with Train ID being the join key (for RDBMS). Select train_id from tab_from_to where Source=x and destination =y Select scheduled from tab_sechedule where train_id in (Select train_id from tab_from_to where Source=x and destination =y)
Instead of locking entire rows for booking, how about having different rows per seat with source and destination as columns and lock that row? One thing that will be needed is if one books from A->C and train goes from A->Z, then there should be another row added for C->Z. Otherwise, we can add one entry for each {source, destination} combination per ticket. This might give better parallelism.
Even though this video is 2 years ago just have one doubt about selection of SQL db for storing train schedules. For a train having 11 stations in between no. of entry in db will be 55. For 51 stations no. of entry will be 25*51. now like the entire assumption changed db becomes 25 times larger which corresponds to 2GB of data
Great video. I had a question, will Prime music and other music streaming services be similar to say 'Netflix' as they are just streaming Music instead of videos?
in an IRCTC schedule LLD i was given to design this table.. schedule/stops.. so it was fun seeing gaurav struggling here.. :D but maintaining seats station was a great usecase
This was wonderful mock interview. I like how Gaurav approached it, and mainly the fact that he kept the focus on data. I would have been side tracked by some other components as well - like what microservices would be there, ( search service, booking service, a way to add availability every day, caching the search results etc.) I do feel that data could have been modelled better in graph DB. Nodes could be stations, and edges are List of trip object, connecting two stations. Doing a BFS to find a path would be easier I guess on graph. Booking and Availability still could be MySQL to ensure concurrency. My 2 cents about the asking the interviewer about estimates - It depends on seniority level you are targeting. For junior levels, you could ask - what are the requirements, and interviewer would reply - search and booking. For senior levels, you would come it to yourself and think what system is supposed to do. Same goes for estimates where you are guessing the non-functional requirements. Just my thoughts. Thanks for doing it !
That's very interactive and good take away to learn. It would be really nice, If we prepare in depth and then come up with a video. Though we will miss the excitement of Gaurav. Nice to watch the interaction and discussion.
To search for trains that are going from a source to a destination and between a particular time, we can create self join. From first table, take it as source and second table as destination and do the query. Query will look like this: select t1.* from train t1 inner join train t2 on t1.trainId = t2.trainId where t1.source = sourceStation and t2.destination = t2.destinationStation and t1.edt > startTime and t2.eat < endTime;
This was a really great video on System design interview. I watched it in segments over a span of a week. My overall understanding of how to approach System Design has improved a lot. Thanks for taking efforts to make such videos and thanks to Gaurav for explaining such insighful inputs/points.
Great interview discussion since it has a lot of things to look at during interview. Few things I think like seats instead of counter why not keep individual rows in the row since every seat will have status and other details. Contention can also be controlled on the row only instead of in counter case. Also for trains I believe why to keep A->B and B->C instead why not keep an entry like 1. train_id, A, ETA 2. train_id, B, ETA 3. train_id, C, ETA and then query simply the trains in memory travelling to A and C and since the data will be less in memory can simply run a loop to check ETA_A < ETA_C and return. Something that sort of.
Nice! - So - if we order by ETA, and then group by trainId - that way we can know all trains that have a SRC and a DEST both In their routes - using group_concat and a like clause in the query
Hi, when do we discuss about just the database schema design or the high level system architecture? It looks impossible to cover both in just one hour.
Can I have video on 1. How to handle idempotency in consumer for an event if there are too much load of events so that there will not be much load on database? 2. How does payment service work when payment gateway such as phonepe screen times out while payment? In general, how payment service works with payment gateways?
Don't think IRCTC allows to select a specific seat. So if 10 seats are there it will allocate seats in first come first serve basis. What could be the design for this specific feature?
Please correct me if my suggestion is wrong but for storing lets say we are keeping it in mysql db for acid prod and consistency and for searching we can have a ds like graphs for traversals so our search query boils down to travel from source to destination and we can apply graph traversal algorithms on the graph ds. Also we can pre process it and store it in cache or something so that it can be available to users in real time without any latency delay. Similar preprocesses can be done for prices low to high, or less durations [can use heaps as a DS and again preprocessing and storing it in cache]. Would love to hear your suggestions on this.
This clearly shows no matter whether you have designed tons of system design videos, given a complex system in one hour window would make anybody nervous. He didn't even go close to what the system should be. Train search service didn't work, was just able to dodge. Even keeping the system without particular seat arrangement he was not able to successfully book the seats. The entire discussion was around databases which also didn't work out. No architecture diagram, no scalability talk, no fault tolerance. Was it really a discussion which can land a job honestly?
lock, so called strong read/write/transaction, are generally provided by any production ready storage solution nowadays. what you have described here for implementing lock should be handled by database API.
But we never got a proper solution for the search of a train query. How do we do that in an optimized fashion without worrying about creating a graph for every query???
great content Keerti but i think there is some issue with how gaurav is calculating and saving the seat no.s at 46:00 where we can actually accommodate 4 people i.e. 1 from a to b, 1 from b toc , 1 from c to d and 1 from a to d ( 2 people travelling at any given time), his algo won't actually allow to do this as count of b and c are already 0.
Are we storing the EDT EAT for all the trains for next 3 months? Doesn't the route remain same on a weekly basis.. that way we would be able to save a lot on space?
I am studying this in sept 2023 , this is the first video for me to study system design , anyways I didn't get much but I get to know how these big sites are being made on this much high scale
Checkout more beginner friendly videos on my channel, for example- there are videos where I am teaching my father what is system design. I really think you will find them interesting. Let me know!
Can we decouple the source and destination(two different tables). First we can get a list of trains which have the requested Source on the date then have a join query to find out which all trains from these have destination. On destination we can index the data on train id and in source I agree with Gaurav it should be on departure date. We may not bother about arrival time on destination for our query.
while querying for source to the destination we could have queried with matching source and then queried destination whose arrival time greater than equal to source departure time in the previous query because if the arrival time of destination of a particular train is greater than source departure time it is certain that there is a route from source to destination. And if the user wants a route that can be separately queried as I don't think there would be a lot of queries for that.
I think having just one column named 'STOP' instead of 'Source' and 'Destination' and other columns like 'Duration of time to reach a stop from source' and 'Sequence number of stop' could have solved many complications. I am not too sure, but I think so.
I think the best way to handle concurrency is maybe to use queues. We can maybe have a queue for each train/couple of trains and only one instance will read the data from that queue.. So now all the requests coming to one train will be handled by that instance and we can completely avoid locks and thus improve the performance a lot on a scale
I am not much clear on the db structure like how to store all the stop's info for a train and how to search on it efficiently. Gaurav you tried your best to explain the concepts. You are the best. I would request you to come up with one more video where we will only discuss about the database schema
Wow, system design seems very interesting topic with so many small problems to solve that beautifully come together as one system. Loved the thought process by Gaurav and the feedback later 👍
System Design questions should actually be this hard then people won't speak out exact words they read somewhere with poker face on and will actually have to come up with solutions based on their experience.
Great quality content, didn't think of any other thing while watching neither did feel to take a break to understand, the pace of video and overall thought process is very gradual. Nice job Gaurav and Keerti.
This is the first video i have seen about system design and while watching the video i went through all comments,your reply to them was funny.You were very fascinating in whole video.
In a booking system do we really care about the schedule of the trains? I would think having the DB to be more centric to "stations" would make the queries much simpler. So each "station" will have a list of arriving trains and their times.
When a user would search for souce->destination we would get all the trains arriving at source and all the trains arriving at destination and returning the common train IDs in both the tables (when arrival time at destination > arrival time at source). but perhaps we are using more space this way.
Instead of arrival Time and Depart time, we can give Number to train stops. Like Origin station is 0 and then next station is 1 and increament it by 1 for next.
So benefit would be we can use this TrainStop in query to find out if sourceStop
@@gk296 But doesn't that run into the same problem of recalculating paths from source to destination everytime we have a query?
Since trains can go back as well you'd have to calculate this path everytime
I believe if we keep the destination as last destination like D for all Source like(A,B,C) location.
Now we have entry from starting point for e.g A as source and D as Destination.
Then next row will B station after A as source and D as Destination and so on like next Station as C and destination D.
Now, if Destination for the traveler is not D then our serch will be in Source Column only and it will be much easier as Source and destination both will be in same column i.e. Source.
We do a search in that column and once get the source and destination. Then subtract Destination start time i.e. source column e.g C location in source minus A or B as Start location.
Please share your thoughts
This interview is way better than other interviews on youtube because over here Gaurav is really thinking off the bat. You can see him struggling with issues and eventually coming up with solutions. Also kudos to Keerti for making very valid points wherever needed.
Thanks Diptanshu! Means a lot❤️❤️😇😇
Agreed
I can feel Gaurav's nervousness, I like the comments you had attached where ever he has done mistake. I love it. Great creator.
Thanks Alok, means a lot😇😇
@@KeertiPurswani One more complement, I loved your paintings 🎉. All are awesome 😍. Btw Happy Teachers Day Gaurav Bhaiya and Keerti ❤️
Today I get to know the Importance of Data Structures and DBMS that I am learning in my undergrad.
And that’s one of the best comments I got on this video😇
This is the first “system design” interview I’ve watched. I’m not even familiar with a lot of terms mentioned by Gaurav but it was absolutely amazing to watch the thought process! I’m surprised how similar system design is to solving a case as a consultant. The entire structured thought process and planning gives a wholesome experience to both the interviewer and the interviewee. Good work!
So glad you liked the video Shivangi, hoping it will motivate you to know more about system design. You can checkout our channels for the same😊😊
Definitely will! :)
@@KeertiPurswani
At what package he got selected
I think gaurav was not prepared for the interview or he has't aware of IRCTC application, have been following him past 3 years.. first time I have seen Gaurav struggling for a solution. Mean while Keerthi was well prepared for he
r panel..
Who's the guy in the red shirt?
No idea but he said he has taken Gaurav Sen course so probably a good guy.
@Cinsad, wish I had the talent of coming up with replies like you🤭🤭
@@cinsad8023 hahaha 😁😛
He's Rachit Jain
@gaurav hiding behind the wall
Just an FYI, in System Design Interviews, you shouldn't be asked to design a 'system'. The best way to approach an interview is to ask the interviewer what we need to design, rather than just throwing everything around. Design interviews don't have a question along the lines of 'need to design IRCTC' - but rather, say, design a train booking system, where we can do X, Y and Z etc. And then further collaboration leads to figuring out more requirements.
umm no - system design interviews are not only about "system design" but also about how the candidate is able to take an open ended question and deal with the ambiguity , break down the problem into parts , negotiate which ones are important and then come up with a solution. May be for SDE 1 and 2 your approach is right but for higher level roles there has to be some ambiguity baked in the problem
@@shrikantkhadilkar4019 yeah - and here are two SDE2's trying to show what a system design interview looks like for an L6 IC. That's my point saying why it's "ridiculous". This isn't the best way to conduct design interviews, and anyone with enough decent interview experience knows that.
@@thealphaking816 meh
people watching this video to switch to good product based company
but fun fact is IRCTC designed by TCS❤
That's a wrong fact. IRCTC was developed by Centre for Railway Information Systems (CRIS). They are the maintainers as well!
Chal zhooti.
@@manishbhatt1101 I got to know that from wiki
go to wiki page then go to service then you can see
even passport india developed by TCS
@@Amritanjali That would be incorrect in that case. Moreover, anyone could have written that section of their wiki page.
I know Passport Seva was developed by TCS for sure (it is there right at the bottom of its home page 😛)
PS: I wonder if UA-cam was designed by TCS too. They are flagging my comment for no reason.
@@manishbhatt1101 lol
In IRCTC, if there is no direct train between two stations then it tells me no direct trains found, unlike the case of flights, that leads me to thinking the following possibilities of the database.
- The one which Gaurav pointed out as a brute force, where he first filters out the sources and then for those rows, filters out the destinations.
- Having two more tables, one called arrivals, and departures with the columns in them being the Station ( of arrival, or departure, and time ) and the train id, and for stations which are intermediate stations, they exist in both, and then query on each of them and get the common trains. The disadvantage would be that there would be 2 entries in the worst case for each train and station pair ( one in each table ), which given an assumption of 50 stations per train journey and around 13 thousand trips per day would lead to 2 * 50 * 13k which is 1.3 million records, and assuming each record to be ( 4 chars for station making it 4 * 2 = 8 bytes ( assuming the station codes like SBC, SMET, NDLS and so on ) + 8 bytes of date and + 8 bytes for id) 24 bytes it is roughly 1.5 * 25 = 0.375 GB,
I do not think it is graph db in IRCTC, because then it would allow to connect destinations which do not have a direct train.
Also all of these assumptions ( also in the video ) might be for a single day right? In a real case we must multiply them by 130 for the 3 month future right? Am I missing something?
Better way to store the schedules would be storing all stations(st) for a particular schedule(one way full run of train). That way you just have to join the table to itself and query would be something like (select * where station is source) as t1 join (select * where station is destination) as t2 on t1.scheduleId = t2.scheduleId and t2.arrival > t1.departure
I really loved this video! One of the best system design videos I have seen. Rather than watching videos where the design is already prepared and just explained to the viewers, this format helps us to know how it is to be a part of a real interview and the problem chosen is very complex of course. I kept on thinking how I would approach the problem while watching it. Great content ! I have a newly found respect for IRCTC as well !
Thanks Anirudh. Means so much!😇😇
Great video, and awesome explanation by Gaurav Also, the interviewer helped him ease the process.
Some Possible design improvements/suggestions for someone looking in future interview preparation for System design.
11:25 We can provide an option to chose window or non-window seats. (And Pick consecutive seats, not random seats(general human requirement))
17:06. Exact scale as of 2021.
Indian Railways is among the world's largest rail networks, and its route length network is spread over 67,956 km, with 13,169 passenger trains and 8,479 freight trains, plying 23 million travelers and 3 million tonnes (MT) of freight daily from 7,349 stations.
31:55 Gaurav's design is great but here is one thing that also can be done maybe it is not that good a design as proposed by him.
There are 7,325 stations all over India. So better not focus on time focus on 7days * (7325(stations) * 7325) entries. that is assuming that from source x to source y there may be roughly around 100 stops for a train in the worst case[Amritsar Express]. (for most of the trains they are limited to 10 to 20). In my approximate calculation for all entries if we want to go for storing all the entries in a single table then there will be a total of around 10 Million entries and querying on a date for a journey is much simpler rather than binary searching over time indices.
This method is easily extensible on graph databases too.
The follow-up operation of locking is also not much costly if for all 7349 sources we have separate normalized tables and locking those rows for all the stations involved in a journey is very less close to ~ (20-100)
Some things that can be addressed or enhanced in the system.
1. Audit queries
2. Tatkal Mode.
3. Refund system
Follow-up design for Tatkal. Limit 10 % seats in each category to not be booked until one day before 11 AM the journey of train starts.
And then separately follow the same design/locking structure over the new table.
Regarding separate categories,only the cost will vary as these seats will have the same journey schedule routes locking etc. so add an extra column cost in the tables. To make life more simple.
Thank You! Pardon if some information is incorrect above.
To search for the trains between destination A to B … how about we store all the trains coming at destination A , retrieve it then get all the trains reaching at destination B and then take their intersection.we will get all the train from source A to destination B.
For Query to get Train Infomation/Trains Available for a particular day -
1. Assume we are going to show availability for next 2 months
2. Keep A train schedule info for next 2 months in a json structure in ES kinda store
3. Model day as an epoch format - becomes easy for search
4. Get All trains running between station A to B on day 10-10-2021 (consider we show sation which is A/B not any near by)
a. Get all trains that has a stoppage in Station A - Then filter out trains that doesn't stop in station B
b. Consider trainId, runDate from the results of prev step
Search is a read intensive work, doesn't need to go with ACID complient DB.
Or even a smarter way would be to maintain a cron for each train Id that says which date the train runs in. This way once we get trains that has a stoppage in station A/B - we can easily evalute rundate against cron expression - we really dont need a BTree structure here
I think handling the routes via a graph table like Neo4j could be an approach where you map one route out with multiple nodes and then have a in memory reference to all the times a train can be at a particular route.
WXACTLY THATS WHAT I WAS THINKING AS WELL.
Great efforts by both 👍 this was indeed a complex problem. I think using relational DB makes it more intuitive to design. We can use routes table with route Id, source, destination. each train have multiple routes and each route can have multiple trains so simply create a mapping table. Although no of rows will be more but search will be faster with proper indexes. Also this helps with booking. We can simply preallocate certain seats to each route for particular train id.(although not very fair but easier to give working solution). For concurrency we have few options - assign workers statically but this won't work for variable workers. So we can simply set locking mode to row level and Read and update count within transaction. Again mapping route would help hear as we have single row to handle. For assigning actual seats we can handle that later based on preference or randomly programmatically so no need to consider that in DB. Let me know your thoughts on this.
At 26:00 minutes , for Search Train query, I believe there is no need for BFS, you just found some rows where B is source and other rows where D is destination, After this you only need to check matching Train Ids ( natural join or something) and timing. Let me know if I am missing something.
I think you are right,
Let's say the table [trains] has following columns only id,src,dest (ignore departure and arrival time)
then the query will be,
_SELECT __T1.id__,T1.src,T2.dest FROM trains as T1 INNER join trains as T2 on T1.id=__T2.id__ where T1.src='A' and T2.dest='D';_
You guys missed a very crucial point here. Assume you have to travel from delhi to jaipur and some one has to travel from Alwar to Dosa (these station falls between Delhi to jaipur route). Then the table proposed will never have a source to destination entry rather entries will be Stop 1 to Stop 2 and then Stop 2 to Stop 3 and so on. So a source to destination match can never happen unless you do a DFS or some sort of traversal. One way i am thinking but that obviously will increase the storage complexity is to have a Map as the value. Now you should store all possible combination Stop1:[Stop2……..StopN] similarly for the same train Stop2: [Stop3……..StopN]. I know it’s not the optimal way but can be a quickest way to find your answer for a query give me all trains between Mumbai and gujrat for tomorrow
Wonderful guys. Good work. Thanks!
One of the approaches to find the available trains for a given source and destination could be (in a single query):
(An example to find trains for 'A' to 'C')
with a as (select train, src, count(src) from route where src = 'A' group by train, src
union
select train, dest, count(dest) from route where dest = 'C' group by train, dest)
select a.train from a group by a.train having count(a.train) > 1
Other Auxiliary option perhaps could be to store the Trains and their stations in another pre-computed table where stations can be a simple 'Array' type and query that first and get the trains in a table backed by a simple LRU cache.
The assumption is the relationship between the trains and stations hardly change and therefore precomputing can bring a good gain.
Almost all interviews are conducted online now using some drawing tool, which means no access to the whiteboard. Putting thoughts down and drawing diagrams is a lot faster/easier on a whiteboard, resulting in effective communication & able to cover a lot more things in the 40-45mins interview.
I suggest all future interviews in this channel to use a drawing tool rather than a whiteboard to make it close to real-interviews.
I found your channel through linkedin , and I would definitely say, This is the one of the best channel i found so far , we'll find here CP people who are being interviewed , we'll find System design people and so on.... And the good thing is we are actually getting the environment of how actually interview happens , because we don't know ki interviewer kaisa hoga usse kaise bat krni chahiye, everything in your channel , truly appreciate your work .
It's my first time seeing the System Design Interview. Didn't consider this as important part earlier. But seems like designing Consumer centric applications are way harder than designing enterprise applications.
One of the most challenging problems for IRCTC is managing Tatkal bookings from millions of users simultaneously. Possible approaches imo
1. Sharded Booking Counter: sharded RDBMS setup to ensure strong consistency, may impact overall availability though.
2. Leaderless Replication-Based Data Store: Utilize a CRDT-based system like Redis, though this might result in occasional overbookings or underbookings due to its eventual consistency model.
@29:05 why not get all the trains which start from staion B and separately get all the trains which ends in station D and do an inner join between these two data sets where the key will be train id
Concurrency handling for booking a seat in IRCTC case is really tough. Wish they publish some white papers (don't know if already there) on it?
Fckin hell. This interview gave me so much idea about system design and also the importance of DSA upto certain point. Thank y'all both. Subscribed to your channel btw.
Thank you so much Sagar, means a lot😇😇
@44:35 in the route table even though no of seats available for destination A and D =1 he/she won't be able to book the seat bcz intermediate stations B and C has no of seats 0. then what is the significance of having available seats of A and B!!!
I wonder if DB will scale if you do pessimistic locking while booking the tickets. Most probably, you will have to do optimistic locking, the first one wins and subsequent ones will fail and they will have to retry. Optimistic locking will increase the scalability.
Yup, we should use optimistic locking. I think this is how it is handled in e-commerce systems also.
I instantly got an idea, that once we have sorted them out based on ETA and EDA, then we get all the ids of the train who have our preferred starting point in any of the source entries, and then we see if the same trains we got from first query, have our preferred destination address in its destination couloumn.
Whatever we get here should be the answer.
I think if we maintain a map having the place name as key and list of trains running thru that place as values as at one point of time one place will be searched. When user clicks on search we have the set of trains as those will be common between 2 maps. To fetch the details of time, days train runs, vacant seats can be found from different db. Also if think passenger trains those don't run more than a day.
Do we really need graph DB? Let Admin create Trip where trip contains sequence of stops and associated train. When user search by entering from and to stops, get the list of trips passing through to and from stops. Intersection of that provides list of trips.
I literally love the way you clean my dirty screen at the beginning of every video so that I can see and understand clearly.
Much love, Thanks.
0:00
😂😂😂😂 your welcome. This is the only type of cleaning I am good at🤭✌️
Haha! Happy diwali (ki safai) ♥️
This will probably be the toughest HLD I have ever seen. Excellent choice of question !
Hehe! Thanks Fakrudeen! 😇😇
I think in Schedule table, it should list stations instead of subtrip between adjacent nodes/stations with estimated arrival and departure time. That way we can build a query with having destination station should have higher ETA than that of ETD of source station.
Later realised he fixed his approach and used the same way I suggested 😛
For the search query, we can find the source as discussed in the video and for the destination, we can search the database for an entry having the same train ID and the requested destination as destination with expected arrival time > EDT of the source and EAT < end time. If multiple entries exist with EAT < end time for the same train ID, take the one with the lowest EAT
For booking,
we can do it in memory
for each train we can have in memory linked list. Each node in the linked list would be stop where it is sorted from source to destination.
we will also have the seats available corresponding to each node in this linked list.
So if the query comes as available seats from source to destination, we will query the linked list nodes as give it back to the UI. The seatids which are common in all the nodes inside the linked list which are available within source to destination will be returned back. The complexity would be (number of stops*number of seats requested).
Once we go ahead and book the seats, we will have to take the lock on the linked list and remove the taken seats from the DS.
One problem with this approach he is coordinating the in memeory data structure with different servers which are getting the same request. For this we can have a qourum specially dedicated for this purpose where the master node will be serving the availability and booking request and the followers would read the WAL to follow the updates on master on this linked list. If we want high consistency we can avoid split brains and only commit the booking once the linked list gets updated to majority of the followers. Also we can have X number of such quorums to shard the train ids on these. While updating we will only take lock on the linked list which we are updating i.e corresponding to the train.
If the master fails, the follower which is consistent with master would become the leader (For this we would need RAFT like consensus protocol). Doing it all in memory would decrease the latency
For each update in the linked list, we need to update the persistent store i.e SQL in this case which would be bookingid, seatId, trainId, source, destination.
If the complete data center goes down we can build the in memory store from the SQL
Many times i thought, ask gaurav to make video on IRCTC System Design...... But today he did it.
Thanks Gaurav 👍
nice video. learned a lot
for parallel seat booking, whenever 1st user click on perticular seat, then it shows filled/booked for other customer. If 1st user can't be able to book that seat on given amount of time then it is available for booking to all customers.
Another approch can be to store all the stop points for every train in route wise,so while you search for train b/w two stations you can just compare first row generated id with another row generated id so if first one is lesser then the second one so we can say there is a route b/w A-B
In place of optimistic locks or pessimistic locks ..what if there is lock on minimum values ..For example if in a Set of source to destination we will find the minimum value of number of seats and put those as locking criteria .This will make system available more. And at same time more users will be able to book tickets...right
Your Dad is very smart, I like both of you ❤. I am also from EEE background now in Cloud Tech, your videos are very helpful. Thanks
I don't know why anyone haven't noticed but to get train from a sorce to destination we can just simply run join query to get the trains
Query : [rows have source A] X(join on basis of train_id) [rows have destination B].
you'll get trains between those routes and then you can perform time sorting.
While searching for trains going from a particular source to destination because we are using relational database can't we do a self join on this table where (trainId of tableA = trainId of tableB) and (source of tableA = sourceProvodedByUser) and (destination of tableB=destinationProvidedByUser) and based on this we'll simply get the date and time for every train going from source to destination on that particular day and we can sort them using ORDER BY clause instead of using any graph based algorithm which will make implementation more complex.
I think we can use the not only sql db to store about trains like the id of a train for key and in value we can use the array like data-structure for storing route. Every train has its route with arrival and departure time from every stattion. If we have to search via source and destination so we can check like the train id who has the source and destination both value in its route we return only those trains.
Locking for the transaction and booking a seat is difficult practically. That is the main reason why IRCTC came up with WAITING LIST feature.. which actually made the system more flexible, easy and most importantly profitable :)
I would have opted for RDBMS, preferably Postgres to store train schedules. As rightly mentioned by @Gaurav, it's cheaper than NoSql and the records are meant to be read-heavy but instead keeping multiple records for one train, I think it would be better to keep only one record per train (15000 records) and keep schedule as json in a jsonb column, coz in general, trains do not change but their schedule are likely to change a lot, also the train can have different schedule for each day.
Please let me know if someone agrees or have some other opinion. Thanks 😊
i too felt that its way better if we kept trains start and end only in a column,
I wouldn't hire Gaurav based on this interview :) :) It's so confusing, complete system architecture has been skipped as well. It ended up being data model design and not really an application or system design.
Dbms part is very much confusing .. just define stations.. then routes .. then schedules..
While multiple people is trying to book locking may not be the solution. You have to check compensating transaction design pattern .. if 10 people is trying to book ticket if some one booked successfully then revert all other transactions
Definitely IRCTC is very complex system, very good interview. We must have discussed user input throughly i.e. From & To Station start + Date + Class(Optional) before designing data base etc.
"Pessimistic lock is reflecting my personality in general." 😂😂😂😂
Really enjoyed the video.
Thanks to CRIS for designing such an efficient system.
Fantastic video. Great learning and exposure
Keerti, thanks for this collab with Gaurav. Recently discovered your channel and you have amazing content.
My query would be focused on Source and destination (That's a seprate table) and join that with table containing the schedule with Train ID being the join key (for RDBMS).
Select train_id from tab_from_to where Source=x and destination =y
Select scheduled from tab_sechedule where train_id in (Select train_id from tab_from_to where Source=x and destination =y)
Pl have some more system design interviews with @gkcs
I really appreciate the questions being asked to gkcs
Amazing
Instead of locking entire rows for booking, how about having different rows per seat with source and destination as columns and lock that row? One thing that will be needed is if one books from A->C and train goes from A->Z, then there should be another row added for C->Z. Otherwise, we can add one entry for each {source, destination} combination per ticket. This might give better parallelism.
Even though this video is 2 years ago just have one doubt about selection of SQL db for storing train schedules.
For a train having 11 stations in between no. of entry in db will be 55.
For 51 stations no. of entry will be 25*51. now like the entire assumption changed db becomes 25 times larger which corresponds to 2GB of data
Great video. I had a question, will Prime music and other music streaming services be similar to say 'Netflix' as they are just streaming Music instead of videos?
It should be similar, streaming is streaming. The difference will be in number of bytes and codec🙂
in an IRCTC schedule LLD i was given to design this table.. schedule/stops..
so it was fun seeing gaurav struggling here.. :D
but maintaining seats station was a great usecase
This was wonderful mock interview. I like how Gaurav approached it, and mainly the fact that he kept the focus on data. I would have been side tracked by some other components as well - like what microservices would be there, ( search service, booking service, a way to add availability every day, caching the search results etc.)
I do feel that data could have been modelled better in graph DB. Nodes could be stations, and edges are List of trip object, connecting two stations. Doing a BFS to find a path would be easier I guess on graph. Booking and Availability still could be MySQL to ensure concurrency.
My 2 cents about the asking the interviewer about estimates - It depends on seniority level you are targeting. For junior levels, you could ask - what are the requirements, and interviewer would reply - search and booking. For senior levels, you would come it to yourself and think what system is supposed to do. Same goes for estimates where you are guessing the non-functional requirements. Just my thoughts.
Thanks for doing it !
awesome content, servers & storage estimation is something which can be included after functional & non functional requirement gathering.
Latency and availability...When try improve those non functional requirements in a system what should we do.
Please do more of this.
Love how impromptu it is
That's very interactive and good take away to learn. It would be really nice, If we prepare in depth and then come up with a video. Though we will miss the excitement of Gaurav. Nice to watch the interaction and discussion.
To search for trains that are going from a source to a destination and between a particular time, we can create self join. From first table, take it as source and second table as destination and do the query. Query will look like this:
select t1.* from train t1 inner join train t2 on t1.trainId = t2.trainId where t1.source = sourceStation and t2.destination = t2.destinationStation and t1.edt > startTime and t2.eat < endTime;
This was a really great video on System design interview. I watched it in segments over a span of a week. My overall understanding of how to approach System Design has improved a lot. Thanks for taking efforts to make such videos and thanks to Gaurav for explaining such insighful inputs/points.
Great interview discussion since it has a lot of things to look at during interview. Few things I think like seats instead of counter why not keep individual rows in the row since every seat will have status and other details. Contention can also be controlled on the row only instead of in counter case.
Also for trains I believe why to keep A->B and B->C instead why not keep an entry like
1. train_id, A, ETA
2. train_id, B, ETA
3. train_id, C, ETA
and then query simply the trains in memory travelling to A and C and since the data will be less in memory can simply run a loop to check ETA_A < ETA_C and return. Something that sort of.
Nice! - So - if we order by ETA, and then group by trainId - that way we can know all trains that have a SRC and a DEST both In their routes - using group_concat and a like clause in the query
@Keerti, +1 for the question
Hi, when do we discuss about just the database schema design or the high level system architecture? It looks impossible to cover both in just one hour.
It is very important for any architect or BA to ask right questions and listen to customer requirements… which was clearly lacking in this interview.
Can I have video on
1. How to handle idempotency in consumer for an event if there are too much load of events so that there will not be much load on database?
2. How does payment service work when payment gateway such as phonepe screen times out while payment? In general, how payment service works with payment gateways?
Don't think IRCTC allows to select a specific seat. So if 10 seats are there it will allocate seats in first come first serve basis. What could be the design for this specific feature?
Please correct me if my suggestion is wrong but for storing lets say we are keeping it in mysql db for acid prod and consistency and for searching we can have a ds like graphs for traversals so our search query boils down to travel from source to destination and we can apply graph traversal algorithms on the graph ds. Also we can pre process it and store it in cache or something so that it can be available to users in real time without any latency delay. Similar preprocesses can be done for prices low to high, or less durations [can use heaps as a DS and again preprocessing and storing it in cache]. Would love to hear your suggestions on this.
As an interviewer, this topic is an ocean of questions and putting scenarios... But as a candidate.. this topic is a hell.. honest opinion.
This clearly shows no matter whether you have designed tons of system design videos, given a complex system in one hour window would make anybody nervous. He didn't even go close to what the system should be. Train search service didn't work, was just able to dodge. Even keeping the system without particular seat arrangement he was not able to successfully book the seats. The entire discussion was around databases which also didn't work out. No architecture diagram, no scalability talk, no fault tolerance. Was it really a discussion which can land a job honestly?
lock, so called strong read/write/transaction, are generally provided by any production ready storage solution nowadays. what you have described here for implementing lock should be handled by database API.
But we never got a proper solution for the search of a train query. How do we do that in an optimized fashion without worrying about creating a graph for every query???
great content Keerti but i think there is some issue with how gaurav is calculating and saving the seat no.s at 46:00 where we can actually accommodate 4 people i.e. 1 from a to b, 1 from b toc , 1 from c to d and 1 from a to d ( 2 people travelling at any given time), his algo won't actually allow to do this as count of b and c are already 0.
I really didn’t even thought that IRCTC backend could be this complex!
Are we storing the EDT EAT for all the trains for next 3 months? Doesn't the route remain same on a weekly basis.. that way we would be able to save a lot on space?
Great Content. The entire process was very structured and I could see the thought process behind very step of the designing. Thanks.
Thanks Abhinandan, means a lot😇😇
I am studying this in sept 2023 , this is the first video for me to study system design , anyways I didn't get much but I get to know how these big sites are being made on this much high scale
Checkout more beginner friendly videos on my channel, for example- there are videos where I am teaching my father what is system design. I really think you will find them interesting. Let me know!
@@KeertiPurswani yeah sure I am just exploring your UA-cam channel and it has highly useful content for beginners, It will help me a lot 😅
Can we decouple the source and destination(two different tables). First we can get a list of trains which have the requested Source on the date then have a join query to find out which all trains from these have destination. On destination we can index the data on train id and in source I agree with Gaurav it should be on departure date. We may not bother about arrival time on destination for our query.
We could have used SQL join queries for searching right??
BTW, this was one of the best mock interviews I have seen. Thank you guys!
while querying for source to the destination we could have queried with matching source and then queried destination whose arrival time greater than equal to source departure time in the previous query because if the arrival time of destination of a particular train is greater than source departure time it is certain that there is a route from source to destination. And if the user wants a route that can be separately queried as I don't think there would be a lot of queries for that.
@Keerti Purswani @Gaurav Sen I would love to know whether this was a correct optimization?
I think having just one column named 'STOP' instead of 'Source' and 'Destination' and other columns like 'Duration of time to reach a stop from source' and 'Sequence number of stop' could have solved many complications. I am not too sure, but I think so.
Thank you for Quality Content.
Determined to buy getinterviewready course ♥️. Got very interesting insights! Thanks both of you 🤗. Thanks again master 💗♥️
I think the best way to handle concurrency is maybe to use queues. We can maybe have a queue for each train/couple of trains and only one instance will read the data from that queue.. So now all the requests coming to one train will be handled by that instance and we can completely avoid locks and thus improve the performance a lot on a scale
I am not much clear on the db structure like how to store all the stop's info for a train and how to search on it efficiently. Gaurav you tried your best to explain the concepts. You are the best. I would request you to come up with one more video where we will only discuss about the database schema
Keerti was enjoying Gaurav's nervousness 😄😄😁
Absolutely 🤭🤭
Wow, system design seems very interesting topic with so many small problems to solve that beautifully come together as one system. Loved the thought process by Gaurav and the feedback later 👍
And a newfound appreciation to IRCTC designers 🙂
Thank you Sai Krishna!😇😇
Good one guys. Keep doing system design interviews.
Thanks Praveen, means so much!❤️😇
Please make a detailed video on perks and benefits that Intuit offers its employees.
Yes yes!!❤️❤️
In IRCTC you will not have ui based seat selection, seat selection is based on pure physics to balance weight
System Design questions should actually be this hard then people won't speak out exact words they read somewhere with poker face on and will actually have to come up with solutions based on their experience.
Great quality content, didn't think of any other thing while watching neither did feel to take a break to understand, the pace of video and overall thought process is very gradual. Nice job Gaurav and Keerti.
Yaaaay, thank you so much Gaurav. Hope you like rest of the videos as well😇😇
This channel is gonna boom pretty soon
But no doubt great effort, appreciated 👍
This is the first video i have seen about system design and while watching the video i went through all comments,your reply to them was funny.You were very fascinating in whole video.