I have 2 question on the final architecture diagram. one is why raw video is sending directly from ingestion to s3. s3 only take final processed video after processing by workers right? and second, why the arrow is from different devices to CDN instead of CDN to different devices
wow the end-to-end request flow was really smart, as we're just returning the list of metadata it'll be fast and metadata will have actual video link too
It is a very good video!!!! Few things to add: 1. Logging is very important 2. Authentication and Authorization 3. Metrics and Reports 4. Containerization & process orchestration (Docker, Kubernetes) 5. API gateway & Microservices
@Gaurav : the system design suggested can work out mostly just a read and write videos. cannot do much on scaling along with processing speed, Recommendation engine has problem, . just queue are not enough need a streaming mechanism along with reactor pattern. If we are talking about low latency based on the region. we need to build multi tenent approach over the other.
How does that happen exactly by the way ? You literally split 1 mb file into three 333kb files and then convert them using any file-format-converter like FFMpeg etc, and then merge again ??
During the format and resolution discussion, Gaurav should have mentioned the network capabilities over which the video will be served. That's one of the primary cause of keeping multiple resolution of the same video.
Mongo was considered, and is the correct choice, because one of it's prime use cases is for storing unstructured data. I would've gone this route as you can have a document per object that is updated as time goes by (the change mentioned) with a database that can scale for both writes and reads. A KV store should not be used in this instance where you need fast transactions that could need frequent updates and generally KVs are structured. A FW session or rule table, a shopping cart, location data, etc. are where KV structures excel. If some of that data is needed in KV format and other portions aren't use something like Karka (or kinesis in aws) with Spark to load the data to the proper place in the proper format (e.g. add a data lake or warehouse into the mix for longer term analytics).
It's really good actually. What if use aws aurora as a common db for user data and video metadata. It has the property of storing json as well. So for scale i think it would be useful
Thank you for your effort, time, and most importantly its insanely invaluable content. This video proves how intelligent Indians generally are... We Iranians/Persians admire how intellectual and intelligent they are, we simply cannot take our eyes off them, in all honesty not only they are super brilliant minded but also super hard-working people.😊 Lots of love and respect for you all from IRAN ❤️
Awesome video. Just want to check, at 13:13 she mentioned "We can have multiple S3s in multiple Regions". I believe, S3 is a global service and each bucket name should be unique.
To be honest - for most part of this video I had a trouble distinguishing who's the interviewer and who's interviewee ;) In real interview, you won't get so much help as Guarav has shown here
Good content and nice way of putting things, But I guess in the final Architecture snapshot that was shown, there should be some connection between VideoMetaData and User Service, to serve all the video metadata of that user
@28 minute, front end server should have minimum steps in the workflow. More the steps it takes more time user has to wait. Some task can be transferred to a backend server. First get the all the data and notify about it and then second could be for publication after post processing.
http or ftp does not makes sense. udp,grpc, wss, etc. are more better at blob upload. it is worth mentioning what tech would be used for encoding/decoding service and resolution for e.g ffmpeg interviewer would expect rought db schema, relations and api routes atleast in 45 mins. no sharding, LB, replicas on users and metadata? what about client side caching? 1.2 TB/day++ with 1M DAU is not scalable and practical at all. lots of potential flaws, I didn’t still understand it is monolithic or microservices.
I don't agree with 23:59m - even services like Netflix use AWS to keep costs down. Matching the quality and the integration that services like S3, Elastic Media Encoder and Cloudfront have would be a huge amount of work and costs would become a massive issue. Now, this isn't to say that internally companies wouldn't look for innovative solutions to keep costs down and implement new processes to maintain costs lower, but 'buying a datacenter' is not necessarily the most cost-effective solution.
++ idea API gateway for Distributing the loads and Enhancing Security. The Video file verification for size and content type should be done on Client side. That may save few millions of hits.
In a dilemma on what to choose between a full stack role or pursue career to become an architect ? Now I see where I’m going .. One of the best things that UA-cam did to me🙃.. You guys are amazing and we would love to see more realtime examples 👌👌🤗
Saw this video twice to understand in depth. Very well explained. Clears a lot of key concepts. Just one question, in the final architecture there is a raw video link going to S3, if our video is going via ingestion service -> workers - >S3, why would we need that? Thanks in advance!
That's a good question. I think, because Gaurav mentioned low latency as a requirement while video upload, you would want to upload a copy of raw video directly in an s3 bucket without processing. The async workflows which will validate, break into chunks and create videos in different formats will delay the upload processing causing bad user experience. Correct me if I am wrong.
I don't think any of the things mentioned (changing description, adding a deleted flag) configure a mutable schema which would really require flexible schema dbs? All that stuff is easily handled in a relational DB, and seems to me like video metadata should have a roughly consistent format between them other than maybe stuff which could be lists like tags
If you are caching video files in CDN, by which I presume you are referring to CloudFront given you are storing video files in s3, you don't have to replicate the s3 bucket in every region you have your users in. CDN is caching it at the edge locations anyway. Replicating s3 is redundant and would add to the costs.
Also Processing of Video Files in detail is missing like how current infrastructure will handle traffic during Diwali, Christmas or New Years, Auto Scaling and how to relinquish instances or task in the case of serverless offering when cost estimation is considered.
The write to read ration should be even lower since a single consumer watches way more videos /day compared to videos created / day by a creator Creator: Consumer ratio = 1:100 Creation per day per creator : consumption per day per consumer = 1:100 ==> Write : Read ration = 1: 100x100 = 1: 10k
Overall a good interview, but i want to point out that this is a difficult problem for a 45 mins interview. I would only ask ingestion or serving in 45 mins.
Didn't understand the part about splitting the video in chunks for parallel processing? Are we converting different chunks into different formats at different times?
I felt she was explaining UA-cam design rather than TikTok. Creating chunks of videos, fetching thumbnails for user, these are all UA-cam design not TikTok.
I am new to systems design and coming up to speed off-late. If we use the Key-Value store for video metadata here, wouldn't it affect the read time? When the user opens the app, he/she is expected to have thumbnails of the videos on their feed, almost instantly. In such a case, the read complexity of video metadata is expected to be as low as possible. If my understanding of read performance of the key-value store in here is wrong, please correct me! Thanks in advance
Thanks Gaurav and Yogita, I have query regarding the protocol used while uploading or serving the video. Why can't we use UDP instead of TCP as it will help us reduce latency for sure or is there any caveats for not using UDP?
I had similar points in my head while Yogita was answering this. Well, for upload, it has to be TCP because you don't want to miss any packets and also the order is important to store it as a correct video file, but that's on transport layer. These TCP chunks will further be wrapped by layers above it and some other protocol will add its payload, for example: TLS with HTTP, and I think that's where her point was. Now coming to the streaming or download, this could be over both TCP and UDP depending on several factors for example: user's network bandwidth,
I have one question, If a user is uploading a video and while half of uploading is done and somehow the internet got disconnected then using which protocol or process, it will be resumed from the point where it got stopped the uploading process, and same question for downloading as well.
Hey! When sudoCODE says that we have "Low SLA" because we are able to make the processing faster using Parllel Processing. That seems... weird. Doesn't low SLA mean falling short of what was promised? This is what I got from ChatGPT: "A low SLA means that the level of service that is being provided falls short of what is expected or agreed upon in the SLA. This could mean that the system is experiencing downtime or is not responding quickly enough to user requests. It could also mean that the system is not meeting other performance metrics that are defined in the SLA." What am I missing?
I think what she meant here was the agreed number of requests to be processed per second will be small in quantity. Low SLA wasn't the right term here. Low guarantee is what she meant, I think.
For fetching video from cdn to mobile device which prototype we will use....you dont tell this things...what about gRPC prototype for video streaming?can you please explain this in comment.
9:28 - Column storage like Mongo...Are you sure? 10:12 - I hope you meant RDBMS DBs not MySQL specifically 12:30 - You mean that you are building this App and locking it with AWS using S3? I hope you have considered the cost, as storing into S3 can be very low but you will be accessing its contents way many times more & that can hit the profits. 38:30 - I doubt if the query regarding all the videos for the given user can be done with current setup. As per my experience, interviewer looking into the system the way s/he has seen somewhere, so yes it is not about right or wrong, its all about what interviewers likes & wants to listen. I am pretty much sure, if you remove your credentials during the interviews, then this design will be rejected.
Thanks Gaurav and Yogita for knowledgeable discussion. Regarding storage cost, could we use any compression technique without compromising core user requirement like latency, uploading time, access, etc ?
can't we just store the video and then depending on the user requirement ( resolution or format ) then we can use a video pipeline for video conversion, rather than converting this at the upload stage requires a lot of storage space after conversion, if we go by my method it will just be like conversion on demand if required we do if not then no, hope it makes some sense, if I am wrong please let me know as I am learning system design 😀 and this is only with respect to TikTok
Another awesome delivery , thanks Gaurav , One thought :- we increased the storage to ~6x for considering different resolution and formats , which we can handle by introducing 2 entities in the system . one , for avoiding different format , we can provide a dedicated video player to user, which understand our format only . Second entity is a resolution manager which we can place before streaming engine , which can help us to upgrade or downgrade a resolution as per user bandwidth or user reqest . take axample like netlix and youtube , they have their own media player which can understand their recording format . yes one extra task will be to convert uplaoded videos to application understanding format while uploading only but that will be fruitfull in saving 6x of storage cost . resolution can also be handled at runtime in 2 ways . -One by keeping always a high resolution copy and downgrade it at run time before serving to user. downside is a storage increment because of high resolution copies . - another is to always keep a low resolution copy for reference with some pixel patteren files to convert the low resolution copy to high resolution copy at run time . Up side it we can reduce the cost of storage system significantly. for perfromace handling in conversion , a dedicated system with predefined resolution converter filter can work .
@@edwardspencer9397 It not just about creating an app which can play video. You'll of-course have an app. Different formats have different properties. Some have small file sizes but require some hardware acceleration to perform well which may not be available on all devices. So even if you create your own player, it will do software decoding which will be slow - users will complain about phones getting warm, high battery consumption and sluggish performance. Instead you create different formats that are optimized for a particular family of hardware. There can always be a basic format as a fallback but you should cover the large percentage of devices in formats optimized for them.
@@lhxperimental Large percentage of devices is no longer true. Businesses always prefer those who have medium / high end phones/devices capable of hardware acceleration because all the others owning low end phones are mostly poor people who have no intention to spend any money on subscriptions or visit advertisers. So even if a poor guy uninstalls something due to overheating issues it shouldn't be a problem.
These kinds of mock discussion on SD is really helpful. Provides viewer a thought process while dealing such questions. Kindly do more these kinds of video ...
Great video! One feedback - I didn't see the usage of the 1.2TB data you calculated, I mean a translation of how many servers (with resources like CPU, RAM, Disk, IO, etc) would be needed for ingestion pipeline as well as storage would have been helpful. Also, some interesting scenarios like thundering herd, data compression to reduce cost would have been of great help. And don't you think, putting all the video in the CDN would be cost heavy. Should have some strategy based on popularity/recency/TTL and upload/remove the video from CDN.
A very average video on System Design, I think Gaurav was better atleast to channel the answers , but Yogita has to improve a lot on her skills. Not very appropriate for some one who is already half way in to System Design and knows the System well, (otherwise ok).
That was really amazing... like how smoothly she explains bits and pieces of the problem. loved it. Learned a lot. . . Thanks a lot for this content guyz.
I quite did not understand how chunking a video would help. A video file has headers and then frame data. Now once the video is ingested we can't simply chunk the blob data (the way we chunk text data normally) without the header. The chunked video would not make any sense without the header. Now if the idea was to 30mb video => chunked to 10mb + 10mb + 10mb video => and then pass each chunk to converters which convert to different codec and resolution, we can't do it with basic blob data chunking because the converters in the pipeline could not work without the headers of original video file. Because the header will be attached to the first 10mb chunk itself. So essentially the pipeline should look something like ```sh 30mb video files = [header]framedata = 10mb file [header]framedata ==> converter_fn ===> corrupted file ==\ + 10mb file framedata ==> converter_fn ===> corrupted file========= | ==> corrupted file after joining + 10mb file framedata ==> converter_fn ===> corrupted file======== / ``` So AFAIK, if we have to chunk 30sec video => 10sec + 10 sec + 10sec video => [1] our best shot here is to pass it through ffmpeg to chunk the video (which essentially recreates the header with proper value with frames) . [2] somehow make the client send 3 different videos and during consumption play it as 1 video from UI layer. Note: When you think about video header, it's not like text HTTP header, it would have general video codec + playback + compression related metadata etc. And then [I guess] each frame would have its own header (inside the frame blocks) => Chunking needs to take care of all these facts => hence we need something like ffmpeg . I probably misunderstood the point Yowgita (apologies if I misspelled the name) had made, or probably there are some techniques she implied which I don't know. In either case, would love to know if I missed something in my understanding.
Good points. You are really familiar with video file format! I guess in a real interview the interviewer would not expect candidates to know the stuff you've mentioned, they would probably give bonus points for recognizing that we should chunk videos when they're huge (think UA-cam). For TikTok I guess we should be ok just uploading the videos as is
It is really sad that freshers have to prepare for system design. You should never have to prepare for it, you learn it as you grow and interact with your leads and architects. Interview system is completely broken.
Yes, and even S3 data storage types. If this app is ingesting 1.2TB of files everyday, it'd make sense to store the raw files in a S3 file storage type that costs less money. Version your videos, and setup lifecycle rules to convert them to "Infrequent access" storage type, after let's say, 15 days.
@@kanuj.bhatnagar I agree with storing the videos in some object storage (s3 or google cloud storage) but applying lifecycle in 15 days is not good. I mean for that we can easily update the cache to remove the older video but at least wait for 180 days to move next lifecycle. As access rate for not frequent data changed high in could storage system.
Liar it's no where near the real world projects...!! Although they are really good, it only gives us a idea of MVP and also how to crack interviews!! Real world scenarios are much worse and terrifying👻😱!!
Few ideas! - Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs. - We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account. - Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again. - Maybe have caches on top of databases
There should be some questions asked upfront before diving in such as "do we want video searching", "do we want to generate newfeed", "what about video sharing", "are users able to download video", "are users able to follow other people", etc. After that we can focus on what the interviewer is really interested at.
I didn't quite get how the queue was adding more advantage if done in a sync way. The video needs to be anyway uploaded at least once. Since the video is uploaded onto a queue message (or somewhere else) and then copied to S3, more IOs and network calls will be needed. Since S3 is anyway getting used, so one can even give temporary (and restricted) access to the S3 objects so that the user can directly upload to S3 without much security concerns and then simply queue just the URL of the uploaded object, along with other user metadata in the queue. So the overall upload API time will be just creating an empty S3 object with the client uploading in an async way in the background. Creating something like AWS S3 or Azure Storage or any distributed File System like storage is quite hard, and it doesn't make sense to really create them from scratch unless budget and scale are limited. Also, there can be multiple ways to implement queues, it can be some pre-existing queue provided by some Cloud platform, or is it built indigenously on top of a database. What happens when someone node reading the queue message crashes or the power was unplugged. What kind of queue message popping mechanism is used while reading those messages, to make the system crash resilient and we do not lose the client's request because the power plug was pulled out after a message was popped from the queue. Does the queue have a strict FIFO requirement (I don't think it is necessary in this case), what happens if multiple queue messages are enqueued because of the retries from the client? Last but not least we cannot keep the videos forever and they need to be GCed, how frequently the GC will run and remove those videos? A lot of questions can be asked, and that's what I like about Design interviews. Ask anything and everything :) Side Note: I have never used AWS's S3, however, I have used Azure Storage extensively and I could search for Azure Storage equivalents in AWS's S3. So please forgive me if I have wrongly used AWS terminologies.
These are excellent questions. The queue mentioned here has events (just the id, url and some metadata about the video). You need a persistent queue with retries here. Ordering isn't necessary. I'd use something like Apache Kafka for this.
I was thinking in the same lines, even though Q can do that job I felt all those high computing tasks should be pushed as backend batch jobs that enrich the meta data. That way during the upload it is easier to maintain the consistency of the system. During upload the videos get pushed into an S3 bucket, an authenticated URL with the token is generated and then the URL along with other information is either stored into DB or pushed to Q for storage into DB, and the user is provided a response. Now once the Data is in DB or another Q a new event can trigger that behind the scene performs the variable bit rate conversion and other needed things and enrich the data. I am just citing an approach IMHO, it will be helpful if someone can validate this
What happens when someone node reading the queue message crashes or the power was unplugged. After reading a message from the queue, there could be a visibility timeout in the queue and the message won't be available to the other worker nodes. After doing the processing, worker must send an acknowledgment to the queue that this is processed and delete the message. And after visibility timeout if queue don't receive any ack then the message can be available again. With this if worker crash then the message will be visible to other workers and system would be more resilient. There should be an algo(S3 provides it) to monitor the Infrequent accessed videos. The videos that are not accessed could be moved to Infrequent access(to save cost) after some time or if not being used for some time. Please reply if you have any thoughts on this.
@@letsmusic3341 Yeah the visibility timeout approach should work. However, there are extreme corner case scenarios when let's say the worker node couldn't finish processing the message, and the message becomes visible to other nodes. Now, to avoid such a thing there is a need for some sort of locking over the message in the queue (which can simply mean to update the visibility timeout again). Even updating the visibility timeout again and again till the job is finished in a different thread can have problems, especially when the thread misses to update the visibility timeout. There can be a rare clock skew problem, the thread updating the locks can have a different notion of time as compared to the queue service and hence might not even refresh the visibility timeout at the right time. But these are extremely rare scenarios. Using external locking with larger timeout mechanisms outside of the queue service can help in such scenarios. A large enough timeout with additional buffer time can also help in such problems.
@Amitayush Thakur Nice question. @Gaurav Sen to be more clear, upload happens to S3. MetaData about the video and User is passed as event to the queue for further processing.
One question I had was before uploading to S3 all the format x resolution copies of a single video will be in memory as part of workflow/pipeline. If 100k users are simultaneously uploading videos how is it going to be handled in-memory before being uploaded?
If you are preparing for a system design interview, try get.interviewready.io.
All the best 😁
S3 is not a file storage
Hi. Could you please share the name of the online tool you are using for colaborating?
@@vishal733 All online meeting service will have a whiteboard inbuilt in it such as webex, zoom, etc.
I have 2 question on the final architecture diagram. one is why raw video is sending directly from ingestion to s3. s3 only take final processed video after processing by workers right? and second, why the arrow is from different devices to CDN instead of CDN to different devices
What software is used for drawing in this video?
Two of my fav youtubers on system desigm
I am watching this video after almost 2 years. Thanks for uploading these kind of videos, They are very helpful.
Thank you!
Coincidentally Akamai CDN was down just a few days after this video was uploaded
Scrolling tiktok for 45 min. - No
Watch whole video for 45 min. - Yes, it's great.
The best mock I saw in my 2 months studying for my interview.
wow the end-to-end request flow was really smart, as we're just returning the list of metadata it'll be fast and metadata will have actual video link too
It is a very good video!!!!
Few things to add:
1. Logging is very important
2. Authentication and Authorization
3. Metrics and Reports
4. Containerization & process orchestration (Docker, Kubernetes)
5. API gateway & Microservices
No interviewer is that humble like gaurav, when we ask for requirements they say you yourself think of it
@Gaurav : the system design suggested can work out mostly just a read and write videos. cannot do much on scaling along with processing speed, Recommendation engine has problem, . just queue are not enough need a streaming mechanism along with reactor pattern. If we are talking about low latency based on the region. we need to build multi tenent approach over the other.
The idea to split the video file to chunks and process them parallel is really interesting and I feel very fundamental in processing input in general.
How does that happen exactly by the way ? You literally split 1 mb file into three 333kb files and then convert them using any file-format-converter like FFMpeg etc, and then merge again ??
During the format and resolution discussion, Gaurav should have mentioned the network capabilities over which the video will be served. That's one of the primary cause of keeping multiple resolution of the same video.
True.
Good interview. One correction as she mentions KV store and considers mangoDB. But, mangoDB is document store data base.
Mongo was considered, and is the correct choice, because one of it's prime use cases is for storing unstructured data. I would've gone this route as you can have a document per object that is updated as time goes by (the change mentioned) with a database that can scale for both writes and reads. A KV store should not be used in this instance where you need fast transactions that could need frequent updates and generally KVs are structured. A FW session or rule table, a shopping cart, location data, etc. are where KV structures excel. If some of that data is needed in KV format and other portions aren't use something like Karka (or kinesis in aws) with Spark to load the data to the proper place in the proper format (e.g. add a data lake or warehouse into the mix for longer term analytics).
One of the best video on this channel.
The metadata nosql choice is not justified. How will you query for all the videos of a user?
It would be great to have Yogita interview you in a similar way.
It's really good actually. What if use aws aurora as a common db for user data and video metadata. It has the property of storing json as well. So for scale i think it would be useful
Yes it should be fine
Thank you for your effort, time, and most importantly its insanely invaluable content. This video proves how intelligent Indians generally are...
We Iranians/Persians admire how intellectual and intelligent they are, we simply cannot take our eyes off them, in all honesty not only they are super brilliant minded but also super hard-working people.😊
Lots of love and respect for you all from IRAN ❤️
Maza aagaya... Thanks a lot... So much knowledge in a 45 min video.
Awesome video. Just want to check, at 13:13 she mentioned "We can have multiple S3s in multiple Regions". I believe, S3 is a global service and each bucket name should be unique.
Thanks Yogita and Gaurav, looking forward to more such videos
To be honest - for most part of this video I had a trouble distinguishing who's the interviewer and who's interviewee ;) In real interview, you won't get so much help as Guarav has shown here
For uploading videos you can use web socket to connect to servers. Because it will have same tcp connection for multiple requests.
At one instance, she says "low SLA"; which I think she said it accidentally. She actually meant "low latency".
Excellent session very helpful..u guys r actual heroes for dev like us..
Good content and nice way of putting things, But I guess in the final Architecture snapshot that was shown, there should be some connection between VideoMetaData and User Service, to serve all the video metadata of that user
@28 minute, front end server should have minimum steps in the workflow. More the steps it takes more time user has to wait. Some task can be transferred to a backend server. First get the all the data and notify about it and then second could be for publication after post processing.
Hi first of all thank you both of you so much for sharing how things work .i will.wish for your best future
Video metadata can be renamed to activity log. Video metadata is different and it is very well connected to users.
amazing video...You should do videos like these more often....
More of this please! ♥️
Fabulous video.. Thank you @Gaurav and @Yogitha
http or ftp does not makes sense. udp,grpc, wss, etc. are more better at blob upload.
it is worth mentioning what tech would be used for encoding/decoding service and resolution for e.g ffmpeg
interviewer would expect rought db schema, relations and api routes atleast in 45 mins.
no sharding, LB, replicas on users and metadata?
what about client side caching?
1.2 TB/day++ with 1M DAU is not scalable and practical at all.
lots of potential flaws, I didn’t still understand it is monolithic or microservices.
Ffmpeg alternatives?
I don't agree with 23:59m - even services like Netflix use AWS to keep costs down. Matching the quality and the integration that services like S3, Elastic Media Encoder and Cloudfront have would be a huge amount of work and costs would become a massive issue. Now, this isn't to say that internally companies wouldn't look for innovative solutions to keep costs down and implement new processes to maintain costs lower, but 'buying a datacenter' is not necessarily the most cost-effective solution.
I wish my interviewers asked a lot like Gaurav. Normally, they just sit there and keep quiet most of the time :)
What white board software was that?
This is really difficult! I always wonder why companies like to ask System Design in the interview?
Yep it's hard, it seems that you can fail for whatever reason.
Fantastic discussion and really helpful. Can you please let us know what software you guys have used for the Whiteboarding and capturing requirements?
++ idea
API gateway for Distributing the loads and Enhancing Security.
The Video file verification for size and content type should be done on Client side. That may save few millions of hits.
This video is very informative , thanks to both of u .
whats SLA?
This is good for someone with 3-4 yoe, but as a senior engineer I can see opportunity for improvements as suggested by others in the comments as well.
In a dilemma on what to choose between a full stack role or pursue career to become an architect ? Now I see where I’m going .. One of the best things that UA-cam did to me🙃.. You guys are amazing and we would love to see more realtime examples 👌👌🤗
Super one, good work you both 👍
Saw this video twice to understand in depth. Very well explained. Clears a lot of key concepts. Just one question, in the final architecture there is a raw video link going to S3, if our video is going via ingestion service -> workers - >S3, why would we need that? Thanks in advance!
That's a good question. I think, because Gaurav mentioned low latency as a requirement while video upload, you would want to upload a copy of raw video directly in an s3 bucket without processing. The async workflows which will validate, break into chunks and create videos in different formats will delay the upload processing causing bad user experience. Correct me if I am wrong.
with Parallel Processing Converting to different formats your write rate is 4 time original write time
I don't think any of the things mentioned (changing description, adding a deleted flag) configure a mutable schema which would really require flexible schema dbs? All that stuff is easily handled in a relational DB, and seems to me like video metadata should have a roughly consistent format between them other than maybe stuff which could be lists like tags
If you are caching video files in CDN, by which I presume you are referring to CloudFront given you are storing video files in s3, you don't have to replicate the s3 bucket in every region you have your users in. CDN is caching it at the edge locations anyway. Replicating s3 is redundant and would add to the costs.
Also Processing of Video Files in detail is missing like how current infrastructure will handle traffic during Diwali, Christmas or New Years, Auto Scaling and how to relinquish instances or task in the case of serverless offering when cost estimation is considered.
Why AWS Cloudfront was not suggested as a CDN solution? Asking just out of curiosity
This was an amazing 45mins of my time
Awesome Gaurav and Team👍👊
The write to read ration should be even lower since a single consumer watches way more videos /day compared to videos created / day by a creator
Creator: Consumer ratio = 1:100
Creation per day per creator : consumption per day per consumer = 1:100
==> Write : Read ration = 1: 100x100 = 1: 10k
You guys are amazing
Great Video Gaurav and Yogita! Can I know Which app is being used for demonstration?
@GauravSen Awesome explanation in the video. Please have some more videos like these mock interviews in future.
Seems unique and useful. Was security skipped.
Yes it was. We didn't touch upon it in this interview.
Gaurav please have some data center mock interviews related to mechanical cooling and electrical power.
Awesome content !!
Gaurav da is always Inspiration and Love ❤️❤️❤️
Overall a good interview, but i want to point out that this is a difficult problem for a 45 mins interview. I would only ask ingestion or serving in 45 mins.
What's the answer for protocols question? When to use what?
Didn't understand the part about splitting the video in chunks for parallel processing? Are we converting different chunks into different formats at different times?
For a long video, we are breaking the video into chunks and converting each chunk into different formats and resolutions.
Tiktok currently using Akamai, that's why latency of playing video is far better than reels.
I felt she was explaining UA-cam design rather than TikTok. Creating chunks of videos, fetching thumbnails for user, these are all UA-cam design not TikTok.
Hi Gaurav can u also make video on designing Cowin app , Arogya Setu app etc..
I am new to systems design and coming up to speed off-late. If we use the Key-Value store for video metadata here, wouldn't it affect the read time? When the user opens the app, he/she is expected to have thumbnails of the videos on their feed, almost instantly. In such a case, the read complexity of video metadata is expected to be as low as possible.
If my understanding of read performance of the key-value store in here is wrong, please correct me! Thanks in advance
Imagine being in a real Desgin interview with Gaurav.
Thanks Gaurav and Yogita, I have query regarding the protocol used while uploading or serving the video. Why can't we use UDP instead of TCP as it will help us reduce latency for sure or is there any caveats for not using UDP?
I had similar points in my head while Yogita was answering this. Well, for upload, it has to be TCP because you don't want to miss any packets and also the order is important to store it as a correct video file, but that's on transport layer. These TCP chunks will further be wrapped by layers above it and some other protocol will add its payload, for example: TLS with HTTP, and I think that's where her point was.
Now coming to the streaming or download, this could be over both TCP and UDP depending on several factors for example: user's network bandwidth,
I have one question, If a user is uploading a video and while half of uploading is done and somehow the internet got disconnected then using which protocol or process, it will be resumed from the point where it got stopped the uploading process, and same question for downloading as well.
Hey!
When sudoCODE says that we have "Low SLA" because we are able to make the processing faster using Parllel Processing. That seems... weird.
Doesn't low SLA mean falling short of what was promised?
This is what I got from ChatGPT: "A low SLA means that the level of service that is being provided falls short of what is expected or agreed upon in the SLA. This could mean that the system is experiencing downtime or is not responding quickly enough to user requests. It could also mean that the system is not meeting other performance metrics that are defined in the SLA."
What am I missing?
I think what she meant here was the agreed number of requests to be processed per second will be small in quantity.
Low SLA wasn't the right term here. Low guarantee is what she meant, I think.
@@gkcs Got it
Nice video @Gaurav
What is the queue system which is being used during upload and why ?
Wow Amazing ,
Nice video, but can you put the subtitles on it?
Awesome work 👍🏻. Well I also don't like TikTok. But to analyse and study anything of such huge scale Design, is always good.
For fetching video from cdn to mobile device which prototype we will use....you dont tell this things...what about gRPC prototype for video streaming?can you please explain this in comment.
9:28 - Column storage like Mongo...Are you sure?
10:12 - I hope you meant RDBMS DBs not MySQL specifically
12:30 - You mean that you are building this App and locking it with AWS using S3?
I hope you have considered the cost, as storing into S3 can be very low but you will
be accessing its contents way many times more & that can hit the profits.
38:30 - I doubt if the query regarding all the videos for the given user can be done with current setup.
As per my experience, interviewer looking into the system the way s/he has seen somewhere, so yes it is not
about right or wrong, its all about what interviewers likes & wants to listen.
I am pretty much sure, if you remove your credentials during the interviews, then this design will be rejected.
Thanks Gaurav and Yogita for knowledgeable discussion. Regarding storage cost, could we use any compression technique without compromising core user requirement like latency, uploading time, access, etc ?
Compression usually takes some processing and time. The savings could be significant if the data transfer times and cost reduce.
What software used for drawing in this video?
can't we just store the video and then depending on the user requirement ( resolution or format ) then we can use a video pipeline for video conversion, rather than converting this at the upload stage requires a lot of storage space after conversion, if we go by my method it will just be like conversion on demand if required we do if not then no, hope it makes some sense, if I am wrong please let me know as I am learning system design 😀 and this is only with respect to TikTok
Another awesome delivery , thanks Gaurav ,
One thought :- we increased the storage to ~6x for considering different resolution and formats , which we can handle by introducing 2 entities in the system . one , for avoiding different format , we can provide a dedicated video player to user, which understand our format only . Second entity is a resolution manager which we can place before streaming engine , which can help us to upgrade or downgrade a resolution as per user bandwidth or user reqest .
take axample like netlix and youtube , they have their own media player which can understand their recording format . yes one extra task will be to convert uplaoded videos to application understanding format while uploading only but that will be fruitfull in saving 6x of storage cost .
resolution can also be handled at runtime in 2 ways .
-One by keeping always a high resolution copy and downgrade it at run time before serving to user. downside is a storage increment because of high resolution copies .
- another is to always keep a low resolution copy for reference with some pixel patteren files to convert the low resolution copy to high resolution copy at run time . Up side it we can reduce the cost of storage system significantly.
for perfromace handling in conversion , a dedicated system with predefined resolution converter filter can work .
Brilliant points, thanks!
It would also be good idea to take a look at ffmpeg and "ts" files creation
Yes it is common sense to create your own video player which supports all devices instead of creating 20 formats lol.
@@edwardspencer9397 It not just about creating an app which can play video. You'll of-course have an app. Different formats have different properties. Some have small file sizes but require some hardware acceleration to perform well which may not be available on all devices. So even if you create your own player, it will do software decoding which will be slow - users will complain about phones getting warm, high battery consumption and sluggish performance. Instead you create different formats that are optimized for a particular family of hardware. There can always be a basic format as a fallback but you should cover the large percentage of devices in formats optimized for them.
@@lhxperimental Large percentage of devices is no longer true. Businesses always prefer those who have medium / high end phones/devices capable of hardware acceleration because all the others owning low end phones are mostly poor people who have no intention to spend any money on subscriptions or visit advertisers. So even if a poor guy uninstalls something due to overheating issues it shouldn't be a problem.
I feel like the interviewer is doing a lot of the talking, not sure that is correct.
These kinds of mock discussion on SD is really helpful. Provides viewer a thought process while dealing such questions. Kindly do more these kinds of video ...
Why do u have two spaces around "viewer"
++
Great video! One feedback - I didn't see the usage of the 1.2TB data you calculated, I mean a translation of how many servers (with resources like CPU, RAM, Disk, IO, etc) would be needed for ingestion pipeline as well as storage would have been helpful. Also, some interesting scenarios like thundering herd, data compression to reduce cost would have been of great help. And don't you think, putting all the video in the CDN would be cost heavy. Should have some strategy based on popularity/recency/TTL and upload/remove the video from CDN.
We can use rating engine for this like acording reach of video in particular region replication of video in that region like fb
A very average video on System Design, I think Gaurav was better atleast to channel the answers , but Yogita has to improve a lot on her skills. Not very appropriate for some one who is already half way in to System Design and knows the System well, (otherwise ok).
Very detailed, touches very important system design aspects. Gives many pointers for further research!
A zillion Thanks!
Instead of Uploading Files from Api ,
can use direct upload file into S3 using signed S3 url
That was really amazing... like how smoothly she explains bits and pieces of the problem.
loved it.
Learned a lot.
.
.
Thanks a lot for this content guyz.
You're very welcome!
I love this video and got to know atleast at a basic level the system design approach.
I quite did not understand how chunking a video would help. A video file has headers and then frame data. Now once the video is ingested we can't simply chunk the blob data (the way we chunk text data normally) without the header. The chunked video would not make any sense without the header. Now if the idea was to 30mb video => chunked to 10mb + 10mb + 10mb video => and then pass each chunk to converters which convert to different codec and resolution, we can't do it with basic blob data chunking because the converters in the pipeline could not work without the headers of original video file. Because the header will be attached to the first 10mb chunk itself.
So essentially the pipeline should look something like
```sh
30mb video files = [header]framedata
= 10mb file [header]framedata ==> converter_fn ===> corrupted file ==\
+ 10mb file framedata ==> converter_fn ===> corrupted file========= | ==> corrupted file after joining
+ 10mb file framedata ==> converter_fn ===> corrupted file======== /
```
So AFAIK, if we have to chunk 30sec video => 10sec + 10 sec + 10sec video => [1] our best shot here is to pass it through ffmpeg to chunk the video (which essentially recreates the header with proper value with frames) . [2] somehow make the client send 3 different videos and during consumption play it as 1 video from UI layer.
Note: When you think about video header, it's not like text HTTP header, it would have general video codec + playback + compression related metadata etc. And then [I guess]
each frame would have its own header (inside the frame blocks) => Chunking needs to take care of all these facts => hence we need something like ffmpeg .
I probably misunderstood the point Yowgita (apologies if I misspelled the name) had made, or probably there are some techniques she implied which I don't know. In either case, would love to know if I missed something in my understanding.
Excellent points
Good points. You are really familiar with video file format! I guess in a real interview the interviewer would not expect candidates to know the stuff you've mentioned, they would probably give bonus points for recognizing that we should chunk videos when they're huge (think UA-cam). For TikTok I guess we should be ok just uploading the videos as is
It is really sad that freshers have to prepare for system design. You should never have to prepare for it, you learn it as you grow and interact with your leads and architects. Interview system is completely broken.
In case of video editing is also a requirement. S3 Versioning of files can be helpful. So choosing s3 fits that too. Thoughts?
Yes, and even S3 data storage types. If this app is ingesting 1.2TB of files everyday, it'd make sense to store the raw files in a S3 file storage type that costs less money. Version your videos, and setup lifecycle rules to convert them to "Infrequent access" storage type, after let's say, 15 days.
@@kanuj.bhatnagar I agree with storing the videos in some object storage (s3 or google cloud storage) but applying lifecycle in 15 days is not good. I mean for that we can easily update the cache to remove the older video but at least wait for 180 days to move next lifecycle. As access rate for not frequent data changed high in could storage system.
This video is so good. It so helpful talking to engineering manager.
Liar it's no where near the real world projects...!! Although they are really good, it only gives us a idea of MVP and also how to crack interviews!! Real world scenarios are much worse and terrifying👻😱!!
Few ideas!
- Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs.
- We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account.
- Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again.
- Maybe have caches on top of databases
s3 also have multiple tiers . you can set the rule to move files to lower tier after set time and further
Agree with chunking the video on the client side!
There should be some questions asked upfront before diving in such as "do we want video searching", "do we want to generate newfeed", "what about video sharing", "are users able to download video", "are users able to follow other people", etc. After that we can focus on what the interviewer is really interested at.
ya i was wondering the same
That would be really a microservices part AFAIK. Scalable architecture is the first goal followed by additive services.
@@ashishprasad1963 correct
What Whiteboard are you guys using?
By watching this video I fallen in love with System Design 😅
I didn't quite get how the queue was adding more advantage if done in a sync way. The video needs to be anyway uploaded at least once. Since the video is uploaded onto a queue message (or somewhere else) and then copied to S3, more IOs and network calls will be needed. Since S3 is anyway getting used, so one can even give temporary (and restricted) access to the S3 objects so that the user can directly upload to S3 without much security concerns and then simply queue just the URL of the uploaded object, along with other user metadata in the queue. So the overall upload API time will be just creating an empty S3 object with the client uploading in an async way in the background. Creating something like AWS S3 or Azure Storage or any distributed File System like storage is quite hard, and it doesn't make sense to really create them from scratch unless budget and scale are limited.
Also, there can be multiple ways to implement queues, it can be some pre-existing queue provided by some Cloud platform, or is it built indigenously on top of a database. What happens when someone node reading the queue message crashes or the power was unplugged. What kind of queue message popping mechanism is used while reading those messages, to make the system crash resilient and we do not lose the client's request because the power plug was pulled out after a message was popped from the queue. Does the queue have a strict FIFO requirement (I don't think it is necessary in this case), what happens if multiple queue messages are enqueued because of the retries from the client? Last but not least we cannot keep the videos forever and they need to be GCed, how frequently the GC will run and remove those videos?
A lot of questions can be asked, and that's what I like about Design interviews. Ask anything and everything :)
Side Note: I have never used AWS's S3, however, I have used Azure Storage extensively and I could search for Azure Storage equivalents in AWS's S3. So please forgive me if I have wrongly used AWS terminologies.
These are excellent questions. The queue mentioned here has events (just the id, url and some metadata about the video).
You need a persistent queue with retries here. Ordering isn't necessary. I'd use something like Apache Kafka for this.
I was thinking in the same lines, even though Q can do that job I felt all those high computing tasks should be pushed as backend batch jobs that enrich the meta data. That way during the upload it is easier to maintain the consistency of the system. During upload the videos get pushed into an S3 bucket, an authenticated URL with the token is generated and then the URL along with other information is either stored into DB or pushed to Q for storage into DB, and the user is provided a response. Now once the Data is in DB or another Q a new event can trigger that behind the scene performs the variable bit rate conversion and other needed things and enrich the data. I am just citing an approach IMHO, it will be helpful if someone can validate this
What happens when someone node reading the queue message crashes or the power was unplugged.
After reading a message from the queue, there could be a visibility timeout in the queue and the message won't be available to the other worker nodes. After doing the processing, worker must send an acknowledgment to the queue that this is processed and delete the message. And after visibility timeout if queue don't receive any ack then the message can be available again. With this if worker crash then the message will be visible to other workers and system would be more resilient.
There should be an algo(S3 provides it) to monitor the Infrequent accessed videos. The videos that are not accessed could be moved to Infrequent access(to save cost) after some time or if not being used for some time.
Please reply if you have any thoughts on this.
@@letsmusic3341 Yeah the visibility timeout approach should work. However, there are extreme corner case scenarios when let's say the worker node couldn't finish processing the message, and the message becomes visible to other nodes. Now, to avoid such a thing there is a need for some sort of locking over the message in the queue (which can simply mean to update the visibility timeout again). Even updating the visibility timeout again and again till the job is finished in a different thread can have problems, especially when the thread misses to update the visibility timeout. There can be a rare clock skew problem, the thread updating the locks can have a different notion of time as compared to the queue service and hence might not even refresh the visibility timeout at the right time. But these are extremely rare scenarios. Using external locking with larger timeout mechanisms outside of the queue service can help in such scenarios. A large enough timeout with additional buffer time can also help in such problems.
@Amitayush Thakur Nice question. @Gaurav Sen to be more clear, upload happens to S3. MetaData about the video and User is passed as event to the queue for further processing.
One question I had was before uploading to S3 all the format x resolution copies of a single video will be in memory as part of workflow/pipeline. If 100k users are simultaneously uploading videos how is it going to be handled in-memory before being uploaded?
One suggestion: For video upload, put these tasks into a message queue like Kafka and put workers to work asynchronously
Kafka or something like AWS Kinesis would be help if we were streaming something LIVE. In this scenario AWS SQS or RabbitMQ are the correct tools