i dont understand , if under same bucket lets say for (2021-2022) we have multiple nodes , how are reads any faster ? for the same bucket logs will be distrubuted across servers and still need to be queried across servers which is slow . Bucketing didnt help in improving read performace , is my understanding .
Yes, sharding improves write performance at the expense of query latency (unless we shard by something more clever!). However, we can still handle a high throughput of reads. This latency vs throughput problem is a common tradeoff with large-scale systems! Hope that helps :)
Great point! There’s a lot that goes into ingesting logs while optimizing network performance and maintaining context. Check out our full video on monitoring systems on interviewpen.com :)
Kafka is an event streaming platform, so it wouldn't solve any of the log storage problems we're addressing here. But if you have any thoughts on how to incorporate it, feel free to share!
Great question! At any given time, we have three "hot" nodes--two are migrating data to cold storage and one is ingesting new data. We only showed one cold storage node in the example, but we would need at least 2 to make this work long-term. Hope that helps!
While we're writing data for 2024/25, we can migrate data to cold storage from both clusters at once, meaning 2020 and 2022 data can both be migrated during those 2 years. Thanks for watching :)
i dont understand , if under same bucket lets say for (2021-2022) we have multiple nodes , how are reads any faster ? for the same bucket logs will be distrubuted across servers and still need to be queried across servers which is slow . Bucketing didnt help in improving read performace , is my understanding .
Yes, sharding improves write performance at the expense of query latency (unless we shard by something more clever!). However, we can still handle a high throughput of reads. This latency vs throughput problem is a common tradeoff with large-scale systems! Hope that helps :)
Great Video man! Would how would you go about designing the data ingestion part?
Great point! There’s a lot that goes into ingesting logs while optimizing network performance and maintaining context. Check out our full video on monitoring systems on interviewpen.com :)
😊😊 ok 0@@interviewpen
great video,very clear
Thanks!
Kafka -> Loki -> S3
If you're looking for an existing solution :)
Yep, S3 does a lot of the things discussed here behind the scenes. Thanks for watching!
So in 2018 every service was writing logs to node 3, didn't we went back to bad write complexity by doing bucketing?
Yep, bucketing makes query performance better, so we introduce sharding as well to distribute writes within a bucket.
Why not use Kafka for high through put?
Kafka is an event streaming platform, so it wouldn't solve any of the log storage problems we're addressing here. But if you have any thoughts on how to incorporate it, feel free to share!
@@interviewpen Use kafka stream + cassandra . process the event through consumers and save inside a Hbase db for analytics .
Great video
Thanks!
Greatt video!! thanks! but, can you create video to develop Effective and efficient Ticketing System?
Sure, we'll add it to the backlog. Thanks for watching!
Great video!!! Please slow down the speed of video as someone new to topic its bit fast to grasp the concept.
Ok, noted!
Suppose every two years, it ingest 2PB and migrate 1PB, how could three sets be enough to cycle after 12 years?
Great question! At any given time, we have three "hot" nodes--two are migrating data to cold storage and one is ingesting new data. We only showed one cold storage node in the example, but we would need at least 2 to make this work long-term. Hope that helps!
I love the cute computer in the background
Thank you :)
By 2026 you will have 2 clusters with 2PB (2022-2023, 2024-2025) of data and one with 1PB of data (2021). What do you do then? 😅
While we're writing data for 2024/25, we can migrate data to cold storage from both clusters at once, meaning 2020 and 2022 data can both be migrated during those 2 years. Thanks for watching :)