Google Maps system design

18 - How to Build a Portfolio That Gets You Hired as a Developer

How Distributed Lock works | ft Redis | System Design

Тайское мороженое в Калининграде

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Top-K problem (Heavy Hitters) system design

SE Verse

Переглядів 5 267

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 18 січ 2025

КОМЕНТАРІ • 8

@TZ-oi4od Рік тому ⁺¹
I'm so confused, why there are two paths, one to calculate topk per min, the other to calculate topk per hour?
why can't the map reduce job to configurable to calculate at different cadence so you can retire the upper path?
@SEVerse-ik9ey Рік тому ⁺¹
Hi, I assume you are referring to the 'Foundational System Design' part. The 'min' in the second path stands for 'minimum' and not 'minute'. The path without 'min heap' represents a naive solution. Sorry for any confusion created. You can assume that we calculate the top K per hour; the actual period is not important for describing the design. Thank you for the feedback!
@jaikarnati Рік тому
Thanks for the video, can you provide how the two paths results are stored in the data store, one is from the map-reduce job and the other is navie path from hashmap or(count -min sketch).how the queries are performed and which to figure out?
@SEVerse-ik9ey 9 місяців тому
Thanks for your question. I will try a short version in comment section, it wasn't covered in video, because DB schemas and queries seem to be too overloading on already pretty full of processes video)
Let's name things:
storage will be PostgreSQL DB
storage(distributed file system) will be AWS S3
1) First flow with "preprocessor" and "merger" allows as to obtain topK for specified period but we expect to choose only one period (hour or day or week or whatever).
2) In case if we in the future decided that we want to be flexible and allow arbitrary time interval and time period, we prepose flow when AWS S3 storage is added to allow for construction of arbitrary time period (topk every hour,day,week) and time interval.
Rough process of Flow #2 ETL process:
1) Saver will dump file, let's say csv/parquet (in reality it would be a good idea to use batching, but for simplicity let's dump every request).
2) Every request creates a record in predefined csv file. It will be a great idea to partition by ids and inserting event of same id to save file. Also we can add id+time interval, say drafting a new file every day (most of the time you need recent info, this allows you to only touch say last week files for calculating counts every hour for past week)
with following fields:
timestamp(in ms or seconds, to allow aggregation to minutes, hours ...), id
1713876131, 1
1713876134, 2
1713876136, 1
3) Count job takes it's one/several file for calculation, because we partitioned by ids, we can independently run count on every file
4) Counting process is going as in Flow #1 by calculating to Hash-map and then to min-heap
5) After this we run "merging job" to get topK overall for defined time interval and time period and save it to PostgreSQL (storage)
Hope it helps clarify things, feel free to ask if explanation was confusing.
@pengli4769 9 місяців тому
When you merges multiple heaps, it doesn't seem the merged topK is the right answer among all heaps.
@SEVerse-ik9ey 9 місяців тому
Hey, can you elaborate on why it will not be the right answer?
If I understood your question correctly, the issue is the possibility of hash maps used as the basis for the calculation of the min heap on server 1 & 2 colliding? Excuse me for not articulating it more clearly, but due to the fact that the Message Queue before, as well as the "processor" service itself, is distributed, we partition by the id of the post/video/whatever. In which case, every hash map on each server shall not collide with another server. Then, the min-heap will be a correct representation, and merging independent min-heaps indeed gives you the correct overall topK min-heap.
@perfectalgos9641 6 місяців тому
Generally API gateway can do TLS offloading. Why we need LB first before api gateway? It would apigateway first and then LB. Can someone explain please?
@perfectalgos9641 6 місяців тому
I got it but, author needs to explain it. so future audience can understand it.

Наступне

Автоматичне відтворення

Google Maps system design

Google Maps system design

18 - How to Build a Portfolio That Gets You Hired as a Developer

18 - How to Build a Portfolio That Gets You Hired as a Developer

How Distributed Lock works | ft Redis | System Design

How Distributed Lock works | ft Redis | System Design

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

System Design Interview - Top K Problem (Heavy Hitters)

System Design Interview - Top K Problem (Heavy Hitters)

Count min sketch | Efficient algorithm for counting stream of data | system design components

Count min sketch | Efficient algorithm for counting stream of data | system design components

7: Design a Rate Limiter | Systems Design Interview Questions With Ex-Google SWE

7: Design a Rate Limiter | Systems Design Interview Questions With Ex-Google SWE

I ACED my Technical Interviews knowing these System Design Basics

I ACED my Technical Interviews knowing these System Design Basics

Twitter Likes Count Design | Youtube Views Count Design | Near Realtime Counter System Design

Twitter Likes Count Design | Youtube Views Count Design | Near Realtime Counter System Design

17: Top K Leaderboard | Systems Design Interview Questions With Ex-Google SWE

17: Top K Leaderboard | Systems Design Interview Questions With Ex-Google SWE

Web Crawler System Design Concepts Nobody Talks About

Web Crawler System Design Concepts Nobody Talks About

Scalability Simply Explained in 10 Minutes

Scalability Simply Explained in 10 Minutes

System Design Interview: Design Top-K Youtube Videos w/ a Ex-Meta Senior Manager

System Design Interview: Design Top-K Youtube Videos w/ a Ex-Meta Senior Manager

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

У ДЕТЕНЫША СТЕПЫ ИСЧЕЗ ГЛАЗИК

У ДЕТЕНЫША СТЕПЫ ИСЧЕЗ ГЛАЗИК

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade