Design a Distributed Geospatial Data Platform | System Design
Вставка
- Опубліковано 14 лип 2024
- Visit Our Website: interviewpen.com/?...
Join Our Discord (24/7 help): / discord
Join Our Newsletter - The Blueprint: theblueprint.dev/subscribe
Like & Subscribe: / @interviewpen
In this video, we discuss a high-level design of a geospatial data aggregation platform. This system would be responsible for ingesting multiple formats of data from a variety of sources, aggregating and cleaning the data, and providing a performant and convenient dashboard to interact with the processed dataset.
Table of Contents:
0:00 - Introduction
0:35 - Requirements
2:12 - Data Processing (Single-Node)
3:22 - Data Processing (Distributed)
4:14 - Workflow Orchestration
4:58 - Data API
5:30 - Caching
6:12 - Conclusion
6:35 - interviewpen.com
Socials:
Twitter: / interviewpen
Twitter (The Blueprint): / theblueprintdev
LinkedIn: / interviewpen
Website: interviewpen.com/?...
Thanks for the video! Would be great to also see the how you would write it on a real application
I would like to point out that there are datebase (extensions) for GIS data. Postgis for postgres. So in fact you could query a database. Other databases have also extensions or native features.
Yes-for our vector-based data this is a good solution. However, for raster data we don’t have any direct equivalent. We sort of glossed over this in the interest of time, so really good thoughts here!
Very good and explicative video, thank you very much.
I am currently building an internal data platform, and I was going to use Prefect on a VM, but after seeing your video I believe the best way to go would be: Prefect + Dask Scheduler + Dask Worker on Azure Kubernetes Service. Does that make sense to you? Then I could benefit from autoscaling of the workers.
Thanks again!
Yep, that sounds like a great solution! There's also fully managed solutions like Snowflake and Databricks as well, if that suits your use case. Thanks for watching!
Did something similar but on a very large scale in PayPal,
Cool cool!
This made me wonder whether systems like Hadoop and MapReduce are still used/built.
Hadoop MapReduce could absolutely be used in place of Spark/Dask as our distributed data processing cluster. However, this would be a lot of manual work to build the types of aggregations we would need from scratch. Good point!