Design a Distributed Geospatial Data Platform | System Design

Поділитися
Вставка
  • Опубліковано 14 лип 2024
  • Visit Our Website: interviewpen.com/?...
    Join Our Discord (24/7 help): / discord
    Join Our Newsletter - The Blueprint: theblueprint.dev/subscribe
    Like & Subscribe: / @interviewpen
    In this video, we discuss a high-level design of a geospatial data aggregation platform. This system would be responsible for ingesting multiple formats of data from a variety of sources, aggregating and cleaning the data, and providing a performant and convenient dashboard to interact with the processed dataset.
    Table of Contents:
    0:00 - Introduction
    0:35 - Requirements
    2:12 - Data Processing (Single-Node)
    3:22 - Data Processing (Distributed)
    4:14 - Workflow Orchestration
    4:58 - Data API
    5:30 - Caching
    6:12 - Conclusion
    6:35 - interviewpen.com
    Socials:
    Twitter: / interviewpen
    Twitter (The Blueprint): / theblueprintdev
    LinkedIn: / interviewpen
    Website: interviewpen.com/?...

КОМЕНТАРІ • 10

  • @joseavellaneda4921
    @joseavellaneda4921 5 місяців тому +4

    Thanks for the video! Would be great to also see the how you would write it on a real application

  • @rankala
    @rankala 5 місяців тому +1

    I would like to point out that there are datebase (extensions) for GIS data. Postgis for postgres. So in fact you could query a database. Other databases have also extensions or native features.

    • @interviewpen
      @interviewpen  5 місяців тому +1

      Yes-for our vector-based data this is a good solution. However, for raster data we don’t have any direct equivalent. We sort of glossed over this in the interest of time, so really good thoughts here!

  • @pmshadow
    @pmshadow 5 місяців тому

    Very good and explicative video, thank you very much.
    I am currently building an internal data platform, and I was going to use Prefect on a VM, but after seeing your video I believe the best way to go would be: Prefect + Dask Scheduler + Dask Worker on Azure Kubernetes Service. Does that make sense to you? Then I could benefit from autoscaling of the workers.
    Thanks again!

    • @interviewpen
      @interviewpen  5 місяців тому

      Yep, that sounds like a great solution! There's also fully managed solutions like Snowflake and Databricks as well, if that suits your use case. Thanks for watching!

  • @yashpandey7433
    @yashpandey7433 5 місяців тому +1

    Did something similar but on a very large scale in PayPal,

  • @pieter5466
    @pieter5466 5 місяців тому +1

    This made me wonder whether systems like Hadoop and MapReduce are still used/built.

    • @interviewpen
      @interviewpen  5 місяців тому

      Hadoop MapReduce could absolutely be used in place of Spark/Dask as our distributed data processing cluster. However, this would be a lot of manual work to build the types of aggregations we would need from scratch. Good point!