How

Поділитися
Вставка
  • Опубліковано 23 лип 2024
  • System Design for SDE-2 and above: arpitbhayani.me/masterclass
    System Design for Beginners: arpitbhayani.me/sys-design
    Redis Internals: arpitbhayani.me/redis
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafters.io/join?via=...
    In the video, I discussed the importance of maintaining search at scale using Elasticsearch at Twitter. Twitter built tooling around Elasticsearch to handle surges in search traffic, real-time ingestion, and backfill. The course on system design focuses on building intuition and covers real-world system design scenarios. Twitter's tooling includes an Elasticsearch proxy for standardization and a backfill service for staggered data ingestion. By deferring rights and ensuring synchronous reads, Twitter maintains stability and scalability in its Elasticsearch clusters.
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.me/knowledge-base
    Bookshelf: arpitbhayani.me/bookshelf
    Papershelf: arpitbhayani.me/papershelf
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack.com
    Thank you for watching and supporting! it means a ton.
    I am on a mission to bring out the best engineering stories from around the world and make you all fall in
    love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
  • Наука та технологія

КОМЕНТАРІ • 45

  • @baibhabmondal1740
    @baibhabmondal1740 Рік тому +5

    Thanks Arpit, this helps in drawing parallels to other systems as well. And its so nice to see the fundamentals are quite the same in handling large scale infra

  • @ShivamKumar-bt9nn
    @ShivamKumar-bt9nn Рік тому +1

    Love these stories of great engineering. Request to please bring these more often. Thanks a lot 🙂

  • @abhishekvishwakarma9045
    @abhishekvishwakarma9045 Рік тому +4

    such a great explanation, learned a lot from you arpit sir 😎, keep going 🔥

  • @biswajit-k
    @biswajit-k 8 місяців тому

    Very helpful! Thanks a lot sir.

  • @architshukla8076
    @architshukla8076 Рік тому +1

    Thanks Arpit for such an informative content

  • @navneetkalra3934
    @navneetkalra3934 4 місяці тому

    Similar to what we built at Oracle...Oracle Knowledge AI search is similar kind of architecture.we have also introduced vector searches in elastic search

  • @random4573
    @random4573 Рік тому

    In database systems we can segregate write and read across different DB and eventually make read node consistent with write node data.
    I dont much about ES. But was it not an option in ES.

  • @manjunathyaji7316
    @manjunathyaji7316 Рік тому

    Thanks Arpit! This was a great video! I had a question.
    In the backfill process, how does the orchestrator know how many workers to spawn? How do you monitor and calculate the amount of data yet to be processed in HDFS?
    If Kafka was used instead of HDFS, I know there's a way to calculate the consumer lag, which can be used to trigger orchestrator's rules.

  • @shishirchaurasiya7374
    @shishirchaurasiya7374 Рік тому

    This was really a crispy one

  • @sureshchaudhari4465
    @sureshchaudhari4465 Рік тому

    Bahut acche

  • @guitar-nation-gautam
    @guitar-nation-gautam Рік тому +2

    But I guess if the read operation is an I/O intensive one , like fetching a yearly orders report from ES it shouldn't be a synchronous operation , rather it should follow the write flow described by you i.e send the report meta details as an event to kafka topics and later workers can mail them the reports asynchronously.

    • @sharemomentsindian
      @sharemomentsindian Рік тому

      Here also if report is big , how we fetch it can be discussed

    • @meditationdanny701
      @meditationdanny701 Рік тому

      But why use ES for analytical queries

    • @AbhishekTiwari-uy7of
      @AbhishekTiwari-uy7of Рік тому

      better to directly run some spark job on s3/hdfs and refrain from using elasticsearch for such use cases.

  • @abhiyanker
    @abhiyanker Рік тому +1

    Why was HDFS used here? A simple queue(like SQS) or a Kafka if Twitter wanted to have a retry mechanism would have achieved the same.

    • @AsliEngineering
      @AsliEngineering  Рік тому

      Staging storage for subsequent consumption.

    • @abhiyanker
      @abhiyanker Рік тому

      @@AsliEngineering Thanks for the reply! I was not hoping I would get a reply here.
      When backfill is not required Twitter is putting it in elastic search directly and for backfill they are putting it in HDFS. I think the reason would be the constraint of memory in Kafka or SQS. S3 or HDFS do not have that.

  • @chetan_bommu
    @chetan_bommu Рік тому

    Thanks for the great explanation. I have a basic question. What is backfill & it's job here?
    Is it about parsing each tweet and doing analysis?

    • @uditchaudhary9117
      @uditchaudhary9117 Рік тому

      Backfilling updates the index with the latest data crawled from various sources.

  • @dharins1636
    @dharins1636 Рік тому

    Hey Arpit, I am confused, initially you said every team had their own cluster, is the proxy a common service for all the clients of different cluster or each cluster will have its own proxy service?

    • @AsliEngineering
      @AsliEngineering  Рік тому +1

      Hybrid setup is a possibility.
      There may be services that have an isolated proxy where there may be a few who share .

  • @kpicsoffice4246
    @kpicsoffice4246 Рік тому

    Great video dude.
    I wanted to ask you about your recording setup. Are you using obs & screen mirroring your iPad or something? Please mention any hardware/software you need for these videos

    • @AsliEngineering
      @AsliEngineering  Рік тому +1

      Obs plus iPad. Nothing more.

    • @kpicsoffice4246
      @kpicsoffice4246 Рік тому

      @@AsliEngineering I see. So is it iPad that you screen mirror + obs on MacBook? And is the app Notability? Btw your handwriting is awesome!!

  • @gomathivigneshmurugan5409
    @gomathivigneshmurugan5409 Рік тому

    Hi arpit, Thanks for your videos, sorry if my question is stupid, I have seen this video and your bookmyshow video also, in both always scaling happens during write opertion only, what about huge no of traffic reads a particular API how api stability is ensured, kindly revert please...

  • @baibhabmondal1740
    @baibhabmondal1740 Рік тому

    Any HighLevel folks watching this,
    It would be very similar to our eventing (& mongo-indexing) service, and the backfill is basically our snapshot service.

  • @user-ot3ro8zc6x
    @user-ot3ro8zc6x Рік тому

    Since write is happening in async that particular tweet wouldn't reflect in his tweets immediately right?? so how will the user immediately sees his tweet??

    • @AsliEngineering
      @AsliEngineering  Рік тому

      How likely is the user going to search his/her own tweet immediately after posting it?

    • @user-ot3ro8zc6x
      @user-ot3ro8zc6x Рік тому

      @@AsliEngineering how to handle such a use case if there's any

    • @AsliEngineering
      @AsliEngineering  Рік тому +1

      @@user-ot3ro8zc6x search systems are never designed to be strongly consistent.
      But if you want strong consistency then your API will have to synchronously write to DB and to Search engine. a massive overkill tbh.

    • @user-ot3ro8zc6x
      @user-ot3ro8zc6x Рік тому

      @@AsliEngineering Yeah got it.

  • @atulyadav21j
    @atulyadav21j Рік тому

    Thanks Arpit for making this video!
    I had some follow up, curious question
    - what happens when there is too much data on kafka during backpressure while indexing ?
    - can map reduce create an elastic-search understandable file, which can be be used for bulk insertion ? Since in current architecture worker will be again making 1:1 calls.

  • @tesla1772
    @tesla1772 Рік тому

    Why dont api server directly write to kafka instead of proxy

    • @AsliEngineering
      @AsliEngineering  Рік тому +3

      Because it was a system rewrite and they did not want to change any upstream.

    • @baibhabmondal1740
      @baibhabmondal1740 Рік тому +2

      Also to add onto Arpit, I would assume the proxy still would have authority over rate of requests, and some kind of auth. In case of strange burst, we could avoid pushing a lot of unwanted data to Kafka.

  • @AadiManchekar
    @AadiManchekar Рік тому

    what if kafka gets too many messages??? will it drop some messages>

    • @AsliEngineering
      @AsliEngineering  Рік тому +1

      Back pressure.

    • @dharins1636
      @dharins1636 Рік тому +2

      No, the beauty of kafka is its log-append, it will add it and you just have to consume, then you can configure the topics to delete the "older" data based on the configuration (bytes or time or both). Ofcourse there are compacted topics but thats another type of "reducing" the data space (and it has its own problems :) )

    • @AadiManchekar
      @AadiManchekar Рік тому

      @@dharins1636 thankyou

  • @rohit_starker34
    @rohit_starker34 Рік тому

    Hi sir I'm 1st year student should I buy your system design course

    • @AsliEngineering
      @AsliEngineering  Рік тому +2

      Not at all. Meant for more than 2 years of work experience.

  • @sharemomentsindian
    @sharemomentsindian Рік тому

    Are worker nodes spark jobs which are streaming from Kafka and writing in elastic search at particular window or interval @arpit @asliengineering

    • @AsliEngineering
      @AsliEngineering  Рік тому

      Could be. Implementation can be anything. Raw consumers, or Spark jobs.