System Design: Why is Kafka fast?

Поділитися
Вставка
  • Опубліковано 29 вер 2024
  • Weekly system design newsletter: bit.ly/3tfAlYD
    Checkout our bestselling System Design Interview books:
    Volume 1: amzn.to/3Ou7gkd
    Volume 2: amzn.to/3HqGozy
    Other things we made:
    Digital version of System Design Interview books: bit.ly/3mlDSk9
    Twitter: bit.ly/3HqEz5G
    LinkedIn: bit.ly/39h22JK
    ABOUT US:
    Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.

КОМЕНТАРІ • 470

  • @ByteByteGo
    @ByteByteGo  2 роки тому +250

    Subscribe and Kafka will say thank you :)

    • @tubenzr
      @tubenzr 2 роки тому

      ok, it's done Sir

    • @DrRishabhGarg
      @DrRishabhGarg 2 роки тому +4

      What software do you use to create this awesome motion graphics?

    • @rpidugu99
      @rpidugu99 Рік тому +4

      May I know what tool you guys use to make these animated videos? Just curious..!!

    • @ropro9817
      @ropro9817 Рік тому +1

      I just discovered this video in my feed. _Sometimes_ the UA-cam algorithm actually works! 🤠Great video! I just subscribed to your channel!

    • @colossus95
      @colossus95 Рік тому

      I wish you were my professor in college.

  • @jay6645
    @jay6645 2 роки тому +1880

    The absence of any background music makes this video great.

  • @ervamate
    @ervamate 2 роки тому +18

    Mentioned a lot in the comments, but I have to say as well: what a great explanation, straight to the point, no bs and gives enough info without overwhelming with details. Thank you!

  • @Youvko
    @Youvko Рік тому

    Wow, this one is super cool. No background music, cool minimalistic diagrams, calm voice!

  • @dishantchauhan4775
    @dishantchauhan4775 2 роки тому +6

    Seriously, thanks a lot Alex for all the stuff you convey through your LinkedIn network and UA-cam videos. Just love the way you distil the topics and make them understand beautifully.

  • @jigneshnakhva1546
    @jigneshnakhva1546 2 роки тому +1

    I love all the System-design Content posted by you!
    Thanks for sharing your knowledge! 🙏

  • @MrRunchSlam
    @MrRunchSlam 2 роки тому +8

    You guys are doing amazing work here. I love the aesthetics, pace, explanations, topics, and cadence of it all. Kudos!

  • @TheAceEditor
    @TheAceEditor Рік тому

    Essential collection of videos in this channel for a software developer

  • @przemekkobel4874
    @przemekkobel4874 2 роки тому +1

    Wow. Never heard about Kafka, but after this brilliant video now I know why it is so fast. Still no idea what it is, though. And so many totally not astroturfed comments. Sweet.

  • @sherhy3689
    @sherhy3689 2 роки тому +1

    i wanted to comment that i appreciate the level of detail in the explanations in the video.
    looking forward to more useful content!

  • @NuncNuncNuncNunc
    @NuncNuncNuncNunc 2 роки тому +1

    Very clear explanation. Thank You!

  • @dowlathbashag65
    @dowlathbashag65 2 роки тому +1

    Awesome Explanation about Kafka is amazing...Thank you, Alex

  • @prathibavijayasekaran4173
    @prathibavijayasekaran4173 Рік тому

    Very simple with good animation to explain things clearly. Keep publishing these kinds of useful videos.

  • @ChandraShekhar-by3cd
    @ChandraShekhar-by3cd 2 роки тому +3

    Loved the animation and explanation. Keep enlightening us all!

  • @dansokolsky3963
    @dansokolsky3963 2 роки тому +2

    We need so much more of this.

  • @nicklaspillay7923
    @nicklaspillay7923 2 роки тому +1

    This is an amazing video.
    Actually putting it out there - I LIKED AND SUBBED!
    Well deserved for great content 💯

  • @robertredziak6461
    @robertredziak6461 7 місяців тому

    Great video, only one more thing could be mentioned: send file doesn’t work, when TLS is terminated on Kafka.

  • @fokerfakerfuker
    @fokerfakerfuker 2 роки тому

    wow the comments are right. simple and clear... subscribed

  • @gopalsv5230
    @gopalsv5230 Рік тому

    Nice intro about Kafka, learned quickly, now you got a new subscriber 👍

  • @Xerxes17
    @Xerxes17 2 роки тому

    Me, who has zero interaction with Kafka:
    "Hmm yes, very interesting, liked and subscribed."

  • @svworld01
    @svworld01 2 роки тому +1

    very nicely explained 😊

  • @santoshbhat7847
    @santoshbhat7847 8 місяців тому +2

    How are such animations made ?

  • @AungBaw
    @AungBaw 2 роки тому

    Short & sweet. Thank you.

  • @san4net
    @san4net Рік тому

    nice simple explanation

  • @rbelatamas
    @rbelatamas Рік тому

    beautiful footage ❤

  • @VenuKoka
    @VenuKoka Рік тому

    Very elegantly done. I wonder what animation tool they are using ??

  • @luizadolphs6084
    @luizadolphs6084 Рік тому +1

    Awesome video!!!!!! How those animations are made? In after effects??

  • @darkswordsmith
    @darkswordsmith 2 роки тому

    I was hoping for an explanation about how to read is efficient when Kafka only performs sequential write...without actually knowing Kafka, I'd guess sequential page tables?

  • @mujtabanadeem7116
    @mujtabanadeem7116 2 роки тому

    I feel like I'm getting some shaolin training just listening to this man

  • @andrewcenteno3462
    @andrewcenteno3462 2 місяці тому

    Wow this was incredible

  • @lytung1532
    @lytung1532 2 роки тому +2

    The tutorial is useful. Thanks for your sharing. Could you give more explanations on how Kafka enforces sequentiality characteristic on the disk? Do we need a specialized disks or dedicated settings because as i know a file can be stored fragmently on the disk?

  • @睡不醒的小麦
    @睡不醒的小麦 Рік тому

    Useful information.

  • @mohammedshabaaz9625
    @mohammedshabaaz9625 2 роки тому

    Can you bring more about kafka?

  • @thecloudterminal
    @thecloudterminal Рік тому

    Thank you for great explanation and putting so much effort

  • @gouravbatra3656
    @gouravbatra3656 10 місяців тому +1

    @ByteByteGo what exactly is the use of socket buffer , is there any consequences if we dont copy the data from disc to application buffer and Socket Buffer ?

  • @suvankardas7932
    @suvankardas7932 2 роки тому

    it's just phenomenal elaboration!!! is there any way to get such thing over book or yo have any course available in any online training organization???waiting for your quick immediate response

  • @NhanLe-tl7vz
    @NhanLe-tl7vz Рік тому

    Thank you so much

  • @danalex2991
    @danalex2991 2 роки тому

    Amazing video !

  • @AndyThomasStaff
    @AndyThomasStaff 2 роки тому +1

    I don't understand the hard disk section. Is part of kafka's selling point that you can use it on HDDs?

    • @maximKvyatkovskij
      @maximKvyatkovskij 2 роки тому

      Yes. Kafka came to life in 2011 and SSDs were just getting started then. But even now SSDs did not replace HDDs in clouds and data centers. You can see that new, larger HDDs are still being introduced to the market. If you use HDD then with Kafka you can technically never remove your messages since storage is cheap and performance doesn't take a hit. If you keep your messages then you can replay them from any moment which has its benefits.

  • @sangamsahai9823
    @sangamsahai9823 Рік тому

    Love your videos. Can you please share what software do you use to make the graphics ? Thanks

  • @abhiramrustagi8456
    @abhiramrustagi8456 2 роки тому

    Ahh, great work!

  • @gkgk8508
    @gkgk8508 Рік тому

    May I ask are you the author of the book of system design?

  • @machiii7394
    @machiii7394 2 роки тому

    I'm bored, it's 4 AM, and I was waiting for a tldr of what Kafka is, ngl.

  • @vancouverbill
    @vancouverbill 2 роки тому

    Excellent video. Are there any situations when Kafka cannot use zero copy and has to resort to read flow without zero copy? Thanks

    • @jasonraynar6955
      @jasonraynar6955 Рік тому

      When doing ssl based connection to brokers - so any real world situation where you store any form of PII and don't want it potentially "seen" in transit. But with cheap servers... You generally don't care, it is extremely fast anyway, as sequential read/write and most consumers being on the head of the partitions, and cached in memory by os, the backing topic never gets read (The interesting optimisations are the time index and record offset files to allow pseudo random access / positioning in partition files)

  • @linleo816
    @linleo816 2 роки тому

    awesome!! 😃

  • @insaikishoreseelamsetty3066

    Exemplary Illustration on Kafka

  • @wusluf
    @wusluf 2 роки тому

    Where did you get the numbers for writes per second from?

  • @오5-p4j
    @오5-p4j 2 роки тому

    awesome

  • @blikenoother
    @blikenoother 2 роки тому

    can we compare these 2 concepts with traditional queue RabbitMQ/AWS SQS?

  • @andrewstephens5971
    @andrewstephens5971 2 роки тому

    If your systems are appending records to the end of lists constantly without reordering them, won't it be problematic for the client application?

  • @qm3ster
    @qm3ster 2 роки тому

    Wait, but how does encryption work with zero-copy DMA?!

  • @Universal.x
    @Universal.x 2 роки тому

    Is there difference between the 2 volumes of the book?

  • @dronihack
    @dronihack 2 роки тому +7

    me not having a clue what kafka is about

  • @catcoder12
    @catcoder12 7 місяців тому +9

    Simple video without being a pretentious tech bro.

  • @kurtmueller2089
    @kurtmueller2089 2 роки тому +535

    What an amazing tutorial: Just the necessities, no annoying background music, no annoying calls to "subscribe and like".
    If all youtube channels were like that, we could heal the world.
    Also, I checked your channel page and was shocked to find that this was only your 3rd video.
    Keep being awesome!

    • @martinmusli3044
      @martinmusli3044 2 роки тому +12

      This Tutorial is insanly "Zen" but he said "please subscribe" right at the end :P

  • @nemeziz_prime
    @nemeziz_prime 2 роки тому +244

    These videos are amazingly simple and clear. The animations are spot on!! Too good xD I wish this channel never stops uploading new content

  • @ridealongreactions2601
    @ridealongreactions2601 Рік тому +53

    I 100% believe you should make a whole series on Kafka, your way of simplifying the subject is legendary.

  • @julianosanm
    @julianosanm Рік тому +1

    The index to the shards on the Businesses DB should be user_id or business_id? I think the latter makes more sense not sure if it was a typo on the video

  • @Dezdichado1000
    @Dezdichado1000 Місяць тому +1

    The bigger the diamter of the pipe, the larger the amount of liquid that can go through it. Put some nsfw tag on this thank you.

  • @jell_pl
    @jell_pl 2 роки тому +2

    wtf? that whole video is about distinction between sequential and random access on... magnetic based storages, which are now used as a last step/level in caching sequence... (quite like magnetic tape storage back in the days...)
    this "Advertisement" for kafka is speaking about hdd vs ssd (distinction which was valid few years ago) ignoring that there are things like nvme which are in the middle or even closer in cost to ssd, being only ~10x slower than ddr kind of memories (while ssds are ~100x slower than ddrs)
    nicely done video, but srsly strongly misleading...

  • @Spiritualleace
    @Spiritualleace 2 роки тому +33

    How can one keep things so deep and yet stunningly simple. Hats off!

  • @volodymyrliashenko1024
    @volodymyrliashenko1024 2 роки тому +1

    Great video!
    BTW, how to create such kind of animations?

  • @nishantparmar
    @nishantparmar 2 роки тому +42

    Short, high quality, clean and extremely precise content...Many Thanks!

  • @sultown4343
    @sultown4343 Рік тому +1

    is there a risk of Kafka accessing other areas of the memory cache in which system calls could send wrong/private data?

  • @brunowu1356
    @brunowu1356 2 роки тому +2

    Why is it called zero copy when it clearly shows that there is one copy made to the NIC buffer?

    • @ByteByteGo
      @ByteByteGo  2 роки тому +6

      This is a great question. Let's use Wikipedia's definition of zero copy, which says:
      "Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided.
      The "spirit" of zero-copy is to eliminate unnecessary data copies. In Kafka's case, sendfile() does reduce the number of copying. In the best case, if a NIC which supports scatter/gather DMA is used, the copy does not involve the CPU, making it truly zero-copy.
      There is quite a bit of nuance when it comes to zero-copy. Maybe we can make a video about it down the road.

    • @brunowu1356
      @brunowu1356 2 роки тому

      Thanks for the explanation Alex

  • @mohimen1617
    @mohimen1617 2 роки тому +2

    amazing video i love the animation so much , what application do you use to make such animation?

  • @amigochan
    @amigochan 2 роки тому +10

    影片中說明兩個為什麼 Apache Kafka 能夠提供高流量傳輸大量紀錄的特性:
    1. 循序 I/O
    以 C 來說,當使用 fopen() 需要開啟一個檔案為 append 模式,file pointer 會直接在檔案尾端準備以新增方式繼續加入新資料,會比每次加入資料需要移動 Pointer 到特定位置再寫入來的快速。如果用硬碟的循序讀寫與隨機讀寫,會更容易理解。
    在 File-based Database,例如 dBASE, COBOL + ISAM, Paradox,也是直接將新紀錄寫在檔案後方。可以用 PC-Tools 打開檔案觀察 HEX Code 確認。風險在於如果來不及寫入 EOL,沒有順利關閉檔案,就會造成檔案損毀與資料遺失。
    刪除紀錄也只是在記錄上做個標記,並不會真正刪除,需要等到執行 compact database 才會真正刪除。因此我在設計需要確實刪除客戶個人資料時,會以無意義的字串覆蓋,直接刪除其實只是標記,資料還在。
    2. [Zero Copy](en.wikipedia.org/wiki/Zero-copy) 避開將相同資料在不同記憶體區塊再次複製後移動,縮短傳送路徑。例如在提供 DMA 模式情況下,讓系統函數直接將讀取已經被讀入記憶體緩衝區的資料放入網卡 NIC 緩衝區開始傳送,省略 Socket Buffer 路徑。

  • @kiwi-mf2do
    @kiwi-mf2do Рік тому +1

    Uhh I thought this was a final fantasy video

  • @learnersparadise7492
    @learnersparadise7492 2 роки тому +4

    Hi Alex, Just a suggestion, please make some videos on different consensus algorithm (RAFT, PAXOS).

  • @rasimbot
    @rasimbot 2 роки тому +1

    Disk read and network transfer delays are order of magnitude longer than copying between buffers

  • @143Support
    @143Support 11 місяців тому +3

    This is not the same Kafka I was expecting, but happy to learn. thanks for sharing!

  • @acuteaura
    @acuteaura 2 роки тому +1

    I don't know a single person who has a Kafka scale problem and not enough money for SSDs. And essentially every modern DBMS will use a WAL and indexes in sequential space too. And the speed comparison is hard extremes in the range of grabbing single bytes off a 4k read. You add more than one consumer (or, dare I say it, write at the same time that you read from a partition) and your perfect scenario where you never seek goes poof. Please put Kafka on SSDs.
    Asking why Kafka is fast is really the wrong question. Under those two conditions, SQLite is fast. It even skips the network! Why don't you ask why Kafka can scale horizontally without much effort, for instance?
    And please, be measured in your chosen technology. Kafka is a big iron and not likely to be useful in a lot of architectures. Yes, you can use it as a message broker for your queue, but that's really not a good use. Add one partition and now your FIFO is guarantee is out the window. Kafka is for sharing data and continuous updates on the same with an unknown, but usually large quantity of other services. And usually in an enterprise.
    Oh and also, please don't tell people the latency hierarchy is a lie. Access patterns improve latency, but memory will always be faster than any disk.

  • @yjhsjtu
    @yjhsjtu Рік тому +1

    Each kafka broker could have data from multiple topic partitions. Reading data from same kafka broker still needs to swich disk header. Will that hurt the benefit from sequential IO?

  • @StephenMoreira
    @StephenMoreira 2 роки тому +1

    Never heard of Kafka. Thank you UA-cam algorithm.

    • @JuliaT522
      @JuliaT522 5 місяців тому

      What are you gonna do with this knowledge now?

  • @AgrimGupta
    @AgrimGupta 6 місяців тому +1

    @ByteByteGo - How do you create those animations? Which software, what process?

  • @SaitamaTheLegend
    @SaitamaTheLegend 2 роки тому +8

    In 5 minutes I learned a lot! Amazing video!
    You are a good teacher!
    Thank you and I hope to see more videos from you!

  • @cryptomania3553
    @cryptomania3553 2 роки тому +1

    3 videos in 14 days and 48k subs ,nice

  • @simon199418
    @simon199418 Рік тому +1

    Cool stuff, the concepts described here are known by embedded system designers who suffered enough.

  • @joshuatsui
    @joshuatsui 2 роки тому +2

    So the question is, why before it had to first copy into socket buffer, and not anymore with Kafka?

  • @GughaGSrinivasan
    @GughaGSrinivasan 2 роки тому +5

    ASMR experience :)
    i have subscribed...
    Neat explanations...
    I am not curious about Kafka, but curious about the optimization techniques and strategies they have accomplished which I would like to learn...
    Please do more!

  • @cipherxen2
    @cipherxen2 2 роки тому +1

    This video explained nothing. Just a bunch of buzzwords.

  • @BryanChance
    @BryanChance 2 роки тому +1

    When using DMA sendfile(), is it transparent to the receiver? I mean does the receiving side have to do anything special or just receive the file through the normal socket as usual?

  • @DevNarayan
    @DevNarayan 2 роки тому +1

    Amazing details about frequently used software. Lucky to bump into this page. Thanks

  • @paul8683
    @paul8683 Рік тому +1

    So this video was in my recommends so I clicked it. 1/2 way done and I still don't know what kafka is. Is it a database? Is it a SAN? Is it a program that runs in an os to transfer files? Wtf is kafka! So I got to the end and I still have no clue what kafka is.

    • @juliap.5375
      @juliap.5375 Рік тому

      Same! 😂 Clicked on recommendation, still idk what is Kafka.
      Moreover, I studied about such optimizations in college back in 2000s, as example if anyone opened file without memory-mapping, it got low grade. WTF, why they told about such basics which use any ordinary developer?

  • @StephenGillie
    @StephenGillie 2 роки тому +1

    This helps to explain why the sequential read speed of HDDs is on the AWS Cloud Solutions Architect study guides.

  • @shashankhegde4007
    @shashankhegde4007 2 роки тому +1

    Nice work!, I would like to learn the content of your book in the video format.

  • @Athmarr
    @Athmarr 2 роки тому +5

    I have used kafka before but never had to think about why it is actually fast. This was very informative. I like the format of the video as well

  • @northmania5332
    @northmania5332 2 роки тому +1

    What software are you using to make this awesome presentations?

  • @TricoliciSerghei
    @TricoliciSerghei 2 роки тому +1

    Very informative video, thank you so much!!

  • @_sudipidus_
    @_sudipidus_ 2 роки тому +1

    I had only thought about sequential IO because of the append only log file.. this was insightful
    are there any security implications of zeroCopy? Since the data is directly being written within the kernel space how does it honor the boundary?

    • @acuteaura
      @acuteaura 2 роки тому +1

      Yes, you can’t use TLS. Transparent memory encryption (e.g AMD SME) may also prevent zero copy, and you probably want that enabled on a server.

  • @toukaK
    @toukaK 2 роки тому +6

    excited to see Sahn on youtube!
    this is by far the best tech video I've watched. concise without losing any depth! looking forward to more videos like this.
    I've had the fortune to (indirectly) work with Sahn and review his code. one of the few top talents that any company is lucky to have. this video is as high quality as other production of his.
    2 questions for Sahn:
    1. there's a small disconnection between "sequential IO throughput vs random IO throughput" and "HDD vs SSD". is there any perf number difference on sequential IO throughput on HDD vs SSD?
    2. is there any perf number difference(ops per sec or latency) for zero-copy vs traditional buffer copies?

  •  2 роки тому +13

    Great technical explanation. I just want to add that Kafka can be used for much more than just data ingestion sending data from a data source to a data sink. The Apache Kafka open source project also includes Kafka Connect for data integration and Kafka Streams for data processing. Therefore, you can leverage the characteristics explained in this video to build a modern data flow with a single (scalable and reliable) real-time infrastructure instead of combining several different components (like Apache Kafka for ingestion, Apache Camel for data integration, and another stream processing framework like Apache Flink for real-time analytics).

    • @EverydayRoadster
      @EverydayRoadster Рік тому +2

      Reliability of Kafka has yet to be proven. Ever so often it does not meet data integration core requirements on reliability, especially in the area of disruption and recovery, where it quickly says GoodBy to “At-most-once” semantics. Don’t get me wrong, Kafka is really great for what it is designed for: efficient streaming in BigData architecture, but that architecture will tolerate a certain fuzziness of data, which pure data integration architecture would not allow for.

  • @tubenzr
    @tubenzr 2 роки тому +1

    your video is very clear and on-point Sir, thanks a lot 👍👍

  • @lifessummerleaves
    @lifessummerleaves 2 роки тому +2

    Very deep insight! Looking forward to your next videos, please keep going

  • @sidforreal
    @sidforreal 2 роки тому +2

    Hey Alex, love your content.
    Still waiting for SD Volume 2 in Amazon India 🥲

  • @javadoctor101
    @javadoctor101 2 роки тому +1

    Great video! May I know what do you use for those animations? These animations are basic yet effective!

  • @betims
    @betims 2 роки тому +1

    Amazing explanation. Thank you sir.

  • @sakthikumar4721
    @sakthikumar4721 2 роки тому +2

    I really appreciate your work. Excellent video. Superbly Articulated. Easy to grab the concepts. Great work. 😍

  • @aryanrahman3212
    @aryanrahman3212 2 роки тому +4

    Really great presentation! I was scared when I saw Kafka but you explained it really well.

  • @siruitao
    @siruitao 2 роки тому +1

    Thanks for the useful instruction!