At scale, managing, maintaining and operating your own Kafka fleet requires expertise, staffing, comes with an operational overhead... All that has a cost... You need to compare that vs a cloud managed Kafka service, which will probably have a higher cost for the service, but probably the operational cost is lower...
I use Pulsar extensively in my job. It's taken the place of traditional queues (think RabbitMQ/ActiveMQ) as well as Kafka. With regards to the queue functionality - it's incredibly nice to be able to replay historical data when you want to to do back testing of a new version of your processor against actual production traffic. For streaming, it's great there as well for moving logs/etc around. The fact that it can successfully scale to millions of topics and you can do wildcard topic subscriptions (e.g. persistent://tenant/namespace/topic.subtopic.*) means that you can create an individual topic per client and subscribe using a wildcard to capture new topics as they are added. You can also federate between brokers at either the namespace or (optionally) the topic level, so you can pick and choose what you're federating. You can also select to do either one-way or two-way federation. Tiered storage is one of the killer features. I love the fact that I can keep the past 30 minutes of data on the bookie, but have a month or more of data offloaded to S3 for additional review/testing if needed. I can't say enough good stuff about it.
I don't understand why we put Kafka between data source and target. for example, the CDC, why don't we stream data directly from source to target? or in the logging case, why don't we bypass Kafka and stream logs directly to ElasticSearch? Doesn't doing so remove the overhead caused by Kafka?
well that's what we do. Kafka is the streaming source. but has multiple benefits : as data retention ( replays history ) or smart publisher/subscriber systems. configure once then multiple consumer that use the same source
Kafka also helps to decouple a producer and consumer. Kafka is fault tolerant and highly available but in a lot of cases the consumers may not be. In your case let's say a database like SQL Server. It can be down for hardware issue or maintenance or some bad workload making it unresponsive. With Kafka, your producer need not worry about the consumer and keep sending the data to Kafka which will store it as per the retention policy configured. When your consumer is up and healthy, it can start consuming the data. If Kafka was not there, messages will start piling up on producer's workers leading to crash and bad user experience.
00:28 1. Log analysis
01:18 2. Real-time machine learning pipelines
02:28 3. Real-time system monitoring and Alerting.
03:39 4. Change Data Capture (CDC)
04:50 5. System Migration
Best youtube channel ever
I never knew Kafka had so many use cases! This video was really informative.
@ByteByteGo Please reduce down your speed of speaking and speed of video graphic animation display
You can use 1.5 speed
Or 0.75 and 0.5
reduce? I always watch it with 1.5 speed
Too fast I have too stop
Great Content, thanks a lot !
Do you have video on how to choose a data transfer system?
I have a question. When to use Kafka over equivalent Cloud Proprietary services (AWS, Azure, GCP)?
At scale, managing, maintaining and operating your own Kafka fleet requires expertise, staffing, comes with an operational overhead... All that has a cost... You need to compare that vs a cloud managed Kafka service, which will probably have a higher cost for the service, but probably the operational cost is lower...
When you grow tired of paying for Bezos’ lifestyle lol.
Host your own when you know the answer to this question.
Thank you for doing this!
What’s used to make the animations?
The animations are made with Lotus 123.
After Effects.
Same question
@@tmd4951 True. It's always the same boring, lazy question on each of these videos.
Any plans on doing a head to head on Kafka vs. Pulsar?
I use Pulsar extensively in my job. It's taken the place of traditional queues (think RabbitMQ/ActiveMQ) as well as Kafka. With regards to the queue functionality - it's incredibly nice to be able to replay historical data when you want to to do back testing of a new version of your processor against actual production traffic. For streaming, it's great there as well for moving logs/etc around.
The fact that it can successfully scale to millions of topics and you can do wildcard topic subscriptions (e.g. persistent://tenant/namespace/topic.subtopic.*) means that you can create an individual topic per client and subscribe using a wildcard to capture new topics as they are added. You can also federate between brokers at either the namespace or (optionally) the topic level, so you can pick and choose what you're federating. You can also select to do either one-way or two-way federation.
Tiered storage is one of the killer features. I love the fact that I can keep the past 30 minutes of data on the bookie, but have a month or more of data offloaded to S3 for additional review/testing if needed.
I can't say enough good stuff about it.
No Event Sourcing?
"Top Kafka use cases..." != "All Kafka use cases"
How to make animation like this video ?
Can you do an architecture diagram of LinkedIn?
I don't understand why we put Kafka between data source and target. for example, the CDC, why don't we stream data directly from source to target? or in the logging case, why don't we bypass Kafka and stream logs directly to ElasticSearch? Doesn't doing so remove the overhead caused by Kafka?
well that's what we do. Kafka is the streaming source. but has multiple benefits : as data retention ( replays history ) or smart publisher/subscriber systems. configure once then multiple consumer that use the same source
Kafka also helps to decouple a producer and consumer. Kafka is fault tolerant and highly available but in a lot of cases the consumers may not be. In your case let's say a database like SQL Server. It can be down for hardware issue or maintenance or some bad workload making it unresponsive.
With Kafka, your producer need not worry about the consumer and keep sending the data to Kafka which will store it as per the retention policy configured. When your consumer is up and healthy, it can start consuming the data.
If Kafka was not there, messages will start piling up on producer's workers leading to crash and bad user experience.
Kafka let's gooooo
Let's be clear, you're not gonna land a job at a FAANG company by watching a 5-min UA-cam video.
Good. Good.
👍🏻
Pt-br