Fundamentally really good. I find it loses a bit of practicality when you reach the 'local' copy of state component of the architecture. Sure, if the local copy of state can be generated from day one by an immutable Kafka stream, then yes, that component of the architecture is sound. Agree 100%. In today's world I think in most of the use cases you will want to drop something like Kafka in the middle of things that do not have the luxury of having that immutable stream from day one. Sure you could 'build' it somehow. Maybe make an initial state and then stream events from there or something. But dang, even thinking about such a thing with regards to legacy concerns on what you have to interact with... It just loses its sexiness right about this point for me. That doesn't say I don't like Kafka. What you're providing under the hood is tremendous. I really, really like it. And better yet, I intend to adopt it. I just think to uptake this technology in a realistic manner, in a currently running real-world shop, demands looking directly at all of the various flavors of existing services that would tap into the Kafka architecture and improving/enhancing the available patterns there. Thanks again. I really love what you guys are doing.
I used a product called Apache Ignite as a distributed set and map, and even with the possibility of doing distributed compute operations. I am intrigued that Kafka has some custom code capabilities and that a DB is optional.
Capturing events from Order via CDC is very simplistics propositions. Good luck doing it in SAP order system for example. There are so many tables you would have to deal with to compile some useful event. CDC cannot do this.
Typically you "edit" the log by creating a new event that supersedes the previous event. For most use cases this is fine. When you build the view/projection newer events alter the state of older events aka event sourcing. There are a couple of places where this isn't enough. One is if the log is actually corrupt, for example because you had a bug in production that led to corrupt data being saved. Another is where you need to physically delete data, for example to comply with GDPR. Compacted topics let you delete data (they include a delete command) but it requires some extra thought (www.confluent.io/blog/handling-gdpr-log-forget/).
How does persistance work when I'd like to create a read model from events? Let's assume that we have a topic of events which can be computed into an aggregate (event sourcing). From the video I understand that I could create a stream, group events by ID and then merge them into aggregate and then forward that stream into KTable for persistance. How can I read such aggregates through rest? Isn't this problematic when kafka streams are stateful ? I'm assuming that there are multiple replicas of the same service which handles streams
I am new to the KAFKA world and my question may sound naive. I am confused about how consumer group consumes the message from the partition. I have seen many talks on UA-cam but still not clearly understood the theoretical content. I understood that Producer1 writes to Topic1-Partition1 and Producer2 writes to Topic1-Partition2. There are 2 consumers (C1 and C2) and each consumer can read from one partition only (is my understanding correct?). Can someone direct me to the article where I can get more details about how consumer/consumer group consume the message.
A consumer group defines the set of consumers that, taken all into account, will read the whole topic. Here you have a small list of facts that can help you: - Each consumer alone can read from any number of partitions, in fact, if you have one consumer group with only one consumer, and the topic is partitioned in N partitions, that single consumer will read from all the N partitions. - Each partition of a topic can be read only by one consumer within a consumer group, that means, you can't have 2 different consumers reading from the same partition if those two consumers are part of the same consumer group. - Two consumers that are part of two different consumer groups can read from the same topic and the same partition, but each of them will have a different offset. That mean, both will read the same data, you will process each message twice: one with each consumer. - A consumer group will read all the data from one topic only once: each consumer inside the consumer group will read a part of the topic (a subset of the partitions). If you group all the events that are processed by all the consumers inside a consumer group, you get all the messages that were written to a topic. Examples that can get you to understand a bit better: - We have 1 topic partitioned in 3 partitions. We define a Consumer group with only 1 consumer. That consumer will read all 3 partitions from the topic. If we were writing all the messages to an output with all consumers, we will see that we have only 1 message in the output for each message in the input. - We have 1 topic partitioned in 3 partitions. We define a Consumer group with 4 consumers. One consumer within the group will be idle, since each topic can only be read from 1 consumer within a consumer group, so there will be 3 consumers reading one topic each one , and one consumer that won't do anything. If we were writing all the messages to an output with all consumers, we will see that we have only 1 message in the output for each message in the input. - We have 1 topic partitioned in 3 partitions. We define 2 Consumer groups, each of them with 1 consumer. Both 2 consumers will process all 3 partitions. If we would be writing them to an output with both consumers, we will see that we have 2 messages in the output for each message in the input.
I'm assuming the customer service writes to the topic first? Then it consumes it's own topic for it's local db. Otherwise if you saved to db first, then the publish to topic failed...
The most common way to do it is write to the topic then recreate the view. This is CQRS. Another variant is to write "through" the database table first then turn the database into an event stream with a CDC Connector. Both give you the same general result. The "write through" approach lets the application read its own writes immediately. CQRS better suits highly performant use cases. There is an example of the CQRS approach here: www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql Write through is discussed here: www.confluent.io/blog/messaging-single-source-truth/
Fundamentally really good. I find it loses a bit of practicality when you reach the 'local' copy of state component of the architecture. Sure, if the local copy of state can be generated from day one by an immutable Kafka stream, then yes, that component of the architecture is sound. Agree 100%. In today's world I think in most of the use cases you will want to drop something like Kafka in the middle of things that do not have the luxury of having that immutable stream from day one. Sure you could 'build' it somehow. Maybe make an initial state and then stream events from there or something. But dang, even thinking about such a thing with regards to legacy concerns on what you have to interact with... It just loses its sexiness right about this point for me. That doesn't say I don't like Kafka. What you're providing under the hood is tremendous. I really, really like it. And better yet, I intend to adopt it. I just think to uptake this technology in a realistic manner, in a currently running real-world shop, demands looking directly at all of the various flavors of existing services that would tap into the Kafka architecture and improving/enhancing the available patterns there. Thanks again. I really love what you guys are doing.
Thanks for such a great talk Tim, really helpful!
Just 1 min in and I can tell I'm going to enjoy this. :) ps: great voice for voiceovers
Finally a great talk! Good Job.
I used a product called Apache Ignite as a distributed set and map, and even with the possibility of doing distributed compute operations. I am intrigued that Kafka has some custom code capabilities and that a DB is optional.
Great presentation!
I want that presentation pointer!
good luck debugging this in production
Capturing events from Order via CDC is very simplistics propositions. Good luck doing it in SAP order system for example. There are so many tables you would have to deal with to compile some useful event. CDC cannot do this.
He looks like Gavin belson :)
Key point: What are you if you're editing a log? :-)
Typically you "edit" the log by creating a new event that supersedes the previous event. For most use cases this is fine. When you build the view/projection newer events alter the state of older events aka event sourcing. There are a couple of places where this isn't enough. One is if the log is actually corrupt, for example because you had a bug in production that led to corrupt data being saved. Another is where you need to physically delete data, for example to comply with GDPR. Compacted topics let you delete data (they include a delete command) but it requires some extra thought (www.confluent.io/blog/handling-gdpr-log-forget/).
watch on 1.5x speed, you're welcome
Awesome! Thanks a bunch!
Thanks, very informative.
How does persistance work when I'd like to create a read model from events? Let's assume that we have a topic of events which can be computed into an aggregate (event sourcing). From the video I understand that I could create a stream, group events by ID and then merge them into aggregate and then forward that stream into KTable for persistance. How can I read such aggregates through rest? Isn't this problematic when kafka streams are stateful ? I'm assuming that there are multiple replicas of the same service which handles streams
Man , That voice :O
When taking questions via sound like this, repeat it for the rest of the audience. When in writing(like Quora) not so necessary.
30:19 what's a partition?
I am new to the KAFKA world and my question may sound naive.
I am confused about how consumer group consumes the message from the partition. I have seen many talks on UA-cam but still not clearly understood the theoretical content.
I understood that Producer1 writes to Topic1-Partition1 and Producer2 writes to Topic1-Partition2. There are 2 consumers (C1 and C2) and each consumer can read from one partition only (is my understanding correct?).
Can someone direct me to the article where I can get more details about how consumer/consumer group consume the message.
A consumer group defines the set of consumers that, taken all into account, will read the whole topic. Here you have a small list of facts that can help you:
- Each consumer alone can read from any number of partitions, in fact, if you have one consumer group with only one consumer, and the topic is partitioned in N partitions, that single consumer will read from all the N partitions.
- Each partition of a topic can be read only by one consumer within a consumer group, that means, you can't have 2 different consumers reading from the same partition if those two consumers are part of the same consumer group.
- Two consumers that are part of two different consumer groups can read from the same topic and the same partition, but each of them will have a different offset. That mean, both will read the same data, you will process each message twice: one with each consumer.
- A consumer group will read all the data from one topic only once: each consumer inside the consumer group will read a part of the topic (a subset of the partitions). If you group all the events that are processed by all the consumers inside a consumer group, you get all the messages that were written to a topic.
Examples that can get you to understand a bit better:
- We have 1 topic partitioned in 3 partitions. We define a Consumer group with only 1 consumer. That consumer will read all 3 partitions from the topic. If we were writing all the messages to an output with all consumers, we will see that we have only 1 message in the output for each message in the input.
- We have 1 topic partitioned in 3 partitions. We define a Consumer group with 4 consumers. One consumer within the group will be idle, since each topic can only be read from 1 consumer within a consumer group, so there will be 3 consumers reading one topic each one , and one consumer that won't do anything. If we were writing all the messages to an output with all consumers, we will see that we have only 1 message in the output for each message in the input.
- We have 1 topic partitioned in 3 partitions. We define 2 Consumer groups, each of them with 1 consumer. Both 2 consumers will process all 3 partitions. If we would be writing them to an output with both consumers, we will see that we have 2 messages in the output for each message in the input.
you are right! :D
@@spanishcrab what if we have 1 topic 3 partition and 2 consumer in the consumer group, will one of the consumer read from two partition
@@abhimanyuray6827 yes
I'm assuming the customer service writes to the topic first? Then it consumes it's own topic for it's local db. Otherwise if you saved to db first, then the publish to topic failed...
The most common way to do it is write to the topic then recreate the view. This is CQRS. Another variant is to write "through" the database table first then turn the database into an event stream with a CDC Connector. Both give you the same general result. The "write through" approach lets the application read its own writes immediately. CQRS better suits highly performant use cases.
There is an example of the CQRS approach here: www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql
Write through is discussed here: www.confluent.io/blog/messaging-single-source-truth/
Heath ledger joker ?? is it only me
Giant Hog Weed
Coingeek website good site
28 minutes of echo system description 😔
isn't that terribly inefficient
Gosh good content but his burps and water drinking is so distracting ..
Lets see your presentation.